Page 1
Dept. of Computer Science & Engg.
Indian Institute of Technology Kharagpur
Part-of-Speech Tagging for Bengali with Hidden Markov Model
Sandipan Dandapat, Sudeshna Sarkar
Department of Computer Science & Engineering
Indian Institute of Technology Kharagpur
Page 2
Dept. of Computer Science & Engg.
Indian Institute of Technology Kharagpur
Machine Learning to Resolve POS Tagging
HMM Supervised (DeRose,88; Mcteer,91; Brants,2000; etc.)
Semi-supervised (Cutting,92; Merialdo,94; Kupiec,92; etc.)
Maximum Entropy (Ratnaparkhi,96; etc.)
TB(ED)L (Brill,92,94,95; etc.)
Decision Tree (Black,92; Marquez,97; etc.)
Page 3
Dept. of Computer Science & Engg.
Indian Institute of Technology Kharagpur
Our Approach HMM based
Simplicity of the model Language Independence Reasonably good accuracy
Data intensive Sparseness problem when extending order
We are adapting first-order HMM
Page 4
Dept. of Computer Science & Engg.
Indian Institute of Technology Kharagpur
POS Tagging Schema
Language Model
Disambiguation Algorithm
Rawtext
Taggedtext
Possible POSClass Restriction …
POS tagging
Page 5
Dept. of Computer Science & Engg.
Indian Institute of Technology Kharagpur
POS Tagging: Our Approach
First-order HMM
Disambiguation Algorithm
Rawtext
Taggedtext
Possible POSClass Restriction …
POS tagging
First order HMM: Current state
depends on previous state
1
1
... 1,
( | ) ( | )arg max i i i i
t tn i n
S P w t P t t
Page 6
Dept. of Computer Science & Engg.
Indian Institute of Technology Kharagpur
POS Tagging: Our Approach
µ = (π,A,B)
Disambiguation Algorithm
Rawtext
Taggedtext
Possible POSClass Restriction …
POS tagging
1
1
... 1,
( | ) ( | )arg max i i i i
t tn i n
S P w t P t t
{ ( | )}i iB P w t1{ ( | )}i iA P t t
start{ ( )}iP t Model Parameters First-order HMM
Page 7
Dept. of Computer Science & Engg.
Indian Institute of Technology Kharagpur
POS Tagging: Our Approach
µ = (π,A,B)
Disambiguation Algorithm
Rawtext
Taggedtext
…
POS tagging
ti {T}
or
ti TMA(wi)
iw
{T} : Set of all tags
TMA(wi) : Set of tags computed by
Morphological Analyzer First-order HMM
Page 8
Dept. of Computer Science & Engg.
Indian Institute of Technology Kharagpur
POS Tagging: Our Approach
µ = (π,A,B)
Viterbi Algorithm
Rawtext
Taggedtext
…
POS tagging
ti {T}
or
ti TMA(wi)
iw
{T} : Set of all tags
TMA(wi) : Set of tags computed by
Morphological Analyzer First-order HMM
Page 9
Dept. of Computer Science & Engg.
Indian Institute of Technology Kharagpur
Disambiguation Algorithm
1
1
... 1,
( | ) ( | )arg max i i i i
t tn i n
S P w t P t t
n321 wwww Text:
Tags:• • •
• • •
• • •
• • •
Where, ti {T} , wi {T} = Set of tags
Page 10
Dept. of Computer Science & Engg.
Indian Institute of Technology Kharagpur
Disambiguation Algorithm
1
1
... 1,
( | ) ( | )arg max i i i i
t tn i n
S P w t P t t
n321 wwww Text:
Tags:• •
•
• •
• •
Where, ti TMA(wi), wi {T} = Set of tags
Page 11
Dept. of Computer Science & Engg.
Indian Institute of Technology Kharagpur
Learning HMM Parameters Supervised Learning ( HMM-S)
Estimates three parameters directly from the tagged corpus
ino. of sentences which begin with t( )
no. of sentencesstart iP t
- 11
- 1
( )( | )
( )
i ii i
i
count t tP t t
count t
with 1
( )( | )
( )
i ii i
i
count w tP w t
count t
Page 12
Dept. of Computer Science & Engg.
Indian Institute of Technology Kharagpur
Learning HMM Parameters Semi-supervised Learning (HMM-SS)
Untagged data (observation) are used to find a model that most likely produce the observation sequence
Initial model is created based on tagged training data Based on initial model and untagged data, update the model
parameters
arg max ( | )untaggedP O
New model parameters are estimated using Baum-Welch algorithm
P(O | ̂) P(O | )
Page 13
Dept. of Computer Science & Engg.
Indian Institute of Technology Kharagpur
Smoothing and Unknown Word Hypothesis
All emission and transition are not observed from the training data
Add-one smoothing to estimate both emission and transition probabilities
Not all words are known to Morphological Analyzer Assume open class grammatical categories
Page 14
Dept. of Computer Science & Engg.
Indian Institute of Technology Kharagpur
Experiments Baseline Model Supervised bigram HMM (HMM-S)
HMM-S HMM-S + IMA HMM-S + CMA
Semi-supervised bigram HMM (HMM-SS) HMM-SS HMM-SS + IMA HMM-SS + CMA
Page 15
Dept. of Computer Science & Engg.
Indian Institute of Technology Kharagpur
Data Used Tagged data: 3085 sentences ( ~ 41,000 words)
Includes both the data in non-privileged and privileged mode
Untagged corpus from CIIL: 11,000 sentences (100,000 words) – unclean To re-estimate the model parameters using Baum-Welch
algorithm
Page 16
Dept. of Computer Science & Engg.
Indian Institute of Technology Kharagpur
Tagset and Corpus Ambiguity Tagset consists of 27 grammatical classes
Corpus Ambiguity Mean number of possible tags for each word Measured in the training tagged data
Dutch Spanish German English French Bengali
1.11 1.19 1.3 1.34 1.69 2.09
(Dermatas et al 1995)
Page 17
Dept. of Computer Science & Engg.
Indian Institute of Technology Kharagpur
Results on Development set
Baseline
30405060708090
100
5 10 15 20 25 30 35 40
Size of the traing corpus (1000x words)
Tagg
ing
Acc
urac
y (%
)
30
40
50
60
70
80
90
100
5 10 15 20 25 30 35 40
Size of the traing corpus ( 1000x words)
Tag
gin
g A
ccu
racy
( %
)
HMM-S
HMM-S + IMA
HMM-S + CMA
30
40
50
60
70
80
90
100
5 10 15 20 25 30 35 40
Size of the training corpus (1000x words)
Tag
gin
g A
ccu
racy (
%)
ACOPOST
30
40
50
60
70
80
90
100
5 10 15 20 25 30 35 40
Size of the training corpus ( 1000x words)
Tagg
ing
Acc
urac
y (%
)
HMM-SS
HMM-SS + IMA
HMM-SS + CMA
Page 18
Dept. of Computer Science & Engg.
Indian Institute of Technology Kharagpur
Results on Development setMethod Accuracy
Baseline 69.11
ACOPOST 83.45
HMM-S 74.53
HMM-S + IMA 78.65
HMM-S + CMA 88.83
HMM-SS 73.77
HMM-SS + IMA 77.98
HMM-SS + CMA 89.65
89.61
89.03
87.0987.4
89.3688.92
85.5
86
86.5
87
87.5
88
88.5
89
89.5
90
knowndata
seen data unknowndata
Tagg
ing
Acc
urac
y(%
)
HMM-S + CMA
HMM-SS + CMA
Page 19
Dept. of Computer Science & Engg.
Indian Institute of Technology Kharagpur
Error Analysis
Actual Class
Predicted Class
% of total error
% of class error
NNC NN 14.2 4.0
VRB VFM 7.1 8.7
JJ NN 5.9 1.7
QF JJ 5.1 3.7
RB JJ 5.0 3.6
NLOC NN 4.5 1.3
VNN VFM 3.7 4.5
Page 20
Dept. of Computer Science & Engg.
Indian Institute of Technology Kharagpur
Results on Test Set Tested on 458 sentences ( 5127 words)
Precision: 84.32% Recall: 84.36% Fβ=1 : 84.34%
Type Precision(%) Recall (%) Fβ=1 Frequency
SYM 100 99.78 99.89 911
NEG 95.45 100 97.67 44
PRP 95.72 93.18 94.43 257
QFNUM 94.70 91.24 92.94 132
Top 4 classes in terms of F-measure
Page 21
Dept. of Computer Science & Engg.
Indian Institute of Technology Kharagpur
Results on Test Set Tested on 458 sentences ( 5127 words)
Precision: 84.32% Recall: 84.36% Fβ=1 : 84.34%
Type Precision(%) Recall (%) Fβ=1 Frequency
VJJ 0 0 0 0
NVB 0 0 0 28
JVB 0 0 0 12
INF 100 12.5 22.22 1
Bottom 4 classes in terms of F-measure
Page 22
Dept. of Computer Science & Engg.
Indian Institute of Technology Kharagpur
Further Improvement Uses suffix information to handle unknown words Calculates the probability of a tag, given the last m
letters (suffix) of a word
Each symbol emission probability of unknown word is normalized
n 1 n
( | _ ) ( _ )( _ | )
( )
( | ,..., ) ( _ )
( )
ii
i
i m
i
P t Unknown word P Unknown wordP Unknown word t
P t
P t l l P Unknown word
P t
Page 23
Dept. of Computer Science & Engg.
Indian Institute of Technology Kharagpur
Further Improvement
73.77
89.65
77.98
90.33
84.6183.33
70
75
80
85
90
95
100
HMM-SS HMM-SS+IMA HMM-SS+CMA
Tag
gin
g A
ccu
racy
(%)
Accuracy reflected on development set
90.17
78.65
88.83
74.53
85.04 85.95
70
75
80
85
90
95
100
HMM-S HMM-S+IMA HMM-S+CMA
Tagg
ing
Acc
urac
y(%
)
IMA
CMA
Page 24
Dept. of Computer Science & Engg.
Indian Institute of Technology Kharagpur
Conclusion and Future Scope Morphological restriction on tags gives an efficient
tagging model even when small labeled text is available
Semi-supervised learning performs better compare to supervised learning
Better adjustment of emission probability can be adopted for both unknown words and less frequent words
Higher order Markov model can be adopted
Page 25
Dept. of Computer Science & Engg.
Indian Institute of Technology Kharagpur
Thank You