Part 6 HMM in Practice CSE717, SPRING 2008 CUBS, Univ at Buffalo
Jan 15, 2016
Part 6 HMM in Practice
CSE717, SPRING 2008
CUBS, Univ at Buffalo
Practical Problems in the HMM
Computation with Probabilities
Configuration of HMM
Robust Parameter Estimation (Feature Optimization, Tying)
Efficient Model Evaluation (Beam Search, Pruning)
Computation with Probabilities
Logarithmic Probability Representation
Lower Bounds for Probabilities
Codebook for Semi-Continuous HMMs
Probability of State Sequence s for a Given Model λ
If all , for a sequence of T>100,
T
tss tt
as1
,1)|Pr(
1.0,1
tt ssa
10010)|Pr( s
Logarithm Transformation
pp ln~
)(})({max)( 11 tjijti
t Obaij
)(~
}~)(~
{min)(~
11 tjijti
t Obaij
Kingsbury-Rayner Formula
)1ln(~
)}1ln({ln
))/1(ln()ln(~
)~~(1
lnln1
121213
12
12
pp
pp
ep
ep
pppppp
213 ppp
Mixture Density Model
Kingsbury-Rayner Formula is not advisable here (too many exps and logs)
Approximation
M
kjkjkj xgcxb
1
)()(
)}({max)( xgcxb jkjkk
j
)}(~~{min)(~
xgcxb jkjkk
j
Lower Bounds for Probabilities
Choose a minimal probability
For example: In training it is avoided that certain states are
not considered for parameter estimation
In decoding it is avoided that paths through states with vanishing output probabilities are immediately discarded
minp
minmax ln~ pp
min)( bxb j
Codebook Evaluation for Semi-Continuous HMMs
Semi-Continuous HMM
M
kkjk
M
kkjkj xpcxgcxb
11
)|()()(
label class :k
Codebook Evaluation for Semi-Continuous HMMs
By Bayes’ Law
Assume can be approximated by a uniform distribution, then
M
kkkk
kkk
xpx
xpxxp
1
)|()Pr()|Pr(
)()|Pr()Pr()|(
M
kk
kk
xp
xpx
1
)|(
)|()|Pr(
)Pr( k
Codebook Evaluation for Semi-Continuous HMMs
This reduces the dynamic range of all quantities involved
M
kM
kk
kjk
M
kkjkj
xp
xpcxcxb
1
1
1
'
)|(
)|()|Pr()(
Configuration of HMM
Model Topology
Modularization
Compound Models
Modeling Emissions
Model Topology
Input data of speech and handwriting recognition exhibit a chronological or linear structure
Ergodic model is not necessary
Linear Model
The most simple model that describes chronological sequences
Transitions to the next state and to the current state are allowed
Bakis Model
Skipping of states is allowed
Larger flexibility inn the modeling of duration
Widely used in speech and handwriting recognition
Left-to-right Model
An arbitrary number of states may be skipped in forward direction
Jumping back to “past” states is not allowed
Can describe larger variations in the temporal structure; longer parts of the data may be missing
Modularization
English Word Recognition Thousands of words: more than thousands of
word models; requires large amount of training data
26 letters: limited number of character models
Modularization: divides complex model into smaller models of segmentation units Word -> subword -> character
Variation of Segmentation Units in Different Context Phonetic transcription of word “speech”: /spitS/
Cannot easily be distinguished from achieve (/@tSiv/), cheese (/tSiz/), or reality (/riEl@ti/)
Triphone [Schwartz, 1984] Three immediately neighboring phone units
taken as a segmentation units, e.g., p/i/t Eliminates the dependence of the variability of
segmentation units on the context
Compound Models
Parallel connection of all individual word models
HMM structure for isolated word recognition
Circles: Model States
Squares: Non-emission States
HMM structure for connected word recognition
Grammar Coded into HMM
Modeling Emissions
Continuous feature vectors in the fields of speech and handwriting recognition are described by mixture models Size of the codebook and number of component
densities per mixture density need to be decided No general way; a compromise between the precision
of the model, its generalization capabilities, and the computation time
Semi-Continuous Model Size of codebook: some hundred up to a few thousand
densities Mixture Model: 8 to 64 component densities
References
[1] Schwartz R, Chow Y, Roucos S, Krasner M, Makhoul J, Improved hidden Markov Modelling of phonemes for continuous speech recognition, in International Conference on Acoustics, Speech and Signal Processing, pp 35.6.1-35.6.4, 1984.
Robust Parameter Estimation
Feature Optimization
Tying
Feature Optimization Techniques