Philip Jackson and Martin Russell Philip Jackson and Martin Russell Electronic Electrical and Computer Engineering Models of speech Models of speech dynamics in a dynamics in a segmental-HMM segmental-HMM recognizer using recognizer using intermediate linear intermediate linear representations representations http://web.bham.ac.uk/p.jackson/ balthasar/
22
Embed
Philip Jackson and Martin Russell Electronic Electrical and Computer Engineering Models of speech dynamics in a segmental-HMM recognizer using intermediate.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Philip Jackson and Martin RussellPhilip Jackson and Martin Russell
Electronic Electrical and Computer Engineering
Models of speech Models of speech dynamics in a segmental-dynamics in a segmental-
HMM recognizer using HMM recognizer using intermediate linear intermediate linear
representationsrepresentations
http://web.bham.ac.uk/p.jackson/balthasar/
Speech dynamics into ASR
INTRODUCTIONINTRODUCTION
Conventional model
INTRODUCTIONINTRODUCTION
1
acoustic observations
HMM
acoustic PDF
1 11 1 2 3 42 2222 3 33 4 4 42
Linear-trajectory model
INTRODUCTIONINTRODUCTION
2 3 41
W
acoustic observations
articulatory-to-
intermediate layer
segmental HMM
acoustic PDF
acoustic mapping
Multi-level Segmental HMM
• segmental finite-state process
• intermediate “articulatory” layer– linear trajectories
• mapping required– linear transformation– radial basis function network
INTRODUCTIONINTRODUCTION
Estimation of linear mapping
Matched sequences andT1x
YXW
,1Ty
YWXD min
THEORYTHEORY
Linear-trajectory equations
Defined as:
,iii ttt cmf
THEORYTHEORY
21 t
tif
t
ic
Training the model parameters
For optimal least-squares estimates (acoustic domain):
,s
11
)(1
ˆi
i
t
tti t
Tyc
1 2
1
1
1 )(ˆ
i
i
i
i
t
tt
t
tti
tt
tttym
THEORYTHEORY
midpoint
slope
11
)(1
ˆi
i
t
ttki tW
Tyc
THEORYTHEORY
midpoint
slope
For optimal least-squares estimates (articulatory domain):
,s
1 2
1
1
1 )(ˆ
i
i
i
i
t
tt
t
tt k
itt
tttW ym
Training the model parameters
11
)(1
ˆi
i
t
ttikii tDWD
Tyc
1 2
1
1
1 )(ˆ
i
i
i
i
t
tt
t
tt iki
itt
tttDWD ym
THEORYTHEORY
midpoint
slope
For optimal maximum-likelihood estimates (articulatory domain):
,s
Training the model parameters
Tests on MOCHA
• S. British English, at 16kHz (Wrench, 2000)
– MFCC13 acoustic features, incl. zero’th
– articulatory x- & y-coords from 7 EMA coils– PCA9+Lx: first nine articulatory modes plus