Top Banner
Extended Baum-Welch algorithm Present by shih-hung Liu 20060121
25

Extended Baum-Welch algorithm Present by shih-hung Liu 20060121.

Dec 18, 2015

Download

Documents

May Fox
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Extended Baum-Welch algorithm Present by shih-hung Liu 20060121.

Extended Baum-Welch algorithm

Present by shih-hung Liu 20060121

Page 2: Extended Baum-Welch algorithm Present by shih-hung Liu 20060121.

NTNU Speech Lab. 2

References

• A generalization of the Baum algorithm to rational objective function - [Gopalakrishnan et al.] IEEE ICASP 1989

• An inequality for rational function with applications to some statistical estimation problems [Gopalakrishnan et al.]

- IEEE Transactions on Information Theory 1991

• HMMs, MMIE, and the Speech Recognition problem- [Normandin 1991] PhD dissertation

• Function maximization - [Povey 2004] PhD thesis chapter 4.5

Page 3: Extended Baum-Welch algorithm Present by shih-hung Liu 20060121.

NTNU Speech Lab. 3

Outline

• Introduction

• Extended Baum-Welch algorithm [Gopalakrishnan et al.]

• EBW from discrete to continuous [Normandin]

• EBW for discrete [Povey]

• Example of function optimization [Gopalakrishnan et al.]

• Conclusion

Page 4: Extended Baum-Welch algorithm Present by shih-hung Liu 20060121.

NTNU Speech Lab. 4

Introduction

• The well-known Baum-Eagon inequality provides an effective iterative scheme for finding a local maximum for homogeneous polynomials with positive coefficients over a domain of probability values

• However, we are interesting in maximizing a general rational function. We extend the Baum-Eagon inequality to rational function

Page 5: Extended Baum-Welch algorithm Present by shih-hung Liu 20060121.

NTNU Speech Lab. 5

Extended Baum-Welch algorithm (1/6)

• an arbitrary homogeneous polynomial with nonnegative coefficient of degree d in variables

Assuming that this polynomial is defined over a domain of probability values, they show how to construct a transformation for some such that following the property:

property A : for any and , unless

[Gopalakrishnan 1989]

})({)( ijXPXP

iij qjpiX ,...,1 ,,...,1 ,

iq

j ijij xxD1

1 ,0 :

DUT : DU

Ux )(xTy )()( xPyP xy

Page 6: Extended Baum-Welch algorithm Present by shih-hung Liu 20060121.

NTNU Speech Lab. 6

Extended Baum-Welch algorithm (2/6)

• is a ratio of two polynomials in variables defined over a domain

we are looking for a growth transformation such that

for any and , unless

• A reduction of the case of rational function to polynomial

we reduce the problem of finding a growth transformation for a rational function to of finding that for a specially formed polynomial

• reduce to Non-homogeneous polynomial with nonnegative

• Extend Baum-Eagon inequality to Non-homogeneous polynomial with nonnegative

[Gopalakrishnan 1989]

iq

j ijij xxD1

1 ,0 :

)(/)()( 21 XSXSXR 0)( ),( 21 XSXS

iij qjpiXX ,...,1 ,,...,1 },{

DDT :

Dx )(xTy )()( xRyR xy

Page 7: Extended Baum-Welch algorithm Present by shih-hung Liu 20060121.

NTNU Speech Lab. 7

Extended Baum-Welch algorithm (3/6)

• Step1:

[Gopalakrishnan 1989]

)()( then ),()( ifsuch that )( polynomial a exists thereany for

xRyRDyxPyPXPDx xxx

)()( then 0)()( if thereforeand, 0)( that see easy to isit Indeed,

)()()()(set enough to isit for this 21

xRyRxPyPxP

XSXRXSXP

xx

x

x

follows as of n nsformatiogrowth tra a define could then we

)( unless any for unless)())((such that , of n nsformatiogrowth tra aconstruct

could we,),( polynomialeach for that suppose now

DT

yTyDyyPyTPDT

DxXP

x

xxxx

x

)()( yTyT y

Page 8: Extended Baum-Welch algorithm Present by shih-hung Liu 20060121.

NTNU Speech Lab. 8

Extended Baum-Welch algorithm (4/6)

• Step2:

[Gopalakrishnan 1989]

1,0 be domain Let

...1,...1, ein variabl tscoefficien real withpolynomial a be })({)(Let :

1

iq

jijij

iij

ij

xxD

qjpiXXPXPLemma

constant a is any at valuethesuch that and tscoefficien enonnegativonly has )()()( polynomail thesuch that polynomial aexist there)(

DxC(x)XCXPXP

C(X)a

)(for nsnsformatiogrowth tra ofset the with coincide )(for of nsnsformatiogrowth tra ofset the)(

XPXPDb

Page 9: Extended Baum-Welch algorithm Present by shih-hung Liu 20060121.

NTNU Speech Lab. 9

Extended Baum-Welch algorithm (5/6)

• Step3: finding a growth transformation for a polynomial with nonnegative coefficients can be reduce to the same problem for a homogeneous polynomial with nonnegative coefficients

[Gopalakrishnan 1989]

1 where...1,1...1, esin variabl})/({})({)( polynomial shomogeneou heconsider t

1

1,11,1

pilm

pijdplm

qqmplYYYPYYPYP

iij

q

jij qjpiyyD

i

...1 ,1...1 ,0 ,1:1

))(()( , any for such that and)1,1(),(for such that

}{ into }{ mapping , :bijection )),(()),((

ln

xfPxPDxpjiyx

yxDxxDDfDYPDYP

ijij

ij

1

Page 10: Extended Baum-Welch algorithm Present by shih-hung Liu 20060121.

NTNU Speech Lab. 10

Extended Baum-Welch algorithm (6/6)

• Baum-Eagon inequality:

[Gopalakrishnan 1989]

i allfor 0)(

1

iq

j ij

ijij x

xPx

iq

j ij

ijij

ij

ijij

ij

x

xPx

x

xPx

y

1

)(

)(

iq

j ij

ijij

ij

ijij

ijC

Cx

xPx

Cx

xPx

xT

1

)(

)(

))((

Page 11: Extended Baum-Welch algorithm Present by shih-hung Liu 20060121.

NTNU Speech Lab. 11

EBW for CDHMM – from discrete to continuous (1/3)

• Discrete case for emission probability update

codebook in the symbols ofnumber theis:

)( : ),(

)(),(

)(),()(for

such that 1

1

K

jkj

Ckbkj

CkbkjkbEBW

t

v

T

t

K

kjt

jtj

k

o

[ Normandin 1991 ]

Page 12: Extended Baum-Welch algorithm Present by shih-hung Liu 20060121.

NTNU Speech Lab. 12

kx

),|( jkxN

j jj

EBW for CDHMM – from discrete to continuous (2/3)[ Normandin 1991 ]

M subintervals Ik of width Mj /2

K

kjjk

jjkj

xN

xNkb

),|(

),|()(

1I 2I3I

Page 13: Extended Baum-Welch algorithm Present by shih-hung Liu 20060121.

NTNU Speech Lab. 13

EBW for CDHMM – from discrete to continuous (3/3)[ Normandin 1991 ]

2

1

222

1

01

2

1

01

2

0

1

1

1

1

01

1

01

0

),(

)(),(lim)(

),()(

)(),(lim))((lim

),(

),(

)(),(

)(),(lim

),()(

)(),(lim)(lim

jK

k

jjk

K

kK

kjkK

kj

jK

kjkjj

K

k

jk

K

kK

kK

kj

kjkK

kkK

kj

jK

kkjj

Ckj

Cxkjx

Ckjkb

Ckbkjxkb

Ckj

Cxkj

Ckbkj

xCkbxkjx

Ckjkb

Ckbkjxkb

K

kjt

jtj

Ckbkj

Ckbkjkb

1

)(),(

)(),()(

EBW

j

K

kkj

v

xkb

1

0)(lim

Page 14: Extended Baum-Welch algorithm Present by shih-hung Liu 20060121.

NTNU Speech Lab. 14

EBW for discrete HMMs (1/6)

• The Baum-Eagon inequality is formulated for the case where there are variables in a matrix containing rows with a sum-to-one constraint , and we are maximizing a sum of polynomial terms in with nonnegative coefficient

• For ML training, we can find an auxiliary function and optimize it

• Finding the maximum of the auxiliary function (e.g. using lagrangian multiplier) leads to the following update, which is a growth transformation for the polynomial:

[Povey 2004]

ijx X1 j ijx

ijx

Page 15: Extended Baum-Welch algorithm Present by shih-hung Liu 20060121.

NTNU Speech Lab. 15

EBW for discrete HMMs (2/6)

• The Baum-Welch update is an update procedure for HMMs which uses this growth transformation together with an algorithm known as the forward-backward algorithm for finding the relevant differentials efficiently

[Povey 2004]

kXXik

ik

XXijij

ij

x

Fx

x

Fx

x

Page 16: Extended Baum-Welch algorithm Present by shih-hung Liu 20060121.

NTNU Speech Lab. 16

EBW for discrete HMMs (3/6)

• An update rule as convenient and provable correct as the Baum-Welch update is not available for discriminative training of HMMs, which is a harder optimization problem

• The Extended Baum-Welch update equation as originally derived is applicable to rational function of parameters which are subject to sum-to-one constraints

• The MMI objective function for discrete-probability HMMs is an example of such a function

[Povey 2004]

)(

)|(log

Op

wOpFMMI

Page 17: Extended Baum-Welch algorithm Present by shih-hung Liu 20060121.

NTNU Speech Lab. 17

EBW for discrete HMMs (4/6)

Instead of maximizing for positive and ,we can instead maximize where and are the value of previous iteration ; increasing will cause to increase

this is because is a strong sense auxiliary function for around

2. If some terms in the resulting polynomial are negative, we can add to the expression a constant C times a further polynomial which is constrained to be a constant (e.g. ), so as to ensure that no product of terms in the final expression has a negative coefficient

[Povey 2004]

)(

)()(

xb

xaxf )(xa )(xb

)()()( xkbxaxg )(/)( xbxak x)(xg )(xf

x)(xg )(xf

1.

j iji xC

two essential points used to derive the EBW update for MMI

Page 18: Extended Baum-Welch algorithm Present by shih-hung Liu 20060121.

NTNU Speech Lab. 18

EBW for discrete HMMs (5/6)[Povey 2004]

kXXik

XXij

ij

x

F

x

F

x

)log(

)log( ijijij

ij

ij xx

F

x

x

x

F

1

)log(

)log(

)log(

By applying these two ideas :

k ij

XXik

ij

XXij

ij

xCx

F

xCx

F

x

)log(

)log(

Page 19: Extended Baum-Welch algorithm Present by shih-hung Liu 20060121.

NTNU Speech Lab. 19

EBW equivalent smooth function (6/6)

0

2)(22

2

1),(

022

2

1),(

check can We

function objective into

2)()2log(

2

1),(

function smootha adding as regarded be can

4

222

2

2

2222

jjjjsm

jjsm

jjjj

sm

DDDDg

DDg

DDDDg

EBW

[Povey 2004]

Page 20: Extended Baum-Welch algorithm Present by shih-hung Liu 20060121.

NTNU Speech Lab. 20

Example

• consider 1 0,, ),,(222

2

zyxzyxzyx

xzyxR

2. togo and 1by iindex iteration increment .5

),,(

),,(

),,(

4442 /),,(/),,(/),,(let

formula update using.4)()(),,(

tcoefficien nonegative with polynomial aobtain .3),,(2.

0iindex iteration 1 ,0,0,0such that ,, some fromstart .1

,

1

,

1

,

1

2

22222

000000000

iiiiiiiii zyx

i

zyx

i

zyx

i

iii

Dz

zyxPz

zDy

zyxPy

yDx

zyxPx

x

kyzkxzkxyxzzyxPzyzyxPyxzyxPxD

zyxkzyxkxzyxP

zyxRk

zyxzyxzyx

C

Page 21: Extended Baum-Welch algorithm Present by shih-hung Liu 20060121.

NTNU Speech Lab. 21

Example

Page 22: Extended Baum-Welch algorithm Present by shih-hung Liu 20060121.

NTNU Speech Lab. 22

Conclusion

• Presented an algorithm for maximization of certain rational function define over domain of probability values

• This algorithm is very useful in practical situation for training HMMs parameters

Page 23: Extended Baum-Welch algorithm Present by shih-hung Liu 20060121.

NTNU Speech Lab. 23

MPE: Final Auxiliary Function

)|(log)|(log

)(),( qOp

qOp

FH r

r

MPE

qrMPE

rlat

W

),),((log)(),( mmrrqm

MPErq

m

et

stqrMPE toNtg

q

qrlat

W

weak-sense auxiliary function

strong-sense auxiliary function

smoothing function involved

)()()(|)log(|2

),),((log)(),(

11

mmmmmT

mmmm

m

mmrrqm

MPErq

m

et

stqrMPE

trD

toNtgq

qrlat

W

weak-sense auxiliary function

Page 24: Extended Baum-Welch algorithm Present by shih-hung Liu 20060121.

NTNU Speech Lab. 24

EBW derived from auxiliary function

m m

mmm

msm Dg 1

)()2log(

2),(

2

22

)()()(|)log(|2

),( 11 mmmmmT

mmmm

m

sm trD

g

2

)(2

2),(

m

mmmsm

m

Dg

m m

mmm

m

mmrrqm

MPErq

m

et

stqrMPE

D

toNtgq

qrlat

1)(

)2log(2

),),((log)(),(

2

22

W

2

2))((

2

1),),(( m

mr to

m

mmr etoN

Page 25: Extended Baum-Welch algorithm Present by shih-hung Liu 20060121.

NTNU Speech Lab. 25

EBW derived from auxiliary function

1)(

)2log(2

),),((log)(),(2

22

m m

mmm

mmmr

rqm

MPErq

m

et

stqrMPE

DtoNtg

q

qrlat

W

2

2

22

))((2

2

1)(

))(()2log(

2

1)(),(

m

mrrqm

MPErq

et

stqr

m

mrm

rqm

MPErq

m

et

stqr

mMPE

m

tot

totg

q

qrlat

q

qrlat

W

W

mrqm

MPErq

et

stqr

mmrrqm

MPErq

et

stqr

m

m

mmm

m

mrrqm

MPErq

et

stqr

m

mmm

m

mrrqm

MPErq

et

stqr

MPEm

Dt

Dtot

Dto

t

Dtot

g

q

qrlat

q

qrlat

q

qrlat

q

qrlat

)(

)()(

0)())((

)(

0)(2

2

))((2

2

1)(

0),(

22

22

W

W

W

W

m m

mmmsm Dg

2

)(2

2),(