Top Banner
1 HMM (I) LING 570 Fei Xia Week 7: 11/5-11/7/07
49

1 HMM (I) LING 570 Fei Xia Week 7: 11/5-11/7/07. 2 HMM Definition and properties of HMM –Two types of HMM Three basic questions in HMM.

Dec 20, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 HMM (I) LING 570 Fei Xia Week 7: 11/5-11/7/07. 2 HMM Definition and properties of HMM –Two types of HMM Three basic questions in HMM.

1

HMM (I)

LING 570

Fei Xia

Week 7: 11/5-11/7/07

Page 2: 1 HMM (I) LING 570 Fei Xia Week 7: 11/5-11/7/07. 2 HMM Definition and properties of HMM –Two types of HMM Three basic questions in HMM.

2

HMM

• Definition and properties of HMM– Two types of HMM

• Three basic questions in HMM

Page 3: 1 HMM (I) LING 570 Fei Xia Week 7: 11/5-11/7/07. 2 HMM Definition and properties of HMM –Two types of HMM Three basic questions in HMM.

3

Definition of HMM

Page 4: 1 HMM (I) LING 570 Fei Xia Week 7: 11/5-11/7/07. 2 HMM Definition and properties of HMM –Two types of HMM Three basic questions in HMM.

4

Hidden Markov Models• There are n states s1, …, sn in an HMM, and the states are

connected.

• The output symbols are produced by the states or edges in HMM.

• An observation O=(o1, …, oT) is a sequence of output symbols.

• Given an observation, we want to recover the hidden state sequence.

• An example: POS tagging– States are POS tags– Output symbols are words– Given an observation (i.e., a sentence), we want to discover the tag

sequence.

Page 5: 1 HMM (I) LING 570 Fei Xia Week 7: 11/5-11/7/07. 2 HMM Definition and properties of HMM –Two types of HMM Three basic questions in HMM.

5

Same observation, different state sequences

V DTP N

time flies like an arrow

N N DTV N

time flies like an arrow

N

Page 6: 1 HMM (I) LING 570 Fei Xia Week 7: 11/5-11/7/07. 2 HMM Definition and properties of HMM –Two types of HMM Three basic questions in HMM.

6

Two types of HMMs

• State-emission HMM (Moore machine):– The output symbol is produced by states:

• By the from-state• By the to-state

• Arc-emission HMM (Mealy machine): – The output symbol is produce by the edges; i.e.,

by the (from-state, to-state) pairs.

Page 7: 1 HMM (I) LING 570 Fei Xia Week 7: 11/5-11/7/07. 2 HMM Definition and properties of HMM –Two types of HMM Three basic questions in HMM.

7

PFA recap

Page 8: 1 HMM (I) LING 570 Fei Xia Week 7: 11/5-11/7/07. 2 HMM Definition and properties of HMM –Two types of HMM Three basic questions in HMM.

8

Formal definition of PFA

A PFA is • Q: a finite set of N states• Σ: a finite set of input symbols• I: Q R+ (initial-state probabilities)• F: Q R+ (final-state probabilities)• : the transition relation

between states.• P: (transition probabilities)

),,,,,( PFIQ

QQ }){(

R

Page 9: 1 HMM (I) LING 570 Fei Xia Week 7: 11/5-11/7/07. 2 HMM Definition and properties of HMM –Two types of HMM Three basic questions in HMM.

9

Qq

qI 1)(

Qq

a

qaqPqFQq

'}{

1)',,()(

1,1

),()(

),,(*)(*)(),(

1,1,1,1

11

111,1,1

nqnnn

ii

n

iinnn

qwPwP

qwqpqFqIqwP

Constraints on function:

Probability of a string:

Page 10: 1 HMM (I) LING 570 Fei Xia Week 7: 11/5-11/7/07. 2 HMM Definition and properties of HMM –Two types of HMM Three basic questions in HMM.

10

An example of PFA

q0:0 q1:0.2

b:0.8

a:1.0

I(q0)=1.0I(q1)=0.0

P(abn)=I(q0)*P(q0,abn,q1)*F(q1) =1.0 * 1.0*0.8n *0.2

18.01

8.0*2.08.0*2.0)()(

0

00

n

n

n

n

x

abPxP

F(q0)=0F(q1)=0.2

Page 11: 1 HMM (I) LING 570 Fei Xia Week 7: 11/5-11/7/07. 2 HMM Definition and properties of HMM –Two types of HMM Three basic questions in HMM.

11

Arc-emission HMM

Page 12: 1 HMM (I) LING 570 Fei Xia Week 7: 11/5-11/7/07. 2 HMM Definition and properties of HMM –Two types of HMM Three basic questions in HMM.

12

Definition of arc-emission HMM

• A HMM is a tuple :

– A set of states S={s1, s2, …, sN}.

– A set of output symbols Σ={w1, …, wM}.

– Initial state probabilities

– Transition prob: A={aij}.

– Emission prob: B={bijk}

}{ i

),,,,( BAS

Page 13: 1 HMM (I) LING 570 Fei Xia Week 7: 11/5-11/7/07. 2 HMM Definition and properties of HMM –Two types of HMM Three basic questions in HMM.

13

Constraints in an arc-emission HMM

M

kijk

N

jij

N

ii

b

a

1

1

1

1

1

1For any integer n and any HMM

nO

HMMOP||

1)|(

Page 14: 1 HMM (I) LING 570 Fei Xia Week 7: 11/5-11/7/07. 2 HMM Definition and properties of HMM –Two types of HMM Three basic questions in HMM.

14

An example: HMM structure

s1 s2sN…

w1

w5

Same kinds of parameters but the emission probabilities depend on both states: P(wk | si, sj)

# of Parameters: O(N2M + N2).

w4

w3

w2

w1

w1

Page 15: 1 HMM (I) LING 570 Fei Xia Week 7: 11/5-11/7/07. 2 HMM Definition and properties of HMM –Two types of HMM Three basic questions in HMM.

15

A path in an arc emission HMM

o1 onX1 X2

Xn…o2 Xn+1

State sequence: X1,n+1

Output sequence: O1,n

1,1

),()(

),|()|()(),(

1,1,1,1

11

111,1,1

nXnnn

iiii

n

iinn

XOPOP

xxoPxxPxXOP

Page 16: 1 HMM (I) LING 570 Fei Xia Week 7: 11/5-11/7/07. 2 HMM Definition and properties of HMM –Two types of HMM Three basic questions in HMM.

16

PFA vs. Arc-emission HMMA PFA is • Q: a finite set of N states• Σ: a finite set of input symbols• I: Q R+ (initial-state probabilities)• F: Q R+ (final-state probabilities)• : the transition relation between states.• P: (transition probabilities)

A HMM is a tuple :– A set of states S={s1, s2, …, sN}.– A set of output symbols Σ={w1, …, wM}.– Initial state probabilities – Transition prob: A={aij}.– Emission prob: B={bijk}

}{ i

),,,,( BAS

),,,,,( PFIQ

QQ }){( R

Page 17: 1 HMM (I) LING 570 Fei Xia Week 7: 11/5-11/7/07. 2 HMM Definition and properties of HMM –Two types of HMM Three basic questions in HMM.

17

State-emission HMM

Page 18: 1 HMM (I) LING 570 Fei Xia Week 7: 11/5-11/7/07. 2 HMM Definition and properties of HMM –Two types of HMM Three basic questions in HMM.

18

Definition of state-emission HMM

• A HMM is a tuple :

– A set of states S={s1, s2, …, sN}.

– A set of output symbols Σ={w1, …, wM}.

– Initial state probabilities

– Transition prob: A={aij}.

– Emission prob: B={bjk}

• We use si and wk to refer to what is in an HMM structure.

• We use Xi and Oi to refer to what is in a particular HMM path and its output

}{ i

),,,,( BAS

Page 19: 1 HMM (I) LING 570 Fei Xia Week 7: 11/5-11/7/07. 2 HMM Definition and properties of HMM –Two types of HMM Three basic questions in HMM.

19

Constraints in a state-emission HMM

M

kik

N

jij

N

ii

b

a

1

1

1

1

1

1For any integer n and any HMM

nO

HMMOP||

1)|(

Page 20: 1 HMM (I) LING 570 Fei Xia Week 7: 11/5-11/7/07. 2 HMM Definition and properties of HMM –Two types of HMM Three basic questions in HMM.

20

An example: the HMM structure

Two kinds of parameters:• Transition probability: P(sj

| si)• Emission probability: P(wk | si) # of Parameters: O(NM+N2)

w1 w2 w1

s1 s2sN…

w5w3w1

Page 21: 1 HMM (I) LING 570 Fei Xia Week 7: 11/5-11/7/07. 2 HMM Definition and properties of HMM –Two types of HMM Three basic questions in HMM.

21

Output symbols are generated by the from-states

• State sequence: X1,n

• Output sequence: O1,n

nXnnn

ii

n

ii

n

iinn

XOPOP

xoPxxPxXOP

,1

),()(

))|(())|(()(),(

,1,1,1

1

1

111,1,1

o1on

X1 X2Xn…

o2

Page 22: 1 HMM (I) LING 570 Fei Xia Week 7: 11/5-11/7/07. 2 HMM Definition and properties of HMM –Two types of HMM Three basic questions in HMM.

22

Output symbols are generated by the to-states

• State sequence: X1,n+1

• Output sequence: O1,n

1,1

),()(

))|()|(()(),(

1,1,1,1

11

111,1,1

nXnnn

iii

n

iinn

XOPOP

xoPxxPxXOP

o1on

X2 X3Xn+1…

o2

X1

Page 23: 1 HMM (I) LING 570 Fei Xia Week 7: 11/5-11/7/07. 2 HMM Definition and properties of HMM –Two types of HMM Three basic questions in HMM.

23

A path in a state-emission HMM

o1on

X1 X2Xn…

o2

o1on

X2 X3Xn+1…

o2

X1

Output symbols are produced by the from-states:

Output symbols are produced by the to-states:

Page 24: 1 HMM (I) LING 570 Fei Xia Week 7: 11/5-11/7/07. 2 HMM Definition and properties of HMM –Two types of HMM Three basic questions in HMM.

24

Arc-emission vs. state-emission

o1on

X2 X3Xn+1…

o2

X1

o1 onX1 X2

Xn…o2 Xn+1

Page 25: 1 HMM (I) LING 570 Fei Xia Week 7: 11/5-11/7/07. 2 HMM Definition and properties of HMM –Two types of HMM Three basic questions in HMM.

25

Properties of HMM

• Markov assumption (Limited horizon):

• Stationary distribution (Time invariance): the probabilities do not change over time:

• The states are hidden because we know the structure of the machine (i.e., S and Σ), but we don’t know which state sequences generate a particular output.

)|(),...,|( 1211 tttt XXPXXXXP

)|()|( 11 mtmttt XXPXXP

Page 26: 1 HMM (I) LING 570 Fei Xia Week 7: 11/5-11/7/07. 2 HMM Definition and properties of HMM –Two types of HMM Three basic questions in HMM.

26

Are the two types of HMMs equivalent?

• For each state-emission HMM1, there is an arc-emission HMM2, such that for any sequence O, P(O|HMM1)=P(O|HMM2).

• The reverse is also true.

• How to prove that?

Page 27: 1 HMM (I) LING 570 Fei Xia Week 7: 11/5-11/7/07. 2 HMM Definition and properties of HMM –Two types of HMM Three basic questions in HMM.

27

Applications of HMM

• N-gram POS tagging– Bigram tagger: oi is a word, and si is a POS tag.

• Other tagging problems: – Word segmentation– Chunking– NE tagging– Punctuation predication– …

• Other applications: ASR, ….

Page 28: 1 HMM (I) LING 570 Fei Xia Week 7: 11/5-11/7/07. 2 HMM Definition and properties of HMM –Two types of HMM Three basic questions in HMM.

28

Three HMM questions

Page 29: 1 HMM (I) LING 570 Fei Xia Week 7: 11/5-11/7/07. 2 HMM Definition and properties of HMM –Two types of HMM Three basic questions in HMM.

29

Three fundamental questions for HMMs

• Training an HMM: given a set of observation sequences, learn its distribution, i.e. learn the transition and emission probabilities

• HMM as a parser: Finding the best state sequence for a given observation

• HMM as an LM: compute the probability of a given observation

Page 30: 1 HMM (I) LING 570 Fei Xia Week 7: 11/5-11/7/07. 2 HMM Definition and properties of HMM –Two types of HMM Three basic questions in HMM.

30

Training an HMM: estimating the probabilities

• Supervised learning:– The state sequences in the training data are

known– ML estimation

• Unsupervised learning:– The state sequences in the training data are

unknown– forward-backward algorithm

Page 31: 1 HMM (I) LING 570 Fei Xia Week 7: 11/5-11/7/07. 2 HMM Definition and properties of HMM –Two types of HMM Three basic questions in HMM.

31

HMM as a parser

Page 32: 1 HMM (I) LING 570 Fei Xia Week 7: 11/5-11/7/07. 2 HMM Definition and properties of HMM –Two types of HMM Three basic questions in HMM.

32

HMM as a parser: Finding the best state sequence

• Given the observation O1,T=o1…oT, find the state sequence X1,T+1=X1 … XT+1 that maximizes P(X1,T+1 | O1,T).

Viterbi algorithm

X1 X2XT…

o1 o2oT

XT+1

Page 33: 1 HMM (I) LING 570 Fei Xia Week 7: 11/5-11/7/07. 2 HMM Definition and properties of HMM –Two types of HMM Three basic questions in HMM.

33

“time flies like an arrow”\init BOS 1.0

\transition BOS N 0.5 BOS DT 0.4 BOS V 0.1

DT N 1.0

N N 0.2 N V 0.7 N P 0.1

V DT 0.4 V N 0.4 V P 0.1 V V 0.1

P DT 0.6 P N 0.4

\emission N time 0.1 V time 0.1 N flies 0.1 V flies 0.2 V like 0.2 P like 0.1 DT an 0.3 N arrow 0.1

Page 34: 1 HMM (I) LING 570 Fei Xia Week 7: 11/5-11/7/07. 2 HMM Definition and properties of HMM –Two types of HMM Three basic questions in HMM.

34

Finding all the paths: to build the trellis

time flies like an arrow

N N N N

V V V V

P P P P

DT DT DT DT

BOS

N

V

P

DT

Page 35: 1 HMM (I) LING 570 Fei Xia Week 7: 11/5-11/7/07. 2 HMM Definition and properties of HMM –Two types of HMM Three basic questions in HMM.

35

Finding all the paths (cont)

time flies like an arrow

N N N N

V V V V

P P P P

DT DT DT DT

BOS

N

V

P

DT

Page 36: 1 HMM (I) LING 570 Fei Xia Week 7: 11/5-11/7/07. 2 HMM Definition and properties of HMM –Two types of HMM Three basic questions in HMM.

36

Viterbi algorithm

The probability of the best path that produces O1,t-1 while ending up in state sj:

),,(max)( 1,11,11,1

jXOXPt tttX

def

jt

jj )1(

tijoijii

j batt )(max)1(

Initialization:

Induction:

Modify it to allow ²-emission

Page 37: 1 HMM (I) LING 570 Fei Xia Week 7: 11/5-11/7/07. 2 HMM Definition and properties of HMM –Two types of HMM Three basic questions in HMM.

37

Proof of the recursive function

tt

tt

t

ttt

tt

t

t

ijoijiiX

tttX

ijoijiX

ijoijtttXiX

ttttttttXiX

tttttX

tttX

j

bat

iXOXPba

baiXOXP

iXOXjXoPiXOXP

jXXOOXP

jXOXPt

)(max

)),,(max(max

),,(maxmax

),,|,(),,(maxmax

),,,,(max

),,(max)1(

1,11,1

1,11,1

1,11,111,11,1

11,11,1

1,1,1

1,1

1,1

1,1

,1

,1

Page 38: 1 HMM (I) LING 570 Fei Xia Week 7: 11/5-11/7/07. 2 HMM Definition and properties of HMM –Two types of HMM Three basic questions in HMM.

38

Viterbi algorithm: calculating ±j(t) # N is the number of states in the HMM structure# observ is the observation O, and leng is the length of

observ.

Initialize viterbi[0..leng] [0..N-1] to 0 for each state j viterbi[0] [j] = ¼[j] back-pointer[0] [j] = -1 # dummy

for (t=0; t<leng; t++) for (j=0; j<N; j++) k=observ[t] # the symbol at time t viterbi[t+1] [j] = maxi viterbi[t] [i] aij bjk

back-pointer[t+1] [j] = arg maxi viterbi[t] [i] aij bjk

Page 39: 1 HMM (I) LING 570 Fei Xia Week 7: 11/5-11/7/07. 2 HMM Definition and properties of HMM –Two types of HMM Three basic questions in HMM.

39

Viterbi algorithm: retrieving the best path

# find the best pathbest_final_state = arg maxj viterbi[leng] [j]

# start with the last state in the sequencej = best_final_state

push(arr, j);

for (t=leng; t>0; t--) i = back-pointer[t] [j] push(arr, i) j = i

return reverse(arr)

Page 40: 1 HMM (I) LING 570 Fei Xia Week 7: 11/5-11/7/07. 2 HMM Definition and properties of HMM –Two types of HMM Three basic questions in HMM.

40

Hw7 and Hw8

• Hw7: write an HMM “class”:– Read HMM input file– Output HMM

• Hw8: implement the algorithms for two HMM tasks:– HMM as parser: Viterbi algorithm– HMM as LM: the prob of an observation

Page 41: 1 HMM (I) LING 570 Fei Xia Week 7: 11/5-11/7/07. 2 HMM Definition and properties of HMM –Two types of HMM Three basic questions in HMM.

41

Implementation issue storing HMM

Approach #1: • ¼i: pi {state_str}• aij: a {from_state_str} {to_state_str}• bjk: b {state_str} {symbol}

Approach #2: • state2idx{state_str} = state_idx• symbol2idx{symbol_str} = symbol_idx

• ¼i: pi [state_idx] = prob• aij: a [from_state_idx] [to_state_idx] = prob• bjk: b [state_idx] [symbol_idx] = prob

• idx2state[state_idx] = state_str• Idx2symbol[symbol_idx] = symbol_str

Page 42: 1 HMM (I) LING 570 Fei Xia Week 7: 11/5-11/7/07. 2 HMM Definition and properties of HMM –Two types of HMM Three basic questions in HMM.

42

Storing HMM: sparse matrix

• aij: a [i] [j] = prob• bjk: b [j] [k] = prob

• aij: a[i] = “j1 p1 j2 p2 …”• aij: a[j] = “i1 p1 i2 p2 …”

• bjk: b[j] = “k1 p1 k2 p2 ….” • bjk: b[k] = “j1 p1 j2 p2 …”

Page 43: 1 HMM (I) LING 570 Fei Xia Week 7: 11/5-11/7/07. 2 HMM Definition and properties of HMM –Two types of HMM Three basic questions in HMM.

43

Other implementation issues

• Index starts from 0 in programming, but often starts from 1 in algorithms

• The sum of logprob is used in practice to replace the product of prob.

• Check constraints and print out warning if the constraints are not met.

Page 44: 1 HMM (I) LING 570 Fei Xia Week 7: 11/5-11/7/07. 2 HMM Definition and properties of HMM –Two types of HMM Three basic questions in HMM.

44

HMM as LM

Page 45: 1 HMM (I) LING 570 Fei Xia Week 7: 11/5-11/7/07. 2 HMM Definition and properties of HMM –Two types of HMM Three basic questions in HMM.

45

HMM as an LM: computing P(o1, …, oT)

1st try: - enumerate all possible paths - add the probabilities of all paths

Page 46: 1 HMM (I) LING 570 Fei Xia Week 7: 11/5-11/7/07. 2 HMM Definition and properties of HMM –Two types of HMM Three basic questions in HMM.

46

Forward probabilities

• Forward probability: the probability of producing O1,t-1 while ending up in state si:

),()( 1,1 iXOPt tt

def

i

)1()(1

TOPN

ii

Page 47: 1 HMM (I) LING 570 Fei Xia Week 7: 11/5-11/7/07. 2 HMM Definition and properties of HMM –Two types of HMM Three basic questions in HMM.

47

Calculating forward probability

tijoiji

i

ttj

bat

jXOPt

)(

),()1( 1,1

jj )1(Initialization:

Induction:

Page 48: 1 HMM (I) LING 570 Fei Xia Week 7: 11/5-11/7/07. 2 HMM Definition and properties of HMM –Two types of HMM Three basic questions in HMM.

48

tijoiji

i

ttttti

tttttti

tti

t

ttj

bat

iXjXoPiXOP

iXOjXoPiXOP

jXiXOP

jXOPt

)(

)|,(*),(

),|,(*),(

),,(

),()1(

11,1

1,111,1

1,1

1,1

Page 49: 1 HMM (I) LING 570 Fei Xia Week 7: 11/5-11/7/07. 2 HMM Definition and properties of HMM –Two types of HMM Three basic questions in HMM.

49

Summary

• Definition: hidden states, output symbols

• Properties: Markov assumption

• Applications: POS-tagging, etc.

• Three basic questions in HMM– Find the probability of an observation: forward probability– Find the best sequence: Viterbi algorithm– Estimate probability: MLE

• Bigram POS tagger: decoding with Viterbi algorithm