Natural Language Processing Spring 2007 V. “Juggy” Jagannathan.

Natural Language Processing

Spring 2007

V. “Juggy” Jagannathan

Foundations of Statistical Natural Language Processing

By

Christopher Manning & Hinrich Schutze

Course Book

Chapter 9

Markov Models

March 5, 2007

Markov models

• Markov assumption– Suppose X = (X1, …, XT) is a sequence of

random variables taking values in some finite set S = {s1,…,sN}, Markov properties are:

• Limited Horizon– P(Xt+1 = sk|X1,…,Xt) = P(Xt+1 = sk|Xt)

– i.e. the t+1 value only depends on t value

• Time invariant (stationary)• Stochastic Transition matrix A:

– aij = P(Xt+1 = sj|Xt=si) where

N

j ijij iajia1

,1&,,0

Markov model example

1

1

123121

112131211

11

)|()...|()|()(

),...,|()...,|()|()(),...,(

T

tXXX

TT

TTT

tta

XXPXXPXXPXP

XXXPXXXPXXPXPXXP

18.0

6.03.00.1

)|()|()(),,( 23121

iXpXPtXiXPtXPpitP

Probability: {lem,ice-t} giventhe machine starts in CP?

0.3x0.7x0.1+0.3x0.3x0.7=0.021+0.063 = 0.084

Hidden Markov Model Example

Why use HMMs?

• Underlying events generating surface observable events

• Eg. Predicting weather based on dampness of seaweeds• http://www.comp.leeds.ac.uk/roger/HiddenMarkovModels/

html_dev/main.html

• Linear Interpolation in n-gram models:

),|( 12 nnnli wwwP

),|()|()( 123312211 nnnnnn wwwPwwPwP

Look at Notes from David Meir Blei [UC Berkley]

http://www-nlp.stanford.edu/fsnlp/hmm-chap/blei-hmm-ch9.pptSlides 1-13

http://www-nlp.stanford.edu/fsnlp/hmm-chap/blei-hmm-ch9.ppt

(Observed states)

Forward Procedure

)|,...()( 11 iXooPt tti

Niii 1,)1(

N

iijoijij NjTtbattt

1

1,1,)()1(

N

ii TOP

1

)1()|(

Initialization:

Induction:

Total computation:

Forward Procedure

)|,...()( iXooPt tTti

NiTi 1,1)1(

N

jjijoiji NiTttbat

t1

1,1),1()(

N

iiiOP

1

)1()|(

Initialization:

Induction:

Total computation:

Backward Procedure

)()(

)||...()|,...(

)|,...()|,(

11

1

tt

iXooPiXooP

iXooPiXOP

ii

tTttt

tTt

N

iii TtttOP

1

11),()()|(

Combining both – forward and backward

Finding the best state sequence

11),(maxarg

)()(

)()(

)|(

)|,(

)(

1

1

TttX

tt

tt

OP

OiXP

t

iNi

t

N

j jj

ji

t

i

To determine the state sequence that best explains observationsLet:

Individually the most likely state is:

This approach, however, does not correctly estimate the most likely state sequence.

Finding the best state sequenceViterbi algorithm

)|(maxarg OXPX

)|,...,...(max)( 1111... 11

jXooXXPt tttxx

jt

Njjj 1,)1(

Store the most probable path that leads to a given node

Initialization

Induction

Njbatttijoiji

Nij

1,)(max)1(

1

Store Backtrace

Njbatttijoiji

Nij

1,)(maxarg)1(

1

)1(maxarg1

1

TX iNi

T

)1(max)(1

TXP iNi

Parameter Estimation


N

m mm

jijoiji

tt

ttt

tt

tbat

OP

OjXiXP

OjXiXPjip

t

1

1

1

)()(

)1()(

)|(

)|,,(

),|,(),(

Probability of traversing an arc at time t given observation sequence O:

T

tt

T

ti

Oinjtoistatefromstransitionofnumberectedjip

Oinistatefromstransitionofnumberectedt

1

1

__________exp),(

________exp)(


T

t t

Ttkot t

ijk

T

t i

T

t t

ij

jip

jip

jtoistatefromstransitionofnumberected

observedkwithjtoistatefromstransitionofnumberectedb

t

jip

istatefromstransitionofnumberected

jtoistatefromstransitionofnumberecteda

t

1

}1,:{

1

1

),(

),(

________exp

___________exp

)(

),(

______exp

________exp

Natural Language Processing Spring 2007 V. “Juggy” Jagannathan.

Documents

backward slide

forward procedure slide

backward procedure slide

parameter estimation

juggy jagannathan slide

time t

s j x t

likely state sequence