Top Banner
Natural Language Processing Spring 2007 V. “Juggy” Jagannathan
19

Natural Language Processing Spring 2007 V. “Juggy” Jagannathan.

Dec 18, 2015

Download

Documents

Cecilia Mosley
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Natural Language Processing Spring 2007 V. “Juggy” Jagannathan.

Natural Language Processing

Spring 2007

V. “Juggy” Jagannathan

Page 2: Natural Language Processing Spring 2007 V. “Juggy” Jagannathan.

Foundations of Statistical Natural Language Processing

By

Christopher Manning & Hinrich Schutze

Course Book

Page 3: Natural Language Processing Spring 2007 V. “Juggy” Jagannathan.

Chapter 9

Markov Models

March 5, 2007

Page 4: Natural Language Processing Spring 2007 V. “Juggy” Jagannathan.

Markov models

• Markov assumption– Suppose X = (X1, …, XT) is a sequence of

random variables taking values in some finite set S = {s1,…,sN}, Markov properties are:

• Limited Horizon– P(Xt+1 = sk|X1,…,Xt) = P(Xt+1 = sk|Xt)

– i.e. the t+1 value only depends on t value

• Time invariant (stationary)• Stochastic Transition matrix A:

– aij = P(Xt+1 = sj|Xt=si) where

N

j ijij iajia1

,1&,,0

Page 5: Natural Language Processing Spring 2007 V. “Juggy” Jagannathan.

Markov model example

1

1

123121

112131211

11

)|()...|()|()(

),...,|()...,|()|()(),...,(

T

tXXX

TT

TTT

tta

XXPXXPXXPXP

XXXPXXXPXXPXPXXP

18.0

6.03.00.1

)|()|()(),,( 23121

iXpXPtXiXPtXPpitP

Page 6: Natural Language Processing Spring 2007 V. “Juggy” Jagannathan.

Probability: {lem,ice-t} giventhe machine starts in CP?

0.3x0.7x0.1+0.3x0.3x0.7=0.021+0.063 = 0.084

Hidden Markov Model Example

Page 7: Natural Language Processing Spring 2007 V. “Juggy” Jagannathan.

Why use HMMs?

• Underlying events generating surface observable events

• Eg. Predicting weather based on dampness of seaweeds• http://www.comp.leeds.ac.uk/roger/HiddenMarkovModels/

html_dev/main.html

• Linear Interpolation in n-gram models:

),|( 12 nnnli wwwP

),|()|()( 123312211 nnnnnn wwwPwwPwP

Page 8: Natural Language Processing Spring 2007 V. “Juggy” Jagannathan.
Page 9: Natural Language Processing Spring 2007 V. “Juggy” Jagannathan.

Look at Notes from David Meir Blei [UC Berkley]

http://www-nlp.stanford.edu/fsnlp/hmm-chap/blei-hmm-ch9.pptSlides 1-13

Page 10: Natural Language Processing Spring 2007 V. “Juggy” Jagannathan.

(Observed states)

Page 11: Natural Language Processing Spring 2007 V. “Juggy” Jagannathan.

Forward Procedure

Page 12: Natural Language Processing Spring 2007 V. “Juggy” Jagannathan.

)|,...()( 11 iXooPt tti

Niii 1,)1(

N

iijoijij NjTtbattt

1

1,1,)()1(

N

ii TOP

1

)1()|(

Initialization:

Induction:

Total computation:

Forward Procedure

Page 13: Natural Language Processing Spring 2007 V. “Juggy” Jagannathan.

)|,...()( iXooPt tTti

NiTi 1,1)1(

N

jjijoiji NiTttbat

t1

1,1),1()(

N

iiiOP

1

)1()|(

Initialization:

Induction:

Total computation:

Backward Procedure

Page 14: Natural Language Processing Spring 2007 V. “Juggy” Jagannathan.

)()(

)||...()|,...(

)|,...()|,(

11

1

tt

iXooPiXooP

iXooPiXOP

ii

tTttt

tTt

N

iii TtttOP

1

11),()()|(

Combining both – forward and backward

Page 15: Natural Language Processing Spring 2007 V. “Juggy” Jagannathan.

Finding the best state sequence

11),(maxarg

)()(

)()(

)|(

)|,(

)(

1

1

TttX

tt

tt

OP

OiXP

t

iNi

t

N

j jj

ji

t

i

To determine the state sequence that best explains observationsLet:

Individually the most likely state is:

This approach, however, does not correctly estimate the most likely state sequence.

Page 16: Natural Language Processing Spring 2007 V. “Juggy” Jagannathan.

Finding the best state sequenceViterbi algorithm

)|(maxarg OXPX

)|,...,...(max)( 1111... 11

jXooXXPt tttxx

jt

Njjj 1,)1(

Store the most probable path that leads to a given node

Initialization

Induction

Njbatttijoiji

Nij

1,)(max)1(

1

Store Backtrace

Njbatttijoiji

Nij

1,)(maxarg)1(

1

)1(maxarg1

1

TX iNi

T

)1(max)(1

TXP iNi

Page 17: Natural Language Processing Spring 2007 V. “Juggy” Jagannathan.

Parameter Estimation

Page 18: Natural Language Processing Spring 2007 V. “Juggy” Jagannathan.

Parameter Estimation

N

m mm

jijoiji

tt

ttt

tt

tbat

OP

OjXiXP

OjXiXPjip

t

1

1

1

)()(

)1()(

)|(

)|,,(

),|,(),(

Probability of traversing an arc at time t given observation sequence O:

T

tt

T

ti

Oinjtoistatefromstransitionofnumberectedjip

Oinistatefromstransitionofnumberectedt

1

1

__________exp),(

________exp)(

Page 19: Natural Language Processing Spring 2007 V. “Juggy” Jagannathan.

Parameter Estimation

T

t t

Ttkot t

ijk

T

t i

T

t t

ij

jip

jip

jtoistatefromstransitionofnumberected

observedkwithjtoistatefromstransitionofnumberectedb

t

jip

istatefromstransitionofnumberected

jtoistatefromstransitionofnumberecteda

t

1

}1,:{

1

1

),(

),(

________exp

___________exp

)(

),(

______exp

________exp