Top Banner
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 26– Recap HMM; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT Bombay 15 th March, 2011
53

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 26– Recap HMM; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT.

Dec 14, 2015

Download

Documents

Britney Carr
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 26– Recap HMM; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT.

CS460/626 : Natural Language Processing/Speech, NLP and the Web(Lecture 26– Recap HMM; Probabilistic

Parsing cntd)

Pushpak BhattacharyyaCSE Dept., IIT Bombay

15th March, 2011

Page 2: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 26– Recap HMM; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT.

Formal Definition of PCFG

A PCFG consists of A set of terminals {wk}, k = 1,….,V

{wk} = { child, teddy, bear, played…}

A set of non-terminals {Ni}, i = 1,…,n {Ni} = { NP, VP, DT…}

A designated start symbol N1

A set of rules {Ni j}, where j is a

sequence of terminals & non-terminals NP DT NN

A corresponding set of rule probabilities

Page 3: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 26– Recap HMM; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT.

Rule Probabilities Rule probabilities are such that

E.g., P( NP DT NN) = 0.2 P( NP NN) = 0.5 P( NP NP PP) = 0.3

P( NP DT NN) = 0.2 Means 20 % of the training data

parses use the rule NP DT NN

i

P(N ) 1i ji

Page 4: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 26– Recap HMM; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT.

Probabilistic Context Free Grammars

S NP VP 1.0 NP DT NN 0.5 NP NNS 0.3 NP NP PP 0.2 PP P NP 1.0 VP VP PP 0.6 VP VBD NP 0.4

DT the 1.0 NN gunman 0.5 NN building 0.5 VBD sprayed 1.0 NNS bullets 1.0

Page 5: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 26– Recap HMM; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT.

Example Parse t1`

The gunman sprayed the building with bullets. S1.0

NP0.5 VP0.6

DT1.0NN0.5

VBD1.0NP0.5

PP1.0

DT1.0 NN0.5

P1.0 NP0.3

NNS1.0

bullets

with

buildingthe

The gunman

sprayed

P (t1) = 1.0 * 0.5 * 1.0 * 0.5 * 0.6 * 0.4 * 1.0 * 0.5 * 1.0 * 0.5 * 1.0 * 1.0 * 0.3 * 1.0 = 0.00225VP0.4

Page 6: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 26– Recap HMM; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT.

Another Parse t2

S1.0

NP0.5 VP0.4

DT1.0NN0.5VBD1.0

NP0.5 PP1.0

DT1.0 NN0.5 P1.0 NP0.3

NNS1.

0bullets

withbuilding

the

Thegunman

sprayed

NP0.2

P (t2) = 1.0 * 0.5 * 1.0 * 0.5 * 0.4 * 1.0 * 0.2 * 0.5 * 1.0 * 0.5 * 1.0 * 1.0 * 0.3 * 1.0 = 0.0015

The gunman sprayed the building with bullets.

Page 7: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 26– Recap HMM; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT.

Probability of a sentence Notation :

wab – subsequence wa….wb

Nj dominates wa….wb

or yield(Nj) = wa….wbwa……………..wb

Nj

1

1 1

1

: ( )

( ) ( , )

= ( ) ( | )

( )m

m mt

mt

t yield t w

P w P w t

P t P w t

P t

Where t is a parse tree of the sentence

the..sweet..teddy ..bear

NP

• Probability of a sentence = P(w1m)

1( | ) = 1mP w tIf t is a parse tree for the sentence w1m, this will be 1 !!

Page 8: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 26– Recap HMM; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT.

Assumptions of the PCFG model Place invariance :

P(NP DT NN) is same in locations 1 and 2 Context-free :

P(NP DT NN | anything outside “The child”) = P(NP DT NN)

Ancestor free : At 2,P(NP DT NN|its ancestor is VP)

= P(NP DT NN)

S

NP

The child

VP

NP

The toy

1

2

Page 9: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 26– Recap HMM; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT.

Probability of a parse tree

Domination :We say Nj dominates from k to l, symbolized as , if Wk,l is derived from Nj

P (tree |sentence) = P (tree | S1,l ) where S1,l means that the start symbol S dominates the word sequence

W1,l

P (t |s) approximately equals joint probability of constituent non-terminals dominating the sentence fragments (next slide)

Page 10: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 26– Recap HMM; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT.

Probability of a parse tree (cont.)S1,l

NP1,2 VP3,l

N2V3,3 PP4,l

P4,4 NP5,lw2

w4

DT1

w1 w3

w5 wl

P ( t|s ) = P (t | S1,l )

= P ( NP1,2, DT1,1 , w1,

N2,2, w2,

VP3,l, V3,3 , w3,

PP4,l, P4,4 , w4, NP5,l, w5…l | S1,l )

= P ( NP1,2 , VP3,l | S1,l) * P ( DT1,1 , N2,2 | NP1,2) * D(w1 | DT1,1) * P (w2 | N2,2) * P (V3,3, PP4,l | VP3,l) * P(w3 | V3,3) * P( P4,4, NP5,l | PP4,l ) * P(w4|P4,4) * P (w5…l | NP5,l)

(Using Chain Rule, Context Freeness and Ancestor Freeness )

Page 11: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 26– Recap HMM; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT.

HMM ↔ PCFG

O observed sequence ↔ w1m sentence

X state sequence ↔ t parse tree

model ↔ G grammar

Three fundamental questions

Page 12: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 26– Recap HMM; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT.

HMM ↔ PCFG How likely is a certain observation given the

model? ↔ How likely is a sentence given the grammar?

How to choose a state sequence which best explains the observations? ↔ How to choose a parse which best supports the sentence?

arg max ( | , )X

P X O 1arg max ( | , )mt

P t w G↔

1( | )mP w G( | )P O ↔

Page 13: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 26– Recap HMM; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT.

HMM ↔ PCFG

How to choose the model parameters that best explain the observed data? ↔ How to choose rule probabilities which maximize the probabilities of the observed sentences?arg max ( | )P O

1arg max ( | )m

GP w G

Page 14: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 26– Recap HMM; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT.

Recap of HMM

Page 15: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 26– Recap HMM; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT.

HMM Definition

Set of states: S where |S|=N Start state S0 /*P(S0)=1*/ Output Alphabet: O where |O|=M Transition Probabilities: A= {aij} /*state i

to state j*/ Emission Probabilities : B= {bj(ok)}

/*prob. of emitting or absorbing ok from state j*/

Initial State Probabilities: Π={p1,p2,p3,…pN}

Each pi=P(o0=ε,Si|S0)

Page 16: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 26– Recap HMM; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT.

Markov Processes

Properties Limited Horizon: Given previous t

states, a state i, is independent of preceding 0 to t-k+1 states.

P(Xt=i|Xt-1, Xt-2 ,… X0) = P(Xt=i|Xt-1, Xt-2… Xt-

k) Order k Markov process

Time invariance: (shown for k=1) P(Xt=i|Xt-1=j) = P(X1=i|X0=j) …= P(Xn=i|

Xn-1=j)

Page 17: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 26– Recap HMM; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT.

Three basic problems (contd.)

Problem 1: Likelihood of a sequence Forward Procedure Backward Procedure

Problem 2: Best state sequence Viterbi Algorithm

Problem 3: Re-estimation Baum-Welch ( Forward-Backward

Algorithm )

Page 18: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 26– Recap HMM; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT.

Probabilistic Inference O: Observation Sequence S: State Sequence

Given O find S* where called Probabilistic Inference

Infer “Hidden” from “Observed” How is this inference different from logical

inference based on propositional or predicate calculus?

* arg max ( / )S

S p S O

Page 19: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 26– Recap HMM; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT.

Essentials of Hidden Markov Model

1. Markov + Naive Bayes

2. Uses both transition and observation probability

3. Effectively makes Hidden Markov Model a Finite State

Machine (FSM) with probability

1 1( ) ( / ) ( / )kOk k k k k kp S S p O S p S S

Page 20: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 26– Recap HMM; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT.

Probability of Observation Sequence

Without any restriction, Search space size= |S||O|

( ) ( , )

= ( ) ( / )S

S

p O p O S

p S p O S

Page 21: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 26– Recap HMM; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT.

Continuing with the Urn example

Urn 1# of Red = 30

# of Green = 50 # of Blue = 20

Urn 3# of Red =60

# of Green =10 # of Blue = 30

Urn 2# of Red = 10

# of Green = 40 # of Blue = 50

Colored Ball choosing

Page 22: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 26– Recap HMM; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT.

Example (contd.)

U1 U2 U3

U1 0.1 0.4 0.5

U2 0.6 0.2 0.2

U3 0.3 0.4 0.3

Given :

Observation : RRGGBRGR

What is the corresponding state sequence ?

and

R G B

U1 0.3 0.5 0.2

U2 0.1 0.4 0.5

U3 0.6 0.1 0.3

Transition Probability Observation/output Probability

Page 23: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 26– Recap HMM; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT.

Diagrammatic representation (1/2)

U1

U2

U3

0.1

0.2

0.4

0.6

0.4

0.5

0.3

0.2

0.3

R, 0.6

G, 0.1

B, 0.3

R, 0.1

B, 0.5

G, 0.4

B, 0.2

R, 0.3 G, 0.5

Page 24: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 26– Recap HMM; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT.

Diagrammatic representation (2/2)

U1

U2

U3

R,0.02G,0.08B,0.10

R,0.24G,0.04B,0.12

R,0.06G,0.24B,0.30

R, 0.08G, 0.20B, 0.12

R,0.15G,0.25B,0.10

R,0.18G,0.03B,0.09

R,0.18G,0.03B,0.09

R,0.02G,0.08B,0.10

R,0.03G,0.05B,0.02

Page 25: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 26– Recap HMM; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT.

Probabilistic FSM

(a1:0.3)

(a2:0.4)

(a1:0.2)

(a2:0.3)

(a1:0.1)

(a2:0.2)

(a1:0.3)

(a2:0.2)

The question here is:“what is the most likely state sequence given the output sequence seen”

S1 S2

Page 26: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 26– Recap HMM; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT.

Developing the treeStart

S1 S2

S1 S2 S1 S2

S1 S2 S1 S2

1.0 0.0

0.1 0.3 0.2 0.3

1*0.1=0.1 0.3 0.0 0.0

0.1*0.2=0.02 0.1*0.4=0.04 0.3*0.3=0.09 0.3*0.2=0.06

. .

. .

a1

a2

Choose the winning sequence per state per iteration

0.2 0.4 0.3 0.2

Page 27: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 26– Recap HMM; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT.

Tree structure contd…

S1 S2

S1 S2 S1 S2

0.1 0.3 0.2 0.3

0.027 0.012 . .

0.09 0.06

0.09*0.1=0.009 0.018

S1

0.3

0.0081

S2

0.2

0.0054

S2

0.4

0.0048

S1

0.2

0.0024

.

a1

a2

The problem being addressed by this tree is )|(maxarg* ,2121 aaaaSPSs

a1-a2-a1-a2 is the output sequence and μ the model or the machine

Page 28: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 26– Recap HMM; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT.

Viterbi Algorithm for the Urn problem (first two symbols)

S0

U1

U2

U3

0.50.3

0.2

U1

U2

U3

0.03

0.08

0.15

U1

U2

U3

U1

U2

U3

0.06

0.02

0.020.18

0.24

0.18

0.015

0.04 0.075* 0.018

0.006

0.006

0.036*

0.048*

0.036*: winner sequences

ε

R

Page 29: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 26– Recap HMM; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT.

Markov process of order>1 (say 2)

Same theory worksP(S).P(O|S)= P(O0|S0).P(S1|S0).

[P(O1|S1). P(S2|S1S0)].

[P(O2|S2). P(S3|S2S1)].

[P(O3|S3).P(S4|S3S2)].

[P(O4|S4).P(S5|S4S3)].

[P(O5|S5).P(S6|S5S4)].

[P(O6|S6).P(S7|S6S5)].

[P(O7|S7).P(S8|S7S6)].

[P(O8|S8).P(S9|S8S7)].

We introduce the states S0 and S9 as initial and final states respectively.

After S8 the next state is S9 with probability 1, i.e., P(S9|S8S7)=1

O0 is ε-transition

O0 O1 O2 O3 O4 O5 O6 O7 O8

Obs: ε R R G G B R G RState: S0 S1 S2 S3 S4 S5 S6 S7 S8

S9

Page 30: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 26– Recap HMM; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT.

Adjustments Transition probability table will have

tuples on rows and states on columns Output probability table will remain the

same In the Viterbi tree, the Markov process

will take effect from the 3rd input symbol (εRR)

There will be 27 leaves, out of which only 9 will remain

Sequences ending in same tuples will be compared Instead of U1, U2 and U3 U1U1, U1U2, U1U3, U2U1, U2U2,U2U3,

U3U1,U3U2,U3U3

Page 31: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 26– Recap HMM; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT.

Forward and Backward Probability Calculation

Page 32: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 26– Recap HMM; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT.

Forward probability F(k,i) Define F(k,i)= Probability of being

in state Si having seen o0o1o2…ok

F(k,i)=P(o0o1o2…ok , Si ) With m as the length of the

observed sequence P(observed

sequence)=P(o0o1o2..om)

=Σp=0,N P(o0o1o2..om , Sp)

=Σp=0,N F(m , p)

Page 33: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 26– Recap HMM; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT.

Forward probability (contd.)F(k , q)= P(o0o1o2..ok , Sq)

= P(o0o1o2..ok , Sq)

= P(o0o1o2..ok-1 , ok ,Sq)

= Σp=0,N P(o0o1o2..ok-1 , Sp , ok ,Sq)

= Σp=0,N P(o0o1o2..ok-1 , Sp ).

P(om ,Sq|o0o1o2..ok-1 , Sp)

= Σp=0,N F(k-1,p). P(ok ,Sq|Sp)

= Σp=0,N F(k-1,p). P(Sp Sq)

ok

O0 O1 O2 O3 … Ok Ok+1 … Om-1 Om

S0 S1 S2 S3 … Sp Sq … Sm

Sfinal

Page 34: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 26– Recap HMM; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT.

Backward probability B(k,i) Define B(k,i)= Probability of seeing

okok+1ok+2…om given that the state was Si

B(k,i)=P(okok+1ok+2…om \ Si ) With m as the length of the

observed sequence P(observed

sequence)=P(o0o1o2..om)

= P(o0o1o2..om| S0)

=B(0,0)

Page 35: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 26– Recap HMM; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT.

Backward probability (contd.)

B(k , p)= P(okok+1ok+2…om \ Sp)

= P(ok+1ok+2…om , ok |Sp)

= Σq=0,N P(ok+1ok+2…om , ok , Sq|Sp)

= Σq=0,N P(ok ,Sq|Sp) P(ok+1ok+2…om|ok ,Sq ,Sp )

= Σq=0,N P(ok+1ok+2…om|Sq ). P(ok , Sq|Sp)

= Σq=0,N B(k+1,q). P(Sp Sq)

ok

O0 O1 O2 O3 … Ok Ok+1 … Om-1 Om

S0 S1 S2 S3 … Sp Sq … Sm

Sfinal

Page 36: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 26– Recap HMM; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT.

Back to PCFG

Page 37: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 26– Recap HMM; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT.

Interesting Probabilities

The gunman sprayed the building with bullets 1 2 3 4 5 6 7

(4,5) NP

N1

NP

(4,5)NP

What is the probability of having a NP at this position such that it will derive “the building” ? -

What is the probability of starting from N1 and deriving “The gunman sprayed”, a NP and “with bullets” ? -

Inside Probabilities

Outside Probabilities

Page 38: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 26– Recap HMM; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT.

Interesting Probabilities Random variables to be considered

The non-terminal being expanded. E.g., NP

The word-span covered by the non-terminal. E.g., (4,5) refers to words “the building”

While calculating probabilities, consider: The rule to be used for expansion :

E.g., NP DT NN The probabilities associated with the RHS

non-terminals : E.g., DT subtree’s inside/outside probabilities & NN subtree’s inside/outside probabilities

Page 39: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 26– Recap HMM; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT.

Outside Probability j(p,q) : The probability of beginning with

N1 & generating the non-terminal Njpq

and all words outside wp..wq1( 1) ( 1)( , ) ( , , | )j

j p pq q mp q P w N w G

w1 ………wp-1 wp…wqwq+1 ……… wm

N1

Nj

Page 40: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 26– Recap HMM; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT.

Inside Probabilities

j(p,q) : The probability of generating the words wp..wq starting with the non-terminal Nj

pq.( , ) ( | , )jj pq pqp q P w N G

w1 ………wp-1 wp…wqwq+1 ……… wm

N1

Nj

Page 41: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 26– Recap HMM; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT.

Outside & Inside Probabilities: example

The gunman sprayed the building with bullets 1 2 3 4 5 6 7

4,5

(4,5) for "the building"

(The gunman sprayed, , with bullets | )

NP

P NP G

N1

NP

4,5(4,5) for "the building" (the building | , )NP P NP G

Page 42: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 26– Recap HMM; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT.

Calculating Inside probabilities j(p,q)

Base case: ( , ) ( | , ) ( | )j jj k kk kk kk k P w N G P N w G

Base case is used for rules which derive the words or terminals directly E.g., Suppose Nj = NN is being considered & NN building is one of the rules with probability 0.5

5,5

5,5

(5,5) ( | , )

( | ) 0.5

NN P building NN G

P NN building G

Page 43: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 26– Recap HMM; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT.

Induction Step: Assuming Grammar in Chomsky Normal Form

Induction step :

wp

Nj

Nr Ns

wdwd+

1

wq

1

,

( , ) ( | , )

( )*

( , )*

( 1, )

jj pq pq

qj r s

r s d p

r

s

p q P w N G

P N N N

p d

d q

Consider different splits of the words - indicated by d E.g., the huge building

Consider different non-terminals to be used in the rule: NP DT NN, NP DT NNS are available options Consider summation over all these.

Split here for d=2 d=3

Page 44: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 26– Recap HMM; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT.

The Bottom-Up Approach The idea of induction Consider “the gunman”

Base cases : Apply unary rulesDT the Prob = 1.0NN gunman Prob = 0.5

Induction : Prob that a NP covers these 2 words= P (NP DT NN) * P (DT deriving the word

“the”) * P (NN deriving the word “gunman”) = 0.5 * 1.0 * 0.5 = 0.25

The gunman

NP0.5

DT1.0 NN0.5

Page 45: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 26– Recap HMM; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT.

Parse Triangle

1 11 1 1 1 1( | ) ( | ) ( | , ) (1, )m m m mP w G P N w G P w N G m

• A parse triangle is constructed for calculating j(p,q)

• Probability of a sentence using j(p,q):

Page 46: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 26– Recap HMM; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT.

Parse TriangleThe

(1)gunman

(2)sprayed

(3)the

(4)building

(5) with

(6) bullets

(7)

1

2

3

4

5

6

7

• Fill diagonals with ( , )j k k

1.0DT

0.5NN

0.5NN

1.0NNS

1.0VBD

1.0DT

1.0P

Page 47: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 26– Recap HMM; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT.

Parse TriangleThe

(1)gunman

(2)sprayed

(3)the

(4)building

(5) with

(6) bullets

(7)

1

2

3

4

5

6

7

• Calculate using induction formula

1.0DT 0.5NN

0.5NN

1.0NNS

1.0VBD

1.0DT

1.0P

1,2(1,2) (the gunman | , )

( )* (1,1)* (2,2)

0.5*1.0*0.5 0.25

NP

DT NN

P NP G

P NP DT NN

0.25NP

Page 48: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 26– Recap HMM; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT.

Example Parse t1

S1.0

NP0.5 VP0.6

DT1.0NN0.5

VBD1.0NP0.5

PP1.0

DT1.0 NN0.5

P1.0 NP0.3

NNS1.

0

bullets

with

building

the

Thegunman

sprayed

VP0.4

Rule used here is

VP VP PP

The gunman sprayed the building with bullets.

Page 49: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 26– Recap HMM; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT.

Another Parse t2

S1.0

NP0.5 VP0.4

DT1.0NN0.5VBD1.0

NP0.5 PP1.0

DT1.0 NN0.5 P1.0 NP0.3

NNS1.

0

bullets

withbuilding

the

Thegunman

sprayed

NP0.2

Rule used here is

VP VBD NP

The gunman sprayed the building with bullets.

Page 50: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 26– Recap HMM; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT.

Parse TriangleThe (1) gunman

(2)sprayed

(3)the

(4)building

(5) with

(6) bullets (7)

1

2

3

4

5

6

7

1.0DT 0.5NN

0.5NN

1.0NNS

1.0VBD

1.0DT

1.0P

3,7(3,7) (sprayed the building with bullets | , )

( )* (3,5)* (6,7)

( )* (3,3)* (4,7)

0.6*1.0*0.3 0.4*1.0*0.015 0.186

VP

VP PP

VBD NP

P VP G

P VP VP PP

P VP VBD NP

0.25NP

0.25NP

0.3PP

1.0VP

0.015NP

0.186VP

0.0465S

Page 51: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 26– Recap HMM; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT.

Different Parses

Consider Different splitting points :

E.g., 5th and 3rd position Using different rules for VP

expansion : E.g., VP VP PP, VP VBD NP

Different parses for the VP “sprayed the building with bullets” can be constructed this way.

Page 52: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 26– Recap HMM; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT.

Outside Probabilities j(p,q)Base case:

Inductive step for calculating :

1(1, ) 1 for start symbol

(1, ) 0 for 1j

m

m j

wp

Nfpe

Njpq Ng

(q+1)

e

wqwq+

1

we

( , )f p e

wp-1w1 wmwe+1

N1

( 1, )g q e

( )f j gP N N N

( , )j p q

Summation over f, g & e

Page 53: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 26– Recap HMM; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT.

Probability of a Sentence

• Joint probability of a sentence w1m and that there is a constituent spanning words wp to wq is given as:

1 1( , | ) ( | , ) ( , ) ( , )jm pq m pq j j

j j

P w N G P w N G p q p q

The gunman sprayed the building with bullets 1 2 3 4 5 6 7

N1

NP

4,5

4,5

(The gunman....bullets, | )

(The gunman...bullets | , )

(4,5) (4,5)

(4,5) (4,5) ...

j

j

NP NP

VP VP

P N G

P N G