Top Banner
Probabilistic and Lexicalized Parsing CS 4705
37

Probabilistic and Lexicalized Parsing CS 4705. Probabilistic CFGs: PCFGs Weighted CFGs –Attach weights to rules of CFG –Compute weights of derivations.

Dec 15, 2015

Download

Documents

Devan Blackaby
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Probabilistic and Lexicalized Parsing CS 4705. Probabilistic CFGs: PCFGs Weighted CFGs –Attach weights to rules of CFG –Compute weights of derivations.

Probabilistic and Lexicalized Parsing

CS 4705

Page 2: Probabilistic and Lexicalized Parsing CS 4705. Probabilistic CFGs: PCFGs Weighted CFGs –Attach weights to rules of CFG –Compute weights of derivations.

Probabilistic CFGs: PCFGs

• Weighted CFGs– Attach weights to rules of CFG– Compute weights of derivations– Use weights to choose preferred parses

• Utility: Pruning and ordering the search space, disambiguate, Language Model for ASR

• Parsing with weighted grammars: find the parse T’ which maximizes the weights of the derivations in the parse tree for all the possible parses of S

• T’(S) = argmaxT∈τ(S) W(T,S)

• Probabilistic CFGs are one form of weighted CFGs

Page 3: Probabilistic and Lexicalized Parsing CS 4705. Probabilistic CFGs: PCFGs Weighted CFGs –Attach weights to rules of CFG –Compute weights of derivations.

Rule Probability

• Attach probabilities to grammar rules• Expansions for a given non-terminal sum to 1

R1: VP V .55

R2: VP V NP .40

R3: VP V NP NP .05• Estimate probabilities from annotated corpora

– E.g. Penn Treebank– P(R1)=counts(R1)/counts(VP)

Page 4: Probabilistic and Lexicalized Parsing CS 4705. Probabilistic CFGs: PCFGs Weighted CFGs –Attach weights to rules of CFG –Compute weights of derivations.

Derivation Probability

• For a derivation T= {R1…Rn}:

– Probability of the derivation: • Product of probabilities of rules expanded in tree

– Most likely probable parse: – Probability of a sentence:

• Sum over all possible derivations for the sentence

• Note the independence assumption: Parse probability does not change based on where the rule is expanded.

)(maxarg* TPTT

n

iiRPTP

1

)()(

T

STPSP )|()(

Page 5: Probabilistic and Lexicalized Parsing CS 4705. Probabilistic CFGs: PCFGs Weighted CFGs –Attach weights to rules of CFG –Compute weights of derivations.

One Approach: CYK Parser

• Bottom-up parsing via dynamic programming– Assign probabilities to constituents as they

are completed and placed in a table– Use the maximum probability for each

constituent type going up the tree to S• The Intuition:

– We know probabilities for constituents lower in the tree, so as we construct higher level constituents we don’t need to recompute these

Page 6: Probabilistic and Lexicalized Parsing CS 4705. Probabilistic CFGs: PCFGs Weighted CFGs –Attach weights to rules of CFG –Compute weights of derivations.

CYK (Cocke-Younger-Kasami) Parser

• Bottom-up parser with top-down filtering• Uses dynamic programming to store intermediate results

(cf. Earley algorithm for top-down case)• Input: PCFG in Chomsky Normal Form

– Rules of form Aw or ABC; no ε• Chart: array [i,j,A] to hold probability that non-terminal A

spans input i-j

– Start State(s): (i,i+1,A) for each Awi+1

– End State: (1,n,S) where n is the input size– Next State Rules: (i,k,B) (k,j,C) (i,j,A) if ABC

• Maintain back-pointers to recover the parse

Page 7: Probabilistic and Lexicalized Parsing CS 4705. Probabilistic CFGs: PCFGs Weighted CFGs –Attach weights to rules of CFG –Compute weights of derivations.

Structural Ambiguity

• S NP VP• VP V NP• NP NP PP• VP VP PP• PP P NP

• NP John | Mary | Denver• V -> called• P -> from

John called Mary from Denver

S

VP PP

NP VP

V NP NPP

John called Mary from Denver

S

NP

NP VP

V NP PP

PJohn called Mary

from Denver

NP

Page 8: Probabilistic and Lexicalized Parsing CS 4705. Probabilistic CFGs: PCFGs Weighted CFGs –Attach weights to rules of CFG –Compute weights of derivations.

Example

John called Mary from Denver

Page 9: Probabilistic and Lexicalized Parsing CS 4705. Probabilistic CFGs: PCFGs Weighted CFGs –Attach weights to rules of CFG –Compute weights of derivations.

Base Case: Aw

NP

P Denver

NP from

V Mary

NP called

John

Page 10: Probabilistic and Lexicalized Parsing CS 4705. Probabilistic CFGs: PCFGs Weighted CFGs –Attach weights to rules of CFG –Compute weights of derivations.

Recursive Cases: ABC

NP

P Denver

NP from

X V Mary

NP called

John

Page 11: Probabilistic and Lexicalized Parsing CS 4705. Probabilistic CFGs: PCFGs Weighted CFGs –Attach weights to rules of CFG –Compute weights of derivations.

NP

P Denver

VP NP from

X V Mary

NP called

John

Page 12: Probabilistic and Lexicalized Parsing CS 4705. Probabilistic CFGs: PCFGs Weighted CFGs –Attach weights to rules of CFG –Compute weights of derivations.

NP

X P Denver

VP NP from

X V Mary

NP called

John

Page 13: Probabilistic and Lexicalized Parsing CS 4705. Probabilistic CFGs: PCFGs Weighted CFGs –Attach weights to rules of CFG –Compute weights of derivations.

PP NP

X P Denver

VP NP from

X V Mary

NP called

John

Page 14: Probabilistic and Lexicalized Parsing CS 4705. Probabilistic CFGs: PCFGs Weighted CFGs –Attach weights to rules of CFG –Compute weights of derivations.

PP NP

X P Denver

S VP NP from

V Mary

NP called

John

Page 15: Probabilistic and Lexicalized Parsing CS 4705. Probabilistic CFGs: PCFGs Weighted CFGs –Attach weights to rules of CFG –Compute weights of derivations.

PP NP

X X P Denver

S VP NP from

X V Mary

NP called

John

Page 16: Probabilistic and Lexicalized Parsing CS 4705. Probabilistic CFGs: PCFGs Weighted CFGs –Attach weights to rules of CFG –Compute weights of derivations.

NP PP NP

X P Denver

S VP NP from

X V Mary

NP called

John

Page 17: Probabilistic and Lexicalized Parsing CS 4705. Probabilistic CFGs: PCFGs Weighted CFGs –Attach weights to rules of CFG –Compute weights of derivations.

NP PP NP

X X X P Denver

S VP NP from

X V Mary

NP called

John

Page 18: Probabilistic and Lexicalized Parsing CS 4705. Probabilistic CFGs: PCFGs Weighted CFGs –Attach weights to rules of CFG –Compute weights of derivations.

VP NP PP NP

X X X P Denver

S VP NP from

X V Mary

NP called

John

Page 19: Probabilistic and Lexicalized Parsing CS 4705. Probabilistic CFGs: PCFGs Weighted CFGs –Attach weights to rules of CFG –Compute weights of derivations.

VP NP PP NP

X X X P Denver

S VP NP from

X V Mary

NP called

John

Page 20: Probabilistic and Lexicalized Parsing CS 4705. Probabilistic CFGs: PCFGs Weighted CFGs –Attach weights to rules of CFG –Compute weights of derivations.

VP1

VP2

NP PP NP

X X X P Denver

S VP NP from

X V Mary

NP called

John

Page 21: Probabilistic and Lexicalized Parsing CS 4705. Probabilistic CFGs: PCFGs Weighted CFGs –Attach weights to rules of CFG –Compute weights of derivations.

S VP1

VP2

NP PP NP

X X X P Denver

S VP NP from

X V Mary

NP called

John

Page 22: Probabilistic and Lexicalized Parsing CS 4705. Probabilistic CFGs: PCFGs Weighted CFGs –Attach weights to rules of CFG –Compute weights of derivations.

S VP NP PP NP

X X X P Denver

S VP NP from

X V Mary

NP called

John

Page 23: Probabilistic and Lexicalized Parsing CS 4705. Probabilistic CFGs: PCFGs Weighted CFGs –Attach weights to rules of CFG –Compute weights of derivations.

Problems with PCFGs• Probability model just based on rules in the derivation.• Lexical insensitivity:

– Doesn’t use words in any real way– But structural disambiguation is lexically driven

• PP attachment often depends on the verb, its object, and the preposition

• I ate pickles with a fork. • I ate pickles with relish.

• Context insensitivity of the derivation– Doesn’t take into account where in the derivation a rule is used

• Pronouns more often subjects than objects • She hates Mary. • Mary hates her.

• Solution: Lexicalization– Add lexical information to each rule– I.e. Condition the rule probabilities on the actual words

Page 24: Probabilistic and Lexicalized Parsing CS 4705. Probabilistic CFGs: PCFGs Weighted CFGs –Attach weights to rules of CFG –Compute weights of derivations.

An example: Phrasal Heads

• Phrasal heads can ‘take the place of’ whole phrases, defining most important characteristics of the phrase

• Phrases generally identified by their heads– Head of an NP is a noun, of a VP is the main verb, of a

PP is preposition

• Each PFCG rule’s LHS shares a lexical item with a non-terminal in its RHS

Page 25: Probabilistic and Lexicalized Parsing CS 4705. Probabilistic CFGs: PCFGs Weighted CFGs –Attach weights to rules of CFG –Compute weights of derivations.

Increase in Size of Rule Set in Lexicalized CFG

• If R is the number of binary branching rules in CFG and ∑ is the lexicon, O(2*|∑|*|R|)

• For unary rules: O(|∑|*|R|)

Page 26: Probabilistic and Lexicalized Parsing CS 4705. Probabilistic CFGs: PCFGs Weighted CFGs –Attach weights to rules of CFG –Compute weights of derivations.

Example (correct parse)

Attribute grammar

Page 27: Probabilistic and Lexicalized Parsing CS 4705. Probabilistic CFGs: PCFGs Weighted CFGs –Attach weights to rules of CFG –Compute weights of derivations.

Example (less preferred)

Page 28: Probabilistic and Lexicalized Parsing CS 4705. Probabilistic CFGs: PCFGs Weighted CFGs –Attach weights to rules of CFG –Compute weights of derivations.

Computing Lexicalized Rule Probabilities

• We started with rule probabilities as before– VP V NP PP P(rule|VP)

• E.g., count of this rule divided by the number of VPs in a treebank

• Now we want lexicalized probabilities– VP(dumped) V(dumped) NP(sacks)

PP(into)• i.e., P(rule|VP ^ dumped is the verb ^ sacks is the

head of the NP ^ into is the head of the PP)

– Not likely to have significant counts in any treebank

Page 29: Probabilistic and Lexicalized Parsing CS 4705. Probabilistic CFGs: PCFGs Weighted CFGs –Attach weights to rules of CFG –Compute weights of derivations.

Exploit the Data You Have

• So, exploit the independence assumption and collect the statistics you can…

• Focus on capturing– Verb subcategorization

• Particular verbs have affinities for particular VPs

– Objects’ affinity for their predicates• Mostly their mothers and grandmothers• Some objects fit better with some predicates than

others

Page 30: Probabilistic and Lexicalized Parsing CS 4705. Probabilistic CFGs: PCFGs Weighted CFGs –Attach weights to rules of CFG –Compute weights of derivations.

Verb Subcategorization

• Condition particular VP rules on their heads– E.g. for a rule r VP -> V NP PP

• P(r|VP) becomes P(r ^ V=dumped | VP ^ dumped)

– How do you get the probability?• How many times was rule r used with dumped,

divided by the number of VPs that dumped appears in, in total

• How predictive of r is the verb dumped?

– Captures affinity between VP heads (verbs) and VP rules

Page 31: Probabilistic and Lexicalized Parsing CS 4705. Probabilistic CFGs: PCFGs Weighted CFGs –Attach weights to rules of CFG –Compute weights of derivations.

Example (correct parse)

Page 32: Probabilistic and Lexicalized Parsing CS 4705. Probabilistic CFGs: PCFGs Weighted CFGs –Attach weights to rules of CFG –Compute weights of derivations.

Example (less preferred)

Page 33: Probabilistic and Lexicalized Parsing CS 4705. Probabilistic CFGs: PCFGs Weighted CFGs –Attach weights to rules of CFG –Compute weights of derivations.

Affinity of Phrasal Heads for Other Heads: PP Attachment

• Verbs with preps vs. Nouns with preps• E.g. dumped with into vs. sacks with into

– How often is dumped the head of a VP which includes a PP daughter with into as its head relative to other PP heads or… what’s P(into|PP,dumped is mother VP’s head))

– Vs…how often is sacks the head of an NP with a PP daughter whose head is into relative to other PP heads or… P(into|PP,sacks is mother’s head))

Page 34: Probabilistic and Lexicalized Parsing CS 4705. Probabilistic CFGs: PCFGs Weighted CFGs –Attach weights to rules of CFG –Compute weights of derivations.

But Other Relationships do Not Involve Heads (Hindle & Rooth ’91)

• Affinity of gusto for eat is greater than for spaghetti; and affinity of marinara for spaghetti is greater than for ate

Vp (ate) Vp(ate)

Vp(ate) Pp(with)

Pp(with)

Np(spag)

npvvAte spaghetti with marinaraAte spaghetti with gusto

np

Page 35: Probabilistic and Lexicalized Parsing CS 4705. Probabilistic CFGs: PCFGs Weighted CFGs –Attach weights to rules of CFG –Compute weights of derivations.

Log-linear models for Parsing

• Why restrict to the conditioning to the elements of a rule?– Use even larger context…word sequence, word

types, sub-tree context etc.

• Compute P(y|x); where fi(x,y) tests properties of context and i is weight of feature

• Use as scores in CKY algorithm to find best parse

Yy

yxf

yxf

ii

ii

e

exyP

),(*

),(*

)|(

Page 36: Probabilistic and Lexicalized Parsing CS 4705. Probabilistic CFGs: PCFGs Weighted CFGs –Attach weights to rules of CFG –Compute weights of derivations.

Supertagging: Almost parsing

Poachers now control the underground trade

NP

N

poachers

N

NN

tradeS

NP

VP

V

NP

N

poachers

::

S

SAdv

now

VP

VPAdv

now

VP

AdvVP

now

::

S

S

VP

V

NP

control

S

NP

VP

V

NP

control

S

NP

VP

V

NP

control

S

NP

NPDet

the

NP

NP

N

trade

N

NN

poachers

S

NP

VP

V

NP

N

trade

N

NAdj

underground

S

NP

VP

V

NP

Adj

underground

S

NP

VP

V

NP

Adj

underground

S

NP

:

Page 37: Probabilistic and Lexicalized Parsing CS 4705. Probabilistic CFGs: PCFGs Weighted CFGs –Attach weights to rules of CFG –Compute weights of derivations.

Summary

• Parsing context-free grammars– Top-down and Bottom-up parsers– Mixed approaches (CKY, Earley parsers)

• Preferences over parses using probabilities– Parsing with PCFG and PCKY algorithms

• Enriching the probability model– Lexicalization– Log-linear models for parsing– Super-tagging