Statistical Parsing and CKY Algorithm Many slides from Ray Mooney and Michael Collins Instructor: Wei Xu Ohio State University
Statistical Parsing and CKY Algorithm
Many slides from Ray Mooney and Michael Collins
Instructor: Wei Xu Ohio State University
TA Office Hours for HW#2• Dreese 390: - 03/28 Tue 10:00AM-12:00 noon - 03/30 Thu 10:00AM-12:00 noon - 04/04 Tue 10:00AM-12:00 noon
• Readings: - textbook http://ciml.info/dl/v0_99/ciml-v0_99-ch17.pdf - slide #28,29.30: https://cocoxu.github.io/courses/
5525_slides_spring17/15_more_memm.pdf
Wuwei Lan
Syntactic Parsing
Syntax
Parsing• Given a string of terminals and a CFG, determine if the string
can be generated by the CFG: - also return a parse tree for the string - also return all possible parse trees for the string
• Must search space of derivations for one that derives the given string. - Top-Down Parsing - Bottom-Up Parsing
Simple CFG for ATIS English
S → NP VP S → Aux NP VP S → VP NP → Pronoun NP → Proper-Noun NP → Det Nominal Nominal → Noun Nominal → Nominal Noun Nominal → Nominal PP VP → Verb VP → Verb NP VP → VP PP PP → Prep NP
Det → the | a | that | this Noun → book | flight | meal | money Verb → book | include | prefer Pronoun → I | he | she | me Proper-Noun → Houston | NWA Aux → does Prep → from | to | on | near | through
Grammar Lexicon
S
VP
Verb NP
book Det Nominal
that Noun
flight
book that flight
Parsing Example
Top Down ParsingS
NP VP
Pronoun
• Start searching space of derivations for the start symbol.
S
NP VP
Pronoun
bookX
Top Down Parsing
S
NP VP
ProperNoun
Top Down Parsing
S
NP VP
ProperNoun
bookX
Top Down Parsing
S
NP VP
Det Nominal
Top Down Parsing
S
NP VP
Det Nominal
bookX
Top Down Parsing
S
Aux NP VP
Top Down Parsing
S
Aux NP VP
bookX
Top Down Parsing
S
VP
Top Down Parsing
S
VP
Verb
Top Down Parsing
S
VP
Verb
book
Top Down Parsing
S
VP
Verb
bookX
that
Top Down Parsing
S
VP
Verb NP
Top Down Parsing
S
VP
Verb NP
book
Top Down Parsing
S
VP
Verb NP
book Pronoun
Top Down Parsing
S
VP
Verb NP
book Pronoun
Xthat
Top Down Parsing
S
VP
Verb NP
book ProperNoun
Top Down Parsing
S
VP
Verb NP
book ProperNoun
Xthat
Top Down Parsing
S
VP
Verb NP
book Det Nominal
Top Down Parsing
S
VP
Verb NP
book Det Nominal
that
Top Down Parsing
S
VP
Verb NP
book Det Nominal
that Noun
Top Down Parsing
S
VP
Verb NP
book Det Nominal
that Noun
flight
Top Down Parsing
book that flight
• Start searching space of reverse derivations from the terminal symbols in the string.
Bottom Up Parsing
book that flight
Noun
Bottom Up Parsing
book that flight
Noun
Nominal
Bottom Up Parsing
book that flight
Noun
Nominal Noun
Nominal
Bottom Up Parsing
book that flight
Noun
Nominal Noun
Nominal
X
Bottom Up Parsing
book that flight
Noun
Nominal PP
Nominal
Bottom Up Parsing
book that flight
Noun Det
Nominal PP
Nominal
Bottom Up Parsing
book that flight
Noun Det
NP
Nominal
Nominal PP
Nominal
Bottom Up Parsing
book that
Noun Det
NP
Nominal
flight
Noun
Nominal PP
Nominal
Bottom Up Parsing
book that
Noun Det
NP
Nominal
flight
Noun
Nominal PP
Nominal
Bottom Up Parsing
book that
Noun Det
NP
Nominal
flight
Noun
S
VP
Nominal PP
Nominal
Bottom Up Parsing
book that
Noun Det
NP
Nominal
flight
Noun
S
VP
X
Nominal PP
Nominal
Bottom Up Parsing
book that
Noun Det
NP
Nominal
flight
Noun
Nominal PP
Nominal
X
Bottom Up Parsing
book that
Verb Det
NP
Nominal
flight
Noun
Bottom Up Parsing
book that
Verb
VP
Det
NP
Nominal
flight
Noun
Bottom Up Parsing
Det
book that
Verb
VP
S
NP
Nominal
flight
Noun
Bottom Up Parsing
Det
book that
Verb
VP
S
XNP
Nominal
flight
Noun
Bottom Up Parsing
book that
Verb
VP
VP
PP
Det
NP
Nominal
flight
Noun
Bottom Up Parsing
book that
Verb
VP
VP
PP
Det
NP
Nominal
flight
Noun
X
Bottom Up Parsing
book that
Verb
VP
Det
NP
Nominal
flight
Noun
NP
Bottom Up Parsing
book that
Verb
VP
Det
NP
Nominal
flight
Noun
Bottom Up Parsing
book that
Verb
VP
Det
NP
Nominal
flight
Noun
S
Bottom Up Parsing
Top Down vs. Bottom Up• Top down never explores options that will not lead to a full
parse, but can explore many options that never connect to the actual sentence. • Bottom up never explores options that do not connect to the
actual sentence but can explore options that can never lead to a full parse. • Relative amounts of wasted search depend on how much the
grammar branches in each direction.
CYK Algorithm
Syntax
Dynamic Programming Parsing• CKY (Cocke-Kasami-Younger) algorithm based on bottom-up
parsing and requires first normalizing the grammar. • First grammar must be converted to Chomsky normal form
(CNF) in which productions must have either exactly 2 non-terminal symbols on the RHS or 1 terminal symbol (lexicon rules). • Parse bottom-up storing phrases formed from all substrings
in a triangular table (chart).
Dynamic Programming• a general algorithm design technique for solving problems
defined by recurrences with overlapping subproblems • first invented by Richard Bellman in 1950s • “programming” here means “planning” or finding an optimal
program, as also seen in the term “linear programming” • Main idea: • setup a recurrence of smaller subproblems • solve subproblems once and record solutions in a table
(avoid any recalculation)
ATIS English Grammar Conversion
S → NP VP S → Aux NP VP
S → VP
NP → Pronoun NP → Proper-Noun NP → Det Nominal Nominal → Noun Nominal → Nominal Noun Nominal → Nominal PP VP → Verb VP → Verb NP VP → VP PP PP → Prep NP
Original Grammar Chomsky Normal FormS → NP VP S → X1 VP X1 → Aux NP S → book | include | prefer S → Verb NP S → VP PP NP → I | he | she | me NP → Houston | NWA NP → Det Nominal Nominal → book | flight | meal | money Nominal → Nominal Noun Nominal → Nominal PP VP → book | include | prefer VP → Verb NP VP → VP PP PP → Prep NP
Note that, although not shown here, original grammar contain all the lexical entires.
Exercise
CKY Parser Book the flight through Houston
i= 0
1
2
3
4
j= 1 2 3 4 5
Cell[i,j] contains all constituents (non-terminals) covering words i +1 through j
CKY Parser
i= 0
1
2
3
4
Cell[i,j] contains all constituents (non-terminals) covering words i +1 through j
Book the flight through Houstonj= 1 2 3 4 5
CKY Parser
S, VP, Verb, Nominal, Noun
Det
Nominal, Noun
None
NP
Book the flight through Houston
CKY Parser
S, VP, Verb, Nominal, Noun
Det
Nominal, Noun
None
NP
VP
Book the flight through Houston
CKY Parser
S, VP, Verb, Nominal, Noun
Det
Nominal, Noun
None
NP
VPS
Book the flight through Houston
CKY Parser
S, VP, Verb, Nominal, Noun
Det
Nominal, Noun
None
NP
VPS
Book the flight through Houston
CKY Parser
S, VP, Verb, Nominal, Noun
Det
Nominal, Noun
None
NP
VPS
Prep
None
None
None
Book the flight through Houston
CKY Parser
S, VP, Verb, Nominal, Noun
Det
Nominal, Noun
None
NP
VPS
Prep
None
None
None
NP ProperNoun
PP
Book the flight through Houston
CKY Parser
S, VP, Verb, Nominal, Noun
Det
Nominal, Noun
None
NP
VPS
Prep
None
None
None
NP ProperNoun
PP
Nominal
Book the flight through Houston
CKY Parser
S, VP, Verb, Nominal, Noun
Det
Nominal, Noun
None
NP
VPS
Prep
None
None
None
NP ProperNoun
PP
Nominal
NP
Book the flight through Houston
CKY Parser
S, VP, Verb, Nominal, Noun
Det
Nominal, Noun
None
NP
VPS
Prep
None
None
None
NP ProperNoun
PP
Nominal
NP
VP
Book the flight through Houston
CKY Parser
S, VP, Verb, Nominal, Noun
Det
Nominal, Noun
None
NP
VPS
Prep
None
None
None
NP ProperNoun
PP
Nominal
NP
SVP
Book the flight through Houston
CKY Parser
S, VP, Verb, Nominal, Noun
Det
Nominal, Noun
None
NP
VPS
Prep
None
None
None
NP ProperNoun
PP
Nominal
NP
VPSVP
Book the flight through Houston
CKY Parser
S, VP, Verb, Nominal, Noun
Det
Nominal, Noun
None
NP
VPS
Prep
None
None
None
NP ProperNoun
PP
Nominal
NP
VPSVP
S
Book the flight through Houston
CKY Parser
S, VP, Verb, Nominal, Noun
Det
Nominal, Noun
None
NP
VPS
Prep
None
None
None
NP ProperNoun
PP
Nominal
NP
VPSVP
S Parse Tree #1
Book the flight through Houston
CKY Parser
S, VP, Verb, Nominal, Noun
Det
Nominal, Noun
None
NP
VPS
Prep
None
None
None
NP ProperNoun
PP
Nominal
NP
VPSVP
S Parse Tree #2
Book the flight through Houston
The Problem with Parsing: Ambiguity
INPUT:She announced a program to promote safety in trucks and vans
+POSSIBLE OUTPUTS:
S
NP
She
VP
announced NP
NP
a program
VP
to promote NP
safety PP
in NP
trucks and vans
S
NP
She
VP
announced NP
NP
NP
a program
VP
to promote NP
safety PP
in NP
trucks
and NP
vans
S
NP
She
VP
announced NP
NP
a program
VP
to promote NP
NP
sa fety PP
in NP
trucks
and NP
va ns
S
NP
Sh e
VP
announced NP
NP
a program
VP
to promote NP
safety
PP
in NP
trucks and vans
S
NP
She
VP
announced NP
NP
NP
a program
VP
to promote NP
safe ty
PP
in NP
trucks
and NP
va ns
S
NP
She
VP
announced NP
NP
NP
a program
VP
to promote NP
safe ty
PP
in NP
trucks and va ns
And there are more...
Probabilistic Context Free Grammars (PCFG)
Syntax
split point
O(n3|N|3)
O(n2) for l, i choices
S(saw)
NP(man)
DT(the)
the
NN(man)
man
VP(saw)
VP(saw)
Vt(saw)
saw
NP(dog)
DT(the)
the
NN(dog)
dog
PP(with)
IN(with)
with
NP(telescope)
DT(the)
the
NN(telescope)
telescope
p(t) = q(S(saw) !2 NP(man) VP(saw))⇥q(NP(man) !2 DT(the) NN(man))⇥q(VP(saw) !1 VP(saw) PP(with))⇥q(VP(saw) !1 Vt(saw) NP(dog))⇥q(PP(with) !1 IN(with) NP(telescope))⇥ . . .
Parsing with Lexicalized CFGs
I The new form of grammar looks just like a Chomsky normalform CFG, but with potentially O(|⌃|2 ⇥ |N |3) possible rules.
I Naively, parsing an n word sentence using the dynamicprogramming algorithm will take O(n3|⌃|2|N |3) time. But|⌃| can be huge!!
I Crucial observation: at most O(n2 ⇥ |N |3) rules can beapplicable to a given sentence w1, w2, . . . wn of length n.This is because any rules which contain a lexical item that isnot one of w1 . . . wn, can be safely discarded.
I The result: we can parse in O(n5|N |3) time.
Dependency Parsing
Syntax