Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05
Dec 22, 2015
Probabilistic Parsing
Ling 571
Fei Xia
Week 5: 10/25-10/27/05
Outline
• Lexicalized CFG (Recap)
• Hw5 and Project 2
• Parsing evaluation measures: ParseVal
• Collin’s parser
• TAG
• Parsing summary
Lexicalized CFG recap
Important equations
),...|(),...,(
),...,()(
111
1
,...,11
2
ii
in
AAn
AAAPAAP
AAPAPn
Lexicalized CFG
• Lexicalized rules:
• Sparse data problem– First generate the head– Then generate the unlexicalized rule
)()...()()()...()(
)()....()(
1111
11
mmnn
nn
rRrRhHlLlLhA
wBwBwA
Lexicalized models
))(),(|(*)))((),(|)((
)),...,),(|(*)),...,|)((
),...,|)(,(
),...|(
),...,(),(
1
11111
111
111
1
iiiiiii
iiiiii
iii
i
ii
i
n
rhrlhsrPrmhrlhsrhP
lrlrrhrPlrlrrhP
lrlrrhrP
lrlrlrP
lrlrPSTP
An example
• he likes her
),|Pr(*),|(*
),|(*),|(*
),|Pr(*),|(*
),|(*),|(
),(
herNPonNPPlikesNPherP
likesVPVNPVPPlikesVPlikesP
heNPonNPPlikesNPheP
likesSNPVPSPSlikesP
STP
An example
• he likes her
),Pr|(Pr*),Pr|(*
),|(*),|(*
),Pr|(Pr*),Pr|(*
),|Pr(*),|(*
),|(*),|(*
),|Pr(*),|(*
),|(*),|(*
),|(*),|(
),(
heronheronPheronherP
likesVlikesVPlikesVlikesP
heonheonPheonheP
herNPonNPPlikesNPherP
likesVPVNPVPPlikesVPlikesP
heNPonNPPlikesNPheP
likesSNPVPSPlikesSlikesP
likesTopSTopPToplikesP
STP
Head-head probability
)...)(...)((
)....)(...)((
),(
),,(
),(
),,(
),|(
1
21
1
12
1
12
12
wAwXC
wAwXC
wAC
wAwC
wAP
wAwP
wAwP
w
)...)(...)((
)...)(...)((),|(
wNPlikesXC
heNPlikesXClikesNPheP
w
Head-rule probability
))((
))((
))((
))((
))((
))((
),(
),,(
),|(
wAC
wAC
wAC
wAC
wAP
wAP
wAP
wAAP
wAAP
))((
)Pr)((),|Pr(
heNPC
onheNPCheNPonNPP
Estimate parameters
))((
))((),|(
)...)(...)((
)....)(...)((),|(
1
2112
wAC
wACwAAP
wAwXC
wAwXCwAwP
w
Building a statistical tool
• Design a model:– Objective function: generative model vs.
discriminative model– Decomposition: independence assumption– The types of parameters and parameter size
• Training: estimate model parameters– Supervised vs. unsupervised– Smoothing methods
• Decoding:
Team Project 1 (Hw5)
• Form a team: program language, schedule, expertise, etc.
• Understand the lexicalized model
• Design the training algorithm
• Work out the decoding (parsing) algorithm: augment CYK algorithm.
• Illustrate the algorithms with a real example.
Team Project 2
• Task: parse real data with a real grammar extracted from a treebank.
• Parser: PCFG or lexicalized PCFG
• Training data: English Penn Treebank Section 02-21
• Development data: section 00
Team Project 2 (cont)
• Hw6: extract PCFG from the treebank
• Hw7: make sure your parser works given real grammar and real sentences; measure parsing performance
• Hw8: improve parsing results
• Hw10: write a report and give a presentation
Parsing evaluation measures
Evaluation of parsers: ParseVal
• Labeled recall: • Labeled precision: • Labeled F-measure:
• Complete match: % of sents where recall and precision are 100%
• Average crossing: # of crossing per sent• No crossing: % of sents which have no crossing.
An example
Gold standard:
(VP (V saw)
(NP (Det the) (N man))
(PP (P with) (NP (Det a) (N telescope))))
Parser output:
(VP (V saw)
(NP (NP (Det the) (N man))
(PP (P with) (NP (Det a) (N telescope)))))
ParseVal measures
• Gold standard: (VP, 1, 6), (NP, 2, 3), (PP, 4, 6), (NP, 5, 6)
• System output: (VP, 1, 6), (NP, 2, 6), (NP, 2, 3), (PP, 4,
6), (NP, 5, 6)
• Recall=4/4, Prec=4/5, crossing=0
A different annotation
Gold standard: (VP (V saw) (NP (Det the) (N’ (N man)) (PP (P with) (NP (Det a) (N’ (N telescope)))))
Parser output: (VP (V saw) (NP (Det the) (N’ (N man) (PP (P with) (NP (Det a) (N’ (N telescope)))))))
ParseVal measures (cont)
• Gold standard: (VP, 1, 6), (NP, 2, 3), (N’, 3, 3), (PP, 4, 6), (NP, 5, 6), (N’, 6,6)
• System output: (VP, 1, 6), (NP, 2, 6), (N’, 3, 6), (PP, 4, 6), (NP, 5, 6), (N’, 6, 6)
• Recall=4/6, Prec=4/6, crossing=1
EVALB
• A tool that calculates ParseVal measures
• To run it:evalb –p parameter_file gold_file system_output
• A copy is available in my dropbox
• You will need it for Team Project 2
Summary of Parsing evaluation measures
• ParseVal is the widely used: F-measure is the most important
• The results depend on annotation style
• EVALB is a tool that calculates ParseVal measures
• Other measures are used too: e.g., accuracy of dependency links
History-based models
History-based models
• History-based approaches maps (T, S) into a decision sequence
• Probability of tree T for sentence S is:
)),....,(|(
),...,|(
),....,(),(
11
11
1
iii
iii
n
ddfdP
dddP
ddPSTP
ndd ,....1
History-based models (cont)
• PCFGs can be viewed as a history-based model
• There are other history-based models– Magerman’s parser (1995)– Collin’s parsers (1996, 1997, ….)– Charniak’s parsers (1996,1997,….)– Ratnaparkhi’s parser (1997)
Collins’ models
• Model 1: Generative model of (Collins, 1996)
• Model 2: Add complement/adjunct distinction
• Model 3: Add wh-movement
Model 1
• First generate the head constituent label• Then generate left and right dependents
),,|)()...((*
),,|)()...((*
),|(
))()...(,,,|)()...((*
),,|)()...((*
),|(
),|)()...()()()...()((
11
11
1111
11
1111
hHArRrRP
hHAlLlLP
hAHP
lLlLhHArRrRP
hHAlLlLP
hAHP
hArRrRhHlLlLhAP
mm
nn
H
nnmm
nn
H
mmnn
Model 1(cont)
),,|)((
))()....(,,,|)((
),,|)()...((
),,|)((
))()....(,,,|)((
),,|)()...((
1111
11
1111
11
hHArRP
rRrRhHArRP
hHArRrRP
hHAlLP
lLlLhHAlLP
hHAlLlLP
iiL
iiiiL
mm
iiL
iiiiL
nn
An example
),,|(
*),,|(
*),,|)((
*),,|)((
*),|(
),|(
boughtVPSSTOPP
boughtVPSSTOPP
boughtVPSweekNPP
boughtVPSmarksNPP
boughtSVPP
boughtSruleP
R
L
L
L
H
)()()()(: boughtVPMarksNPweekNPboughtSrule
Sentence: Last week Marks bought Brooks.
Model 2
• Generate a head label H
• Choose left and right subcat frames
• Generate left and right arguments
• Generate left and right modifiers
An example
{}),,,|(
*{}),,,|(
*{}),,,|)((
*}){,,,|)((
*),,|({}*),,|}({
*),|(
),|(
boughtVPSSTOPP
boughtVPSSTOPP
boughtVPSweekNPP
NPboughtVPSmarksNPP
boughtVPSPboughtVPSNPP
boughtSVPP
boughtSruleP
R
L
L
ccL
rcclc
H
)()()()(: boughtVPMarksNPweekNPboughtSrule c
Model 3
• Add Trace and wh-movement
• Given that the LHS of a rule has a gap, there are three ways to pass down the gap– Head: S(+gap)NP VP(+gap)– Left: S(+gap)NP(+gap) VP– Right: SBAR(that)(+gap)WHNP(that)
S(+gap)
Parsing results
LR LP
Model 1 87.4% 88.1%
Model 2 88.1% 88.6%
Model 3 88.1% 88.6%
Tree Adjoining Grammar (TAG)
TAG
• TAG basics:
• Extension of LTAG– Lexicalized TAG (LTAG)– Synchronous TAG (STAG)– Multi-component TAG (MCTAG)– ….
TAG basics
• A tree-rewriting formalism (Joshi et. al, 1975)
• It can generate mildly context-sensitive languages.
• The primitive elements of a TAG are elementary trees.
• Elementary trees are combined by two operations: substitution and adjoining.
• TAG has been used in – parsing, semantics, discourse, etc.– Machine translation, summarization, generation, etc.
Two types of elementary trees
VP
ADVP
ADV
still
VP*
Initial tree: Auxiliary tree:
S
NP VP
V NP
draft
Substitution operation
They draft policies
Adjoining operation
Y
Y*Y*
They still draft policies
Derivation tree
Elementary trees
Derived tree
Derivation tree
Derived tree vs. derivation tree
• The mapping is not 1-to-1.
• Finding the best derivation is not the same as finding the best derived tree.
S
V
do
S*
they
PN
NP
Wh-movement
What do they draft ?
i
S
iNP S
NP VP
V NP
draft
N
what do
PN
they
i
i
S
NP S
V S
NP VP
V NP
draft
what
NP
N
What does John think they draft ?
S
V
does
S*
S
NP VP
V S*
think
Long-distance wh-movement
S
SNP
NP VP
V NP
draft i
i
does
think
i
i
S
NP S
V S
NP VP
S
NP VP
V
draft
NP
what
John
they
Who did you have dinner with?
have
S
NP VP
NPV
S
NPS*
PN
who
iPP
P NP
with
VP
VP*
i
S
NP
PN
who PP
P NP
with
VP
VP
have
S
NP
NPV
i
i
TAG extension
• Lexicalized TAG (LTAG)
• Synchronized TAG (STAG)
• Multi-component TAG (MCTAG)
• ….
STAG
• The primitive elements in STAG are elementary tree pairs.
• Used for MT
Summary of TAG
• A formalism beyond CFG• Primitive elements are trees, not rules• Extended domain of locality• Two operations: substitution and adjoining
• Parsing algorithm: • Statistical parser for TAG• Algorithms for extracting TAG from treebanks.
)( 6nO
Parsing summary
Types of parsers
• Phrase structure vs. dependency tree• Statistical vs. rule-based• Grammar-based or not• Supervised vs. unsupervised
Our focus:Phrase structureMainly statisticalMainly Grammar-based: CFG, TAGSupervised
Grammars
• Chomsky hierarchy:– Unstricted grammar (type 0)– Context-sensitive grammar – Context-free grammar– Regular grammarHuman languages are beyond context-free
• Other formalism– HPSG, LFG– TAG– Dependency grammars
Parsing algorithm for CFG
• Top-down
• Bottom-up
• Top-down with bottom-up filter
• Earley algorithm
• CYK algorithm– Requiring CFG to be in CNF– Can be augmented to deal with PCFG,
lexicalized CFG, etc.
Extensions of CFG
• PCFG: find the most likely parse trees
• Lexicalized CFG: – use less strong independence assumption– Account for certain types of lexical and
structural dependency
Beyond CFG
• History-based models– Collins’ parsers
• TAG– Tree-writing– Mildly context-sensitive grammar– Many extensions: LTAG, STAG, …
Statistical approach
• Modeling– Choose the objective function– Decompose the function:
• Common equations: joint, conditional, marginal probabilities• Independency assumptions
• Training: – Supervised vs. unsupervised– Smoothing
• Decoding– Dynamic programming– Pruning
Evaluation of parsers
• Accuracy: ParseVal
• Robustness
• Resources needed
• Efficiency
• Richness
Other things
• Converting into CNF:– CFG– PCFG– Lexicalized CFG
• Treebank annotation– Tagset: syntactic labels, POS tag, function
tag, empty categories– Format: indentation, brackets