Page 1
Discontinuous Parsing with an Efficientand Accurate DOP Model
Andreas van Cranenburgh Rens Bod
Huygens ING Institute for Logic, Language and ComputationRoyal Netherlands Academy of Arts and Sciences University of Amsterdam
November 27, 2013
IWPT 2013, Nara, Japan
Page 2
This talk
Parsing with . . .I discontinuous constituents:
Linear Context-Free Rewriting Systems (LCFRS)I treebank fragments:
Data-Oriented Parsing (DOP)Tree-Substitution Grammar (TSG)
Page 3
Discontinuous constituents
Example:I Why did the chicken cross the road?I The chicken crossed the road to get to the other side.
Page 4
Discontinuous treesROOT
SBARQ
SQ
VP
WHADVP
WRBWhy
VBcross
NP
DTthe
NNroad
VBDdid
NP
DTthe
NNchicken
.?
Figure : A discontinuous tree not found in the Penn treebank.
Page 5
Discontinuous constituents
Motivation:I Flexible word-orderI Capture argument structureI Combine information from
constituency & dependency structuresI Information is available in treebanks
(German, Dutch, English after conversion).
Page 6
Discontinuous treesROOT
SBARQ
SQ
VP
WHADVP
WRBWhy
VBcross
NP
DTthe
NNroad
VBDdid
NP
DTthe
NNchicken
.?
Figure : A discontinuous tree not found in the Penn treebank.
Context-Free Grammar (CFG)NP(ab)→ DT(a) NN(b)
Page 7
Discontinuous treesROOT
SBARQ
SQ
VP
WHADVP
WRBWhy
VBcross
NP
DTthe
NNroad
VBDdid
NP
DTthe
NNchicken
.?
Figure : A discontinuous tree not found in the Penn treebank.
Linear Context-Free Rewriting System (LCFRS)VP2(a,bc)→WHADVP(a) VB(b) NP(c)
Page 8
Discontinuous treesROOT
SBARQ
SQ
VP
WHADVP
WRBWhy
VBcross
NP
DTthe
NNroad
VBDdid
NP
DTthe
NNchicken
.?
Figure : A discontinuous tree not found in the Penn treebank.
Linear Context-Free Rewriting System (LCFRS)VP2(a,bc)→WHADVP(a) VB(b) NP(c)SQ(abcd)→ VBD(b) NP(c) VP2(a,d)
Page 9
Linear Context-Free Rewriting Systems
I Mildly context-sensitive grammar formalismI Can be parsed with tabular parsing algorithmI Agenda-based probabilistic parser for LCFRS
(Kallmeyer & Maier 2010);extended to produce k-best derivations
I Parsing a binarized LCFRS has polynomial complexity:
O(n3ϕ)
where ϕ is the maximum number of componentscovered by a non-terminal (fan-out).
Kallmeyer & Maier (2010). Data-driven parsing with probabilistic linearcontext-free rewriting systems.
Page 10
But . . .
0 10 20 30 40
length
0
200
400
600
800
1000
1200
Avg. C
PU
tim
e (
seco
nds)
PLCFRS
Negra dev. set, gold tags
Page 11
PCFG approximation of PLCFRS
S
B*1 B*2
A X C Y D
a b c b d
S
B
A X C Y D
a b c b d
I Transformation is reversibleI Increased independence assumption:⇒ every component is a new node
I Language is a superset of original PLCFRS⇒ coarser, overgenerating PCFG (‘split-PCFG’)
Boyd (2007). Discontinuity revisited.
Page 12
Coarse-to-fine pipeline
G0
G1
G2
Split-PCFG
PLCFRS
a largegrammar
Treebankgrammars
Mildlycontext-sensitive
prune parsing with Gm+1 by only consideringitems in k-best Gm derivations.
Page 13
With coarse-to-fine
0 10 20 30 40
length
0
200
400
600
800
1000
1200Avg
. C
PU
tim
e (
seco
nd
s)
PLCFRS (k=10,000)Split-PCFGPLCFRS
Negra dev. set, gold tags
Page 14
Data-Oriented Parsing
Treebank grammartrees⇒ productions + rel. frequencies⇒ problematic independence assumptions
Data-Oriented Parsing (DOP)trees⇒ fragments + rel. frequenciesfragments are arbitrarily sized chunksfrom the corpus
consider all possible fragments from treebank. . .and “let the statistics decide”
Scha (1990): Lang. theory and lang. tech.; competence and performanceBod (1992): A computational model of language performance
Page 15
DOP fragmentsS
VP2
VB NP ADJis Gatsby rich
S
VP2
VB NP ADJis rich
S
VP2
VB NP ADJGatsby rich
S
VP2
VB NP ADJis Gatsby
S
VP2
VB NP ADJrich
S
VP2
VB NP ADJGatsby
S
VP2
VB NP ADJis
S
VP2
VB NP ADJ
S
VP2
NPGatsby
S
VP2
NPVP2
VB ADJis rich
VP2
VB ADJrich
VP2
VB ADJis
VP2
VB ADJ
NPGatsby
VBis
ADJrich
P(f ) = count(f )∑f ′∈F count(f ′) where F = { f ′ | root(f ′) = root(f ) }
Note: discontinuous frontier non-terminalsmark destination of components
Page 16
DOP derivation
S
VP2
VB NP ADJrich
VBis
NPGatsby
S
VP2
VB NP ADJis Gatsby rich P(d) = 0.2
S
VP2
VB NP ADJis
NPGatsby
ADJrich
S
VP2
VB NP ADJis Gatsby rich P(d) = 0.3
Derivations for this tree P(t) = 0.5
P(d) = P(f1 ◦ · · · ◦ fn) =∏f∈d
p(f )
P(t) = P(d1) + · · ·+ P(dn) =∑
d∈D(t)
∏f∈d
p(f )
Page 17
DOP implementation issues
Exponential number of fragmentsdue to all-fragments assumption
I Can use DOP reduction (Goodman 2003);weight of fragments spread over many productions
I Can restrict number of fragmentsby depth or frontier nodes &c.,⇒ but: not data-oriented!
Goodman (2003): Efficient parsing of DOP with PCFG-reductions
Page 18
Double-DOP
I Extract fragments that occurat least twice in treebank
I For every pair of trees,extract maximal overlapping fragments
I Can be extracted in linear average timeI Number of fragments is small enough
to parse with directly
Sangati & Zuidema (2011). Accurate parsing w/compact TSGs: Double-DOP
Page 19
From fragments to grammar
I Fragments mapped to unique rules,relative frequencies as probabilities
I Remove internal nodes,leaves root node, substitution sites & terminalsX → X1 . . .Xn
I Reconstruct derivations after parsing
S
VP2
VB NP ADJrich
S
VB NP rich
Sangati & Zuidema (2011). Accurate parsing w/compact TSGs: Double-DOP
Page 20
Preprocessing
I Remove function labelsI Binarize w/markovization (h=1, v=1)I Simple unknown word model
I Rare words replaced by features(model 4 from Stanford parser)
I Reserve probability massfor unseen (tag, word) pairs
Page 21
Results w/Double-DOP
F1 %DOP reduction 74.3Double-DOP
(Negra dev set ≤ 40 words, gold tags)
Page 22
Results w/Double-DOP
F1 %DOP reduction 74.3Double-DOP 76.3
(Negra dev set ≤ 40 words, gold tags)
Also: parsing 3× faster, grammar 3× smaller
Page 23
Results w/Double-DOP
k=50 k=5000F1 % F1 %
DOP reduction 74.3 73.5Double-DOP 76.3
(Negra dev set ≤ 40 words, gold tags)
What if we reduce pruning?
Page 24
Results w/Double-DOP
k=50 k=5000F1 % F1 %
DOP reduction 74.3 73.5Double-DOP 76.3 77.7
(Negra dev set ≤ 40 words, gold tags)
What if we reduce pruning?⇒ For Double-DOP, performance does not deterioriatewith expanded search space.
Page 25
Main Results: test sets
Parser, treebank |w | POS F1 EXGERMAN
vanCra2012, Negra ≤ 40 100 72.3 33.2#KaMa2013, Negra ≤ 30 100 75.8this paper, Negra ≤ 40 100 76.8 40.5this paper, Negra ≤ 40 96.3 74.8 38.7HaNi2008, Tiger ≤ 40 97.0 75.3 32.6this paper, Tiger ≤ 40 97.6 78.8 40.8
KaMa: Kallmeyer & Maier (2013) [different test set];vanCra: van Cranenburgh (2012); HaNi: Hall & Nivre (2008).
Page 26
Main Results: test sets
ENGLISH#EvKa2011, disc. wsj < 25 100 79.0this paper, disc. wsj ≤ 40 96.6 85.6 31.3SaZu2011, wsj ≤ 40 87.9 33.7
DUTCHthis paper, Alpino ≤ 40 85.2 65.9 23.1this paper, Lassy ≤ 40 94.6 77.0 35.2
EvKa: Evang & Kallmeyer (2011) [different test set];SaZu: Sangati & Zuidema (2011).
Page 27
Main Results: test sets
ENGLISH#EvKa2011, disc. wsj < 25 100 79.0this paper, disc. wsj ≤ 40 96.6 85.6 31.3SaZu2011, wsj ≤ 40 87.9 33.7
DUTCHthis paper, Alpino ≤ 40 85.2 65.9 23.1this paper, Lassy ≤ 40 94.6 77.0 35.2
EvKa: Evang & Kallmeyer (2011) [different test set];SaZu: Sangati & Zuidema (2011).
Page 28
Can DOP handle discontuinity without LCFRS?
Split-PCFG⇓
PLCFRS⇓
PLCFRS Double-DOP77.7 % F141.5 % EX
Split-PCFG
⇓
Split-Double-DOP
78.1 % F142.0 % EX
Answer: Yes!
Fragments can capture discontinuous contexts
Page 29
Can DOP handle discontuinity without LCFRS?
Split-PCFG⇓
PLCFRS⇓
PLCFRS Double-DOP77.7 % F141.5 % EX
Split-PCFG
⇓
Split-Double-DOP78.1 % F142.0 % EX
Answer: Yes!
Fragments can capture discontinuous contexts
Page 30
Conclusions
I Multilingual results for discontinuous parsing,w/automatic assignment of tags
I All fragments vs. selected fragmentsI Explicit representation of recurring fragments with
Double-DOP leads to better sample of derivationsthan parsing with all fragments
I Not necessary to parse beyond CFG!⇒ Increase amount of contextthrough fragments / labels
I LCFRS could be exploited for other things thandiscontinuity: adjunction, synchronous parsing, ...
Page 31
Conclusions
I Multilingual results for discontinuous parsing,w/automatic assignment of tags
I All fragments vs. selected fragmentsI Explicit representation of recurring fragments with
Double-DOP leads to better sample of derivationsthan parsing with all fragments
I Not necessary to parse beyond CFG!⇒ Increase amount of contextthrough fragments / labels
I LCFRS could be exploited for other things thandiscontinuity: adjunction, synchronous parsing, ...
Page 32
Conclusions
I Multilingual results for discontinuous parsing,w/automatic assignment of tags
I All fragments vs. selected fragmentsI Explicit representation of recurring fragments with
Double-DOP leads to better sample of derivationsthan parsing with all fragments
I Not necessary to parse beyond CFG!⇒ Increase amount of contextthrough fragments / labels
I LCFRS could be exploited for other things thandiscontinuity: adjunction, synchronous parsing, ...
Page 33
Conclusions
I Multilingual results for discontinuous parsing,w/automatic assignment of tags
I All fragments vs. selected fragmentsI Explicit representation of recurring fragments with
Double-DOP leads to better sample of derivationsthan parsing with all fragments
I Not necessary to parse beyond CFG!⇒ Increase amount of contextthrough fragments / labels
I LCFRS could be exploited for other things thandiscontinuity: adjunction, synchronous parsing, ...
Page 34
THE END
Codes: http://github.com/andreasvc/disco-dop
Page 35
Wait . . . there’s more
BACKUP SLIDES
Page 36
Efficiency (Negra dev set)
10 20 30 40
# words
0
10
20
30
40
cpu t
ime (
seco
nds)
dopplcfrspcfg
Page 37
Binarization
I mark heads of constituentsI head-outward binarization (parse head first)I no parent annotation: v = 1I horizontal Markovization: h = 1
X
A B C D E F
X
XA
XB
XF
XE
XD
XC$
A B C D E FKlein & Manning (2003): Accurate unlexicalized parsing.
Page 38
Parser setuptraincorpus='wsj02-21.export',testcorpus='wsj24.export',corpusdir='../../dptb',stages=[
dict(name='pcfg', mode='pcfg',split=True, markorigin=True,
),dict(
name='plcfrs', mode='plcfrs',prune=True, splitprune=True, k=10000,
),dict(
name='dop', mode='plcfrs',prune=True, k=5000,dop=True, usedoubledop=True, m=10000,estimator='dop1', objective='mpp',
),],[...]
Page 39
Web-based interface