Giorgio Satta University of Padua Parsing Techniques for Lexicalized Context-Free Grammars* * Joint work with : Jason Eisner, Mark- Jan Nederhof
Giorgio SattaUniversity of Padua
Parsing Techniques for
Lexicalized Context-Free Grammars*
* Joint work with : Jason Eisner, Mark-Jan Nederhof
Summary
• Part I: Lexicalized Context-Free Grammars– motivations and definition– relation with other formalisms
• Part II: standard parsing– TD techniques– BU techniques
• Part III: novel algorithms– BU enhanced– TD enhanced
Lexicalized grammars
• each rule specialized for one or more lexical items
• advantages over non-lexicalized formalisms:
– express syntactic preferences that are sensitive to lexical words
– control word selection
Syntactic preferences
• adjuncts Workers [ dumped sacks ] into a bin
*Workers dumped [ sacks into a bin ]
• N-N compound [ hydrogen ion ] exchange
*hydrogen [ ion exchange ]
Word selection
• lexical Nora convened the meeting
?Nora convened the party
• semantics Peggy solved two puzzles
?Peggy solved two goats
• world knowledge Mary shelved some books
?Mary shelved some cooks
Lexicalized CFG
Motivations :
• study computational properties common to generative formalisms used in state-of-the-art real-world parsers
• develop parsing algorithm that can be directly applied to these formalisms
Lexicalized CFG
dumped sacks into a bin
VP[dump][sack]
P[into]
N[bin]Det[a]
NP[bin]
PP[into][bin]
V[dump]
VP[dump][sack]
NP[sack]
N[sack]
Lexicalized CFG
Context-free grammars with :
• alphabet VT:
– dumped, sacks, into, ...
• delexicalized nonterminals VD:
– NP, VP, ...
• nonterminals VN:
– NP[sack], VP[dump][sack], ...
Lexicalized CFG
Delexicalized nonterminals encode :
• word senseN, V, ...
• grammatical featuresnumber, tense, ...
• structural information bar level, subcategorization state, ...
• other constraints distribution, contextual features, ...
Lexicalized CFG
• productions have two forms :
– V[dump] dumped
– VP[dump][sack] VP[dump][sack] PP[into][bin]
• lexical elements in lhs inherited from rhs
Lexicalized CFG
• production is k-lexical : k occurrences of lexical elements in rhs
– NP[bin] Det[a] N[bin] is 2-lexical
– VP[dump][sack] VP[dump][sack] PP[into][bin]
is 4-lexical
LCFG at work
• 2-lexical CFG– Alshawi 1996 : Head Automata
– Eisner 1996 : Dependency Grammars
– Charniak 1997 : CFG
– Collins 1997 : generative model
LCFG at work
Probabilistic LCFG G is strongly equivalent to probabilistic grammar G’ iff
• 1-2-1 mapping between derivations
• each direction is a homomorphism
• derivation probabilities are preserved
LCFG at work
From Charniak 1997 to 2-lex CFG :
NPS[profits]
NNP[profits] ADJNP[corporate]
Pr1 (corporate | ADJ, NP, profits)
Pr1 (profits | N, NP, profits) Pr2 ( NP ADJ N | NP, S, profits)
LCFG at work
From Collins 1997 (Model #2) to 2-lex CFG :
Prleft (NP, IBM | VP, S, bought, left , {NP-C})
N, [IBM] VPS, {NP-C}, left , S [bought]
VPS, {}, left , S [bought]
LCFG at work
Major Limitation : Cannot capture relations involving lexical items outside actual constituent (cfr. history based models)
V[d0
]
d1
NP[d1]
d2
PP[d2][d3]
d0
cannot look at d0 when computing PP attachment
NP[d1][d0]
LCFG at work
• lexicalized context-free parsers that are not LCFG :
– Magerman 1995 : Shift-Reduce+
– Ratnaparkhi 1997 : Shift-Reduce+
– Chelba & Jelinek 1998 : Shift-Reduce+
– Hermjakob & Mooney 1997 : LR
Related work
Other frameworks for the study of lexicalized grammars :
• Carroll & Weir 1997 : Stochastic Lexicalized Grammars; emphasis on expressiveness
• Goodman 1997 : Probabilistic Feature Grammars; emphasis on parameter estimation
Summary
• Part I: Lexicalized Context-Free Grammars– motivations and definition– relation with other formalisms
• Part II: standard parsing– TD techniques– BU techniques
• Part III: novel algorithms– BU enhanced– TD enhanced
Standard Parsing
• standard parsing algorithms (CKY, Earley, LC, ...) run on LCFG in time O ( |G | |w |3 )
• for 2-lex CFG (simplest case) |G | grows with |VD|3 |VT|2 !!
Goal : Get rid of |VT| factors
Standard Parsing: TD
Result (to be refined) : Algorithms satisfying the correct-prefix property are “unlikely” to run on LCFG in time independent of VT
Correct-prefix property
Earley, Left-Corner, GLR, ... :
w
S
left-to-right reading position
On-line parsing
No grammar precompilation (Earley) :
ParserG
w
Output
Standard Parsing: TD
Result : On-line parsers with correct-prefix property cannot run in time O ( f(|VD|, |w |) ), for any function f
Off-line parsing
Grammar is precompiled (Left-Corner, LR) :
Parser
G PreComp
C(G )
w OutputParser
Standard Parsing: TD
Fact : We can simulate a nondeterministic FA M on w in time O ( |M | |w | )
Conjecture : Fix a polynomial p. We cannot simulate M on w in time p( |w | ) unless we spend exponential time in precompiling M
Standard Parsing: TD
Assume our conjecture holds true
Result : Off-line parsers with correct-prefix property cannot run in time O ( p(|VD|, |w |) ), for any polynomial p, unless we spend exponential time in precompiling G
Standard Parsing: BU
Common practice in lexicalized grammar parsing :
• select productions that are lexically grounded in w
• parse BU with selected subset of G
Problem :Algorithm removes |VT| factors but introduces new |w | factors !!
Standard Parsing: BU
Running time is O ( |VD|3 |w |5 ) !!
i k j
B[d1] C[d2]
A[d2]
d1 d2
Time charged : • i, k, j |w |3
• A, B, C |VD|3
• d1, d2 |w |2
Standard BU : Exhaustive
y = c x 5,2019
1
10
100
1000
10000
100000
10 100
length
tim
e BU naive
Standard BU : Pruning
y = c x 3,8282
1
10
100
1000
10000
10 100
length
tim
e BU naive
Summary
• Part I: Lexicalized Context-Free Grammars– motivations and definition– relation with other formalisms
• Part II: standard parsing– TD techniques– BU techniques
• Part III: novel algorithms– BU enhanced– TD enhanced
BU enhanced
Result : Parsing with 2-lex CFG in time O ( |VD|3 |w |4 )
Remark : Result transfers to models in Alshawi 1996, Eisner 1996, Charniak 1997, Collins 1997
Remark : Technique extends to improve parsing of Lexicalized-Tree Adjoining Grammars
Algorithm #1
Basic step in naive BU :
j
C[d2]
d2i k
B[d1]
d1
Idea :Indices d1 and j can be processed independently
A[d2]
Algorithm #1
• Step 1
• Step 2
i k
B[d1
]
d1
A[d2]
C[d2]
d2
A[d2]
i d2 j
i k
A[d2]
C[d2]
d2
ji
A[d2]
C[d2]
d2k
BU enhanced
Upper bound provided by Algorithm #1 : O (|w |4 )
Goal : Can we go down to O (|w |3 ) ?
Spine
last week
AdvP[week]
NP[IBM]
NP[Lotus]
Lotus
IBM
S[buy]
S[buy]
VP[buy]
V[buy]
bought
The spine of a parse tree is the path from the root to the root’s head
S[buy]
S[buy]
VP[buy]
V[buy]
bought
The spine projection is the yield of the sub-tree composed by the spine and all its sibling nodes
Spine projection
last week
AdvP[week]
S[buy]
NP[IBM]
S[buy]
VP[buy]
V[buy] NP[Lotus]
Lotus
bought
IBM
AdvP[week]
NP[IBM]
NP[Lotus]
NP[IBM] bought NP[Lotus] AdvP[week]
Split Grammars
Split spine projections at head :
Problem :how much information do we need to store in order to construct new grammatical spine projections from splits ?
??
Split Grammars
Fact : Set of spine projections is a linear context-free language
Definition : 2-lex CFG is split if set of spine projections is a regular language
Remark : For split grammars, we can recombine splits using finite information
Split Grammars
Non-split grammar :
• unbounded # of dependencies between left and right dependents of head
S[d]
AdvP[a]
S[d] AdvP[b]
S1[d]
• linguistically unattested and unlikely
AdvP[a]
S[d] AdvP[b]
S1[d]
Split Grammars
Split grammar :finite # of dependencies between left and right dependents of lexical head
S[d]
AdvP[a]
S[d] AdvP[b]
S[d]
Split Grammars
Precompile grammar such that splits are derived separately
AdvP[week]
NP[IBM]
NP[Lotus]
S[buy]
S[buy]
VP[buy]
V[buy]
bought
AdvP[week]
NP[IBM]
NP[Lotus]
S[buy]
r3[buy]
r2[buy]
r1[buy]
bought
r3[buy] is a split symbol
Split Grammars
• t : max # of states per spine automaton
• g : max # of split symbols per spine automaton (g < t )
• m : # of delexicalized nonterminals thare are maximal projections
BU enhanced
Result : Parsing with split 2-lexical CFG in time O (t 2 g 2 m 2 |w |3 )
Remark: Models in Alshawi 1996, Charniak 1997 and Collins 1997 are not split
Algorithm #2
Idea :
• recognize left and right splits separately
• collect head dependents one split at a time
B[d]
d
d
s
B[d]
s
B[d]
d
NP[IBM] bought NP[Lotus] AdvP[week]
Algorithm #2
Algorithm #2
• Step 1
B[d1]
d1
s1
k
r1
d2
s2
r2
• Step 2
B[d1]
d1
s1
d2
s2
r2
B[d1]
d1
s1
d2
s2
r2
i
r2
d2i
s2
Algorithm #2 : Exhaustive
y = c x 5,2019
y = c x 3,328
1
10
100
1000
10000
100000
10 100
length
tim
e
BU split
BU naive
Algorithm #2 : Pruning
y = c x 3,8282
y = c x 2,8179
1
10
100
1000
10000
10 100
length
tim
e
BU naive
BU split
Related work
Cubic time algorithms for lexicalized grammars :
• Sleator & Temperley 1991 : Link Grammars
• Eisner 1997 : Bilexical Grammars (improved by transfer of Algorithm #2)
TD enhanced
Goal : Introduce TD prediction for 2-lexical CFG parsing, without |VT| factors
Remark : Must relax left-to-right parsing (because of previous results)
TD enhanced
Result : TD parsing with 2-lex CFG in time O ( |VD|3 |w |4 )
Open : O ( |w |3 ) extension to split grammars
TD enhanced
Strongest version of correct-prefix property :
S
w
reading position
Data Structures
Prods with lhs A[d] :
• A[d] X1[d1] X2[d2]
• A[d] Y1[d3] Y2[d2]
• A[d] Z1[d2] Z2[d1]d3
d2
d1
Trie for A[d] :
d1
d2
Data Structures
Rightmost subsequence recognition by precompiling input w into a deterministic FA :
a
b
a
c
c b
a
b
b
a
a
c
Algorithm #3
Item representation :
• i, j indicate extension of A[d] partial analysis
A[d]
i j k
S
• k indicates rightmost possible position for completion of A[d] analysis
Algorithm #3 : Prediction
• Step 1 : find rightmost subsequence before k for some
A[d2] production• Step 2 :
make Earley prediction
C[d2] B[d1]
k’
i
A[d2]
k
d2d1
Conclusions
• standard parsing techniques are not suitable for processing lexicalized grammars
• novel algorithms have been introduced using enhanced dynamic programming
• work to be done : extension to history-based models
The End
Many thanks for helpful discussion to :
Jason Eisner, Mark-Jan Nederhof