grammer

Giorgio SattaUniversity of Padua

Parsing Techniques for

Lexicalized Context-Free Grammars*

* Joint work with : Jason Eisner, Mark-Jan Nederhof

Summary

• Part I: Lexicalized Context-Free Grammars– motivations and definition– relation with other formalisms

• Part II: standard parsing– TD techniques– BU techniques

• Part III: novel algorithms– BU enhanced– TD enhanced

Lexicalized grammars

• each rule specialized for one or more lexical items

• advantages over non-lexicalized formalisms:

– express syntactic preferences that are sensitive to lexical words

– control word selection

Syntactic preferences

• adjuncts Workers [ dumped sacks ] into a bin

*Workers dumped [ sacks into a bin ]

• N-N compound [ hydrogen ion ] exchange

*hydrogen [ ion exchange ]

Word selection

• lexical Nora convened the meeting

?Nora convened the party

• semantics Peggy solved two puzzles

?Peggy solved two goats

• world knowledge Mary shelved some books

?Mary shelved some cooks

Lexicalized CFG

Motivations :

• study computational properties common to generative formalisms used in state-of-the-art real-world parsers

• develop parsing algorithm that can be directly applied to these formalisms

Lexicalized CFG

dumped sacks into a bin

VP[dump][sack]

P[into]

N[bin]Det[a]

NP[bin]

PP[into][bin]

V[dump]

VP[dump][sack]

NP[sack]

N[sack]

Lexicalized CFG

Context-free grammars with :

• alphabet VT:

– dumped, sacks, into, ...

• delexicalized nonterminals VD:

– NP, VP, ...

• nonterminals VN:

– NP[sack], VP[dump][sack], ...

Lexicalized CFG

Delexicalized nonterminals encode :

• word senseN, V, ...

• grammatical featuresnumber, tense, ...

• structural information bar level, subcategorization state, ...

• other constraints distribution, contextual features, ...

Lexicalized CFG

• productions have two forms :

– V[dump] dumped

– VP[dump][sack] VP[dump][sack] PP[into][bin]

• lexical elements in lhs inherited from rhs

Lexicalized CFG

• production is k-lexical : k occurrences of lexical elements in rhs

– NP[bin] Det[a] N[bin] is 2-lexical

– VP[dump][sack] VP[dump][sack] PP[into][bin]

is 4-lexical

LCFG at work

• 2-lexical CFG– Alshawi 1996 : Head Automata

– Eisner 1996 : Dependency Grammars

– Charniak 1997 : CFG

– Collins 1997 : generative model

LCFG at work

Probabilistic LCFG G is strongly equivalent to probabilistic grammar G’ iff

• 1-2-1 mapping between derivations

• each direction is a homomorphism

• derivation probabilities are preserved

LCFG at work

From Charniak 1997 to 2-lex CFG :

NPS[profits]

NNP[profits] ADJNP[corporate]

Pr1 (corporate | ADJ, NP, profits)

Pr1 (profits | N, NP, profits) Pr2 ( NP ADJ N | NP, S, profits)

LCFG at work

From Collins 1997 (Model #2) to 2-lex CFG :

Prleft (NP, IBM | VP, S, bought, left , {NP-C})

N, [IBM] VPS, {NP-C}, left , S [bought]

VPS, {}, left , S [bought]

LCFG at work

Major Limitation : Cannot capture relations involving lexical items outside actual constituent (cfr. history based models)

V[d0

]

d1

NP[d1]

d2

PP[d2][d3]

d0

cannot look at d0 when computing PP attachment

NP[d1][d0]

LCFG at work

• lexicalized context-free parsers that are not LCFG :

– Magerman 1995 : Shift-Reduce+

– Ratnaparkhi 1997 : Shift-Reduce+

– Chelba & Jelinek 1998 : Shift-Reduce+

– Hermjakob & Mooney 1997 : LR

Related work

Other frameworks for the study of lexicalized grammars :

• Carroll & Weir 1997 : Stochastic Lexicalized Grammars; emphasis on expressiveness

• Goodman 1997 : Probabilistic Feature Grammars; emphasis on parameter estimation

Summary




Standard Parsing

• standard parsing algorithms (CKY, Earley, LC, ...) run on LCFG in time O ( |G | |w |3 )

• for 2-lex CFG (simplest case) |G | grows with |VD|3 |VT|2 !!

Goal : Get rid of |VT| factors

Standard Parsing: TD

Result (to be refined) : Algorithms satisfying the correct-prefix property are “unlikely” to run on LCFG in time independent of VT

Correct-prefix property

Earley, Left-Corner, GLR, ... :

w

S

left-to-right reading position

On-line parsing

No grammar precompilation (Earley) :

ParserG

w

Output


Result : On-line parsers with correct-prefix property cannot run in time O ( f(|VD|, |w |) ), for any function f

Off-line parsing

Grammar is precompiled (Left-Corner, LR) :

Parser

G PreComp

C(G )

w OutputParser


Fact : We can simulate a nondeterministic FA M on w in time O ( |M | |w | )

Conjecture : Fix a polynomial p. We cannot simulate M on w in time p( |w | ) unless we spend exponential time in precompiling M


Assume our conjecture holds true

Result : Off-line parsers with correct-prefix property cannot run in time O ( p(|VD|, |w |) ), for any polynomial p, unless we spend exponential time in precompiling G

Standard Parsing: BU

Common practice in lexicalized grammar parsing :

• select productions that are lexically grounded in w

• parse BU with selected subset of G

Problem :Algorithm removes |VT| factors but introduces new |w | factors !!

Standard Parsing: BU

Running time is O ( |VD|3 |w |5 ) !!

i k j

B[d1] C[d2]

A[d2]

d1 d2

Time charged : • i, k, j |w |3

• A, B, C |VD|3

• d1, d2 |w |2

Standard BU : Exhaustive

y = c x 5,2019

1

10

100

1000

10000

100000

10 100

length

tim

e BU naive

Standard BU : Pruning

y = c x 3,8282

1

10

100

1000

10000

10 100

length

tim

e BU naive

Summary




BU enhanced

Result : Parsing with 2-lex CFG in time O ( |VD|3 |w |4 )

Remark : Result transfers to models in Alshawi 1996, Eisner 1996, Charniak 1997, Collins 1997

Remark : Technique extends to improve parsing of Lexicalized-Tree Adjoining Grammars

Algorithm #1

Basic step in naive BU :

j

C[d2]

d2i k

B[d1]

d1

Idea :Indices d1 and j can be processed independently

A[d2]

Algorithm #1

• Step 1

• Step 2

i k

B[d1

]

d1

A[d2]

C[d2]

d2

A[d2]

i d2 j

i k

A[d2]

C[d2]

d2

ji

A[d2]

C[d2]

d2k

BU enhanced

Upper bound provided by Algorithm #1 : O (|w |4 )

Goal : Can we go down to O (|w |3 ) ?

Spine

last week

AdvP[week]

NP[IBM]

NP[Lotus]

Lotus

IBM

S[buy]

S[buy]

VP[buy]

V[buy]

bought

The spine of a parse tree is the path from the root to the root’s head

S[buy]

S[buy]

VP[buy]

V[buy]

bought

The spine projection is the yield of the sub-tree composed by the spine and all its sibling nodes

Spine projection

last week

AdvP[week]

S[buy]

NP[IBM]

S[buy]

VP[buy]

V[buy] NP[Lotus]

Lotus

bought

IBM

AdvP[week]

NP[IBM]

NP[Lotus]

NP[IBM] bought NP[Lotus] AdvP[week]

Split Grammars

Split spine projections at head :

Problem :how much information do we need to store in order to construct new grammatical spine projections from splits ?

??

Split Grammars

Fact : Set of spine projections is a linear context-free language

Definition : 2-lex CFG is split if set of spine projections is a regular language

Remark : For split grammars, we can recombine splits using finite information

Split Grammars

Non-split grammar :

• unbounded # of dependencies between left and right dependents of head

S[d]

AdvP[a]

S[d] AdvP[b]

S1[d]

• linguistically unattested and unlikely

AdvP[a]

S[d] AdvP[b]

S1[d]

Split Grammars

Split grammar :finite # of dependencies between left and right dependents of lexical head

S[d]

AdvP[a]

S[d] AdvP[b]

S[d]

Split Grammars

Precompile grammar such that splits are derived separately

AdvP[week]

NP[IBM]

NP[Lotus]

S[buy]

S[buy]

VP[buy]

V[buy]

bought

AdvP[week]

NP[IBM]

NP[Lotus]

S[buy]

r3[buy]

r2[buy]

r1[buy]

bought

r3[buy] is a split symbol

Split Grammars

• t : max # of states per spine automaton

• g : max # of split symbols per spine automaton (g < t )

• m : # of delexicalized nonterminals thare are maximal projections

BU enhanced

Result : Parsing with split 2-lexical CFG in time O (t 2 g 2 m 2 |w |3 )

Remark: Models in Alshawi 1996, Charniak 1997 and Collins 1997 are not split

Algorithm #2

Idea :

• recognize left and right splits separately

• collect head dependents one split at a time

B[d]

d

d

s

B[d]

s

B[d]

d

NP[IBM] bought NP[Lotus] AdvP[week]

Algorithm #2

Algorithm #2

• Step 1

B[d1]

d1

s1

k

r1

d2

s2

r2

• Step 2

B[d1]

d1

s1

d2

s2

r2

B[d1]

d1

s1

d2

s2

r2

i

r2

d2i

s2

Algorithm #2 : Exhaustive

y = c x 5,2019

y = c x 3,328

1

10

100

1000

10000

100000

10 100

length

tim

e

BU split

BU naive

Algorithm #2 : Pruning

y = c x 3,8282

y = c x 2,8179

1

10

100

1000

10000

10 100

length

tim

e

BU naive

BU split

Related work

Cubic time algorithms for lexicalized grammars :

• Sleator & Temperley 1991 : Link Grammars

• Eisner 1997 : Bilexical Grammars (improved by transfer of Algorithm #2)

TD enhanced

Goal : Introduce TD prediction for 2-lexical CFG parsing, without |VT| factors

Remark : Must relax left-to-right parsing (because of previous results)

TD enhanced

Result : TD parsing with 2-lex CFG in time O ( |VD|3 |w |4 )

Open : O ( |w |3 ) extension to split grammars

TD enhanced

Strongest version of correct-prefix property :

S

w

reading position

Data Structures

Prods with lhs A[d] :

• A[d] X1[d1] X2[d2]

• A[d] Y1[d3] Y2[d2]

• A[d] Z1[d2] Z2[d1]d3

d2

d1

Trie for A[d] :

d1

d2

Data Structures

Rightmost subsequence recognition by precompiling input w into a deterministic FA :

a

b

a

c

c b

a

b

b

a

a

c

Algorithm #3

Item representation :

• i, j indicate extension of A[d] partial analysis

A[d]

i j k

S

• k indicates rightmost possible position for completion of A[d] analysis

Algorithm #3 : Prediction

• Step 1 : find rightmost subsequence before k for some

A[d2] production• Step 2 :

make Earley prediction

C[d2] B[d1]

k’

i

A[d2]

k

d2d1

Conclusions

• standard parsing techniques are not suitable for processing lexicalized grammars

• novel algorithms have been introduced using enhanced dynamic programming

• work to be done : extension to history-based models

The End

Many thanks for helpful discussion to :

Jason Eisner, Mark-Jan Nederhof

grammer

Technology

standard parsing standard

lexicalized grammar

standard bu

bu time

parsing techniques

lexicalized cfg motivations

parsing grammar

lexicalized cfg productions