Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.

Probabilistic Parsing

Ling 571

Fei Xia

Week 5: 10/25-10/27/05

Outline

• Lexicalized CFG (Recap)

• Hw5 and Project 2

• Parsing evaluation measures: ParseVal

• Collin’s parser

• TAG

• Parsing summary

Lexicalized CFG recap

Important equations

),...|(),...,(

),...,()(

111

1

,...,11

2

ii

in

AAn

AAAPAAP

AAPAPn

Lexicalized CFG

• Lexicalized rules:

• Sparse data problem– First generate the head– Then generate the unlexicalized rule

)()...()()()...()(

)()....()(

1111

11

mmnn

nn

rRrRhHlLlLhA

wBwBwA

Lexicalized models

))(),(|(*)))((),(|)((

)),...,),(|(*)),...,|)((

),...,|)(,(

),...|(

),...,(),(

1

11111

111

111

1

iiiiiii

iiiiii

iii

i

ii

i

n

rhrlhsrPrmhrlhsrhP

lrlrrhrPlrlrrhP

lrlrrhrP

lrlrlrP

lrlrPSTP

An example

• he likes her

),|Pr(*),|(*

),|(*),|(*

),|Pr(*),|(*

),|(*),|(

),(

herNPonNPPlikesNPherP

likesVPVNPVPPlikesVPlikesP

heNPonNPPlikesNPheP

likesSNPVPSPSlikesP

STP

An example

• he likes her

),Pr|(Pr*),Pr|(*

),|(*),|(*

),Pr|(Pr*),Pr|(*

),|Pr(*),|(*

),|(*),|(*

),|Pr(*),|(*

),|(*),|(*

),|(*),|(

),(

heronheronPheronherP

likesVlikesVPlikesVlikesP

heonheonPheonheP

herNPonNPPlikesNPherP

likesVPVNPVPPlikesVPlikesP

heNPonNPPlikesNPheP

likesSNPVPSPlikesSlikesP

likesTopSTopPToplikesP

STP

Head-head probability

)...)(...)((

)....)(...)((

),(

),,(

),(

),,(

),|(

1

21

1

12

1

12

12

wAwXC

wAwXC

wAC

wAwC

wAP

wAwP

wAwP

w

)...)(...)((

)...)(...)((),|(

wNPlikesXC

heNPlikesXClikesNPheP

w

Head-rule probability

))((

))((

))((

))((

))((

))((

),(

),,(

),|(

wAC

wAC

wAC

wAC

wAP

wAP

wAP

wAAP

wAAP

))((

)Pr)((),|Pr(

heNPC

onheNPCheNPonNPP

Estimate parameters

))((

))((),|(

)...)(...)((

)....)(...)((),|(

1

2112

wAC

wACwAAP

wAwXC

wAwXCwAwP

w

Building a statistical tool

• Design a model:– Objective function: generative model vs.

discriminative model– Decomposition: independence assumption– The types of parameters and parameter size

• Training: estimate model parameters– Supervised vs. unsupervised– Smoothing methods

• Decoding:

Team Project 1 (Hw5)

• Form a team: program language, schedule, expertise, etc.

• Understand the lexicalized model

• Design the training algorithm

• Work out the decoding (parsing) algorithm: augment CYK algorithm.

• Illustrate the algorithms with a real example.

Team Project 2

• Task: parse real data with a real grammar extracted from a treebank.

• Parser: PCFG or lexicalized PCFG

• Training data: English Penn Treebank Section 02-21

• Development data: section 00

Team Project 2 (cont)

• Hw6: extract PCFG from the treebank

• Hw7: make sure your parser works given real grammar and real sentences; measure parsing performance

• Hw8: improve parsing results

• Hw10: write a report and give a presentation

Parsing evaluation measures

Evaluation of parsers: ParseVal

• Labeled recall: • Labeled precision: • Labeled F-measure:

• Complete match: % of sents where recall and precision are 100%

• Average crossing: # of crossing per sent• No crossing: % of sents which have no crossing.

An example

Gold standard:

(VP (V saw)

(NP (Det the) (N man))

(PP (P with) (NP (Det a) (N telescope))))

Parser output:

(VP (V saw)

(NP (NP (Det the) (N man))

(PP (P with) (NP (Det a) (N telescope)))))

ParseVal measures

• Gold standard: (VP, 1, 6), (NP, 2, 3), (PP, 4, 6), (NP, 5, 6)

• System output: (VP, 1, 6), (NP, 2, 6), (NP, 2, 3), (PP, 4,

6), (NP, 5, 6)

• Recall=4/4, Prec=4/5, crossing=0

A different annotation

Gold standard: (VP (V saw) (NP (Det the) (N’ (N man)) (PP (P with) (NP (Det a) (N’ (N telescope)))))

Parser output: (VP (V saw) (NP (Det the) (N’ (N man) (PP (P with) (NP (Det a) (N’ (N telescope)))))))

ParseVal measures (cont)

• Gold standard: (VP, 1, 6), (NP, 2, 3), (N’, 3, 3), (PP, 4, 6), (NP, 5, 6), (N’, 6,6)

• System output: (VP, 1, 6), (NP, 2, 6), (N’, 3, 6), (PP, 4, 6), (NP, 5, 6), (N’, 6, 6)

• Recall=4/6, Prec=4/6, crossing=1

EVALB

• A tool that calculates ParseVal measures

• To run it:evalb –p parameter_file gold_file system_output

• A copy is available in my dropbox

• You will need it for Team Project 2

Summary of Parsing evaluation measures

• ParseVal is the widely used: F-measure is the most important

• The results depend on annotation style

• EVALB is a tool that calculates ParseVal measures

• Other measures are used too: e.g., accuracy of dependency links

History-based models

History-based models

• History-based approaches maps (T, S) into a decision sequence

• Probability of tree T for sentence S is:

)),....,(|(

),...,|(

),....,(),(

11

11

1

iii

iii

n

ddfdP

dddP

ddPSTP

ndd ,....1

History-based models (cont)

• PCFGs can be viewed as a history-based model

• There are other history-based models– Magerman’s parser (1995)– Collin’s parsers (1996, 1997, ….)– Charniak’s parsers (1996,1997,….)– Ratnaparkhi’s parser (1997)

Collins’ models

• Model 1: Generative model of (Collins, 1996)

• Model 2: Add complement/adjunct distinction

• Model 3: Add wh-movement

Model 1

• First generate the head constituent label• Then generate left and right dependents

),,|)()...((*

),,|)()...((*

),|(

))()...(,,,|)()...((*

),,|)()...((*

),|(

),|)()...()()()...()((

11

11

1111

11

1111

hHArRrRP

hHAlLlLP

hAHP

lLlLhHArRrRP

hHAlLlLP

hAHP

hArRrRhHlLlLhAP

mm

nn

H

nnmm

nn

H

mmnn

Model 1(cont)

),,|)((

))()....(,,,|)((

),,|)()...((

),,|)((

))()....(,,,|)((

),,|)()...((

1111

11

1111

11

hHArRP

rRrRhHArRP

hHArRrRP

hHAlLP

lLlLhHAlLP

hHAlLlLP

iiL

iiiiL

mm

iiL

iiiiL

nn

An example

),,|(

*),,|(

*),,|)((

*),,|)((

*),|(

),|(

boughtVPSSTOPP

boughtVPSSTOPP

boughtVPSweekNPP

boughtVPSmarksNPP

boughtSVPP

boughtSruleP

R

L

L

L

H

)()()()(: boughtVPMarksNPweekNPboughtSrule

Sentence: Last week Marks bought Brooks.

Model 2

• Generate a head label H

• Choose left and right subcat frames

• Generate left and right arguments

• Generate left and right modifiers

An example

{}),,,|(

*{}),,,|(

*{}),,,|)((

*}){,,,|)((

*),,|({}*),,|}({

*),|(

),|(

boughtVPSSTOPP

boughtVPSSTOPP

boughtVPSweekNPP

NPboughtVPSmarksNPP

boughtVPSPboughtVPSNPP

boughtSVPP

boughtSruleP

R

L

L

ccL

rcclc

H

)()()()(: boughtVPMarksNPweekNPboughtSrule c

Model 3

• Add Trace and wh-movement

• Given that the LHS of a rule has a gap, there are three ways to pass down the gap– Head: S(+gap)NP VP(+gap)– Left: S(+gap)NP(+gap) VP– Right: SBAR(that)(+gap)WHNP(that)

S(+gap)

Parsing results

LR LP

Model 1 87.4% 88.1%

Model 2 88.1% 88.6%

Model 3 88.1% 88.6%

Tree Adjoining Grammar (TAG)

TAG

• TAG basics:

• Extension of LTAG– Lexicalized TAG (LTAG)– Synchronous TAG (STAG)– Multi-component TAG (MCTAG)– ….

TAG basics

• A tree-rewriting formalism (Joshi et. al, 1975)

• It can generate mildly context-sensitive languages.

• The primitive elements of a TAG are elementary trees.

• Elementary trees are combined by two operations: substitution and adjoining.

• TAG has been used in – parsing, semantics, discourse, etc.– Machine translation, summarization, generation, etc.

Two types of elementary trees

VP

ADVP

ADV

still

VP*

Initial tree: Auxiliary tree:

S

NP VP

V NP

draft

Substitution operation

They draft policies

Adjoining operation

Y

Y*Y*

They still draft policies

Derivation tree

Elementary trees

Derived tree

Derivation tree

Derived tree vs. derivation tree

• The mapping is not 1-to-1.

• Finding the best derivation is not the same as finding the best derived tree.

S

V

do

S*

they

PN

NP

Wh-movement

What do they draft ?

i

S

iNP S

NP VP

V NP

draft

N

what do

PN

they

i

i

S

NP S

V S

NP VP

V NP

draft

what

NP

N

What does John think they draft ?

S

V

does

S*

S

NP VP

V S*

think

Long-distance wh-movement

S

SNP

NP VP

V NP

draft i

i

does

think

i

i

S

NP S

V S

NP VP

S

NP VP

V

draft

NP

what

John

they

Who did you have dinner with?

have

S

NP VP

NPV

S

NPS*

PN

who

iPP

P NP

with

VP

VP*

i

S

NP

PN

who PP

P NP

with

VP

VP

have

S

NP

NPV

i

i

TAG extension

• Lexicalized TAG (LTAG)

• Synchronized TAG (STAG)

• Multi-component TAG (MCTAG)

• ….

STAG

• The primitive elements in STAG are elementary tree pairs.

• Used for MT

Summary of TAG

• A formalism beyond CFG• Primitive elements are trees, not rules• Extended domain of locality• Two operations: substitution and adjoining

• Parsing algorithm: • Statistical parser for TAG• Algorithms for extracting TAG from treebanks.

)( 6nO

Parsing summary

Types of parsers

• Phrase structure vs. dependency tree• Statistical vs. rule-based• Grammar-based or not• Supervised vs. unsupervised

Our focus:Phrase structureMainly statisticalMainly Grammar-based: CFG, TAGSupervised

Grammars

• Chomsky hierarchy:– Unstricted grammar (type 0)– Context-sensitive grammar – Context-free grammar– Regular grammarHuman languages are beyond context-free

• Other formalism– HPSG, LFG– TAG– Dependency grammars

Parsing algorithm for CFG

• Top-down

• Bottom-up

• Top-down with bottom-up filter

• Earley algorithm

• CYK algorithm– Requiring CFG to be in CNF– Can be augmented to deal with PCFG,

lexicalized CFG, etc.

Extensions of CFG

• PCFG: find the most likely parse trees

• Lexicalized CFG: – use less strong independence assumption– Account for certain types of lexical and

structural dependency

Beyond CFG

• History-based models– Collins’ parsers

• TAG– Tree-writing– Mildly context-sensitive grammar– Many extensions: LTAG, STAG, …

Statistical approach

• Modeling– Choose the objective function– Decompose the function:

• Common equations: joint, conditional, marginal probabilities• Independency assumptions

• Training: – Supervised vs. unsupervised– Smoothing

• Decoding– Dynamic programming– Pruning

Evaluation of parsers

• Accuracy: ParseVal

• Robustness

• Resources needed

• Efficiency

• Richness

Other things

• Converting into CNF:– CFG– PCFG– Lexicalized CFG

• Treebank annotation– Tagset: syntactic labels, POS tag, function

tag, empty categories– Format: indentation, brackets

Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.

Documents

n n telescope slide

np np det

presentation slide

n n man pp p

unlexicalized rule slide

lexicalized cfg recap

n n telescope parser

headhead probability