Top Banner
Beam-Width Prediction for Efficient Context-Free Parsing Nathan Bodenstab, Aaron Dunlop, Keith Hall, Brian Roark June 2011
28

Beam-Width Prediction for Efficient Context-Free Parsing Nathan Bodenstab, Aaron Dunlop, Keith Hall, Brian Roark June 2011.

Dec 26, 2015

Download

Documents

Arleen Short
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Beam-Width Prediction for Efficient Context-Free Parsing Nathan Bodenstab, Aaron Dunlop, Keith Hall, Brian Roark June 2011.

Beam-Width Prediction for Efficient Context-Free Parsing

Nathan Bodenstab, Aaron Dunlop, Keith Hall, Brian Roark

June 2011

Page 2: Beam-Width Prediction for Efficient Context-Free Parsing Nathan Bodenstab, Aaron Dunlop, Keith Hall, Brian Roark June 2011.

OHSU Beam-Search Parser (BUBS)

2

• Standard bottom-up CYK

• Beam-search per chart cell

• Only “best” are retained

Page 3: Beam-Width Prediction for Efficient Context-Free Parsing Nathan Bodenstab, Aaron Dunlop, Keith Hall, Brian Roark June 2011.

Ranking, Prioritization, and FOMs

• f() = g() + h()

• Figure of Merit– Caraballo and Charniak (1997)

• A* search– Klein and Manning (2003)

– Pauls and Klein (2010)

• Other– Turrian (2007)

– Huang (2008)

• Apply to beam-search

3

Page 4: Beam-Width Prediction for Efficient Context-Free Parsing Nathan Bodenstab, Aaron Dunlop, Keith Hall, Brian Roark June 2011.

Beam-Width Prediction

• Traditional beam-search uses constant beam-width

• Two definitions of beam-width:– Number of local competitors to retain (n-best)– Score difference from best entry

• Advantages– Heavy pruning compared to CYK– Minimal sorting compared to global agenda

• Disadvantages– No global pruning – all chart cells treated equal– Conservative to keep outliers within beam

4

Page 5: Beam-Width Prediction for Efficient Context-Free Parsing Nathan Bodenstab, Aaron Dunlop, Keith Hall, Brian Roark June 2011.

5

Beam-Width Prediction

• How often is gold edge ranked in top N per chart cell– Exhaustively parse section 22 + Berkeley latent variable grammar

Gold rank <= N

Cum

ulat

ive

Gol

d E

dges

Page 6: Beam-Width Prediction for Efficient Context-Free Parsing Nathan Bodenstab, Aaron Dunlop, Keith Hall, Brian Roark June 2011.

6

Beam-Width Prediction

• How often is gold edge ranked in top N per chart cell– Exhaustively parse section 22 + Berkeley latent variable grammar

Gold rank <= N

Cum

ulat

ive

Gol

d E

dges

Page 7: Beam-Width Prediction for Efficient Context-Free Parsing Nathan Bodenstab, Aaron Dunlop, Keith Hall, Brian Roark June 2011.

7

Beam-Width Prediction

• Beam-search + C&C Boundary ranking: – How often is gold edge ranked in top N per chart cell:

Gold rank <= N

Cum

ulat

ive

Gol

d E

dges

To maintain baseline accuracy, beam-width must be set to 15 with C&C Boundary ranking (and 50 using only inside score)

To maintain baseline accuracy, beam-width must be set to 15 with C&C Boundary ranking (and 50 using only inside score)

Page 8: Beam-Width Prediction for Efficient Context-Free Parsing Nathan Bodenstab, Aaron Dunlop, Keith Hall, Brian Roark June 2011.

8

Beam-Width Prediction

• Beam-search + C&C Boundary ranking: – How often is gold edge ranked in top N per chart cell:

Gold rank <= N

Cum

ulat

ive

Gol

d E

dges

To maintain baseline accuracy, beam-width must be set to 15 with C&C Boundary ranking (and 50 using only inside score)

To maintain baseline accuracy, beam-width must be set to 15 with C&C Boundary ranking (and 50 using only inside score)

• Over 70% of gold edges are already ranked first in the local agenda

• 14 of 15 edges in these cells are unnecessary

• We can do much better than a constant beam-width

• Over 70% of gold edges are already ranked first in the local agenda

• 14 of 15 edges in these cells are unnecessary

• We can do much better than a constant beam-width

Page 9: Beam-Width Prediction for Efficient Context-Free Parsing Nathan Bodenstab, Aaron Dunlop, Keith Hall, Brian Roark June 2011.

Beam-Width Prediction

• Method: Train an averaged perceptron (Collins, 2002) to predict the optimal beam-width per chart cell

• Map each chart cell in sentence S spanning words wi … wj to a feature vector representation:

• x: Lexical and POS unigrams and bigrams, relative and absolute span• y:1 if gold rank > k, 0 otherwise (no gold edge has rank of -1)

• Minimize the loss:

• H is the unit step function

9

k

k

Page 10: Beam-Width Prediction for Efficient Context-Free Parsing Nathan Bodenstab, Aaron Dunlop, Keith Hall, Brian Roark June 2011.

Beam-Width Prediction

• Method: Use a discriminative classifier to predict the optimal beam-width per chart cell• Minimize the loss:

• L is the asymmetric loss function:

• If beam-width is too large, tolerable efficiency loss• If beam-width is too small, high risk to accuracy• Lambda set to 102 in all experiments

10

k

Page 11: Beam-Width Prediction for Efficient Context-Free Parsing Nathan Bodenstab, Aaron Dunlop, Keith Hall, Brian Roark June 2011.

11

Beam-Width Prediction

Special case: Predict if chart cell is open or closed to multi-word constituents

Page 12: Beam-Width Prediction for Efficient Context-Free Parsing Nathan Bodenstab, Aaron Dunlop, Keith Hall, Brian Roark June 2011.

12

Beam-Width Prediction

• A “closed” chart cell may need to be partially open• Binarized or dotted-rule parsing creates new “factored”

productions:

Page 13: Beam-Width Prediction for Efficient Context-Free Parsing Nathan Bodenstab, Aaron Dunlop, Keith Hall, Brian Roark June 2011.

13

Beam-Width Prediction

Method 1: Constituent Closure

Page 14: Beam-Width Prediction for Efficient Context-Free Parsing Nathan Bodenstab, Aaron Dunlop, Keith Hall, Brian Roark June 2011.

14

Beam-Width Prediction

• Constituent Closure is a per-cell generalization of Roark & Hollingshead (2008)– O(n2) classifications instead of O(n)

Page 15: Beam-Width Prediction for Efficient Context-Free Parsing Nathan Bodenstab, Aaron Dunlop, Keith Hall, Brian Roark June 2011.

15

Beam-Width Prediction

Method 2: Complete Closure

Page 16: Beam-Width Prediction for Efficient Context-Free Parsing Nathan Bodenstab, Aaron Dunlop, Keith Hall, Brian Roark June 2011.

16

Beam-Width Prediction

Method 3: Beam-Width Prediction

Page 17: Beam-Width Prediction for Efficient Context-Free Parsing Nathan Bodenstab, Aaron Dunlop, Keith Hall, Brian Roark June 2011.

17

Beam-Width Prediction

Method 3: Beam-Width Prediction

• Use multiple binary classifiers instead of regression (better performance)

• Local beam-width taken from classifier with smallest beam-width prediction

• Best performance with four binary classifiers: 0, 1, 2, 4– 97% of positive examples have beam-width <= 4– Don’t need a classifier for every possible beam-

width value between 0 and global maximum (15 in our case)

Page 18: Beam-Width Prediction for Efficient Context-Free Parsing Nathan Bodenstab, Aaron Dunlop, Keith Hall, Brian Roark June 2011.

18

Beam-Width Prediction

Page 19: Beam-Width Prediction for Efficient Context-Free Parsing Nathan Bodenstab, Aaron Dunlop, Keith Hall, Brian Roark June 2011.

19

Beam-Width Prediction

1.0

0.8

0.6

0.4

0.2

0.0

Page 20: Beam-Width Prediction for Efficient Context-Free Parsing Nathan Bodenstab, Aaron Dunlop, Keith Hall, Brian Roark June 2011.

20

Beam-Width Prediction

• Section 22 development set results

• Decoding time is seconds per sentence averaged over all sentences in Section 22

• Parsing with Berkeley latent variable grammar (4.3 million productions)

Parser Secs/Sent Speedup F1

CYK 70.383 89.4

CYK + Constituent Closure 47.870 1.5x 89.3

CYK + Complete Closure 32.619 2.2x 89.3

Page 21: Beam-Width Prediction for Efficient Context-Free Parsing Nathan Bodenstab, Aaron Dunlop, Keith Hall, Brian Roark June 2011.

21

Beam-Width Prediction

Parser Secs/Sent Speedup F1

CYK 70.383 89.4

CYK + Constituent Closure 47.870 1.5x 89.3

CYK + Complete Closure 32.619 2.2x 89.3

Beam + Inside FOM (BI) 3.977 89.2

BI + Constituent Closer 2.033 2.0x 89.2

BI + Complete Closure 1.575 2.5x 89.3

BI + Beam-Predict 1.180 3.4x 89.3

Page 22: Beam-Width Prediction for Efficient Context-Free Parsing Nathan Bodenstab, Aaron Dunlop, Keith Hall, Brian Roark June 2011.

22

Beam-Width Prediction

Parser Secs/Sent Speedup F1

CYK 70.383 89.4

CYK + Constituent Closure 47.870 1.5x 89.3

CYK + Complete Closure 32.619 2.2x 89.3

Beam + Inside FOM (BI) 3.977 89.2

BI + Constituent Closer 2.033 2.0x 89.2

BI + Complete Closure 1.575 2.5x 89.3

BI + Beam-Predict 1.180 3.4x 89.3

Beam + Boundary FOM (BB) 0.326 89.2

BB + Constituent Closure 0.279 1.2x 89.2

BB + Complete Closure 0.199 1.6x 89.3

BB + Beam-Predict 0.143 2.3x 89.3

Page 23: Beam-Width Prediction for Efficient Context-Free Parsing Nathan Bodenstab, Aaron Dunlop, Keith Hall, Brian Roark June 2011.

Beam-Width Prediction

23

Parser Secs/Sent Speedup F1

CYK 70.383 89.4

CYK + Constituent Closure 47.870 1.5x 89.3

CYK + Complete Closure 32.619 2.2x 89.3

Beam + Inside FOM (BI) 3.977 89.2

BI + Constituent Closer 2.033 2.0x 89.2

BI + Complete Closure 1.575 2.5x 89.3

BI + Beam-Predict 1.180 3.4x 89.3

Beam + Boundary FOM (BB) 0.326 89.2

BB + Constituent Closure 0.279 1.2x 89.2

BB + Complete Closure 0.199 1.6x 89.3

BB + Beam-Predict 0.143 2.3x 89.3

Most recent numbers 0.053 6.2x 89.x

Page 24: Beam-Width Prediction for Efficient Context-Free Parsing Nathan Bodenstab, Aaron Dunlop, Keith Hall, Brian Roark June 2011.

24

Beam-Width Prediction

• Section 23 test results• Only MaxRule is marginalizing over latent variables and

performing non-Viterbi decoding

Parser Secs/Sent F1

CYK 64.610 88.7

Berkeley CTF MaxRule Petrov and Klein (2007)

0.213 90.2

Berkeley CTF Viterbi 0.208 88.8

Beam + Boundary FOM (BB) Caraballo and Charniak (1998)

0.334 88.6

BB + Chart Constraints Roark and Hollingshead (2008; 2009)

0.244 88.7

BB + Beam-Prediction 0.125 88.7

Page 25: Beam-Width Prediction for Efficient Context-Free Parsing Nathan Bodenstab, Aaron Dunlop, Keith Hall, Brian Roark June 2011.

Thanks.

25

Page 26: Beam-Width Prediction for Efficient Context-Free Parsing Nathan Bodenstab, Aaron Dunlop, Keith Hall, Brian Roark June 2011.

26

Beam-Width Prediction

Page 27: Beam-Width Prediction for Efficient Context-Free Parsing Nathan Bodenstab, Aaron Dunlop, Keith Hall, Brian Roark June 2011.

27

FOM Details

• C&C FOM Details– FOM(NT) = Outsideleft * Inside * Outsideright

– Inside = Accumulated grammar score

– Outsideleft = MaxPOS [ POS forward prob * POS-to-NT transition prob ]

– Outsideright = MaxPOS [ NT-to-POS transition prob * POS bkwd prob ]

Page 28: Beam-Width Prediction for Efficient Context-Free Parsing Nathan Bodenstab, Aaron Dunlop, Keith Hall, Brian Roark June 2011.

28

FOM Details

• C&C FOM Details