Top Banner
SI485i : NLP Set 8 PCFGs and the CKY Algorithm
24

SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules.

Dec 31, 2015

Download

Documents

Lora Nash
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules.

SI485i : NLP

Set 8

PCFGs and the CKY Algorithm

Page 2: SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules.

2

PCFGs

• We saw how CFGs can model English (sort of)• Probabilistic CFGs put weights on the production

rules

• NP -> DET NN with probability 0.34• NP -> NN NN with probability 0.16

Page 3: SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules.

3

PCFGs

• We still parse sentences and come up with a syntactic derivation tree

• But now we can talk about how confident the tree is• P(tree) !

Page 4: SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules.

4

Buffalo Example

• What is the probability of this tree?• It’s the product of all the inner trees, e.g., P(S->NP VP)

Page 5: SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules.

5

PCFG Formalized

• G = (T, N, S, R, P)• T is set of terminals• N is set of nonterminals• For NLP, we usually distinguish out a set P N of preterminals, ⊂

which always rewrite as terminals

• S is the start symbol (one of the nonterminals)• R is rules/productions of the form X → γ, where X is a

nonterminal and γ is a sequence of terminals and nonterminals• P(R) gives the probability of each rule.

• A grammar G generates a language model L.

Some slides adapted from Chris Manning

Page 6: SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules.

6

Some notation

• w1n = w1 … wn = the word sequence from 1 to n• wab = the subsequence wa … wb

• We’ll write P(Ni → ζj) to mean P(Ni → ζj | Ni )• Take note, this is a conditional probability. For instance, the

sum of all rules headed by an NP must sum to 1!

• We’ll want to calculate the best tree T• maxT P(T * ⇒ wab)

Page 7: SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules.

7

Trees and Probabilities

• P(t) -- The probability of tree is the product of the probabilities of the rules used to generate it.

• P(w1n) -- The probability of the string is the sum of the probabilities of all possible trees that have the string as their yield• P(w1n) = Σj P(w1n, tj) where tj is a parse of w1n• = Σj P(tj)

Page 8: SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules.

8

Example PCFG

Page 9: SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules.

9

Page 10: SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules.

10

Page 11: SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules.

11

P(tree) computation

Page 12: SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules.

12

Time to Parse

• Let’s parse!!• Almost ready…• Trees must be in Chomsky Normal Form first.

Page 13: SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules.

13

Chomsky Normal Form

• All rules are Z -> X Y or Z -> w• Transforming a grammar to CNF does not change its

weak generative capacity.• Remove all unary rules and empties• Transform n-ary rules: VP->V NP PP becomes • VP -> V @VP-V and @VP-V -> NP PP

• Why do we do this? Parsing is easier now.

Page 14: SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules.

14

Converting into CNF

Page 15: SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules.

15

The CKY Algorithm

• Cocke-Kasami-Younger (CKY)

DynamicProgrammingIs back!

Page 16: SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules.

16

The CKY AlgorithmNP->NN NNS 0.13p = 0.13 x .0023 x .0014p = 1.87 x 10^-7

NP->NNP NNS 0.056p = 0.056 x .001 x .0014p = 7.84 x 10^-8

Page 17: SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules.

17

The CKY Algorithm

• What is the runtime? O( ?? )• Note that each cell must

check all pairs of children below it.

• Binarizing the CFG rules is a must. The complexity explodes if you do not.

Page 18: SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules.

18

Page 19: SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules.

19

Page 20: SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules.

20

Page 21: SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules.

21

Page 22: SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules.

22

Page 23: SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules.

23

Evaluating CKY

• How do we know if our parser works?

• Count the number of correct labels in your table...the label and the span it dominates• [ label, start, finish ]

• Most trees have an error or two!

• Count how many spans are correct, wrong, and compute a Precision/Recall ratio.

Page 24: SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules.

24

Probabilities?

• Where do the probabilities come from?• P( NP -> DT NN ) = ???

• Penn Treebank: a bunch of newspaper articles whose sentences have been manually annotated with full parse trees

• P( NP -> DT NN ) = C( NP -> DT NN ) / C( NP )