Top Banner
Grammar Induction With ADIOS (“Automatic DIstillation Of Structure”)
50

Grammar Induction With ADIOS (Automatic DIstillation Of Structure)

Mar 31, 2015

Download

Documents

Dylon Grim
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Grammar Induction With ADIOS (Automatic DIstillation Of Structure)

Grammar Induction

With ADIOS(“Automatic DIstillation Of

Structure”)

Page 2: Grammar Induction With ADIOS (Automatic DIstillation Of Structure)

Previous work

Probabilistic Context Free Grammars

‘Supervised’ induction methods Little work on raw data

Mostly work on artificial CFGs Clustering

Page 3: Grammar Induction With ADIOS (Automatic DIstillation Of Structure)

Our goal

Given a corpus of raw text separated into sentences, we want to derive a specification of the underlying grammar

This means we want to be able to Create new unseen grammatically correct

sentences Accept new unseen grammatically correct

sentences and reject ungrammatical ones

Page 4: Grammar Induction With ADIOS (Automatic DIstillation Of Structure)

What do we need to do?

G is given by the rewrite rules – SNP VP NPthe N | a N Nman | boy | dog VPV NP Vsaw | heard | sensed | sniffed

Page 5: Grammar Induction With ADIOS (Automatic DIstillation Of Structure)

ADIOS in outline

Composed of three main elements A representational data structure A segmentation criterion (MEX) A generalization ability

We will consider each of these in turn

Page 6: Grammar Induction With ADIOS (Automatic DIstillation Of Structure)

Is that a dog?

(6)102(5)(4)102 (3)

(4)

101

101)1( (2) 101 (3)

103

(1)

104

(1)

(2)

104

(3)

(2)(3)

103

(6)

(5)

(7)

(6)

)6(

(5)

where

104

(4)the

dog ? END

(4)

(5)

a

andhorse

)2( that

cat

102(1)BEGIN is

Is that a cat?Where is the dog? And is that a horse?

nodeedge

The Model: Graph representation with words as vertices and sentences as paths.

Page 7: Grammar Induction With ADIOS (Automatic DIstillation Of Structure)

ADIOS in outline

Composed of three main elements A representational data structure A segmentation criterion (MEX) A generalization ability

Page 8: Grammar Induction With ADIOS (Automatic DIstillation Of Structure)

Toy problem – Alice in Wonderland

a l i c e w a s b e g i n n i n g t o g e t v e r y t i r e d o f s i t t i n g b y h e r s i s t e r o n t h e b a n k a n d o f h a v i n g n o t h i n g t o d o o n c e o r t w i c e s h e h a d p e e p e d i n t o t h e b o o k h e r s i s t e r w a s r e a d i n g b u t i t h a d n o p i c t u r e s o r c o n v e r s a t i o n s i n i t a n d w h a t i s t h e u s e o f a b o o k t h o u g h t a l i c e w i t h o u t p i c t u r e s o r c o n v e r s a t i o n

Page 9: Grammar Induction With ADIOS (Automatic DIstillation Of Structure)

Detecting significant patterns

Identifying patterns becomes easier on a graph Sub-paths are automatically aligned

search path

4 5

1

2

36 7

e1 end

5 4

7

1

23

vertex

path

begin

8

e4 e5 e6

86

A

e3e2

9Initialization

Page 10: Grammar Induction With ADIOS (Automatic DIstillation Of Structure)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

p

e1 e2 e3 e4 e5

significantpattern

PR)e2|e1(=3/4

PR)e4|e1e2e3(=1

PR)e5|e1e2e3e4(=1/3

PL)e4(=6/41

PL)e3|e4(=5/6

PL)e2|e3e4(=1

PL)e1|e2e3e4(=3/5

PL

SLSR

PR

PR)e1(=4/41

PR)e3|e1e2(=1

begin end

Motif EXtraction

Page 11: Grammar Induction With ADIOS (Automatic DIstillation Of Structure)

The Markov Matrix

The top right triangle defines the PL probabilities, bottom left triangle the PR probabilities

Matrix is path-dependent

Page 12: Grammar Induction With ADIOS (Automatic DIstillation Of Structure)

1

BEGIN e1 e2 e3 e4 e5 e6 END

BEGIN8/41 2/ 4

1/ 3 1 1

e12/8 4/41

1 1

e21/2 1 1

1 1

e31 1

1/ 2 1

e41 1 1 1

1/2

e51 1/3

e61 1 1 1

1 1/8

END 8/ 411 1 1 1

1

1

1

1

1

1

B

1

1

3/4

1/5

5/6

6/41

3/6

1/3

5/41

3/5

1/3

1/5

3/5

5/6

6/41

2/6

1/2

1/2 1/2

2/41

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

p

e1 e2 e3 e4 e5

significantpattern

PR)e2|e1(=3/4

PR)e4|e1e2e3(=1

PR)e5|e1e2e3e4(=1/3

PL)e4(=6/41

PL)e3|e4(=5/6

PL)e2|e3e4(=1

PL)e1|e2e3e4(=3/5

PL

SLSR

PR

PR)e1(=4/41

PR)e3|e1e2(=1

begin end

Page 13: Grammar Induction With ADIOS (Automatic DIstillation Of Structure)

Example of a probability matrix

Page 14: Grammar Induction With ADIOS (Automatic DIstillation Of Structure)

search path1

2

36 7

e1 end

5 4

7

1

2

3

vertex

begin

8

e4 e5 e6

54

76

5 4

6 73

e2

new vertex

86

11

e3e2

e3

9

3

9

e4

8

8

C rewiring

e2 e3 e4

P1

4 5

path9

9

Rewiring the graph

Once a pattern is identified as significant, the sub-paths it subsumes are merged into a new vertex and the graph is rewired accordingly. Repeating this process, leads to the formation of complex, hierarchically structured patterns.

Page 15: Grammar Induction With ADIOS (Automatic DIstillation Of Structure)

MEX at work

Page 16: Grammar Induction With ADIOS (Automatic DIstillation Of Structure)

ALICE motifsWeight Occurrences Length

conversation 0.98 11 11whiterabbit 1.00 22 10caterpillar 1.00 28 10interrupted 0.94 7 10procession 0.93 6 9mockturtle 0.91 56 9beautiful 1.00 16 8important 0.99 11 8continued 0.98 9 8different 0.98 9 8atanyrate 0.94 7 8difficult 0.94 7 8surprise 0.99 10 7appeared 0.97 10 7mushroom 0.97 8 7thistime 0.95 19 7suddenly 0.94 13 7business 0.94 7 7nonsense 0.94 7 7morethan 0.94 6 7remember 0.92 20 7consider 0.91 10 7

curious 1.00 19 6hadbeen 1.00 17 6however 1.00 20 6perhaps 1.00 16 6hastily 1.00 16 6herself 1.00 78 6footman 1.00 14 6suppose 1.00 12 6silence 0.99 14 6witness 0.99 10 6gryphon 0.97 54 6serpent 0.97 11 6angrily 0.97 8 6croquet 0.97 8 6venture 0.95 12 6forsome 0.95 12 6timidly 0.95 9 6whisper 0.95 9 6rabbit 1.00 27 5course 1.00 25 5eplied 1.00 22 5seemed 1.00 26 5remark 1.00 28 5

Weight Occurrences Length

Page 17: Grammar Induction With ADIOS (Automatic DIstillation Of Structure)

ADIOS in outline

Composed of three main elements A representational data structure A segmentation criterion (MEX) A generalization ability

Page 18: Grammar Induction With ADIOS (Automatic DIstillation Of Structure)

Generalization – defining an equivalence class

show me flights from philadelphia to san francisco on wednesdays

list all flights from boston to san francisco with the maximum number of stops

show flights from dallas to san francisco

may i see the flights from denver to san francisco please

show me flights from to san francisco on wednesdays

boston philadelphia

denverdallas

Generalized search path:

Page 19: Grammar Induction With ADIOS (Automatic DIstillation Of Structure)

Generalization

show me flights from to san francisco on wednesdays

boston philadelphia

denverdallas

i need to fly from boston to baltimore please give me…

which airlines fly from dallas to denver

please give me a flight from philadelphia to atlanta before ten a m in the morning

list all flights going from boston to atlanta on wednesday…

P1: from _E1 to

_E1 =

boston philadelphia

denverdallas

Page 20: Grammar Induction With ADIOS (Automatic DIstillation Of Structure)

Context-sensitive generalization Slide a context window of size L across

current search path For each 1≤i≤L

look at all paths that are identical with the search path for 1≤k≤L, except for k=i

Define an equivalence class containing the nodes at index i for these paths

Replace i’th node with equivalence class Find significant patterns using MEX criterion

Page 21: Grammar Induction With ADIOS (Automatic DIstillation Of Structure)

Determining L

Involves a tradeoff Larger L will demand more context

sensitivity in the inference Will hamper generalization

Smaller L will detect more patterns But many might be spurious

Page 22: Grammar Induction With ADIOS (Automatic DIstillation Of Structure)

The effects of context window width

0

0. 1

0. 2

0. 3

0. 4

0. 5

0. 6

0. 7

0. 8

0. 9

1

0 0. 1 0. 2 0. 3 0. 4 0. 5 0. 6 0. 7 0. 8 0. 9 1

C

D

B

G

Recall

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00

Pre

cis

ion

over-generalization

low

pro

du

cti

vit

y

A

B

C

D L=6

L=5

L=4

L=3

10,000Sentences

120,000Sentences

40,000Sentences

0

0. 1

0. 2

0. 3

0. 4

0. 5

0. 6

0. 7

0. 8

0. 9

1

0 0. 1 0. 2 0. 3 0. 4 0. 5 0. 6 0. 7 0. 8 0. 9 1

F

0

0. 1

0. 2

0. 3

0. 4

0. 5

0. 6

0. 7

0. 8

0. 9

1

0 0. 1 0. 2 0. 3 0. 4 0. 5 0. 6 0. 7 0. 8 0. 9 1

E

120,000Sentences

Page 23: Grammar Induction With ADIOS (Automatic DIstillation Of Structure)

Over-generalization

john believes that to please is easy john thinks that to please is fun jack and john believe that to

please is hard

john that to please is easybelievesthinksbelieve

Generalized search path:

Page 24: Grammar Induction With ADIOS (Automatic DIstillation Of Structure)

Bootstrapping

what are the cheapest flights from denver to boston that stop in atlanta

boston philadelphia

denverdallas

A pre-existing equivalence class:

What are the cheapest flights from to that stop in atlanta

boston philadelphia

denverdallas

Generalized search path I:

boston philadelphia

denverdallas

Page 25: Grammar Induction With ADIOS (Automatic DIstillation Of Structure)

Bootstrapping

What are the cheapest flights from to that stop in atlanta

boston philadelphia

denverdallas

boston philadelphia

denverdallas

what is the cheapest fare from denver to philadelphia and from pittsburgh to atlanta

i would… like the cheapest airfare from boston to denver december twenty sixth

show me the cheapest flight from philadelphia to dallas which arrives…

Page 26: Grammar Induction With ADIOS (Automatic DIstillation Of Structure)

Bootstrapping

What are the cheapest from to that stop in atlanta

boston philadelphia

denver

Generalized search path II:

denver philadelphia

dallas

flightflightsairfare

fare

_P2: the cheapest _E2 from _E3 to _E4

flightflightsairfare

fare

boston philadelphia

denver

denver philadelphia

dallas_E2 = _E3 = _E4 =

Page 27: Grammar Induction With ADIOS (Automatic DIstillation Of Structure)

Bootstrapping Slide a context window of length L along

the current search path Consider all sub-paths of length L that

begin in a1 and end in aL These are the candidate paths

For each 1≤i≤L For each 1≤k≤L, k≠i

Replace node k with the EC that contains node k and maximally overlaps the set of nodes at index k of the candidate paths

Continue as before

Page 28: Grammar Induction With ADIOS (Automatic DIstillation Of Structure)

The ADIOS algorithm

Initialization – load all data into a pseudograph

Until no more patterns are found For each path P

Create generalized search paths from P Detect significant patterns using MEX If found, add best new pattern and

equivalence classes and rewire the graph

Page 29: Grammar Induction With ADIOS (Automatic DIstillation Of Structure)

1205

567 321120132234 621987

321234987 1203

567 321120132234 621987 2000

321234987 1203 3211203

234987 1204

987 2001 1204

The training process

Page 30: Grammar Induction With ADIOS (Automatic DIstillation Of Structure)

The training process

1205

567 321120132234 621987

1203

567120132 6212000

3213211203

1204

987 2001 1204

Page 31: Grammar Induction With ADIOS (Automatic DIstillation Of Structure)

The result

1205

567321 120 132 234621 987

567

120 132621 2000

321

321 1203

98720011204

Page 32: Grammar Induction With ADIOS (Automatic DIstillation Of Structure)

Example

uice,kid,knife,ladder,lid,matter,milk,minute,mommy,mouth,nap,nose,number,people,picnic,picture,pie,pretend,question,ride,right,salad,second,smile,snack,snow,snowman,spoon,steak,step,store,story,table,time,toaster,top,tower,truck,try,window,wood,

t,salad,second,smile,snack,snow,snowman,spoon,steak,step,store,story,table,time,toaster,top,tower,truck,try,window,wood,

to go

ba

ckh

om

eo

uts

ide

po

tty

up

tha

tth

e

1405

lad

de

r1404

inth

at

the

ba

ckb

ed

roo

mb

en

chb

ox

car

cha

irci

rcle

clo

set

cup

ga

rag

eh

ou

seo

ne

ove

nre

frig

era

tor

sno

wsq

ua

retr

uck

1458

1457)0.56(

1904

1903 )0.15(

1405

)1(

do

you

ha

ve

like

wa

nt

1679

1678

)1(

rephrase sentences by ADIOSoriginal sentences from CHILDES

(a)

(b)

I'll play with the toys and you play with your bib.there's another bar+b+que.there's a chicken!play with the dolls and the roof?

oh ; the peanut butter can go up there .you better finish it.we better hold that ; then.uh ; that's another little girl!should we put this stuff in in another chick?

I'll play with the eggs and you play with your Mom.there's another chicken.there's a square!play with the cars and the people?

should we put this chair back in the bedroom?

oh ; the peanut butter can sit right there.you better eat it.we better finish it ; then.yeah ; that's a good one!

Page 33: Grammar Induction With ADIOS (Automatic DIstillation Of Structure)

More Patterns

Page 34: Grammar Induction With ADIOS (Automatic DIstillation Of Structure)

Evaluating performance

In principle, we would like to compare ADIOS-generated parse-trees with the true parse-trees for given sentences

Alas, the ‘true parse-trees’ are subject to opinion Some approaches don’t even suppose

parse trees

Page 35: Grammar Induction With ADIOS (Automatic DIstillation Of Structure)

Evaluating performance Define

Recall – the probability of ADIOS recognizing an unseen grammatical sentence

Precision – the proportion of grammatical ADIOS productions

Recall can be assessed by leaving out some of the training corpus

Precision is trickier Unless we’re learning a known CFG

Page 36: Grammar Induction With ADIOS (Automatic DIstillation Of Structure)

The ATIS experiments ATIS-NL is a 13,043 sentence corpus of

natural language Transcribed phone calls to an airline

reservation service ADIOS was trained on 12,700 sentences

of ATIS-NL The remaining 343 sentences were used to

assess recall Precision was determined with the help of 8

graduate students from Cornell University

Page 37: Grammar Induction With ADIOS (Automatic DIstillation Of Structure)

An ADIOS drawback

ADIOS is inherently a heuristic and greedy algorithm Once a pattern is created it remains

forever – errors conflate Sentence ordering affects outcome

Running ADIOS with different orderings gives patterns that ‘cover’ different parts of the grammar

Page 38: Grammar Induction With ADIOS (Automatic DIstillation Of Structure)

An ad-hoc solution Train multiple learners on the corpus

Each on a different sentence ordering Create a ‘forest’ of learners

To create a new sentence Pick one learner at random Use it to produce sentence

To check grammaticality of given sentence If any learner accepts sentence, declare as

grammatical

Page 39: Grammar Induction With ADIOS (Automatic DIstillation Of Structure)

The ATIS experiments

ADIOS’ performance scores – Recall – 40% Precision – 70%

For comparison, ATIS-CFG reached – Recall – 45% Precision - <1%(!)

Page 40: Grammar Induction With ADIOS (Automatic DIstillation Of Structure)

ADIOS/ATIS-N comparison

0.00

0.20

0.40

0.60

0.80

1.00

ADIOS ATIS-N

Pre

cis

ion

A B

Chinese

Spanish

French

English

Swedish

Danish

C

D E

Page 41: Grammar Induction With ADIOS (Automatic DIstillation Of Structure)

unacceptable

acceptable

H1

1   i would like a flight from washington to boston flight three twenty four on august twentieth

C2

1   round trip flight from boston to baltimore leaving boston less than a thousand dollars

C 3   1 well what offers the ground transportation available in fort worth

C 4   1 does continental fly from san francisco to atlanta

H 5   1 does american airlines fly from philadelphia to dallas

H 6   1 please describe to me the classes of service that are available

H 7   1 i'd like to fly from philadelphia to dallas next week

C 8 1   which airline offers the most flights from san francisco washington

H 9   1 is it possible for me to fly from baltimore to san francisco

C10   1

i want to fly from boston to pittsburgh to san francisco

C11 1  

would like to arrange a round trip flight from atlanta to boston to pittsburgh to san francisco tuesday the

H12 1  

between eleven and twelve o'clock in the morning

C13   1

what offers the cheapest fare from boston to pittsburgh to atlanta

C14   1

what is the airfare from boston to pittsburgh to atlanta HumanComputer

Page 42: Grammar Induction With ADIOS (Automatic DIstillation Of Structure)

English as Second Language test A single instance of ADIOS was trained

on the CHILDES corpus 120,000 sentences of transcribed child-

directed speech Subjected to the Goteborg multiple

choice ESL test 100 sentences, each with open slot Pick correct word out of three

ADIOS got 60% of answers correctly An average ninth-grader performance

Page 43: Grammar Induction With ADIOS (Automatic DIstillation Of Structure)

ADIOS

Page 44: Grammar Induction With ADIOS (Automatic DIstillation Of Structure)

Meta-analysis of ADIOS results

Define a pattern spectrum as the histogram of pattern types for an individual learner A pattern type is determined by its

contents E.g. TT, TET, EE, PE…

A single ADIOS learner was trained with each of 6 translations of the bible

Page 45: Grammar Induction With ADIOS (Automatic DIstillation Of Structure)

Pattern spectraT

T

TE

TP

ET

EE

EP

PT

PE

PP

TTT

TTE

TTP

TE

T

TE

E

TE

P

TP

T

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

English

Spanish

Swedish

Chinese

Danish

French

0.60

0.65

0.70

0.75

0.80

0.85

0.90

0.95

1.00

0 200 400 600 8000.60

0.65

0.70

0.75

0.80

0.85

0.90

0.95

1.00

0 200 400 600 800

A B

Chinese

Spanish

French

English

Swedish

Danish

C

D E

Page 46: Grammar Induction With ADIOS (Automatic DIstillation Of Structure)

Language dendogram

TT TE TP ET EE EP PT PE PPTT

TTT

ETT

PTE

TTE

ETE

PTP

T

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

A B

Chinese

Spanish

French

English

Swedish

Danish

C

D E

Page 47: Grammar Induction With ADIOS (Automatic DIstillation Of Structure)

In case there’s time…

Page 48: Grammar Induction With ADIOS (Automatic DIstillation Of Structure)

Pattern significance Say we found a potential pattern-edge

from nodes 1 to n. Define m - the number of paths from 1 to n r – the number of paths from 1 to n+1 Because it’s a pattern edge, we know that

Let’s suppose that the true probability for n+1 given 1 through n is

r/m is our best estimate, but just an estimate What are the odds of getting r and m but

still have ?

1 or nn

n

P rP

P m

*1nP

*1n nP P

Page 49: Grammar Induction With ADIOS (Automatic DIstillation Of Structure)

Pattern significance Assume The odds of getting result r and m

or better are then given by

If this is smaller than a predetermined α, we say the pattern-edge candidate is significant

*1n nP P

1

( , , ) ( ) (1 )r

i m in n n

i

mB r m P P P

i

Page 50: Grammar Induction With ADIOS (Automatic DIstillation Of Structure)

To be continued…