Grammar Induction With ADIOS (“Automatic DIstillation Of Structure”)
Mar 31, 2015
Grammar Induction
With ADIOS(“Automatic DIstillation Of
Structure”)
Previous work
Probabilistic Context Free Grammars
‘Supervised’ induction methods Little work on raw data
Mostly work on artificial CFGs Clustering
Our goal
Given a corpus of raw text separated into sentences, we want to derive a specification of the underlying grammar
This means we want to be able to Create new unseen grammatically correct
sentences Accept new unseen grammatically correct
sentences and reject ungrammatical ones
What do we need to do?
G is given by the rewrite rules – SNP VP NPthe N | a N Nman | boy | dog VPV NP Vsaw | heard | sensed | sniffed
ADIOS in outline
Composed of three main elements A representational data structure A segmentation criterion (MEX) A generalization ability
We will consider each of these in turn
Is that a dog?
(6)102(5)(4)102 (3)
(4)
101
101)1( (2) 101 (3)
103
(1)
104
(1)
(2)
104
(3)
(2)(3)
103
(6)
(5)
(7)
(6)
)6(
(5)
where
104
(4)the
dog ? END
(4)
(5)
a
andhorse
)2( that
cat
102(1)BEGIN is
Is that a cat?Where is the dog? And is that a horse?
nodeedge
The Model: Graph representation with words as vertices and sentences as paths.
ADIOS in outline
Composed of three main elements A representational data structure A segmentation criterion (MEX) A generalization ability
Toy problem – Alice in Wonderland
a l i c e w a s b e g i n n i n g t o g e t v e r y t i r e d o f s i t t i n g b y h e r s i s t e r o n t h e b a n k a n d o f h a v i n g n o t h i n g t o d o o n c e o r t w i c e s h e h a d p e e p e d i n t o t h e b o o k h e r s i s t e r w a s r e a d i n g b u t i t h a d n o p i c t u r e s o r c o n v e r s a t i o n s i n i t a n d w h a t i s t h e u s e o f a b o o k t h o u g h t a l i c e w i t h o u t p i c t u r e s o r c o n v e r s a t i o n
Detecting significant patterns
Identifying patterns becomes easier on a graph Sub-paths are automatically aligned
search path
4 5
1
2
36 7
e1 end
5 4
7
1
23
vertex
path
begin
8
e4 e5 e6
86
A
e3e2
9Initialization
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
p
e1 e2 e3 e4 e5
significantpattern
PR)e2|e1(=3/4
PR)e4|e1e2e3(=1
PR)e5|e1e2e3e4(=1/3
PL)e4(=6/41
PL)e3|e4(=5/6
PL)e2|e3e4(=1
PL)e1|e2e3e4(=3/5
PL
SLSR
PR
PR)e1(=4/41
PR)e3|e1e2(=1
begin end
Motif EXtraction
The Markov Matrix
The top right triangle defines the PL probabilities, bottom left triangle the PR probabilities
Matrix is path-dependent
1
BEGIN e1 e2 e3 e4 e5 e6 END
BEGIN8/41 2/ 4
1/ 3 1 1
e12/8 4/41
1 1
e21/2 1 1
1 1
e31 1
1/ 2 1
e41 1 1 1
1/2
e51 1/3
e61 1 1 1
1 1/8
END 8/ 411 1 1 1
1
1
1
1
1
1
B
1
1
3/4
1/5
5/6
6/41
3/6
1/3
5/41
3/5
1/3
1/5
3/5
5/6
6/41
2/6
1/2
1/2 1/2
2/41
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
p
e1 e2 e3 e4 e5
significantpattern
PR)e2|e1(=3/4
PR)e4|e1e2e3(=1
PR)e5|e1e2e3e4(=1/3
PL)e4(=6/41
PL)e3|e4(=5/6
PL)e2|e3e4(=1
PL)e1|e2e3e4(=3/5
PL
SLSR
PR
PR)e1(=4/41
PR)e3|e1e2(=1
begin end
Example of a probability matrix
search path1
2
36 7
e1 end
5 4
7
1
2
3
vertex
begin
8
e4 e5 e6
54
76
5 4
6 73
e2
new vertex
86
11
e3e2
e3
9
3
9
e4
8
8
C rewiring
e2 e3 e4
P1
4 5
path9
9
Rewiring the graph
Once a pattern is identified as significant, the sub-paths it subsumes are merged into a new vertex and the graph is rewired accordingly. Repeating this process, leads to the formation of complex, hierarchically structured patterns.
MEX at work
ALICE motifsWeight Occurrences Length
conversation 0.98 11 11whiterabbit 1.00 22 10caterpillar 1.00 28 10interrupted 0.94 7 10procession 0.93 6 9mockturtle 0.91 56 9beautiful 1.00 16 8important 0.99 11 8continued 0.98 9 8different 0.98 9 8atanyrate 0.94 7 8difficult 0.94 7 8surprise 0.99 10 7appeared 0.97 10 7mushroom 0.97 8 7thistime 0.95 19 7suddenly 0.94 13 7business 0.94 7 7nonsense 0.94 7 7morethan 0.94 6 7remember 0.92 20 7consider 0.91 10 7
curious 1.00 19 6hadbeen 1.00 17 6however 1.00 20 6perhaps 1.00 16 6hastily 1.00 16 6herself 1.00 78 6footman 1.00 14 6suppose 1.00 12 6silence 0.99 14 6witness 0.99 10 6gryphon 0.97 54 6serpent 0.97 11 6angrily 0.97 8 6croquet 0.97 8 6venture 0.95 12 6forsome 0.95 12 6timidly 0.95 9 6whisper 0.95 9 6rabbit 1.00 27 5course 1.00 25 5eplied 1.00 22 5seemed 1.00 26 5remark 1.00 28 5
Weight Occurrences Length
ADIOS in outline
Composed of three main elements A representational data structure A segmentation criterion (MEX) A generalization ability
Generalization – defining an equivalence class
show me flights from philadelphia to san francisco on wednesdays
list all flights from boston to san francisco with the maximum number of stops
show flights from dallas to san francisco
may i see the flights from denver to san francisco please
show me flights from to san francisco on wednesdays
boston philadelphia
denverdallas
Generalized search path:
Generalization
show me flights from to san francisco on wednesdays
boston philadelphia
denverdallas
i need to fly from boston to baltimore please give me…
which airlines fly from dallas to denver
please give me a flight from philadelphia to atlanta before ten a m in the morning
list all flights going from boston to atlanta on wednesday…
P1: from _E1 to
_E1 =
boston philadelphia
denverdallas
Context-sensitive generalization Slide a context window of size L across
current search path For each 1≤i≤L
look at all paths that are identical with the search path for 1≤k≤L, except for k=i
Define an equivalence class containing the nodes at index i for these paths
Replace i’th node with equivalence class Find significant patterns using MEX criterion
Determining L
Involves a tradeoff Larger L will demand more context
sensitivity in the inference Will hamper generalization
Smaller L will detect more patterns But many might be spurious
The effects of context window width
0
0. 1
0. 2
0. 3
0. 4
0. 5
0. 6
0. 7
0. 8
0. 9
1
0 0. 1 0. 2 0. 3 0. 4 0. 5 0. 6 0. 7 0. 8 0. 9 1
C
D
B
G
Recall
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00
Pre
cis
ion
over-generalization
low
pro
du
cti
vit
y
A
B
C
D L=6
L=5
L=4
L=3
10,000Sentences
120,000Sentences
40,000Sentences
0
0. 1
0. 2
0. 3
0. 4
0. 5
0. 6
0. 7
0. 8
0. 9
1
0 0. 1 0. 2 0. 3 0. 4 0. 5 0. 6 0. 7 0. 8 0. 9 1
F
0
0. 1
0. 2
0. 3
0. 4
0. 5
0. 6
0. 7
0. 8
0. 9
1
0 0. 1 0. 2 0. 3 0. 4 0. 5 0. 6 0. 7 0. 8 0. 9 1
E
120,000Sentences
Over-generalization
john believes that to please is easy john thinks that to please is fun jack and john believe that to
please is hard
john that to please is easybelievesthinksbelieve
Generalized search path:
Bootstrapping
what are the cheapest flights from denver to boston that stop in atlanta
boston philadelphia
denverdallas
A pre-existing equivalence class:
What are the cheapest flights from to that stop in atlanta
boston philadelphia
denverdallas
Generalized search path I:
boston philadelphia
denverdallas
Bootstrapping
What are the cheapest flights from to that stop in atlanta
boston philadelphia
denverdallas
boston philadelphia
denverdallas
what is the cheapest fare from denver to philadelphia and from pittsburgh to atlanta
i would… like the cheapest airfare from boston to denver december twenty sixth
show me the cheapest flight from philadelphia to dallas which arrives…
Bootstrapping
What are the cheapest from to that stop in atlanta
boston philadelphia
denver
Generalized search path II:
denver philadelphia
dallas
flightflightsairfare
fare
_P2: the cheapest _E2 from _E3 to _E4
flightflightsairfare
fare
boston philadelphia
denver
denver philadelphia
dallas_E2 = _E3 = _E4 =
Bootstrapping Slide a context window of length L along
the current search path Consider all sub-paths of length L that
begin in a1 and end in aL These are the candidate paths
For each 1≤i≤L For each 1≤k≤L, k≠i
Replace node k with the EC that contains node k and maximally overlaps the set of nodes at index k of the candidate paths
Continue as before
The ADIOS algorithm
Initialization – load all data into a pseudograph
Until no more patterns are found For each path P
Create generalized search paths from P Detect significant patterns using MEX If found, add best new pattern and
equivalence classes and rewire the graph
1205
567 321120132234 621987
321234987 1203
567 321120132234 621987 2000
321234987 1203 3211203
234987 1204
987 2001 1204
The training process
The training process
1205
567 321120132234 621987
1203
567120132 6212000
3213211203
1204
987 2001 1204
The result
1205
567321 120 132 234621 987
567
120 132621 2000
321
321 1203
98720011204
Example
uice,kid,knife,ladder,lid,matter,milk,minute,mommy,mouth,nap,nose,number,people,picnic,picture,pie,pretend,question,ride,right,salad,second,smile,snack,snow,snowman,spoon,steak,step,store,story,table,time,toaster,top,tower,truck,try,window,wood,
t,salad,second,smile,snack,snow,snowman,spoon,steak,step,store,story,table,time,toaster,top,tower,truck,try,window,wood,
to go
ba
ckh
om
eo
uts
ide
po
tty
up
tha
tth
e
1405
lad
de
r1404
inth
at
the
ba
ckb
ed
roo
mb
en
chb
ox
car
cha
irci
rcle
clo
set
cup
ga
rag
eh
ou
seo
ne
ove
nre
frig
era
tor
sno
wsq
ua
retr
uck
1458
1457)0.56(
1904
1903 )0.15(
1405
)1(
do
you
ha
ve
like
wa
nt
1679
1678
)1(
rephrase sentences by ADIOSoriginal sentences from CHILDES
(a)
(b)
I'll play with the toys and you play with your bib.there's another bar+b+que.there's a chicken!play with the dolls and the roof?
oh ; the peanut butter can go up there .you better finish it.we better hold that ; then.uh ; that's another little girl!should we put this stuff in in another chick?
I'll play with the eggs and you play with your Mom.there's another chicken.there's a square!play with the cars and the people?
should we put this chair back in the bedroom?
oh ; the peanut butter can sit right there.you better eat it.we better finish it ; then.yeah ; that's a good one!
More Patterns
Evaluating performance
In principle, we would like to compare ADIOS-generated parse-trees with the true parse-trees for given sentences
Alas, the ‘true parse-trees’ are subject to opinion Some approaches don’t even suppose
parse trees
Evaluating performance Define
Recall – the probability of ADIOS recognizing an unseen grammatical sentence
Precision – the proportion of grammatical ADIOS productions
Recall can be assessed by leaving out some of the training corpus
Precision is trickier Unless we’re learning a known CFG
The ATIS experiments ATIS-NL is a 13,043 sentence corpus of
natural language Transcribed phone calls to an airline
reservation service ADIOS was trained on 12,700 sentences
of ATIS-NL The remaining 343 sentences were used to
assess recall Precision was determined with the help of 8
graduate students from Cornell University
An ADIOS drawback
ADIOS is inherently a heuristic and greedy algorithm Once a pattern is created it remains
forever – errors conflate Sentence ordering affects outcome
Running ADIOS with different orderings gives patterns that ‘cover’ different parts of the grammar
An ad-hoc solution Train multiple learners on the corpus
Each on a different sentence ordering Create a ‘forest’ of learners
To create a new sentence Pick one learner at random Use it to produce sentence
To check grammaticality of given sentence If any learner accepts sentence, declare as
grammatical
The ATIS experiments
ADIOS’ performance scores – Recall – 40% Precision – 70%
For comparison, ATIS-CFG reached – Recall – 45% Precision - <1%(!)
ADIOS/ATIS-N comparison
0.00
0.20
0.40
0.60
0.80
1.00
ADIOS ATIS-N
Pre
cis
ion
A B
Chinese
Spanish
French
English
Swedish
Danish
C
D E
unacceptable
acceptable
H1
1 i would like a flight from washington to boston flight three twenty four on august twentieth
C2
1 round trip flight from boston to baltimore leaving boston less than a thousand dollars
C 3 1 well what offers the ground transportation available in fort worth
C 4 1 does continental fly from san francisco to atlanta
H 5 1 does american airlines fly from philadelphia to dallas
H 6 1 please describe to me the classes of service that are available
H 7 1 i'd like to fly from philadelphia to dallas next week
C 8 1 which airline offers the most flights from san francisco washington
H 9 1 is it possible for me to fly from baltimore to san francisco
C10 1
i want to fly from boston to pittsburgh to san francisco
C11 1
would like to arrange a round trip flight from atlanta to boston to pittsburgh to san francisco tuesday the
H12 1
between eleven and twelve o'clock in the morning
C13 1
what offers the cheapest fare from boston to pittsburgh to atlanta
C14 1
what is the airfare from boston to pittsburgh to atlanta HumanComputer
English as Second Language test A single instance of ADIOS was trained
on the CHILDES corpus 120,000 sentences of transcribed child-
directed speech Subjected to the Goteborg multiple
choice ESL test 100 sentences, each with open slot Pick correct word out of three
ADIOS got 60% of answers correctly An average ninth-grader performance
ADIOS
Meta-analysis of ADIOS results
Define a pattern spectrum as the histogram of pattern types for an individual learner A pattern type is determined by its
contents E.g. TT, TET, EE, PE…
A single ADIOS learner was trained with each of 6 translations of the bible
Pattern spectraT
T
TE
TP
ET
EE
EP
PT
PE
PP
TTT
TTE
TTP
TE
T
TE
E
TE
P
TP
T
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
English
Spanish
Swedish
Chinese
Danish
French
0.60
0.65
0.70
0.75
0.80
0.85
0.90
0.95
1.00
0 200 400 600 8000.60
0.65
0.70
0.75
0.80
0.85
0.90
0.95
1.00
0 200 400 600 800
A B
Chinese
Spanish
French
English
Swedish
Danish
C
D E
Language dendogram
TT TE TP ET EE EP PT PE PPTT
TTT
ETT
PTE
TTE
ETE
PTP
T
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
A B
Chinese
Spanish
French
English
Swedish
Danish
C
D E
In case there’s time…
Pattern significance Say we found a potential pattern-edge
from nodes 1 to n. Define m - the number of paths from 1 to n r – the number of paths from 1 to n+1 Because it’s a pattern edge, we know that
Let’s suppose that the true probability for n+1 given 1 through n is
r/m is our best estimate, but just an estimate What are the odds of getting r and m but
still have ?
1 or nn
n
P rP
P m
*1nP
*1n nP P
Pattern significance Assume The odds of getting result r and m
or better are then given by
If this is smaller than a predetermined α, we say the pattern-edge candidate is significant
*1n nP P
1
( , , ) ( ) (1 )r
i m in n n
i
mB r m P P P
i
To be continued…