Page 1
Knowledge and Tree-Edits in Learnable Entailment Proofs
Asher Stern, Amnon Lotan, Shachar Mirkin, Eyal Shnarch, Lili Kotlerman, Jonathan Berant and Ido Dagan
TACNovember 2011, NIST, Gaithersburg, Maryland, USADownload at: http://www.cs.biu.ac.il/~nlp/downloads/biutee
BIUTEE
Page 2
2
RTE• Classify a (T,H) pair as
ENTAILING or NON-ENTAILING
T: The boy was located by the police.H: Eventually, the police found the child.
Example
Page 3
3
Matching vs. Transformations• Matching
• Sequence of transformations (A proof)
– Tree-Edits• Complete proofs• Estimate confidence
– Knowledge based Entailment Rules• Linguistically motivated• Formalize many types of knowledge
T = T0 → T1 → T2 → ... → Tn = H
Page 4
4
Transformation based RTE - ExampleT = T0 → T1 → T2 → ... → Tn = H
Text: The boy was located by the police.Hypothesis: Eventually, the police found the child.
Page 5
Transformation based RTE - ExampleT = T0 → T1 → T2 → ... → Tn = H
Text: The boy was located by the police.
The police located the boy.
The police found the boy.
The police found the child.
Hypothesis: Eventually, the police found the child.
5
Page 6
Transformation based RTE - ExampleT = T0 → T1 → T2 → ... → Tn = H
6
Page 7
7
BIUTEE Goals
• Tree Edits1. Complete proofs2. Estimate confidence
• Entailment Rules3. Linguistically motivated4. Formalize many types of knowledge
• BIUTEE• Integrates the benefits of both worlds
Page 8
8
Challenges / System Components
1. generate linguistically motivated complete proofs?
2. estimate proof confidence?3. find the best proof?4. learn the model parameters?
How to
Page 9
9
1. Generate linguistically motivated complete proofs
Page 10
Entailment Rules
boy child
Generic Syntactic
Lexical Syntactic
Lexical
Bar-Haim et al. 2007. Semantic inference at the lexical-syntactic level.
Page 11
11
Extended Tree Edits (On The Fly Operations)
• Predefined custom tree edits– Insert node on the fly– Move node / move sub-tree on the fly– Flip part of speech– …
• Heuristically capture linguistic phenomena– Operation definition– Features definition
Page 12
Proof over Parse Trees - ExampleT = T0 → T1 → T2 → ... → Tn = H
Text: The boy was located by the police.Passive to active
The police located the boy.X locate Y X find Y
The police found the boy.Boy child
The police found the child.Insertion on the fly
Hypothesis: Eventually, the police found the child.
12
Page 13
13
2. Estimate proof confidence
Page 14
14
Cost based Model
• Define operation cost– Assesses operation’s validity– Represent each operation as a feature vector– Cost is linear combination of feature values
• Define proof cost as the sum of the operations’ costs
• Classify: entailment if and only if proof cost is smaller than a threshold
Page 15
Feature vector representation• Define operation cost
– Represent each operation as a feature vector
Features (Insert-Named-Entity, Insert-Verb, … , WordNet, Lin, DIRT, …)
The police located the boy.DIRT: X locate Y X find Y (score = 0.9)
The police found the boy.
(0,0,…,0.457,…,0)(0 ,0,…,0,…,0)Feature vector that represents the operation 15
An operation
A downward function of score
Page 16
16
Cost based Model
• Define operation cost– Cost is linear combination of feature values
Cost = weight-vector * feature-vector• Weight-vector is learned automatically
)())(( ofwofC Tw
Page 17
Confidence Model
• Define operation cost– Represent each operation as a feature vector
• Define proof cost as the sum of the operations’ costs
)()()()(11
PfwofwoCPC Tn
ii
Tn
iiww
Cost of proofWeight vector
Vector represents the proof.Define
)()(1
Pfofn
ii
Page 18
18
Feature vector representation - exampleT = T0 → T1 → T2 → ... → Tn = H
(0,0,……………….………..,1,0)
(0,0,………..……0.457,..,0,0)
(0,0,..…0.5,.……….……..,0,0)
(0,0,1,……..…….…..…....,0,0)
(0,0,1..0.5..…0.457,....…1,0)
+
+
+
=
Text: The boy was located by the police.
Passive to activeThe police located the boy.
X locate Y X find YThe police found the boy.
Boy childThe police found the child.
Insertion on the flyHypothesis: Eventually, the
police found the child.
Page 19
19
Cost based Model• Define operation cost
– Represent each operation as a feature vector• Define proof cost as the sum of the
operations’ costs• Classify: “entailing” if and only if proof cost is
smaller than a threshold
bPfwT )(Learn
Page 20
20
3. Find the best proof
Page 21
21
Search the best proofT H
Proof #1Proof #2Proof #3Proof #4
Page 22
22
Search the best proof
• Need to find the “best” proof• “Best Proof” = proof with lowest cost
‒ Assuming a weight vector is given• Search space is exponential
‒ AI style search algorithm
Proof #1Proof #2Proof #3Proof #4
T HProof #1Proof #2Proof #3Proof #4
T H
Page 23
23
4. Learn model parameters
Page 24
24
Learning
• Goal: Learn parameters (w, b)• Use a linear learning algorithm
– logistic regression, SVM, etc.
Page 25
25
Inference vs. Learning
Training samples
Vector representation
Learning algorithm
w,bBest Proofs
Feature extraction
Feature extraction
Page 26
26
Inference vs. Learning
Training samples
Vector representation
Learning algorithm
w,bBest Proofs
Feature extraction
Page 27
27
Iterative Learning Scheme
Training samples
Vector representation
Learning algorithm
w,bBest Proofs
1. W=reasonable guess
2. Find the best proofs
3. Learn new w and b
4. Repeat to step 2
Page 28
28
Summary- System Components
1. Generate syntactically motivated complete proofs?– Entailment rules– On the fly operations (Extended Tree Edit Operations)
2. Estimate proof validity?– Confidence Model
3. Find the best proof?– Search Algorithm
4. Learn the model parameters?– Iterative Learning Scheme
How to
Page 29
29
Results RTE7
ID Knowledge Resources Precision %
Recall % F1 %
BIU1 WordNet, Directional Similarity 38.97 47.40 42.77
BIU2 WordNet, Directional Similarity, Wikipedia 41.81 44.11 42.93
BIU3 WordNet, Directional Similarity, Wikipedia, FrameNet, Geographical database
39.26 45.95 42.34
BIUTEE 2011 on RTE 6 (F1 %)
Base line (Use IR top-5 relevance) 34.63
Median (September 2010) 36.14
Best (September 2010) 48.01
Our system 49.54
Page 30
30
Conclusions• Inference via sequence of transformations
– Knowledge– Extended Tree Edits
• Proof confidence estimation• Results
– Better than median on RTE7– Best on RTE6
• Open Source
http://www.cs.biu.ac.il/~nlp/downloads/biutee
Page 31
Thank You
http://www.cs.biu.ac.il/~nlp/downloads/biutee