Robust local textual inference Marie-Catherine de Marneffe, Bill MacCartney, Teg Grenager, Daniel Cer, Anna Rafferty, Christopher D. Manning NLP Group.

Robust local textual inference

Marie-Catherine de Marneffe, Bill MacCartney,

Teg Grenager, Daniel Cer, Anna Rafferty,

Christopher D. Manning

NLP Group - Stanford University

3 approaches to RTE

Graph matchingapproaches

Logicalapproaches

e lose(e) subj(e, x)

soldier(x) …k kill(k)

subj(k, y) troop(y) …

Word overlapmodels

T H

Graph matching

• Represent sentences as typed dependency trees

• Find low-cost alignment (using lexical & structural match costs) T: Thirteen soldiers lost

their lives in today’s ambush

H: Several troops were killed in the ambush

We need sloppy matching!

T: Today's best estimate of giant panda numbers in the wild is about 1,100 individuals living in up to 32 separate populations mostly in China's Sichuan Province, but also in Shaanxi and Gansu provinces.

H: There are about 1,100 pandas in the wild in China. (TRUE)

best estimate of giant panda numbers in the wild is about 1,100

➜ there are about 1,100 pandas in the wild

T: DeLay bought Enron stock and Clinton sold Enron stock.

H: DeLay sold Enron stock.

The problem with tree matching

YesNoProbably, yes

Solution: align, then evaluate

T: DeLay bought Enron stock and Clinton sold Enron stock.

H: DeLay sold Enron stock.

Things we aimed to fix

1. Confounding of alignment and entailment

2. Assumption of monotonicity• Matching/embedding methods assume upward monotonicity

T: Sue saw Les Misérables in London

H: Sue saw Les Misérables (TRUE)• But

T: The largest missile base in Asia is the Jiupeng missile

testing ground in Taiwan

H: The largest missile base is the Jiupeng missile

testing ground in Taiwan (FALSE)

• Assumption/requirement of locality

[MacCartney et al. 2006]

Whether an alignment is good depends on non-local factors

(3) T: It is not the case that Bin Laden was seen in Tora Bora.

Q: Was Bin Laden seen in Tora Bora? (No)

It’s difficult to see non-factive context when aligning “seen” → “seen”

(1) T: Some students came to school by car.Q: Did any students come to school? (Yes)

(2) T: No students came to school by car.Q: Did any students come to school? (Don’t

know)

Context of monotonicity: Whether it is okay to delete “by car” in the hypothesis depends on subject quantifier in the text

Three-stage architecture

Annotated semantic graphs

Aligner

Entailment model

Inference problem

score ➔ {yes, no}

Three-step approach (1)

Step 1: linguistic annotation• Named entity recognition [Finkel et al. 2005]

• Canonicalization quantity, date, and money expressionsNormalized dates and relational expressions of amount “>200”:T: Kessler's team conducted 60,643 face-to-face interviews with adults in 14 countries.H: Kessler's team interviewed more than 60,000 adults in 14 countries.

• NER & collocation collapsing (George_Bush, carried_out)

• Phrase structure and typed dependency parsing

• Coreference resolution

• TF-IDF scores


Step 2: align the hypothesis graph into the text graph

Lexical resources: WordNet, InfoMap, string overlap, gazeteers, distributional similarity scores

(Lin,Ravichandran)


Step 3: make an inference decision conditioned on graphs and the alignment

• Compute specialized features: • Antonymy/negation,• Quantification, numeric mismatch• This is where linguistics comes in

• Can use machine learning techniques to learn weights on the semantic features! Here, logistic regression.

Representation/alignment example

T: Mitsubishi Motors Corp.’s new vehicle sales in the US fell 46 percent in June.

H: Mitsubishi sales rose 46 percent. (FALSE)

rose fell

sales sales

Mitsubishi Mitsubishi_Motors_Corp

percent percent

46 46

Alignment score: –0.89

Features:– Aligned antonyms in pos/pos context+ Structure: main predicate good match+ Numeric quantity match– Date: text date deleted in hypothesis+ Alignment: good

Inference score: -5.42

FALSE

Structural (mis-)match features

T: Ahmadinejad attacked the “threat” to bring the issue of Iran’s nuclear activity to the UN Security Council by the US, France, Britain and Germany.

H: Ahmadinejad attacked the UN Security Council. (FALSE)

We check particularly the main predicate of the hypothesis and its match in the text to try and assess compatibility using syntactic grammatical relations:

Object of attack in hypothesis is not related to object of attack in text

Modality features

T: The Scud C has a range of 500 kilometers and is manufactured in Syria with know-how from North Korea.

H:A Scud C can fly 500 kilometers. (TRUE)

Map text & hypothesis to 6 canonical modalitiesPOSSIBLE NOT_POSSIBLE ACTUAL NOT_ACTUALNECESSARY NOT_NECESSARY

e.g. can, perhaps, might POSSIBLE

Map modality pair into judgment feature(POSSIBLE, ACTUAL) --> dontknow(NECESSARY, NOT_ACTUAL) --> no(ACTUAL, POSSIBLE) --> yes

Restrictive adjuncts

T: In all, Zerich bought $422 million worth of oil from Iraq, according to the Volcker committee.

H: Zerich bought oil from Iraq during the embargo. (FALSE)

T: Zerich didn’t buy any oil from Iraq, according to the Volcker committee.

H: Zerich didn’t buy oil from Iraq during the embargo. (TRUE)

We can check whether adding/dropping restrictive adjuncts is licensed relative to upward and downward entailing contexts.

Factive & implicative features

T: Scientists have discovered that drinking tea protects against heart disease by improving the function of the artery walls.

H: Tea protects from some disease. (TRUE)

Evaluate governing verbs for implicativity class• Unknown: say, tell, suspect, try, …• Fact: know, acknowledge, ignore, …• True: manage to, …• False: fail to, forget to, …

Need to check for negative context here too

Our RTE2 Results

Weights Accuracy Ave Precision

RTE 2 Dev Set

Hand-set 67.0 72.5

Learned 66.9 74.8

RTE 2 Test Set

Hand-set 58.3 61.4

Learned 60.5 58.4

Problems we can fix

• Identification of some structure mismatches:

T:Nguyen’s lawyer, Lex Lasry told [...]

H: Nguyen is a lawyer.

• Handling of computation:

T:This is good news for Gaelic translators, as the EU will have to churn out official documents in this language, in addition to the 20 other official EU languages.

H: There are 21 official EU languages.

Problems we can fix

• Alignment of numbers with respect to

the sentence structure:

T:Some 420 people have been hanged in Singapore since 1991. That gives the country of 4.4 million people the highest execution rate in the world relative to population.

H: 4.4 million people were executed in Singapore.

More challenging problems

Non-entailment is easier than entailment• Good at finding knock-out features • Hard to be certain that we’ve considered everything

Deal with dropping/adding modifiers vs. upward/downward entailing contexts is hard

• Need to know which are restrictive/not/discourse itemsMaurice was subsequently killed in Angola.

Multiword “lexical” semantics/world knowledge• Good at synonyms, hyponyms, antonyms• Cannot resolve multi-word equivalences

T: David McCool took the money and decided to start Muzzy Lane in 2002H: David McCool is the founder of Muzzy Lane

Robust local textual inference Marie-Catherine de Marneffe, Bill MacCartney, Teg Grenager, Daniel Cer, Anna Rafferty, Christopher D. Manning NLP Group.

Documents