Top Banner
RTE @ Stanford Rajat Raina, Aria Haghighi, Christopher Cox, Jenny Finkel, Jeff Michels, Kristina Toutanova, Bill MacCartney, Marie-Catherine de Marneffe, Christopher D. Manning and Andrew Y. Ng PASCAL Challenges Workshop April 12, 2005
34

RTE @ Stanford Rajat Raina, Aria Haghighi, Christopher Cox, Jenny Finkel, Jeff Michels, Kristina Toutanova, Bill MacCartney, Marie-Catherine de Marneffe,

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: RTE @ Stanford Rajat Raina, Aria Haghighi, Christopher Cox, Jenny Finkel, Jeff Michels, Kristina Toutanova, Bill MacCartney, Marie-Catherine de Marneffe,

RTE @ Stanford

Rajat Raina, Aria Haghighi, Christopher Cox, Jenny Finkel, Jeff Michels, Kristina Toutanova,

Bill MacCartney, Marie-Catherine de Marneffe, Christopher D. Manning and Andrew Y. Ng

PASCAL Challenges WorkshopApril 12, 2005

Page 2: RTE @ Stanford Rajat Raina, Aria Haghighi, Christopher Cox, Jenny Finkel, Jeff Michels, Kristina Toutanova, Bill MacCartney, Marie-Catherine de Marneffe,

Our approach

• Represent using syntactic dependencies• But also use semantic annotations.• Try to handle language variability.

• Perform semantic inference over this representation• Use linguistic knowledge sources.• Compute a “cost” for inferring hypothesis from text.

Low cost Hypothesis is entailed.

Page 3: RTE @ Stanford Rajat Raina, Aria Haghighi, Christopher Cox, Jenny Finkel, Jeff Michels, Kristina Toutanova, Bill MacCartney, Marie-Catherine de Marneffe,

Outline of this talk

• Representation of sentences• Syntax: Parsing and post-processing• Adding annotations on representation (e.g., semantic

roles)

• Inference by graph matching• Inference by abductive theorem proving

• A combined system• Results and error analysis

Page 4: RTE @ Stanford Rajat Raina, Aria Haghighi, Christopher Cox, Jenny Finkel, Jeff Michels, Kristina Toutanova, Bill MacCartney, Marie-Catherine de Marneffe,

Sentence processing

• Parse with a standard PCFG parser. [Klein & Manning, 2003] • Al Qaeda: [Aa]l[ -]Qa’?[ie]da• Train on some extra sentences from recent news.

• Used a high-performing Named Entity Recognizer (next slide)• Force parse tree to be consistent with certain NE tags.

Example: American Ministry of Foreign Affairs announced that Russia called the United States...

(S(NP (NNP American_Ministry_of_Foreign_Affairs))(VP (VBD announced)

(…)))

Page 5: RTE @ Stanford Rajat Raina, Aria Haghighi, Christopher Cox, Jenny Finkel, Jeff Michels, Kristina Toutanova, Bill MacCartney, Marie-Catherine de Marneffe,

Named Entity Recognizer

• Trained a robust conditional random field model. [Finkel et al., 2003]

• Interpretation of numeric quantity statementsExample:T: Kessler's team conducted 60,643 face-to-face interviews

with adults in 14 countries.H: Kessler's team interviewed more than 60,000 adults in 14

countries. TRUE

Annotate numerical values implied by:• “6.2 bn”, “more than 60000”, “around 10”, …• MONEY/DATE named entities

Page 6: RTE @ Stanford Rajat Raina, Aria Haghighi, Christopher Cox, Jenny Finkel, Jeff Michels, Kristina Toutanova, Bill MacCartney, Marie-Catherine de Marneffe,

Parse tree post-processing

• Recognize collocations using WordNet

Example: Shrek 2 rang up $92 million.

(S(NP (NNP Shrek) (CD 2))(VP (VBD rang_up)

(NP (QP ($ $) (CD 92) (CD million))))

(. .))

MONEY, 9200000

Page 7: RTE @ Stanford Rajat Raina, Aria Haghighi, Christopher Cox, Jenny Finkel, Jeff Michels, Kristina Toutanova, Bill MacCartney, Marie-Catherine de Marneffe,

Parse tree Dependencies

• Find syntactic dependencies• Transform parse tree representations into typed syntactic

dependencies, including a certain amount of collapsing and normalization

Example: Bill’s mother walked to the grocery store.subj(walked, mother)poss(mother, Bill)to(walked, store)nn(store, grocery)

• Dependencies can also be written as a logical formulamother(A) Bill(B) poss(B, A) grocery(C) store(C) walked(E, A,

C)

Basic representations

Page 8: RTE @ Stanford Rajat Raina, Aria Haghighi, Christopher Cox, Jenny Finkel, Jeff Michels, Kristina Toutanova, Bill MacCartney, Marie-Catherine de Marneffe,

Representations

Dependency graph Logical formula

mother(A) Bill(B) poss(B, A)grocery(C) store(C)

walked(E, A, C)poss

grocery

store

walked

mother

Bill

subj to

nn

•Can make representation richer

•“walked” is a verb

•“Bill” is a PERSON (named entity).

•“store” is the location/destination of “walked”.

•…

PERSON

VBD

ARGM-LOC

VBD

PERSON

ARGM-LOC

Page 9: RTE @ Stanford Rajat Raina, Aria Haghighi, Christopher Cox, Jenny Finkel, Jeff Michels, Kristina Toutanova, Bill MacCartney, Marie-Catherine de Marneffe,

Annotations

• Parts-of-speech, named entities• Already computed.

• Semantic rolesExample:T: C and D Technologies announced that it has closed the

acquisition of Datel, Inc.H1: C and D Technologies acquired Datel Inc. TRUE

H2: Datel acquired C and D Technologies. FALSE

• Use a state-of-the-art semantic role classifier to label verb arguments. [Toutanova et al. 2005]

Page 10: RTE @ Stanford Rajat Raina, Aria Haghighi, Christopher Cox, Jenny Finkel, Jeff Michels, Kristina Toutanova, Bill MacCartney, Marie-Catherine de Marneffe,

More annotations

• CoreferenceExample:T: Since its formation in 1948, Israel …H: Israel was established in 1948. TRUE

• Use a conditional random field model for coreference detection.

Note: Appositive “references” were previously detected.T: Bush, the President of USA, went to Florida.H: Bush is the President of USA. TRUE

• Other annotations• Word stems (very useful)• Word senses (no performance gain in our system)

Page 11: RTE @ Stanford Rajat Raina, Aria Haghighi, Christopher Cox, Jenny Finkel, Jeff Michels, Kristina Toutanova, Bill MacCartney, Marie-Catherine de Marneffe,

Event nouns

• Use a heuristic to find event nouns• Augment text representation using WordNet derivational

links.

Example:T: … witnessed the murder of police commander ...H: Police officer killed. TRUE

Text logical formula:murder(M) police_commander(P) of(M, P)

Augment:murder(E, M, P)

NOUN:

VERB:

Page 12: RTE @ Stanford Rajat Raina, Aria Haghighi, Christopher Cox, Jenny Finkel, Jeff Michels, Kristina Toutanova, Bill MacCartney, Marie-Catherine de Marneffe,

Outline of this talk

• Representation of sentences• Syntax: Parsing and post-processing• Adding annotations on representation (e.g., semantic

roles)

• Inference by graph matching• Inference by abductive theorem proving

• A combined system• Results and error analysis

Page 13: RTE @ Stanford Rajat Raina, Aria Haghighi, Christopher Cox, Jenny Finkel, Jeff Michels, Kristina Toutanova, Bill MacCartney, Marie-Catherine de Marneffe,

Graph Matching Approach

• Why Graph Matching?:• Dependency tree has natural graphical interpretation• Successful in other domains: e.g., Lossy image matching

• Input: Hypothesis (H) and Text (T) GraphsToy example:

• Vertices are words and phrases • Edges are labeled dependencies

• Output: Cost of matching H to T (next slide)

BMW

bought

John

ARG0(Agent) ARG1(Theme)

PERSON

T: John bought a BMW. H: John purchased a car. TRUE

car

purchased

John

ARG0(Agent) ARG1(Theme)

PERSON

Page 14: RTE @ Stanford Rajat Raina, Aria Haghighi, Christopher Cox, Jenny Finkel, Jeff Michels, Kristina Toutanova, Bill MacCartney, Marie-Catherine de Marneffe,

Graph Matching: Idea

• Idea: Align H to T so that vertices are similar and preserve relations (as in machine translation)

• A matching M is a mapping from vertices of H to vertices of TThus, for each vertex v in H, M(v) is a vertex in T

BMW

bought

John

ARG0(Agent) ARG1(Theme)

PERSON

car

purchased

John

ARG0(Agent) ARG1(Theme)

PERSON

T

H

matching

Page 15: RTE @ Stanford Rajat Raina, Aria Haghighi, Christopher Cox, Jenny Finkel, Jeff Michels, Kristina Toutanova, Bill MacCartney, Marie-Catherine de Marneffe,

Graph Matching: Costs

• The cost of a matching MatchCost(M) measures the “quality” of a matching M• VertexCost(M) – Compare vertices in H with matched vertices

in T• RelationCost(M) – Compare edges (relations) in H with

corresponding edges (relations) in T

• MatchCost(M) = (1 - ß) VertexCost(M) + ß RelationCost(M)

Page 16: RTE @ Stanford Rajat Raina, Aria Haghighi, Christopher Cox, Jenny Finkel, Jeff Michels, Kristina Toutanova, Bill MacCartney, Marie-Catherine de Marneffe,

Graph Matching: Costs

• VertexCost(M)For each vertex v in H, and vertex M(v) in T:• Do vertex heads share same stem and/or POS ?• Is T vertex head a hypernym of H vertex head?• Are vertex heads “similar” phrases? (next slide)

• RelationCost(M)For each edge (v,v’) in H, and edge (M(v),M(v’)) in T• Are parent/child pairs in H parent/child in T ?• Are parent/child pairs in H ancestor/descendant in T ?• Do parent/child pairs in H share a common ancestor in T?

Page 17: RTE @ Stanford Rajat Raina, Aria Haghighi, Christopher Cox, Jenny Finkel, Jeff Michels, Kristina Toutanova, Bill MacCartney, Marie-Catherine de Marneffe,

Digression: Phrase similarity

• Measures based on WordNet (Resnik/Lesk).

• Distributional similarity• Example: “run” and “marathon” are related.• Latent Semantic Analysis to discover words that are distributionally

similar (i.e., have common neighbors).

• Used a web-search based measure• Query google.com for all pages with:

• “run”• “marathon”• Both “run” and “marathon”

• Learning paraphrases. [Similar to DIRT: Lin and Pantel, 2001]

• “World knowledge” (labor intensive)• CEO = Chief_Executive_Officer• Philippines Filipino• [Can add common facts: “Paris is the capital of France”, …]

Page 18: RTE @ Stanford Rajat Raina, Aria Haghighi, Christopher Cox, Jenny Finkel, Jeff Michels, Kristina Toutanova, Bill MacCartney, Marie-Catherine de Marneffe,

Graph Matching: Costs

• VertexCost(M)For each vertex v in H, and vertex M(v) in T:• Do vertex heads share same stem and or POS ?• Is T vertex head a hypernym of H vertex head?• Are vertex heads “similar” phrases? (next slide)

• RelationCost(M)For each edge (v,v’) in H, and edge (M(v),M(v’)) in T• Are parent/child pairs in H parent/child in T ?• Are parent/child pairs in H ancestor/descendant in T ?• Do parent/child pairs in H share a common ancestor in T?

Page 19: RTE @ Stanford Rajat Raina, Aria Haghighi, Christopher Cox, Jenny Finkel, Jeff Michels, Kristina Toutanova, Bill MacCartney, Marie-Catherine de Marneffe,

Graph Matching: Example

VertexCost: (0.0 + 0.2 + 0.4)/3 = 0.2 RelationCost: 0 (Graphs Isomorphic)

ß = 0.45 (say) MatchCost: 0.55 * (0.2) + 0.45 * (0.0) = 0.11

BMW

bought

John

ARG0(Agent) ARG1(Theme)

PERSON

car

purchased

John

ARG0(Agent) ARG1(Theme)

PERSON

Hypernym Match Cost: 0.4

Exact Match Cost: 0.0

Synonym Match Cost: 0.2

Page 20: RTE @ Stanford Rajat Raina, Aria Haghighi, Christopher Cox, Jenny Finkel, Jeff Michels, Kristina Toutanova, Bill MacCartney, Marie-Catherine de Marneffe,

Outline of this talk

• Representation of sentences• Syntax: Parsing and post-processing• Adding annotations on representation (e.g., semantic

roles)

• Inference by graph matching• Inference by abductive theorem proving

• A combined system• Results and error analysis

Page 21: RTE @ Stanford Rajat Raina, Aria Haghighi, Christopher Cox, Jenny Finkel, Jeff Michels, Kristina Toutanova, Bill MacCartney, Marie-Catherine de Marneffe,

Abductive inference

• Idea: • Represent text and hypothesis as logical formulae.• A hypothesis can be inferred from the text if and only if the

hypothesis logical formula can be proved from the text logical formula.

• Toy example:

T: John bought a BMW. H: John purchased a car. TRUE

John(A) BMW(B) bought(E, A, B) John(x) car(y) purchased(z, x, y)

Prove?Allow assumptions at various

“costs”

BMW(t) + $2 => car(t)

bought(p, q, r) + $1 => purchased(p, q, r)

Page 22: RTE @ Stanford Rajat Raina, Aria Haghighi, Christopher Cox, Jenny Finkel, Jeff Michels, Kristina Toutanova, Bill MacCartney, Marie-Catherine de Marneffe,

Abductive assumptions

• Assign costs to all assumptions of the form:

P(p1, p2, …, pm)

Q(q1, q2, …, qn)

• Build an assumption cost model

Predicate match cost•Phrase similarity cost?•Same named entity type?•…

Argument match cost•Same semantic role?•Constant similarity cost?•…

Page 23: RTE @ Stanford Rajat Raina, Aria Haghighi, Christopher Cox, Jenny Finkel, Jeff Michels, Kristina Toutanova, Bill MacCartney, Marie-Catherine de Marneffe,

Abductive theorem proving

• Each assumption provides a potential proof step.

• Find the proof with the minimum total cost• Uniform cost search• If there is a low-cost proof, the hypothesis is entailed.

• Example:T: John(A) BMW(B) bought(E, A, B) H: John(x) car(y) purchased(z, x, y)

Here is a possible proof by resolution refutation (for the earlier costs):$0 -John(x) -car(y) -purchased(z, x, y) [Given: negation of hypothesis]$0 -car(y) -purchased(z, A, y) [Unify with John(A)]$2 -purchased(z, A, B) [Unify with BMW(B)]$3 NULL [Unify with purchased(E, A, B)]Proof cost = 3

Page 24: RTE @ Stanford Rajat Raina, Aria Haghighi, Christopher Cox, Jenny Finkel, Jeff Michels, Kristina Toutanova, Bill MacCartney, Marie-Catherine de Marneffe,

Abductive theorem proving

• Can automatically learn good assumption costs• Start from a labeled dataset (e.g.: the PASCAL development

set)• Intuition: Find assumptions that are used in the proofs for

TRUE examples, and lower their costs (by framing a log-linear model). Iterate. [Details: Raina et al., in submission]

Page 25: RTE @ Stanford Rajat Raina, Aria Haghighi, Christopher Cox, Jenny Finkel, Jeff Michels, Kristina Toutanova, Bill MacCartney, Marie-Catherine de Marneffe,

Some interesting features

• Examples of handling “complex” constructions in graph matching/abductive inference.

• Antonyms/Negation: High cost for matching verbs, if they are antonyms or one is negated and the other not.

• T: Stocks fell. H: Stocks rose. FALSE

• T: Clinton’s book was not a hit H: Clinton’s book was a hit. FALSE

• Non-factive verbs:

• T: John was charged for doing X. H: John did X. FALSECan detect because “doing” in text has non-factive “charged” asa parent but “did” does not have such a parent.

Page 26: RTE @ Stanford Rajat Raina, Aria Haghighi, Christopher Cox, Jenny Finkel, Jeff Michels, Kristina Toutanova, Bill MacCartney, Marie-Catherine de Marneffe,

Some interesting features

• “Superlative check”• T: This is the tallest tower in western Japan.

H: This is the tallest tower in Japan. FALSE

Page 27: RTE @ Stanford Rajat Raina, Aria Haghighi, Christopher Cox, Jenny Finkel, Jeff Michels, Kristina Toutanova, Bill MacCartney, Marie-Catherine de Marneffe,

Outline of this talk

• Representation of sentences• Syntax: Parsing and post-processing• Adding annotations on representation (e.g., semantic

roles)

• Inference by graph matching• Inference by abductive theorem proving

• A combined system• Results and error analysis

Page 28: RTE @ Stanford Rajat Raina, Aria Haghighi, Christopher Cox, Jenny Finkel, Jeff Michels, Kristina Toutanova, Bill MacCartney, Marie-Catherine de Marneffe,

Results

• Combine inference methods• Each system produces a score.

Separately normalize each system’s score variance.Suppose normalized scores are s1 and s2.

• Final score S = w1s1 + w2s2

• Learn classifier weights w1 and w2 on the development set using logistic regression. Two submissions:• Train one classifier weight for all RTE tasks. (General)• Train different classifier weights for each RTE task.

(ByTask)

Page 29: RTE @ Stanford Rajat Raina, Aria Haghighi, Christopher Cox, Jenny Finkel, Jeff Michels, Kristina Toutanova, Bill MacCartney, Marie-Catherine de Marneffe,

Results

General ByTask

Accuracy CWS Accuracy CWS

DevSet 1 64.8% 0.778 65.5% 0.805

DevSet 2 52.1% 0.578 55.7% 0.661

DevSet 1 + DevSet

258.5% 0.679 60.8% 0.743

Test Set 56.3% 0.620 55.3% 0.686

•Balanced predictions. [55.4%, 51.2% predicted TRUE on test set.]

•Best other results: Accuracy=58.6%, CWS=0.617

Page 30: RTE @ Stanford Rajat Raina, Aria Haghighi, Christopher Cox, Jenny Finkel, Jeff Michels, Kristina Toutanova, Bill MacCartney, Marie-Catherine de Marneffe,

Results by task

General ByTask

Accuracy CWS Accuracy CWS

CD 79.3% 0.903 84.7% 0.926

IE 47.5% 0.493 55.0% 0.590

IR 56.7% 0.590 55.6% 0.604

MT 46.7% 0.480 47.5% 0.479

PP 58.0% 0.623 54.0% 0.535

QA 48.5% 0.478 43.9% 0.466

RC 52.9% 0.523 50.7% 0.480

Page 31: RTE @ Stanford Rajat Raina, Aria Haghighi, Christopher Cox, Jenny Finkel, Jeff Michels, Kristina Toutanova, Bill MacCartney, Marie-Catherine de Marneffe,

Partial coverage results

50

60

70

80

90

100

0.015 0.115 0.215 0.315 0.415 0.515 0.615 0.715 0.815 0.915

Coverage

Accuracy

•Can also draw coverage-CWS curves. For example:

•at 50% coverage, CWS = 0.781

•at 25% coverage, CWS = 0.873

ByTask

General

ByTask

Task-specific optimization

seems better!

Page 32: RTE @ Stanford Rajat Raina, Aria Haghighi, Christopher Cox, Jenny Finkel, Jeff Michels, Kristina Toutanova, Bill MacCartney, Marie-Catherine de Marneffe,

Some interesting issues

• Phrase similarity• away from the coast farther inland• won victory in presidential election became President• stocks get a lift stocks rise• life threatening fatal

• Dictionary definitions• believe there is only one God are monotheistic

• “World knowledge”• K Club, venue of the Ryder Cup, … K Club will host the

Ryder Cup

Page 33: RTE @ Stanford Rajat Raina, Aria Haghighi, Christopher Cox, Jenny Finkel, Jeff Michels, Kristina Toutanova, Bill MacCartney, Marie-Catherine de Marneffe,

Future directions

• Need more NLP components in there:• Better treatment of frequent nominalizations,

parenthesized material, etc.

• Need much more ability to do inference• Fine distinctions between meanings, and fine similarities.

e.g., “reach a higher level” and “rise”We need a high-recall, reasonable precision similarity measure!

• Other resources (e.g., antonyms) are also very sparse.

• More task-specific optimization.

Page 34: RTE @ Stanford Rajat Raina, Aria Haghighi, Christopher Cox, Jenny Finkel, Jeff Michels, Kristina Toutanova, Bill MacCartney, Marie-Catherine de Marneffe,

Thanks!