Top Banner
FATE: a FrameNet Annotated corpus for Textual Entailment Marco Pennacchiotti, Aljoscha Burchardt Computerlinguistik Saarland University, Germany LREC 2008 , Marrakech , 28 May 2008 SALSA II - The Saarbrücken Lexical Semantics Acquisition Project
17

FATE: a FrameNet Annotated corpus for Textual Entailment Marco Pennacchiotti, Aljoscha Burchardt Computerlinguistik Saarland University, Germany LREC 2008,

Dec 18, 2015

Download

Documents

Bertina Bond
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: FATE: a FrameNet Annotated corpus for Textual Entailment Marco Pennacchiotti, Aljoscha Burchardt Computerlinguistik Saarland University, Germany LREC 2008,

FATE:a FrameNet Annotated corpus for

Textual Entailment

Marco Pennacchiotti, Aljoscha BurchardtComputerlinguistik

Saarland University, Germany

LREC 2008 , Marrakech , 28 May 2008

SALSA II - The Saarbrücken Lexical Semantics Acquisition Project

Page 2: FATE: a FrameNet Annotated corpus for Textual Entailment Marco Pennacchiotti, Aljoscha Burchardt Computerlinguistik Saarland University, Germany LREC 2008,

Summary

• FrameNet and Textual Entailment

• FATE annotation schema

• Annotation examples and statistics

• Conclusions

28/05/2008 2 / 17FATE - Marco Pennacchiotti

Page 3: FATE: a FrameNet Annotated corpus for Textual Entailment Marco Pennacchiotti, Aljoscha Burchardt Computerlinguistik Saarland University, Germany LREC 2008,

Frame Semantics• Frame: conceptual structure modeling a prototypical situation• Frame Elements (FE): participants of the situation• Frame Evoking elements (FEE): predicates evoking the situation

[Fillmore 1976, 2003]

28/05/2008 3 / 17FATE - Marco Pennacchiotti

Predicate-argument level normalizations

• FrameNet Berkeley Project 1

– Database of frames for the core lexicon of English– 800 frames, 10.000 lemmas, 135.000 annotated sentences

(1) http://framenet.icsi.berkeley.edu

“Evelyn spoke about her past” “Evelyn’s statement about her past”

STATEMENT(SPEAKER: Evelyn; TOPIC: her past)

Page 4: FATE: a FrameNet Annotated corpus for Textual Entailment Marco Pennacchiotti, Aljoscha Burchardt Computerlinguistik Saarland University, Germany LREC 2008,

Textual Entailment (TE)Given two text fragments, the Text T and the Hypothesis H,

T entails H if the meaning of H can be inferred from the meaning of T, as would typically interpreted by people

[Dagan 2005]

T: “Yahoo has recently acquired Overture” H: “Yahoo owns Overture”T H

• Recognizing Textual Entailment (RTE)– recognize if entailment holds for a given (T,H) pair– Models core inferences of many NLP applications (QA, IE, MT,…)

• RTE Challenges [Dagan et al.,2005 ; Giampiccolo et al., 2007]– Compare systems for RTE– Corpus: 800 training pairs, 800 test pairs, evenly split in + and - pairs

28/05/2008 4 / 17FATE - Marco Pennacchiotti

Page 5: FATE: a FrameNet Annotated corpus for Textual Entailment Marco Pennacchiotti, Aljoscha Burchardt Computerlinguistik Saarland University, Germany LREC 2008,

Predicate-argument and RTE• Predicate-level inference plays a relevant role in TE (20% of positive

examples in RTE-2 [Garoufi, 2007] )

An avalanche has struck a popular skiing resort in Austria, killing at least 11 people.

Humans died in an avalanche.

• Implementation gap :

• [Burchardt et al.,2007] : FrameNet system comparable to lexical overlap • [Hickl et al.,2006] : PropBank-based features are not effective• [Rana et al.,2005]: DIRT paraphrase repository does not help

28/05/2008 5 / 17FATE - Marco Pennacchiotti

DEATH(PROTAGONIST: 11 people / humans ; CAUSE: avalanche / avalanche )

Page 6: FATE: a FrameNet Annotated corpus for Textual Entailment Marco Pennacchiotti, Aljoscha Burchardt Computerlinguistik Saarland University, Germany LREC 2008,

FATE corpus

• Reference corpus : RTE-2 test set, 800 pairs, 29,000 tokens• Frame resource : FrameNet version 1.3• Corpus Format : SALSA/TIGER XML [Burchardt et al.,2006]

• Pre-processing : annotation on top of Collins parser syntactic analysis: T and H are randomly reordered to avoid biases

• Annotation : performed by one highly experienced annotator: inter-annotator agreement over 5% of the corpus

– FEE-agreement : 82%– Frame-agreement: 88%– Role-agreement: 91%

: annotation carried out using the SALTO tool 1

(1) http://www.coli.uni-saarland.de/projects/salsa/salto/doc

28/05/2008 6 / 17FATE - Marco Pennacchiotti

FATE: a manually frame-annotated Textual Entailment corpus, to study the role of frame semantics in RTE

Page 7: FATE: a FrameNet Annotated corpus for Textual Entailment Marco Pennacchiotti, Aljoscha Burchardt Computerlinguistik Saarland University, Germany LREC 2008,

FATE annotation process: an example

28/05/2008 7 / 17FATE - Marco Pennacchiotti

Collins synt. an. Collins synt. an.

full-text annotation (all words considered) [Ruppenhofer,2007]

Page 8: FATE: a FrameNet Annotated corpus for Textual Entailment Marco Pennacchiotti, Aljoscha Burchardt Computerlinguistik Saarland University, Germany LREC 2008,

FATE annotation process: an example

28/05/2008 8 / 17FATE - Marco Pennacchiotti

frameframe

FEEFEE

Collins synt. an. Collins synt. an.

Page 9: FATE: a FrameNet Annotated corpus for Textual Entailment Marco Pennacchiotti, Aljoscha Burchardt Computerlinguistik Saarland University, Germany LREC 2008,

FATE annotation process: an example

28/05/2008 9 / 17FATE - Marco Pennacchiotti

frameframe

FEFE

Collins synt. an. Collins synt. an.

FEEFEE FE fillerFE filler

Maximization principle: chose the largest constituent possible when annotating

Page 10: FATE: a FrameNet Annotated corpus for Textual Entailment Marco Pennacchiotti, Aljoscha Burchardt Computerlinguistik Saarland University, Germany LREC 2008,

Annotation Schema

• Intuition: annotate as FEE only those words evoking a relevant situation (frame) in the sentence at hand– Very intuitive flavor, but high agreement: 83% on a

pilot set of 15 sentences

Relevance Principle

“Authorities in Brazil hold 200 people as hostage”

LEADERSHIP DETAIN PEOPLE KIDNAPPING

28/05/2008 10 / 17FATE - Marco Pennacchiotti

VICTIMPLACEPERPETRATOR

Page 11: FATE: a FrameNet Annotated corpus for Textual Entailment Marco Pennacchiotti, Aljoscha Burchardt Computerlinguistik Saarland University, Germany LREC 2008,

Annotation Schema

• On T of positive pairs, annotate only the fragments (spans) contributing to the inferential process– Spans are obtained from the ARTE annotation

[Garoufi,2007]– For negative pairs it is not straightforward to derive spans,

hence we do full annotation

Span Annotation

T: “Soon after the EZLN had returned to Chiapas, Congress approved a different version of the COCOPA Law, which did not include the autonomy clauses, claiming they were in contradiction with some constitutional rights (private property and secret voting); this was seen as a betrayal by the EZLN and other political groups.”

H: “EZLN is a political group.”

28/05/2008 11 / 17FATE - Marco Pennacchiotti

Page 12: FATE: a FrameNet Annotated corpus for Textual Entailment Marco Pennacchiotti, Aljoscha Burchardt Computerlinguistik Saarland University, Germany LREC 2008,

Annotation Schema

• Unknown frames: use an UNKNOWN frame for words evoking situations not present in the FrameNet database

• Anaphora

• Copula and support verbs

• Modal expressions

• Metaphors

• Existential constructions

• …

Other guidelines

28/05/2008 12 / 17FATE - Marco Pennacchiotti

Page 13: FATE: a FrameNet Annotated corpus for Textual Entailment Marco Pennacchiotti, Aljoscha Burchardt Computerlinguistik Saarland University, Germany LREC 2008,

Corpus statistics

• Annotated pairs : 800 (400 positive, 400 negatives)

• Annotated frames : 4,500: avg. 5.6 frames per pair: 1,600 frames in positive pairs: 2,800 in negative pairs

• Annotated roles : 9,500:avg. 2.1 roles per frame

• Annotation time : 230 hours: 90 h for positive pairs (13 min/pair): 140 h for negative pairs (21 min/pair)

28/05/2008 13 / 17FATE - Marco Pennacchiotti

Page 14: FATE: a FrameNet Annotated corpus for Textual Entailment Marco Pennacchiotti, Aljoscha Burchardt Computerlinguistik Saarland University, Germany LREC 2008,

FrameNet and RTE (simple case)

28/05/2008 14 / 17FATE - Marco Pennacchiotti

• Syntactic normalization– Active / Passive

EDUCATIONAL_TEACHING(STUDENT: ground soldiers / soldiers; MATERIAL: virtual reality/ virtual reality)

Page 15: FATE: a FrameNet Annotated corpus for Textual Entailment Marco Pennacchiotti, Aljoscha Burchardt Computerlinguistik Saarland University, Germany LREC 2008,

(1) Resource coverage is too low(2) Models for predicate-argument inference are weak(3) Automatic annotation models (SRL) are not good

enough to be safely used in RTE

Implementation gap insights

28/05/2008 15 / 17FATE - Marco Pennacchiotti

• FrameNet coverage is good:– 373 Unknown frames (8 % of total frames)– Unknown roles 1 % of total roles

• Coverage is unlikely to be a limiting factor for using FrameNet in applications

Page 16: FATE: a FrameNet Annotated corpus for Textual Entailment Marco Pennacchiotti, Aljoscha Burchardt Computerlinguistik Saarland University, Germany LREC 2008,

(1) Resource coverage is too low(2) Models for predicate-argument inference are weak(3) Automatic annotation models (SRL) are not good

enough to be safely used in RTE

28/05/2008 16 / 17FATE - Marco Pennacchiotti

• To better study predicate-argument inference in RTE• To experiment frame-RTE models on a gold-std corpus• To learn better SRL models, by training on FATE

Corpus is freely available on-line

Why should you use FATE ?

Page 17: FATE: a FrameNet Annotated corpus for Textual Entailment Marco Pennacchiotti, Aljoscha Burchardt Computerlinguistik Saarland University, Germany LREC 2008,

Thank you!Questions?

28/03/2008 FATE – Marco Pennacchiotti 17 / 17

FATE download: http://www.coli.uni-saarland.de/projects/salsa/fate

[email protected]

www.coli.uni-saarland.de/~pennacchiotti