FATE: a FrameNet Annotated corpus for Textual Entailment Marco Pennacchiotti, Aljoscha Burchardt Computerlinguistik Saarland University, Germany LREC 2008,

FATE:a FrameNet Annotated corpus for

Textual Entailment

Marco Pennacchiotti, Aljoscha BurchardtComputerlinguistik

Saarland University, Germany

LREC 2008 , Marrakech , 28 May 2008

SALSA II - The Saarbrücken Lexical Semantics Acquisition Project

Summary

• FrameNet and Textual Entailment

• FATE annotation schema

• Annotation examples and statistics

• Conclusions

28/05/2008 2 / 17FATE - Marco Pennacchiotti

Frame Semantics• Frame: conceptual structure modeling a prototypical situation• Frame Elements (FE): participants of the situation• Frame Evoking elements (FEE): predicates evoking the situation

[Fillmore 1976, 2003]

Predicate-argument level normalizations

• FrameNet Berkeley Project 1

– Database of frames for the core lexicon of English– 800 frames, 10.000 lemmas, 135.000 annotated sentences

(1) http://framenet.icsi.berkeley.edu

“Evelyn spoke about her past” “Evelyn’s statement about her past”

STATEMENT(SPEAKER: Evelyn; TOPIC: her past)

Textual Entailment (TE)Given two text fragments, the Text T and the Hypothesis H,

T entails H if the meaning of H can be inferred from the meaning of T, as would typically interpreted by people

[Dagan 2005]

T: “Yahoo has recently acquired Overture” H: “Yahoo owns Overture”T H

• Recognizing Textual Entailment (RTE)– recognize if entailment holds for a given (T,H) pair– Models core inferences of many NLP applications (QA, IE, MT,…)

• RTE Challenges [Dagan et al.,2005 ; Giampiccolo et al., 2007]– Compare systems for RTE– Corpus: 800 training pairs, 800 test pairs, evenly split in + and - pairs

Predicate-argument and RTE• Predicate-level inference plays a relevant role in TE (20% of positive

examples in RTE-2 [Garoufi, 2007] )

An avalanche has struck a popular skiing resort in Austria, killing at least 11 people.

Humans died in an avalanche.

• Implementation gap :

• [Burchardt et al.,2007] : FrameNet system comparable to lexical overlap • [Hickl et al.,2006] : PropBank-based features are not effective• [Rana et al.,2005]: DIRT paraphrase repository does not help

DEATH(PROTAGONIST: 11 people / humans ; CAUSE: avalanche / avalanche )

FATE corpus

• Reference corpus : RTE-2 test set, 800 pairs, 29,000 tokens• Frame resource : FrameNet version 1.3• Corpus Format : SALSA/TIGER XML [Burchardt et al.,2006]

• Pre-processing : annotation on top of Collins parser syntactic analysis: T and H are randomly reordered to avoid biases

• Annotation : performed by one highly experienced annotator: inter-annotator agreement over 5% of the corpus

– FEE-agreement : 82%– Frame-agreement: 88%– Role-agreement: 91%

: annotation carried out using the SALTO tool 1

(1) http://www.coli.uni-saarland.de/projects/salsa/salto/doc

FATE: a manually frame-annotated Textual Entailment corpus, to study the role of frame semantics in RTE

FATE annotation process: an example

Collins synt. an. Collins synt. an.

full-text annotation (all words considered) [Ruppenhofer,2007]

frameframe

FEEFEE

frameframe

FEEFEE FE fillerFE filler

Maximization principle: chose the largest constituent possible when annotating

Annotation Schema

• Intuition: annotate as FEE only those words evoking a relevant situation (frame) in the sentence at hand– Very intuitive flavor, but high agreement: 83% on a

pilot set of 15 sentences

Relevance Principle

“Authorities in Brazil hold 200 people as hostage”

LEADERSHIP DETAIN PEOPLE KIDNAPPING

VICTIMPLACEPERPETRATOR

Annotation Schema

• On T of positive pairs, annotate only the fragments (spans) contributing to the inferential process– Spans are obtained from the ARTE annotation

[Garoufi,2007]– For negative pairs it is not straightforward to derive spans,

hence we do full annotation

Span Annotation

T: “Soon after the EZLN had returned to Chiapas, Congress approved a different version of the COCOPA Law, which did not include the autonomy clauses, claiming they were in contradiction with some constitutional rights (private property and secret voting); this was seen as a betrayal by the EZLN and other political groups.”

H: “EZLN is a political group.”

Annotation Schema

• Unknown frames: use an UNKNOWN frame for words evoking situations not present in the FrameNet database

• Anaphora

• Copula and support verbs

• Modal expressions

• Metaphors

• Existential constructions

• …

Other guidelines

Corpus statistics

• Annotated pairs : 800 (400 positive, 400 negatives)

• Annotated frames : 4,500: avg. 5.6 frames per pair: 1,600 frames in positive pairs: 2,800 in negative pairs

• Annotated roles : 9,500:avg. 2.1 roles per frame

• Annotation time : 230 hours: 90 h for positive pairs (13 min/pair): 140 h for negative pairs (21 min/pair)

FrameNet and RTE (simple case)

• Syntactic normalization– Active / Passive

EDUCATIONAL_TEACHING(STUDENT: ground soldiers / soldiers; MATERIAL: virtual reality/ virtual reality)

(1) Resource coverage is too low(2) Models for predicate-argument inference are weak(3) Automatic annotation models (SRL) are not good

enough to be safely used in RTE

Implementation gap insights

• FrameNet coverage is good:– 373 Unknown frames (8 % of total frames)– Unknown roles 1 % of total roles

• Coverage is unlikely to be a limiting factor for using FrameNet in applications

(1) Resource coverage is too low(2) Models for predicate-argument inference are weak(3) Automatic annotation models (SRL) are not good

enough to be safely used in RTE

• To better study predicate-argument inference in RTE• To experiment frame-RTE models on a gold-std corpus• To learn better SRL models, by training on FATE

Corpus is freely available on-line

Why should you use FATE ?

Thank you!Questions?

28/03/2008 FATE – Marco Pennacchiotti 17 / 17

FATE download: http://www.coli.uni-saarland.de/projects/salsa/fate

pennacchiotti@coli.uni-sb.de

www.coli.uni-saarland.de/~pennacchiotti

FATE: a FrameNet Annotated corpus for Textual Entailment Marco Pennacchiotti, Aljoscha Burchardt Computerlinguistik Saarland University, Germany LREC 2008,

text t

meaning of t

rte slide

rte corpus

given t

meaning of h

fate corpus reference

frame semantics frame

Documents

Assessing the Impact of Frame Semantics on Textual...

Day 4: Dominance Graphs, Round Two Aljoscha Burchardt ...

SALSA The Saarbrücken Lexical Semantics Annotation &...

2010/2/4Yi-Ting Huang Pennacchiotti, M., & Zanzotto, F. M......

Public Report No. 1 - CORDIS€¦ · Public Report No. 1...

NEFIS (WP5) Evaluation Meeting, 15-16 November 2004...

Grzegorz Burchardt Wizualizacje PORTFOLIO

Aljoscha Heims - kunstakademie-karlsruhe.de

Apache Flink for IoT: How Event-Time Processing Enables Easy...

Ley Televisión Digital Terrestre Presentación a la...

Aljoscha Requardt, University of Hamburg NEFIS WP5:...

IMG 0001 - Kai Burchardt Boxing for personality · Title:.....

Burchardt, Susann Grenzen und institutionelle Chancen...

The SALSA Corpus: a German corpus resource for lexical...

Aljoscha Burchardt, Alexander Koller, Stephan Walter,...

IMG 0003 - Kai Burchardt Boxing for personality · Title:.....