TDT 2004 Evaluation Workshop, NIST, December 2-3, 2004 Creating the TDT5 Corpus and 2004 Evaluation Topics at LDC Stephanie Strassel, Meghan Glenn, Junbo Kong Linguistic Data Consortium {strassel, mlglenn, junbok}@ldc.upenn.edu www.ldc.upenn.edu/Projects/TDT5
36
Embed
TDT 2004 Evaluation Workshop, NIST, December 2-3, 2004 Creating the TDT5 Corpus and 2004 Evaluation Topics at LDC Stephanie Strassel, Meghan Glenn, Junbo.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
TDT 2004 Evaluation Workshop, NIST, December 2-3, 2004
Creating the TDT5 Corpus and 2004 Evaluation Topics at LDC
Stephanie Strassel, Meghan Glenn, Junbo Kong
Linguistic Data Consortium
{strassel, mlglenn, junbok}@ldc.upenn.edu
www.ldc.upenn.edu/Projects/TDT5
TDT 2004 Evaluation Workshop, NIST, December 2-3, 2004
What’s new in TDT5?
Same fundamental concepts Story, event, topic
New multilingual corpus Much larger than previous corpora Newswire only
New topic selection strategy, more topics 250 topics; ~25% multilingual
New topic labeling strategy Search-guided, but time-limited
New annotation toolkit, infrastructure Multilingual, multiplatform, database Highly customized for TDT task
TDT 2004 Evaluation Workshop, NIST, December 2-3, 2004
Basic Concepts STORY
In TDT2, story is “a section containing at least two independent declarative clauses on same topic”
In TDT3, definition modified to capture annotators’ intuitions about what constitutes story Distinguish “preview/teaser” and complete news story
TDT4 preserves this content-based story definition In TDT5, no manual story segmentation
Newswire comes with story boundaries; all documents are stories
EVENT A specific thing that happens at a specific time and
place along with all necessary preconditions and unavoidable consequences
TOPIC An event or activity along with all directly related events
and activities
TDT 2004 Evaluation Workshop, NIST, December 2-3, 2004
Candidate topics reviewed for suitability as final topics Exclude same-language exact duplicates, but No avoidance of hierarchical or overlapping topics
• But no extra effort to include them Select range of topic types, sizes
• No avoidance of “singletons” Also consider annotator preferences
Later feeds
into topic definition
TDT 2004 Evaluation Workshop, NIST, December 2-3, 2004
2004 Evaluation Topics
250 final topics selected from candidates Equal balance across languages
Topics by Language
0
10
20
30
40
50
60
70
English Chinese Arabic Multi
English-Arabic
English-Chinese
English-Arabic- Chinese
TDT 2004 Evaluation Workshop, NIST, December 2-3, 2004
Topic Research
‒ Annotator spends up to 1 hour/topic web searching for information
‒ Fills in missing details‒ Provides context, scope‒ Annotators specialize in
particular topics (of their choosing)
‒ Create topic profile that includes brief narrative plus information like ‒ timelines‒ maps‒ keywords‒ named entities‒ links to other online resources
‒ Feeds directly into later annotation queries
Completed for each evaluation topic
TDT 2004 Evaluation Workshop, NIST, December 2-3, 2004
Topic Explication
After topic research, annotator provides topic explication Apply rule of interpretation to convert event to topic
13 rules state, for each type of seminal event, what other types of events are related
4. Natural Disasters e.g., 30002: Hurricane MitchSeminal events include: weather events (El Nino, tornadoes, hurricanes, floods, droughts), other natural events like volcanic eruptions, wildfires, famines and the like, rescue efforts, coverage of economic or human impact of the disaster. Topic includes: the causal (weather/natural) activity including predictions thereof, the disaster itself, victims and other losses, evacuations and rescue/relief efforts.
TDT 2004 Evaluation Workshop, NIST, December 2-3, 2004
Topic Definition
‒ Fixed format to enhance consistency
‒ Seminal event ‒ who/what/
when/where‒ Topic explication‒ Rule of
interpretation link‒ Topic research
link‒ Seed story link‒ Feeds directly into
topic annotation
After topic research and topic explication are complete, annotator creates final topic definition
TDT 2004 Evaluation Workshop, NIST, December 2-3, 2004
Annotation Strategy Overview
Search-guided annotation One topic at a time Multiple stages for each topic Two-way topic labeling decision Time-limited: no more than 3 hours per topic
Annotation may be incomplete for a given topic Relevance Labels
YES: story discusses the topic in a substantial way NO: story does not discuss the topic at all, or only mentions the
topic in passing without giving any information about the topic No BRIEF in TDT4 or TDT5 “Difficult Decision” label for tricky decisions
Completeness Judgment Each topic also marked “complete” or “incomplete” at
conclusion
TDT 2004 Evaluation Workshop, NIST, December 2-3, 2004
Submit seed story as query to search engine Read through resulting relevance-ranked list of 200documents Label each story as YES/NO Stop after finding 5-10 on-topic stories, or After reaching “off-topic threshold”
At least 2 off-topic stories for every 1 OT read AND The last 10 consecutive stories are off-topic
Stage 2: Topic profile-based queries (45 minutes) Issue new query drawn from text within topic research & topic definition Read and annotate stories in resulting relevance-ranked list until reaching
off-topic threshold Stage 3: Improved query using stories from Stage 1-2 (45 minutes)
Issue new query using concatenation of all or some known OT stories Read and annotate stories in resulting relevance-ranked list until reaching