CLEF 2009 - Kerkyra Robust – Word Sense Disambiguation exercise UBC: Eneko Agirre, Arantxa Otegi UNIPD: Giorgio Di Nunzio UH: Thomas Mandl.

CLEF 2009 - Kerkyra

Robust – Word Sense Disambiguation exercise

UBC: Eneko Agirre, Arantxa OtegiUNIPD: Giorgio Di NunzioUH: Thomas Mandl

CLEF 2009 - Kerkyra 2

Introduction

Robust: emphasize difficult topics using non-linear combination of topic results (GMAP)

WSD: also automatic word sense annotation: English documents and topics (English WordNet) Spanish topics (Spanish WordNet - closely linked to

the English WordNet) Participants explore how the word senses (plus the

semantic information in wordnets) can be used in IR and CLIR

This is the second edition of Robust-WSD

Documents

News collection: LA Times 94, Glasgow Herald 95 Sense information added to all content words

Lemma Part of speech Weight of each sense in WordNet 1.6

XML with DTD provided Two leading WSD systems:

National University of Singapore University of the Basque Country

Significant effort (100Mword corpus) Special thanks to Hwee Tou Ng and colleagues from

NUS and Oier Lopez de Lacalle from UBC

Documents: example XML

Topics

We used existing CLEF topics in English and Spanish:2001; 41-90; LA 942002; 91-140; LA 942004; 201-250; GH 952003; 141-200; LA 94, GH 952005; 251-300; LA 94, GH 952006; 301-350; LA 94, GH 95

First three as training (plus relevance judg.)

Last three for testing

Topics: WSD

English topics were disambiguated by both NUS and UBC systems

Spanish topics: no large-scale WSD system available, so we used the first-sense heuristic Word sense codes are shared between Spanish and

English wordnets Sense information added to all content words

Lemma Part of speech Weight of each sense in WordNet 1.6

XML with DTD provided

Topics: WSD example

Evaluation

Reused relevance assessments from previous years

Relevance assessment for training topics were provided alongside the training topics

MAP and GMAP Participants had to send at least one run

which did not use WSD and one run which used WSD

Participation

10 official participants 58 monolingual runs 31 bilingual runs

Monolingual Bilingual

Alicante X

Darmstadt X

Geneva X X

Ixa X X

Jaen X

Know-center X X

Reina X X

Ufrgs X X

Uniba X X

Valencia X

Monolingual results

MAP: non-WSD best, 2 participants improve using WSD GMAP: non-WSD best, 3 participants improve using WSD

Track Participant MAP GMAP ΔMAP ΔGMAP

English

1 darmstadt 45.09 20.42 - -

2 reina 44.52 21.18 - -

3 uniba 42.50 17.93 - -

4 geneva 41.71 17.88 - -

5 know-center 41.70 18.64 - -

English WSD

1 darmstadt 45.00 20.49 -0.09 +0.07

2 uniba 43.46 19.60 +0.96 +1.67

3 know-center 42.22 19.47 +0.52 +0.87

4 reina 41.23 18.38 -1.27 -2.80

5 geneva 38.11 16.26 -3.59 -2.38

Monolingual: using WSD

Darmstadt: combination of several indexes, including monolingual translation model

No improvement using WSD Reina: UNINE: synset indexes, combine with results from other indexes

Improvement in GMAP UCM: query expansion using structured queries

Improvement in MAP and GMAP IXA: use semantic relatedness to expand documents

No improvement using WSD GENEVA: synset indexes, expanding to synonyms and hypernyms

No improvement, except for some topics UFRGS: only use lemmas (plus multiwords)

Improvement in MAP and GMAP

Monolingual: using WSD

UNIBA: combine synset indexes (best sense) Improvements in MAP

Univ. of Alicante: expand to all synonyms of best sense

Improvement on train / decrease on test Univ. of Jaen: combine synset indexes (best sense)

No improvement, except for some topics

Bilingual results

MAP and GMAP: best results for non-WSD 2 participants increase GMAP using WSD, 2 increase MAP. Improvements

are rather small.

Track Participant MAP GMAP ΔMAP ΔGMAP

1 reina 38.42 15.11 - -

2 uniba 38.09 13.11 - -

3 know-center 28.98 06.79 - -

4 ufrgs 27.65 07.37 - -

5 Ixa 18.05 01.90 - -

Es-En WSD

1 uniba 37.53 13.82 -0.56 +0.71

2 geneva 36.63 16.02 - -

3 reina 30.32 09.38 -8.10 -5.73

4 know-center 29.64 07.05 +0.66 +0.26

5 ixa 18.38 01.98 +0.33 +0.08

Bilingual: using WSD

IXA: wordnets as the sole sources for translation Improvement in MAP

UNIGE: translation of topic for baseline No improvement

UFRGS: association rules from parallel corpora, plus use of lemmas (no WSD) No improvement

UNIBA: wordnets as the sole sources for translation Improvement in both MAP and GMAP

Conclusions and future

Successful participation 10 participants Use of word senses allows small improvements

on some stop scoring systems Further analysis ongoing:

Manual analysis of topics which get significant improvement with WSD

Significance tests (WSD non-WSD) No need of another round:

All necessary material freely available http://ixa2.si.ehu.es/clirwsd Topics, documents (no word order, Lucene indexes),

relevance assesments, WSD tags

CLEF 2009 - Kerkyra

Robust – Word Sense Disambiguation exercise

Thank you!

CLEF 2009 - Kerkyra Robust – Word Sense Disambiguation exercise UBC: Eneko Agirre, Arantxa Otegi UNIPD: Giorgio Di Nunzio UH: Thomas Mandl.

nonwsd best

difficult topics

wsdenglish topics

best results

existing clef topics

leading wsd systems

ubc systemsspanish topics

english documents

Documents

Principles of Landscape Ecology - e-learning unipd

Roberto Gazzara & NOXI @ Master Innovazione Unipd 2013 11...

LINEA GUIDA -...

Door Ronny van Pellecom -...

IONISCHE EILANDEN1 Dr. H.J. van Wiechen -...

En el asunto Otegi Mondragón contra España, ·...

Destination KERKYRA

15 03 27 unipd neurosviluppo

Esercizi Teconologia Meccanica UNIPD Ingegneria Meccanica...

LA FUERZA DE LA PAZ OTEGI, LA FUERZA Otegi · 2018. 1....

Lombardo - I Greci a Kerkyra Melaina

Directorio del Comité Vasco de Árbitros. 2013-2014 …...

2011 19th Mediterranean ; 1 - gbv.de · ContentListof19th.....

Kerkyra Yacht Manual

Giornata mondiale del rifugiato -...

Lezioni Finesso Segnali e sistemi unipd