Top Banner
Making Sense of Microposts (#Microposts2015) @ WWW2015 Named Entity rEcognition and Linking Challenge http://www.scc.lancs.ac.uk/microposts2015/challenge/
28
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: NEEL2015 challenge summary

Making Sense of Microposts (#Microposts2015) @ WWW2015

Named Entity rEcognition and Linking Challenge

http://www.scc.lancs.ac.uk/microposts2015/challenge/

Page 2: NEEL2015 challenge summary

NEEL challenge overview

➢ Challenging to make sense of Microposts○ they are very short text messages○ they contain abbreviations and typos○ they are “grammar free”

➢ The NEEL challenge aims to explore new approaches to foster research into novel, more accurate entity recognition and linking approaches tailored for Microposts

Page 3: NEEL2015 challenge summary

2013

2014

Information Extraction (IE)named entity recognition (4 types)

2015

Named Entity Extraction and Linking (NEEL)named entity extraction and linking to DBpedia 3.9 entries

Named Entity rEcognition and Linking (NEEL)named entity recognition (7 types) and linking to DBpedia 2014 entries

Page 4: NEEL2015 challenge summary

➢ normalization○ linguistic pre-processing and expansion of tweets

➢ entity recognition and linking○ sequential and semi-joint tasks○ large Knowledge Bases (such as DBpedia and

Yago) as lexical dictionaries and source of already existing relations among entities

○ supervised learning approaches to both predict the type of the entity given the linguistic and contextual similarity, and the link given the semantic similarity

○ unsupervised learning approaches for grouping similar lexical entities, affecting the entity resolution

Highlights of the submitted approaches over the 3-year challenge

Page 5: NEEL2015 challenge summary

Sponsorship

➢ Successfully obtained sponsorship each year○ highlights importance of this practical research○ importance extends BEYOND academia

➢ Sponsor has early access to results as senior PC member○ opportunity to liaise with participants to extend work

➢ Workshop and participants obtain greater exposure

Page 6: NEEL2015 challenge summary

➢ Italian company operating in the business of knowledge extraction and representation

➢ successfully participated in 2014 NEEL challenge, ranking 3rd overall

Page 7: NEEL2015 challenge summary

29 teams expressed intent to take part in the challenge

Page 8: NEEL2015 challenge summary

21 teams finally got involved and signed the agreement to access to

the NEEL challenge corpus

Page 9: NEEL2015 challenge summary

NEEL corpus

no. of tweets %

Training 3498 58.06

Development 500 8.3

Test 2027 33.64

Page 10: NEEL2015 challenge summary

NEEL Corpus details

➢ 6025 tweets○ events from 2011 and 2013 such the London Riots,

the Oslo bombing (cf. event-annotated tweets provided by the Redites project)

○ events in 2014 such as UCI Cyclo-cross World Cup

➢ Corpus available after having signed the NEEL Agreement Form (remains available by contacting [email protected])

Page 11: NEEL2015 challenge summary

Manual creation of the Gold Standard

3-step annotation1. unsupervised annotations, with intent to

extract candidate links which were used as input to the second stage. NERD-ML was used as off-the-shelf system

2. three human annotators analyzed and complemented the annotations. GATE was used as the workbench

3. one domain expert reviewed and resolved problematic cases

Page 12: NEEL2015 challenge summary

Evaluation protocol

Participants were asked to wrap their prototypes as a publicly accessible web service following a REST-based protocol

Widen the dissemination, ensure the reproducibility, the reuse, and the correctness of the results

Page 13: NEEL2015 challenge summary

Evaluation periods

D-Time to test the contending entries (REST APIs) submitted by the participants

T-Time for the final evaluation and metric computations

Page 14: NEEL2015 challenge summary

Submissions and Runs

➢ Paper submission○ describing approach taken○ identifying and detailing any limitations or

dependencies of approach

➢ Up to 10 contending entries○ best of 3 used for the final ranking

Page 15: NEEL2015 challenge summary

Evaluation scorer

TAC KBP official scorer https://github.com/wikilinks/neleval

Page 16: NEEL2015 challenge summary

Evaluation metrics

tagging strong_typed_mention_match (check entity name boundary and type)

linking strong_link_match

clustering mention_ceaf (NIL over the exact match of the entities)

latency computation time

Page 17: NEEL2015 challenge summary

Ranking strategy

rs = 0.4*clusteringF1 + 0.3*taggingF1 +

0.3*linkingF1

we resolved to the latency to sort draws

Page 18: NEEL2015 challenge summary

7 teams participated to the T-Time

Page 19: NEEL2015 challenge summary

Drop of 14 participantsdue to complexity i) of the challenge protocol, which has required broaden expertise in different domains such as Information Extraction, Data Semantics, and Web ii) generally low results

Page 20: NEEL2015 challenge summary

And the winner is ...

Page 21: NEEL2015 challenge summary

Ikuya Yamada, Hideaki Takeda and Yoshiyasu Takefuji

An End-to-End Entity Linking Approach for Tweets

Team Ousia

Page 22: NEEL2015 challenge summary

rank runid team name rs

1 9 ousia 0.80672 7 acubelab 0.47573 guru uva 0.47564 UNIBA-SUP uniba 0.43295 ualberta ualberta 0.38086 CEN_NEEL_1 cen_neel 0.0004

7 run2 tcs-iitkgp NCA*

NEEL Final Ranking

NCA = annotations not compliant with the NEEL specs

Page 23: NEEL2015 challenge summary

NEEL Final Rankingbreakdown per clusteringF1

rank runid team name clusteringF1

1 9 ousia 0.842 guru uva 0.6433 7 acubelab 0.5064 UNIBA-SUP uniba 0.4595 ualberta ualberta 0.3946 CEN_NEEL_1 cen_neel 0.0017 run2 tcs-iitkgp NCA

Page 24: NEEL2015 challenge summary

NEEL Final Rankingbreakdown per taggingF1

rank runid team name taggingF1

1 9 ousia 0.8072 guru uva 0.4123 7 acubelab 0.3884 UNIBA-SUP uniba 0.3675 ualberta ualberta 0.3296 CEN_NEEL_1 cen_neel 0

7 run2 tcs-iitkgp NCA

Page 25: NEEL2015 challenge summary

NEEL Final Rankingbreakdown per linkingF1

rank runid team name linkingF1

1 9 ousia 0.7622 7 acubelab 0.5234 UNIBA-SUP uniba 0.4645 ualberta ualberta 0.4153 guru uva 0.3166 CEN_NEEL_1 cen_neel 07 run2 tcs-iitkgp NCA

Page 26: NEEL2015 challenge summary

NEEL Final Rankingbreakdown per submission

Page 27: NEEL2015 challenge summary

rank team name runID taggingF1 clusteringF1 linkingF1 latency[ms] score

1 ousia 9 0.807 0.84 0.762 8500.99 +/- 3619.12 0.8067

2 ousia 5 0.68 0.843 0.762 8477.88 +/- 3596.47 0.7698

3 ousia 10 0.679 0.842 0.762 8493.38 +/-3562.96 0.7691

4 acubelab 7 0.388 0.506 0.523 127.97 +/- 21.84 0.4757

5 uva guru 0.412 0.643 0.316 186.95 +/- 88.53 0.4756

6 acubelab 6 0.385 0.506 0.524 126.55 +/- 20.31 0.4751

7 acubelab 9 0.386 0.504 0.52 126.54 +/- 19.16 0.4734

8 uva wiz 0.404 0.642 0.285 187.83 +/- 99.78 0.4635

9 uva qtip 0.383 0.595 0.318 1731.16 +/- 857.98 0.4483

10 uniba UNIBA-SUP 0.367 0.459 0.464 2034.75 +/- 2346.23 0.4329

11 ualberta ualberta 0.329 0.394 0.415 3406.43 +/- 7625.28 0.3808

12 unibaUNIBA-UNSUP 0.283 0.37 0.348 761.88 +/- 631.59 0.3373

13 cen_neelCEN_NEEL_

1 0 0.001 0 12366.61 +/- 27598.28 0.0004

14 tcs-iitkgp run2 NCA NCA NCA 12888.27 +/- 11654.02 NaN

15 tcs-iitkgp run4 NCA NCA NCA 12909.65 +/- 11593.13 NaN

16 tcs-iitkgp run10 NCA NCA NCA 12831.80 +/- 11538.43 NaN

Page 28: NEEL2015 challenge summary

Acknowledgements

The research leading to this work was partially supported by the European Union’s 7th Framework Programme via the projects LinkedTV