Top Banner
Entity Typing Using Distributional Semantics and DBpedia Marieke van Erp and Piek Vossen
13

Entity Typing Using Distributional Semantics and DBpedia

Apr 16, 2017

Download

Technology

Marieke van Erp
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Entity Typing Using Distributional Semantics and DBpedia

Entity Typing Using Distributional Semantics and DBpediaMarieke van Erp and Piek Vossen

Page 2: Entity Typing Using Distributional Semantics and DBpedia

Conclusions

• Finegrained entity typing is necessary for semantic queries over text

• Search space for word2vec is large, topics help

• Combining Distributional Semantics with DBpedia can help overcome NIL and Dark Entities

https://github.com/MvanErp/entity-typing/

Page 3: Entity Typing Using Distributional Semantics and DBpedia

Dark entities: little or no information available in KB

https://github.com/MvanErp/entity-typing/

Page 4: Entity Typing Using Distributional Semantics and DBpedia

Dark entities: little or no information available in KB

https://github.com/MvanErp/entity-typing/

Page 5: Entity Typing Using Distributional Semantics and DBpedia

Distributional Semantics

• Similar concepts (denoted by words) occur in similar contexts

• Word2Vec (Mikolov et al., 2013) explores this notion in a popular implementation

SushiTeriyakiUdon

Okonomiyaki

SobaSashimi

KimonoYukataNemakiKFC

Steak

HamburgerMcDonald’s

JeansT-shirt

Skirt

Page 6: Entity Typing Using Distributional Semantics and DBpedia

Research Question:

• Can we predict the type of the concept ‘Sushi’ by modelling it in a distributional semantics space and comparing its vector to the vectors of concepts for which we do know the type?

SushiTeriyakiUdon

Okonomiyaki

SobaSashimi

KimonoYukataNemakiKFC

Steak

HamburgerMcDonald’s

JeansT-shirt

Skirt

Page 7: Entity Typing Using Distributional Semantics and DBpedia

Setup

• 7 Named Entity Linking Benchmark datasets (AIDA-YAGO, 2014 NEEL, 2015 NEEL, OKE2015, RSS500, WES2015, Wikinews)

• 3 Word2Vec models: GoogleNews, English Wikipedia, Reuters RCV1*

• Compare all entities within datasets to each other and return highest ranking type (as taken from DBpedia)

* AIDA-YAGO is part of Reuters RCV1

https://github.com/MvanErp/entity-typing/

Page 8: Entity Typing Using Distributional Semantics and DBpedia

Initial results

• Not so great?

https://github.com/MvanErp/entity-typing/

Page 9: Entity Typing Using Distributional Semantics and DBpedia

Initial results (some footnotes)

• Ranking approach favours fine-grained entity types

• The Word2Vec corpus matters! NEEL2014&2015 are derived from Tweets, typically low coverage when querying news

• Smaller datasets (Wikinews, WES2015, OKE2015) do better?

https://github.com/MvanErp/entity-typing/

Page 10: Entity Typing Using Distributional Semantics and DBpedia

Let’s zoom in on topics

• Initially, all entities within a benchmark dataset were compared to all other entities.

• What if we only compare entities from sports documents to other entities from sports documents?

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

AIDA−YAGO Coarsegrained Categories GoogleNews Fine

20

40

60

80

1001510

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

AIDA−YAGO Coarsegrained Categories RCV1 Fine

20

40

60

80

1001510

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

AIDA−YAGO Coarsegrained Categories Wikipedia Fine

20

40

60

80

1001510

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69

AIDA−YAGO Finegrained Categories GoogleNews Fine

20

40

60

80

1001510

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69

AIDA−YAGO Finegrained Categories RCV1 Fine

20

40

60

80

1001510

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69

AIDA−YAGO Finegrained Categories Wikipedia Fine

20

40

60

80

1001510

https://github.com/MvanErp/entity-typing/

Page 11: Entity Typing Using Distributional Semantics and DBpedia

Conclusions and Future Work

• Difficult task, but topics help

• Ranking needs to be improved

• Multi-class classification (KFC: food & organisation, Arnold Schwarzenegger: Actor & Politician)

• What else can we discover beyond type?

https://github.com/MvanErp/entity-typing/

Page 12: Entity Typing Using Distributional Semantics and DBpedia

Thank you!

https://github.com/MvanErp/entity-typing/

Page 13: Entity Typing Using Distributional Semantics and DBpedia

This research was made possible by the CLARIAH-CORE project financed by NWO.

http://www.clariah.nl