Top Banner
Dynamic Collective Entity Representations for Entity Ranking David Graus, Manos Tsagkias, Wouter Weerkamp, Edgar Meij, Maarten de Rijke
22

Dynamic Collective Entity Representations for Entity Ranking

Apr 15, 2017

Download

Science

David Graus
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Dynamic Collective Entity Representations for Entity Ranking

Dynamic Collective Entity Representations for Entity RankingDavid Graus, Manos Tsagkias, Wouter Weerkamp, Edgar Meij, Maarten de Rijke

Page 2: Dynamic Collective Entity Representations for Entity Ranking

2

Page 3: Dynamic Collective Entity Representations for Entity Ranking

3

Page 4: Dynamic Collective Entity Representations for Entity Ranking

4

Entity search?

Ò Index = Knowledge Base (= Wikipedia) Ò Documents = Entities Ò “Real world entities” have a single representation

(in KB)

Page 5: Dynamic Collective Entity Representations for Entity Ranking

5

Representation is not static

Ò Associations between words and entities change over time Ò “ferguson shooting” -> Ferguson, Missouri

Ò People talk about entities all the time

Page 6: Dynamic Collective Entity Representations for Entity Ranking

6

*****

Page 7: Dynamic Collective Entity Representations for Entity Ranking

7

Dynamic Collective Entity Representations

Ò Use “collective intelligence” to mine entity descriptions to enrich representation. Ò Is like document expansion (add terms found

through explicit links) Ò Is not query expansion (terms found through

predicted links)

Page 8: Dynamic Collective Entity Representations for Entity Ranking

8

Advantages

Ò Cheap: Change document in index, leverage tried & tested retrieval algorithms

Ò Free “smoothing”: (e.g., tweets) may capture ‘newly evolving’ word associations (Ferguson shooting) and incorporate out-of-document terms

Ò “move relevant documents closer to queries” (= close the gap between searcher vocabulary & docs in index)

Page 9: Dynamic Collective Entity Representations for Entity Ranking

9

Haven’t we seen this before?

Ò Anchors & queries in particular have been shown to improve retrieval [1]

Ò Tweets have been shown to be similar to anchors [2] Ò Social tags, same [3] Ò But: in batch (i.e., add data, see if/how it improves

retrieval)

[1] T. Westerveld, W. Kraaij, and D. Hiemstra. Retrieving web pages using content, links, urls and anchors. TREC 2001 [2] G. Mishne and J. Lin. Twanchor text: A preliminary study of the value of tweets as anchor text. SIGIR ’12

[3] C.-J. Lee and W. B. Croft. Incorporating social anchors for ad hoc retrieval. OAIR ’13

Page 10: Dynamic Collective Entity Representations for Entity Ranking

10

Description sourcesDescription sources

KBWikipedia dump (Aug ‘14)57M descriptions for 4.8M entities.

Web anchorsAnchors from Google WikiLinks corpus.9.8M descriptions for 876,063 entities.

TweetsTweets w/ links to Wikipedia pages (2011-2014)52,631 descriptions for 38,269 entities.

QueriesQueries from MSN query logs that yield Wikipedia clicks.47,002 descriptions for 18,724 entities.

Social tagsDelicious tags for Wiki pages, from the SocialBM0311 corpus.4.4M descriptions for 289,015 entities.

Dynamic sources

Static sources

Page 11: Dynamic Collective Entity Representations for Entity Ranking

11

Original entity representation

Tupac ShakurTupac Amaru Shakur (Previously known as Lesane Parish Crooks)(too-pahk shə-koor;[1] June 16, 1971 – Septem-ber 13, 1996), also known by his stage names 2Pac and (briefly) Makaveli, was an American rapper, author,

actor, and poet.[2] As of 2007, Shakur has sold over 75 million records worldwide, making him one of the best-selling music artists of all time.[3] His double disc albums All Eyez on Me and his Greatest Hits are among the [...]

Original entity description

Entity description

Page 12: Dynamic Collective Entity Representations for Entity Ranking

12

Static description sources

KB Anchors2PacTupacMakaveli

KB Linked entitiesThe Notorious B.I.G.Black Panther PartyMuammar Gaddafi

KB Redirects2pac ShakurThug Immortal

KB CategoriesMurdered RappersDeath Row Record ArtistsAmerican deists

Web AnchorsWhat job did Tupac have before he was a rapper

Tupac

Tupac is arguably more influential

Tupac Amaru Shakur

Tupac Shakur-style drive-by shooting

Tupac Shakur

Tupac Shakur reciting Shake-speare at art school

Description sources

KBWikipedia dump (Aug ‘14)57M descriptions for 4.8M entities.

Web anchorsAnchors from Google WikiLinks corpus.9.8M descriptions for 876,063 entities.

TweetsTweets w/ links to Wikipedia pages (2011-2014)52,631 descriptions for 38,269 entities.

QueriesQueries from MSN query logs that yield Wikipedia clicks.47,002 descriptions for 18,724 entities.

Social tagsDelicious tags for Wiki pages, from the SocialBM0311 corpus.4.4M descriptions for 289,015 entities.

Dynamic sources

Static sources

Description sources

KBWikipedia dump (Aug ‘14)57M descriptions for 4.8M entities.

Web anchorsAnchors from Google WikiLinks corpus.9.8M descriptions for 876,063 entities.

TweetsTweets w/ links to Wikipedia pages (2011-2014)52,631 descriptions for 38,269 entities.

QueriesQueries from MSN query logs that yield Wikipedia clicks.47,002 descriptions for 18,724 entities.

Social tagsDelicious tags for Wiki pages, from the SocialBM0311 corpus.4.4M descriptions for 289,015 entities.

Dynamic sources

Static sources

KB Anchors2PacTupacMakaveli

KB Linked entitiesThe Notorious B.I.G.Black Panther PartyMuammar Gaddafi

KB Redirects2pac ShakurThug Immortal

KB CategoriesMurdered RappersDeath Row Record ArtistsAmerican deists

Web AnchorsWhat job did Tupac have before he was a rapper

Tupac

Tupac is arguably more influential

Tupac Amaru Shakur

Tupac Shakur-style drive-by shooting

Tupac Shakur

Tupac Shakur reciting Shake-speare at art school

Page 13: Dynamic Collective Entity Representations for Entity Ranking

13

Dynamic description sourcesDynamic expansions

tupac and the law

hiphop/icons

dead rappers

people influenced by tupac

awesomeartist rapd

Happy Birthday Tupac!!! 2Pac Gemini

RT: Las cenizas de Tupac, el mejor rapero de la historia,-fueron mezcladas con marihuana y fumadas por miembros de Outlawz

Even more crazy that this was an-nounced just one day before what would have been Pac’s 40th birth-day.

Tweets TagsQueries

Description sources

KBWikipedia dump (Aug ‘14)57M descriptions for 4.8M entities.

Web anchorsAnchors from Google WikiLinks corpus.9.8M descriptions for 876,063 entities.

TweetsTweets w/ links to Wikipedia pages (2011-2014)52,631 descriptions for 38,269 entities.

QueriesQueries from MSN query logs that yield Wikipedia clicks.47,002 descriptions for 18,724 entities.

Social tagsDelicious tags for Wiki pages, from the SocialBM0311 corpus.4.4M descriptions for 289,015 entities.

Dynamic sources

Static sources

Description sources

KBWikipedia dump (Aug ‘14)57M descriptions for 4.8M entities.

Web anchorsAnchors from Google WikiLinks corpus.9.8M descriptions for 876,063 entities.

TweetsTweets w/ links to Wikipedia pages (2011-2014)52,631 descriptions for 38,269 entities.

QueriesQueries from MSN query logs that yield Wikipedia clicks.47,002 descriptions for 18,724 entities.

Social tagsDelicious tags for Wiki pages, from the SocialBM0311 corpus.4.4M descriptions for 289,015 entities.

Dynamic sources

Static sources

Description sources

KBWikipedia dump (Aug ‘14)57M descriptions for 4.8M entities.

Web anchorsAnchors from Google WikiLinks corpus.9.8M descriptions for 876,063 entities.

TweetsTweets w/ links to Wikipedia pages (2011-2014)52,631 descriptions for 38,269 entities.

QueriesQueries from MSN query logs that yield Wikipedia clicks.47,002 descriptions for 18,724 entities.

Social tagsDelicious tags for Wiki pages, from the SocialBM0311 corpus.4.4M descriptions for 289,015 entities.

Dynamic sources

Static sources

Page 14: Dynamic Collective Entity Representations for Entity Ranking

14

Challenge

Ò Heterogeneity 1. Description sources 2. Entities

Ò Dynamic nature Ò Content changes over time

Page 15: Dynamic Collective Entity Representations for Entity Ranking

15

Adaptive ranking

Ò Supervised single-field weighting model Ò Features:

Ò field similarity: retrieval score per field. Ò field “importance”: length, novel terms, etc. Ò entity “importance”: time since last update.

Ò Learn optimal field weights from clicks

Supervised single-field weighting modelEeach field’s contribution towards the final score is individually weighted, learned from clicks at set intervals.

Page 16: Dynamic Collective Entity Representations for Entity Ranking

16

Experimental setup

1. Data: Ò MSN Query log (62,841 queries that yield entity clicks)

Ò For each query: Ò Produce ranking Ò Observe click Ò Evaluate ranking (MAP/P@1) Ò Expand entities (w/ descriptions from dynamic

sources) Ò [re-train ranker]

Page 17: Dynamic Collective Entity Representations for Entity Ranking

17

Results

Ò Comparing effectiveness of diff. description sources

Ò Comparing adaptive vs. non-adaptive ranker performance

Page 18: Dynamic Collective Entity Representations for Entity Ranking

18

Description sources

0.60

0.50

0.51

0.52

0.53

0.54

0.55

0.56

0.57

0.58

0.59

0 5000 10000 15000 20000 25000 30000

Page 19: Dynamic Collective Entity Representations for Entity Ranking

19

Feature weights over time

Page 20: Dynamic Collective Entity Representations for Entity Ranking

20

Adaptive vs. non-adaptive ranking

0.60

0.50

0.51

0.52

0.53

0.54

0.55

0.56

0.57

0.58

0.59

0 5000 10000 15000 20000 25000 30000

Page 21: Dynamic Collective Entity Representations for Entity Ranking

21

In summary

Ò Expanding entity representations with different sources enables better matching of queries to entities

Ò As new content comes in, it is beneficial to retrain the ranker

Ò Informing ranker of “expansion state” further improves performance

Page 22: Dynamic Collective Entity Representations for Entity Ranking

22

Thank you