Top Banner
RESLVE: Leveraging User Interest to Improve En6ty Disambigua6on on Short Text Elizabeth L. Murnane [email protected] Bernhard Haslhofer [email protected] Carl Lagoze [email protected]
150

RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Jul 09, 2015

Download

Technology

These are the presentation slides for the paper "RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text", which was named the Best Paper at the Web of Linked Entities (WoLE'13) workshop at the 22nd International World Wide Web Conference (WWW'13). The paper's abstract is below, along with a link to the full paper.

Abstract:
We address the Named Entity Disambiguation (NED) problem for short, user-generated texts on the social Web. In such settings, the lack of linguistic features and sparse lexical context result in a high degree of ambiguity and sharp performance drops of nearly 50% in the accuracy of conventional NED systems. We handle these challenges by developing a model of user-interest with respect to a personal knowledge context; and Wikipedia, a particularly well-established and reliable knowledge base, is used to instantiate the procedure. We conduct systematic evaluations using individuals' posts from Twitter, YouTube, and Flickr and demonstrate that our novel technique is able to achieve substantial performance gains beyond state-of-the-art NED methods.

Full Paper: http://arxiv.org/abs/1304.2401
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

RESLVE:  Leveraging  User  Interest  to  Improve  En6ty  Disambigua6on  on  Short  Text  

Elizabeth  L.  Murnane  [email protected]  Bernhard  Haslhofer  [email protected]  Carl  Lagoze  [email protected]  

Page 2: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

A  Personalized  Approach  to  Entity  Resolution  

Background  •  Task  Defini6ons  •  Challenges  &  Examples  •  ADempted  Solu6ons  

Approach  •  Mo6va6ons  •  Modeling  a  Knowledge  Context  •  Implementa6on:  The  RESLVE  System  

Evalua2on  •  Experiments  •  Results  •  Future  Work  

Page 3: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

A  Personalized  Approach  to  Entity  Resolution  

Background  •  Task  Defini6ons  •  Challenges  &  Examples  •  ADempted  Solu6ons  

Approach  •  Mo6va6ons  •  Modeling  a  Knowledge  Context  •  Implementa6on:  The  RESLVE  System  

Evalua2on  •  Experiments  •  Results  •  Future  Work  

Page 4: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text
Page 5: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Social  Web  

10  million    pages  per  day  

Page 6: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Social  Web  

800  million    visitors  per  month  

Page 7: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Social  Web  

7  billion  images  (twice  4  years  ago)  

Page 8: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Task  Definition

Page 9: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Task  Definition Named  En2ty  Recogni2on  (NER)  

•  Systema6cally  iden6fying  men6ons  of  en##es  (e.g.,  people,  places,  concepts,  ideas)  

Page 10: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Task  Definition Named  En2ty  Recogni2on  (NER)  

•  Systema6cally  iden6fying  men6ons  of  en##es  (e.g.,  people,  places,  concepts,  ideas)  

Named  En2ty  Disambigua2on  (NED)  Resolving  the  intended  meaning  of  ambiguous  en66es  from  mul6ple  candidate  meanings  

Page 11: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Ambiguous  Entities  

aaahh  one  more  day  un,l  finn!!!  #cantwait        

office  holiday  party   Beetle  

Page 12: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Ambiguous  Entities  

aaahh  one  more  day  un,l  finn!!!  #cantwait        

office  holiday  party   Beetle  

Page 13: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Ambiguous  Entities  

aaahh  one  more  day  un,l  finn!!!  #cantwait        

office  holiday  party   Beetle  

Page 14: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Ambiguous  Entities  

aaahh  one  more  day  un,l  finn!!!  #cantwait        

office  holiday  party   Beetle  

Page 15: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Footage:  

office  holiday  party  

Page 16: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

office  holiday  party  

Footage:  • Workplace?  

Page 17: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

office  holiday  party  

Footage:  • Workplace?  • TV  Show?  

Page 18: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

office  holiday  party  

Episode  4  

Footage:  • Workplace?  • TV  Show?  

Page 19: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

office  holiday  party  

Episode  4  

Footage:  • Workplace?  • TV  Show?  

• US  Version?  • UK  Version?  

Page 20: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Episode  4  

office  holiday  party  

office,  december  3  

Footage:  • Workplace?  • TV  Show?  

• US  Version?  • UK  Version?  

Page 21: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Challenges  &  Focus  

Page 22: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Challenges  &  Focus  

•  Short  Length  

Page 23: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Challenges  &  Focus  

•  Short  Length  •  Sparse  Lexical  Context  

Page 24: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Challenges  &  Focus  

•  Short  Length  •  Sparse  Lexical  Context  • Noisy  

Page 25: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Challenges  &  Focus  

•  Short  Length  •  Sparse  Lexical  Context  • Noisy  • Highly  personal  in  nature  

Page 26: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Challenges  &  Focus  

•  Short  Length  •  Sparse  Lexical  Context  • Noisy  • Highly  personal  in  nature  

Page 27: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Limitations  of  Extant  Research  Tweets  severely  degrade  tradi6onal  techniques    

Page 28: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Limitations  of  Extant  Research  Tweets  severely  degrade  tradi6onal  techniques  

•  Stanford  NER:  F1  drops  90%  à  46%  • DBPedia  Spotlight  &  Wikipedia  Miner:  P@1  <  40%  

Page 29: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Limitations  of  Extant  Research  Tweets  severely  degrade  tradi6onal  techniques  

•  Stanford  NER:  F1  drops  90%  à  46%  • DBPedia  Spotlight  &  Wikipedia  Miner:  P@1  <  40%  

 Recent  strategies  

Page 30: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Limitations  of  Extant  Research  Tweets  severely  degrade  tradi6onal  techniques  

•  Stanford  NER:  F1  drops  90%  à  46%  • DBPedia  Spotlight  &  Wikipedia  Miner:  P@1  <  40%  

 Recent  strategies  

• Crowd-­‐sourcing  •  Limita6on:  Dependent  on  reliable  human  workers  

Page 31: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Tweets  severely  degrade  tradi6onal  techniques  •  Stanford  NER:  F1  drops  90%  à  46%  • DBPedia  Spotlight  &  Wikipedia  Miner:  P@1  <  40%  

 Recent  strategies  

• Crowd-­‐sourcing  •  Limita6on:  Dependent  on  reliable  human  workers  

• Automated  aDempts  •  Limita6on:  Focus  on  NER  not  NED  •  Limita6on:  Generalizability  beyond  TwiDer?  

 

Limitations  of  Extant  Research  

Page 32: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Challenges  &  Focus  

•  Short  Length  •  Sparse  Lexical  Context  • Noisy  • Highly  personal  in  nature  

Page 33: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

• User’s  past  content  on  same  plaeorm  not  feasible  background  corpus  

Challenges  &  Focus  

•  Short  Length  •  Sparse  Lexical  Context  • Noisy  • Highly  personal  in  nature  

Page 34: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Task  Definition                

Our  focus:  disambigua2ng  any  en2ty  detected  in  users’  text-­‐based  uNerances  on  social  Web  

Named  En2ty  Recogni2on  (NER)  •  Systema6cally  iden6fying  men6ons  of  en##es  (e.g.,  people,  places,  concepts,  ideas)  

Named  En2ty  Disambigua2on  (NED)  Resolving  the  intended  meaning  of  ambiguous  en66es  from  mul6ple  candidate  meanings  

Page 35: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Exploring  a  Personalized  Solution  •  Individual-­‐centric  approach  to  NED  

Page 36: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Exploring  a  Personalized  Solution  •  Individual-­‐centric  approach  to  NED    •  Incorporates  external,  user-­‐specific  seman6c  data   Personal  

Context  

Page 37: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Exploring  a  Personalized  Solution  •  Individual-­‐centric  approach  to  NED    •  Incorporates  external,  user-­‐specific  seman6c  data  

• Model  personal  interests  with  respect  to  this  informa6on  

Personal  Context  

Page 38: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Exploring  a  Personalized  Solution  •  Individual-­‐centric  approach  to  NED    •  Incorporates  external,  user-­‐specific  seman6c  data  

• Model  personal  interests  with  respect  to  this  informa6on  

• Determine  user’s  likely  intended  meaning  of  ambiguous  en6ty  based  on  similarity  between  poten6al  meanings  and  interests  

Personal  Context  

Page 39: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Exploring  a  Personalized  Solution  •  Individual-­‐centric  approach  to  NED    •  Incorporates  external,  user-­‐specific  seman6c  data  

• Model  personal  interests  with  respect  to  this  informa6on  

• Determine  user’s  likely  intended  meaning  of  ambiguous  en6ty  based  on  similarity  between  poten6al  meanings  and  interests  

RESLVE  Resolving  En6ty  Sense  by  LeVeraging  Edits  

 

Personal  Context  

Page 40: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Background  •  Task  Defini6ons  •  Challenges  &  Examples  •  ADempted  Solu6ons  

Approach  •  Mo6va6ons  •  Modeling  a  Knowledge  Context  •  Implementa6on:  The  RESLVE  System  

Evalua2on  •  Experiments  •  Results  •  Future  Work  

Agenda  

Page 41: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Underlying  Assumptions  

Page 42: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Underlying  Assumptions  • User  has  core  interests  

•  User  more  likely  to  men6on  an  en6ty  about  a  topic  relevant  to  personal  interests  than  men6on  a  topic  of  non-­‐interest  

 User  expresses  these  interests  consistently  in  content  she  posts  online  in  mul6ple  communi6es  

Can  use  a  seman6c  knowledge  base  to  formally  represent  these  topics  of  interest  

           

Page 43: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Underlying  Assumptions  • User  has  core  interests  

•  User  more  likely  to  men6on  an  en6ty  about  a  topic  relevant  to  personal  interests  than  men6on  a  topic  of  non-­‐interest  

 • User  expresses  these  interests  consistently  in  content  she  posts  online  in  mul6ple  communi6es  

Can  use  a  seman6c  knowledge  base  to  formally  represent  these  topics  of  interest  

           

Page 44: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Underlying  Assumptions  • User  has  core  interests  

•  User  more  likely  to  men6on  an  en6ty  about  a  topic  relevant  to  personal  interests  than  men6on  a  topic  of  non-­‐interest  

 • User  expresses  these  interests  consistently  in  content  she  posts  online  in  mul6ple  communi6es  

•  Can  use  a  seman6c  knowledge  base  to  formally  represent  these  topics  of  interest  

           

Page 45: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Underlying  Assumptions  • User  has  core  interests  

•  User  more  likely  to  men6on  an  en6ty  about  a  topic  relevant  to  personal  interests  than  men6on  a  topic  of  non-­‐interest  

 • User  expresses  these  interests  consistently  in  content  she  posts  online  in  mul6ple  communi6es  

•  Can  use  a  seman6c  knowledge  base  to  formally  represent  these  topics  of  interest  

Ø Bridge  user  iden6ty  between  social  Web  and  knowledge  base,  K  Ø Model  interests  using  K’s  organiza6onal  scheme  Ø Rank  en6ty  senses  according  to  relevance  to  interests  

Page 46: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Qualitative  Analysis:  Stable  Interests  

Page 47: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Qualitative  Analysis:  Stable  Interests  User’s  topics  of  contribu6on  similar  across  Web:    

   

   On  average,  52.4%  of  en66es  a  user  men6ons  in  social  Web  (e.g.,  “Java”)  have  at  least  1  candidate  sense  in  same  parent  category  of  Wikipedia  ar6cle  same  user  edited  (e.g.,  “Programming  language”)  If  extend  to  just  4  parents  up  category  hierarchy,  get  all  100%  

 

Page 48: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Qualitative  Analysis:  Stable  Interests  User’s  topics  of  contribu6on  similar  across  Web:    

Same  Topics  

   On  average,  52.4%  of  en66es  a  user  men6ons  in  social  Web  (e.g.,  “Java”)  have  at  least  1  candidate  sense  in  same  parent  category  of  Wikipedia  ar6cle  same  user  edited  (e.g.,  “Programming  language”)  If  extend  to  just  4  parents  up  category  hierarchy,  get  all  100%  

 

   

 

Ambiguous  YouTube  post:    office,  december  3  

 

Same  user’s  recent  Wikipedia  edit:    <item  userid="xxxx"  user="xxxx”  pageid="31841130”  ,tle=    "The  Office  (U.S.  season  8)"/>    

Page 49: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Qualitative  Analysis:  Stable  Interests  User’s  topics  of  contribu6on  similar  across  Web:    

Same  Topics  

Same  categories  • On  average,  52.4%  of  en66es  a  user  men6ons  in  social  Web  (e.g.,  “Java”)  have  at  least  1  candidate  sense  in  same  parent  category  of  Wikipedia  ar6cle  same  user  edited  (e.g.,  “Programming  language”)  

•  If  extend  to  just  4  parents  up  category  hierarchy,  get  all  100%  

 

   

 

Ambiguous  YouTube  post:    office,  december  3  

 

Same  user’s  recent  Wikipedia  edit:    <item  userid="xxxx"  user="xxxx”  pageid="31841130”  ,tle=    "The  Office  (U.S.  season  8)"/>    

Page 50: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Theoretical  Motivations  

Page 51: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Theoretical  Motivations  • Online  Contribu6on:  

•  Users  produce  online  content  about  key  set  of  personally-­‐interes6ng  topics  because  it  is  fulfilling  and  seen  as  having  beDer  cost  benefit  

•  (Harper  et  al.,  2007;  Lakhani  &  von  Hippel,  2003;  Lerner  &  Tirole,  2000;  Ling  et  al.,  2006;  Maslow,  1970)  

   

Page 52: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Theoretical  Motivations  • Online  Contribu6on:  

•  Users  produce  online  content  about  key  set  of  personally-­‐interes6ng  topics  because  it  is  fulfilling  and  seen  as  having  beDer  cost  benefit  

•  (Harper  et  al.,  2007;  Lakhani  &  von  Hippel,  2003;  Lerner  &  Tirole,  2000;  Ling  et  al.,  2006;  Maslow,  1970)  

• Modeling  Interests:  •  Effec6ve  to  model  these  topic  interests  from  lexical  features  of  these  text-­‐based  contribu6ons  

•  (Chen  et  al.,  2010;  Cosley  et  al.,  2007;  Pennacchioq  &  Popescu,  2011)  

 

Page 53: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Modeling  a  Knowledge  Context  

•  Knowledge  base,  K  

•  K=(N,E)  

•  2  node  types:  •  Categories  •  Topics  

c1c2

c4

t3t2

c3

d2d1 d3

t1

Page 54: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

The  Knowledge  Graph  

Page 55: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

The  Knowledge  Graph  

•  Category  nodes:  NCategory⊂N  

Page 56: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

The  Knowledge  Graph  

•  Category  nodes:  NCategory⊂N  

Page 57: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

The  Knowledge  Graph  

•  Category  nodes:  NCategory⊂N  

Page 58: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

The  Knowledge  Graph  

•  Category  nodes:  NCategory⊂N  

Page 59: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

The  Knowledge  Graph  

•  Category  nodes:  NCategory⊂N  •  Unique  iden6fier  

Page 60: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

The  Knowledge  Graph  

•  Category  nodes:  NCategory⊂N  •  Unique  iden6fier  •  Seman6c  rela6onships  with  other  nodes  

 

Page 61: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

The  Knowledge  Graph  

•  Category  nodes:  NCategory⊂N  •  Unique  iden6fier  •  Seman6c  rela6onships  with  other  nodes  

•  Topic  nodes:  NTopic⊂N    

Page 62: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

The  Knowledge  Graph  

•  Category  nodes:  NCategory⊂N  •  Unique  iden6fier  •  Seman6c  rela6onships  with  other  nodes  

•  Topic  nodes:  NTopic⊂N  

Page 63: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

The  Knowledge  Graph  

•  Category  nodes:  NCategory⊂N  •  Unique  iden6fier  •  Seman6c  rela6onships  with  other  nodes  

•  Topic  nodes:  NTopic⊂N  

 

Page 64: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

The  Knowledge  Graph  

•  Category  nodes:  NCategory⊂N  •  Unique  iden6fier  •  Seman6c  rela6onships  with  other  nodes  

•  Topic  nodes:  NTopic⊂N    •  Unique  iden6fier  

 

Page 65: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

The  Knowledge  Graph  

•  Category  nodes:  NCategory⊂N  •  Unique  iden6fier  •  Seman6c  rela6onships  with  other  nodes  

•  Topic  nodes:  NTopic⊂N    •  Unique  iden6fier  

Page 66: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

The  Knowledge  Graph  

•  Category  nodes:  NCategory⊂N  •  Unique  iden6fier  •  Seman6c  rela6onships  with  other  nodes  

•  Topic  nodes:  NTopic⊂N    •  Unique  iden6fier  

Page 67: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

The  Knowledge  Graph  

•  Category  nodes:  NCategory⊂N  •  Unique  iden6fier  •  Seman6c  rela6onships  with  other  nodes  

•  Topic  nodes:  NTopic⊂N    •  Unique  iden6fier  •  Belongs  to  one  or  more  categories  

 

Page 68: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

The  Knowledge  Graph  

•  Category  nodes:  NCategory⊂N  •  Unique  iden6fier  •  Seman6c  rela6onships  with  other  nodes  

•  Topic  nodes:  NTopic⊂N    •  Unique  iden6fier  •  Belongs  to  one  or  more  categories  

 

Page 69: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

The  Knowledge  Graph  

•  Category  nodes:  NCategory⊂N  •  Unique  iden6fier  •  Seman6c  rela6onships  with  other  nodes  

•  Topic  nodes:  NTopic⊂N    •  Unique  iden6fier  •  Belongs  to  one  or  more  categories  

 

Page 70: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

The  Knowledge  Graph  

•  Category  nodes:  NCategory⊂N  •  Unique  iden6fier  •  Seman6c  rela6onships  with  other  nodes  

•  Topic  nodes:  NTopic⊂N    •  Unique  iden6fier  •  Belongs  to  one  or  more  categories  

 

Page 71: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

The  Knowledge  Graph  

•  Category  nodes:  NCategory⊂N  •  Unique  iden6fier  •  Seman6c  rela6onships  with  other  nodes  

•  Topic  nodes:  NTopic⊂N    •  Unique  iden6fier  •  Belongs  to  one  or  more  categories  •  Associated  with  text-­‐based  descrip6on  

 

Page 72: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

User  Interest  Model  

Page 73: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

User  Interest  Model  •  Edi6ng  a  descrip6on  signals  interest  in  associated  topic              

Page 74: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

User  Interest  Model  •  Edi6ng  a  descrip6on  signals  interest  in  associated  topic              

Page 75: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

User  Interest  Model  •  Edi6ng  a  descrip6on  signals  interest  in  associated  topic              

Page 76: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

User  Interest  Model  •  Edi6ng  a  descrip6on  signals  interest  in  associated  topic  •  Topic  nodes:  all  topics  user  edited  descrip6on  of          

Page 77: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

User  Interest  Model  •  Edi6ng  a  descrip6on  signals  interest  in  associated  topic  •  Topic  nodes:  all  topics  user  edited  descrip6on  of          

Page 78: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

User  Interest  Model  •  Edi6ng  a  descrip6on  signals  interest  in  associated  topic  •  Topic  nodes:  all  topics  user  edited  descrip6on  of          

Page 79: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

User  Interest  Model  •  Edi6ng  a  descrip6on  signals  interest  in  associated  topic  •  Topic  nodes:  all  topics  user  edited  descrip6on  of          

Page 80: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

User  Interest  Model  •  Edi6ng  a  descrip6on  signals  interest  in  associated  topic  •  Topic  nodes:  all  topics  user  edited  descrip6on  of  •  Category  nodes:  categories  reachable  in  knowledge  graph  from  those  topics      

Page 81: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

User  Interest  Model  •  Edi6ng  a  descrip6on  signals  interest  in  associated  topic  •  Topic  nodes:  all  topics  user  edited  descrip6on  of  •  Category  nodes:  categories  reachable  in  knowledge  graph  from  those  topics      

Page 82: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

User  Interest  Model  •  Edi6ng  a  descrip6on  signals  interest  in  associated  topic  •  Topic  nodes:  all  topics  user  edited  descrip6on  of  •  Category  nodes:  categories  reachable  in  knowledge  graph  from  those  topics      

Page 83: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

User  Interest  Model  •  Edi6ng  a  descrip6on  signals  interest  in  associated  topic  •  Topic  nodes:  all  topics  user  edited  descrip6on  of  •  Category  nodes:  categories  reachable  in  knowledge  graph  from  those  topics  •  Edge  weight  =  inverse  of  shortest  path  length  

Page 84: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

User  Interest  Model  •  Edi6ng  a  descrip6on  signals  interest  in  associated  topic  •  Topic  nodes:  all  topics  user  edited  descrip6on  of  •  Category  nodes:  categories  reachable  in  knowledge  graph  from  those  topics  •  Edge  weight  =  inverse  of  shortest  path  length  

Page 85: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

User  Interest  Model  •  Edi6ng  a  descrip6on  signals  interest  in  associated  topic  •  Topic  nodes:  all  topics  user  edited  descrip6on  of  •  Category  nodes:  categories  reachable  in  knowledge  graph  from  those  topics  •  Edge  weight  =  inverse  of  shortest  path  length  

! c1 c2 c3 c4

t1 !!! 1!

!!! 0!

t2 !!! 1!

!!! 1!

t3 0! 0! !!! 1!

Page 86: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

User  Interest  Model  •  Edi6ng  a  descrip6on  signals  interest  in  associated  topic  •  Topic  nodes:  all  topics  user  edited  descrip6on  of  •  Category  nodes:  categories  reachable  in  knowledge  graph  from  those  topics  •  Edge  weight  =  inverse  of  shortest  path  length  

! c1 c2 c3 c4

t1 !!! 1!

!!! 0!

t2 !!! 1!

!!! 1!

t3 0! 0! !!! 1!

•  Same  representa6on  for  candidates  

Page 87: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Instantiating  the  Model •  Wikipedia  •  DBPedia  •  Freebase  

Page 88: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Instantiating  the  Model •  Wikipedia  •  DBPedia  •  Freebase  

Page 89: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Instantiating  on  Wikipedia •  Ar6cles,  categories  effec6vely  represent  topics  (Syed,  2008)  

Page 90: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Instantiating  on  Wikipedia •  Ar6cles,  categories  effec6vely  represent  topics  (Syed,  2008)  •  Good  coverage  of  even  rare  en6ty  concepts  (Zesch,  2007)  

Page 91: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Instantiating  on  Wikipedia •  Ar6cles,  categories  effec6vely  represent  topics  (Syed,  2008)  •  Good  coverage  of  even  rare  en6ty  concepts  (Zesch,  2007)  •  Compa6ble  with  NER  toolkits  

•  DBPedia  Spotlight,  Wikipedia  Miner  

Page 92: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Instantiating  on  Wikipedia •  Ar6cles,  categories  effec6vely  represent  topics  (Syed,  2008)  •  Good  coverage  of  even  rare  en6ty  concepts  (Zesch,  2007)  •  Compa6ble  with  NER  toolkits  

•  DBPedia  Spotlight,  Wikipedia  Miner  

•  Ar6cle  edi6ng  behavior  effec6ve  for  modeling  interests  (Cosley,  2007;  Lieberman  &  Lin,  2009;  WaDenberg  et  al.,  2007)  

Page 93: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Article  editing  signals  topic  interest  

Editing Behavior Intuition Number of times user edits article

Repeatedly editing an article implies greater commitment and interest

Article’s overall edit activity and total number of editors

Generally popular and actively edited articles are less discriminative of individ-ual interest and personal relevance

Time period user edits article

Long-term interests are stronger than fleeting, short-term interests

Type of edit accord-ing to revision tag

Trivial edits such as vandalism reversion or typo correction less indicative of inter-est than thoughtful, effortful edits

Complexity, com-pleteness, informa-tiveness of edit ac-cording to metrics of Information Quality

Type, substantiveness, and overall quality of care user gives to an edit indicates con-cern and interest in topic

Edi6ng  behaviors  indica6ve  of  user  interest:  

Page 94: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Article  editing  signals  topic  interest  

Editing Behavior Intuition Number of times user edits article

Repeatedly editing an article implies greater commitment and interest

Article’s overall edit activity and total number of editors

Generally popular and actively edited articles are less discriminative of individ-ual interest and personal relevance

Time period user edits article

Long-term interests are stronger than fleeting, short-term interests

Type of edit accord-ing to revision tag

Trivial edits such as vandalism reversion or typo correction less indicative of inter-est than thoughtful, effortful edits

Complexity, com-pleteness, informa-tiveness of edit ac-cording to metrics of Information Quality

Type, substantiveness, and overall quality of care user gives to an edit indicates con-cern and interest in topic

Edi6ng  behaviors  indica6ve  of  user  interest:  

Page 95: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Less  Meaningful  Edits  

Ignore Irrelevant Edits Clean Article Text Articles with less than 100 non-stopwords

Stem, tokenize, lowercase; re-move stopwords, punctuation, non-printable characters.

Trivial edits, i.e., typo correc-tion, vandalism reversion.

Parse Wiki Markup to remove article maintenance information

List pages merely containing widely diverse sets of topics that are all not necessarily indicative of the piece person-ally relevant to the user

Page 96: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Implementation:  The  RESLVE  System  RESLVE  (Resolving  En6ty  Sense  by  LeVeraging  Edits)  addresses  NED  by:    

pre-processor

Wikipedia Miner

user utterances unstructured short texts

DBPedia Spotlight

top ranked personally-

relevant candidates

entity

mmm

entity

username

user contributed structured documents

user interest model

BRIDGING USER

IDENTITY

MODELING USER

INTEREST

I II

IIIRANKING

CANDIDATES BY PERSONAL RELEVANCE

mmm

m mm m

mmm

entity

entity

detected entities & candidate meanings ("m")

Page 97: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Implementation:  The  RESLVE  System  RESLVE  (Resolving  En6ty  Sense  by  LeVeraging  Edits)  addresses  NED  by:  I.  Connec6ng  social  Web  +  Wikipedia  editor  iden6ty  

pre-processor

Wikipedia Miner

user utterances unstructured short texts

DBPedia Spotlight

top ranked personally-

relevant candidates

entity

mmm

entity

username

user contributed structured documents

user interest model

BRIDGING USER

IDENTITY

MODELING USER

INTEREST

I II

IIIRANKING

CANDIDATES BY PERSONAL RELEVANCE

mmm

m mm m

mmm

entity

entity

detected entities & candidate meanings ("m")

Page 98: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Implementation:  The  RESLVE  System  RESLVE  (Resolving  En6ty  Sense  by  LeVeraging  Edits)  addresses  NED  by:  I.  Connec6ng  social  Web  +  Wikipedia  editor  iden6ty    II.  Modeling  topics  of  interests  using  ar6cle  edits  

pre-processor

Wikipedia Miner

user utterances unstructured short texts

DBPedia Spotlight

top ranked personally-

relevant candidates

entity

mmm

entity

username

user contributed structured documents

user interest model

BRIDGING USER

IDENTITY

MODELING USER

INTEREST

I II

IIIRANKING

CANDIDATES BY PERSONAL RELEVANCE

mmm

m mm m

mmm

entity

entity

detected entities & candidate meanings ("m")

Page 99: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Implementation:  The  RESLVE  System  RESLVE  (Resolving  En6ty  Sense  by  LeVeraging  Edits)  addresses  NED  by:  I.  Connec6ng  social  Web  +  Wikipedia  editor  iden6ty    II.  Modeling  topics  of  interests  using  ar6cle  edits  III.  Ranking  en6ty  candidates  by  personal  relevance    

pre-processor

Wikipedia Miner

user utterances unstructured short texts

DBPedia Spotlight

top ranked personally-

relevant candidates

entity

mmm

entity

username

user contributed structured documents

user interest model

BRIDGING USER

IDENTITY

MODELING USER

INTEREST

I II

IIIRANKING

CANDIDATES BY PERSONAL RELEVANCE

mmm

m mm m

mmm

entity

entity

detected entities & candidate meanings ("m")

Page 100: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Implementation:  The  RESLVE  System  RESLVE  (Resolving  En6ty  Sense  by  LeVeraging  Edits)  addresses  NED  by:  I.  Connec6ng  social  Web  +  Wikipedia  editor  iden6ty    II.  Modeling  topics  of  interests  using  ar6cle  edits  III.  Ranking  en6ty  candidates  by  personal  relevance    

pre-processor

Wikipedia Miner

user utterances unstructured short texts

DBPedia Spotlight

top ranked personally-

relevant candidates

entity

mmm

entity

username

user contributed structured documents

user interest model

BRIDGING USER

IDENTITY

MODELING USER

INTEREST

I II

IIIRANKING

CANDIDATES BY PERSONAL RELEVANCE

mmm

m mm m

mmm

entity

entity

detected entities & candidate meanings ("m")

Page 101: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Phase  1:  Bridging  Web  Identities  •  Connect  iden6ty  of  social  media  user  with  Wikipedia  editor  

pre-processor

Wikipedia Miner

user utterances unstructured short texts

DBPedia Spotlight

top ranked personally-

relevant candidates

entity

mmm

entity

username

user contributed structured documents

user interest model

BRIDGING USER

IDENTITY

MODELING USER

INTEREST

I II

IIIRANKING

CANDIDATES BY PERSONAL RELEVANCE

mmm

m mm m

mmm

entity

entity

detected entities & candidate meanings ("m")

Page 102: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Phase  1:  Bridging  Web  Identities  •  Connect  iden6ty  of  social  media  user  with  Wikipedia  editor  

•  Simple  string  matching  •  Iofciu,  2011;  Perito,  2011  

pre-processor

Wikipedia Miner

user utterances unstructured short texts

DBPedia Spotlight

top ranked personally-

relevant candidates

entity

mmm

entity

username

user contributed structured documents

user interest model

BRIDGING USER

IDENTITY

MODELING USER

INTEREST

I II

IIIRANKING

CANDIDATES BY PERSONAL RELEVANCE

mmm

m mm m

mmm

entity

entity

detected entities & candidate meanings ("m")

Page 103: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

pre-processor

Wikipedia Miner

user utterances unstructured short texts

DBPedia Spotlight

top ranked personally-

relevant candidates

entity

mmm

entity

username

user contributed structured documents

user interest model

BRIDGING USER

IDENTITY

MODELING USER

INTEREST

I II

IIIRANKING

CANDIDATES BY PERSONAL RELEVANCE

mmm

m mm m

mmm

entity

entity

detected entities & candidate meanings ("m")

Phase  2:  Representing  Users  and  Entities  • Models  user’s  topics  of  interest  using  bridged  Wiki  account’s  edi6ng-­‐history  •  Compares  similarity  of  those  topics  to  topic  associated  with  candidate  sense  

Page 104: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

• Models  user’s  topics  of  interest  using  bridged  Wiki  account’s  edi6ng-­‐history  •  Compares  similarity  of  those  topics  to  topic  associated  with  candidate  sense  •  Content-­‐based  &  knowledge-­‐graph  based  similarity  

pre-processor

Wikipedia Miner

user utterances unstructured short texts

DBPedia Spotlight

top ranked personally-

relevant candidates

entity

mmm

entity

username

user contributed structured documents

user interest model

BRIDGING USER

IDENTITY

MODELING USER

INTEREST

I II

IIIRANKING

CANDIDATES BY PERSONAL RELEVANCE

mmm

m mm m

mmm

entity

entity

detected entities & candidate meanings ("m")

Phase  2:  Representing  Users  and  Entities  

Page 105: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

• Models  user’s  topics  of  interest  using  bridged  Wiki  account’s  edi6ng-­‐history  •  Compares  similarity  of  those  topics  to  topic  associated  with  candidate  sense  •  Content-­‐based  &  knowledge-­‐graph  based  similarity  • Weighted  vectors  used  to  represent  user  and  candidate  sense  

pre-processor

Wikipedia Miner

user utterances unstructured short texts

DBPedia Spotlight

top ranked personally-

relevant candidates

entity

mmm

entity

username

user contributed structured documents

user interest model

BRIDGING USER

IDENTITY

MODELING USER

INTEREST

I II

IIIRANKING

CANDIDATES BY PERSONAL RELEVANCE

mmm

m mm m

mmm

entity

entity

detected entities & candidate meanings ("m")

Phase  2:  Representing  Users  and  Entities  

Page 106: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Content-­‐based  similarity  •  Bag-­‐Of-­‐Words  

•  Titles  of  ar6cles  user  edited  •  Candidate’s  ar6cle  6tle  •  Words  from  those  ar6cles’  pages  &  category  6tles  

•  TF-­‐IDF  weighted    

Page 107: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Content-­‐based  similarity  •  Bag-­‐Of-­‐Words  

•  Titles  of  ar6cles  user  edited  •  Candidate’s  ar6cle  6tle  •  Words  from  those  ar6cles’  pages  &  category  6tles  

•  TF-­‐IDF  weighted    

• User,  u:  Vcontent,  u  •  Candidate  meaning,  m:  Vcontent,  m    

simcontent(u,  m)  =  cossim(Vcontent,  u  ,  Vcontent,  m)    

Page 108: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Knowledge-­‐context  based  similarity  •  Vectors  of  ar6cles’  category  IDs  •  Weight  is  distance  between  the  ar6cle  (topic)  and  category  in  knowledge  graph  

•  E.g.,  “American  Television  Series”  >  “Broadcas6ng”    

Page 109: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Knowledge-­‐context  based  similarity  •  Vectors  of  ar6cles’  category  IDs  •  Weight  is  distance  between  the  ar6cle  (topic)  and  category  in  knowledge  graph  

•  E.g.,  “American  Television  Series”  >  “Broadcas6ng”    

• User,  u  :  Vcategory,  u  •  Candidate  meaning,  m:  Vcategory,  m  

 simcategory(u,  m)  =  cossim(Vcategory,  u  ,  Vcategory,  m)  

 

Page 110: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Phase  3:  Ranking  by  Personal  Relevance  Output  highest  scoring  candidate  as  intended  meaning  by  measuring:  

sim(u,m)=α*simcontent(u,m)+(1-­‐α)*simcategory(u,m)      

pre-processor

Wikipedia Miner

user utterances unstructured short texts

DBPedia Spotlight

top ranked personally-

relevant candidates

entity

mmm

entity

username

user contributed structured documents

user interest model

BRIDGING USER

IDENTITY

MODELING USER

INTEREST

I II

IIIRANKING

CANDIDATES BY PERSONAL RELEVANCE

mmm

m mm m

mmm

entity

entity

detected entities & candidate meanings ("m")

Page 111: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Pre-­‐processing  &  prepara6on  modules  

pre-processor

Wikipedia Miner

user utterances unstructured short texts

DBPedia Spotlight

top ranked personally-

relevant candidates

entity

mmm

entity

username

user contributed structured documents

user interest model

BRIDGING USER

IDENTITY

MODELING USER

INTEREST

I II

IIIRANKING

CANDIDATES BY PERSONAL RELEVANCE

mmm

m mm m

mmm

entity

entity

detected entities & candidate meanings ("m")

Page 112: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Pre-­‐processing  &  prepara6on  modules  

pre-processor

Wikipedia Miner

user utterances unstructured short texts

DBPedia Spotlight

top ranked personally-

relevant candidates

entity

mmm

entity

username

user contributed structured documents

user interest model

BRIDGING USER

IDENTITY

MODELING USER

INTEREST

I II

IIIRANKING

CANDIDATES BY PERSONAL RELEVANCE

mmm

m mm m

mmm

entity

entity

detected entities & candidate meanings ("m")

Page 113: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Pre-­‐processing  &  prepara6on  modules  

pre-processor

Wikipedia Miner

user utterances unstructured short texts

DBPedia Spotlight

top ranked personally-

relevant candidates

entity

mmm

entity

username

user contributed structured documents

user interest model

BRIDGING USER

IDENTITY

MODELING USER

INTEREST

I II

IIIRANKING

CANDIDATES BY PERSONAL RELEVANCE

mmm

m mm m

mmm

entity

entity

detected entities & candidate meanings ("m")

Page 114: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Pre-­‐processing  &  prepara6on  modules  

pre-processor

Wikipedia Miner

user utterances unstructured short texts

DBPedia Spotlight

top ranked personally-

relevant candidates

entity

mmm

entity

username

user contributed structured documents

user interest model

BRIDGING USER

IDENTITY

MODELING USER

INTEREST

I II

IIIRANKING

CANDIDATES BY PERSONAL RELEVANCE

mmm

m mm m

mmm

entity

entity

detected entities & candidate meanings ("m")

Page 115: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Background  •  Task  Defini6ons  •  Challenges  &  Examples  •  ADempted  Solu6ons  

Approach  •  Mo6va6ons  •  Modeling  a  Knowledge  Context  •  Implementa6on:  The  RESLVE  System  

Evalua2on  •  Experiments  •  Results  •  Future  Work  

Agenda  

Page 116: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Experiment  Data  Sample  •  TwiDer:  tweets  •  YouTube:  video  6tles,  descrip6ons  •  Flickr:  photo  tags,  6tles,  descrip6ons    

Page 117: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Experiment  Data  Sample  •  TwiDer:  tweets  •  YouTube:  video  6tles,  descrip6ons  •  Flickr:  photo  tags,  6tles,  descrip6ons    •  String-­‐matched  usernames  of  posters  to  Wikipedia  accounts  •  Mechanical  Turk  used  to  confirm  accounts  were  same  person  

 

Page 118: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Experiment  Data  Sample  •  TwiDer:  tweets  •  YouTube:  video  6tles,  descrip6ons  •  Flickr:  photo  tags,  6tles,  descrip6ons    •  String-­‐matched  usernames  of  posters  to  Wikipedia  accounts  •  Mechanical  Turk  used  to  confirm  accounts  were  same  person  

For  confirmed  matches:  •  Collected  100  most  recent  uDerances    •  ID,  6tle,  page  content,  categories  of  edited  ar6cles  

Page 119: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Experiment  Labeling  correct  en6ty  meaning  •  1545  valid  ambiguous  en66es  •  Mechanical  Turk  Categoriza6on  Masters    •  Averaged  observed  agreement  across  all  coders  and  items  =  0.866  •  Average  Fleiss  Kappa  =  0.803  •  918  unanimously  labeled  ambiguous  en66es  

Page 120: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Dataset  Characteristics  

Page 121: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Text  Length  Longest  uDerances  s6ll  shorter  than  even  shortest  texts  from  NER  task  corpora  like  Reuters-­‐21578,  Brown-­‐Corpus  

0"

5"

10"

15"

20"

25"

30"

10"

40"

70"

100"

130"

160"

190"

300"

450"

600"

800"

1100"

1400"

2500"

4000"

5500"

7000"

8500"

10000"

11500"

13000"

14500"

Twi/er" YouTube" Flickr"Reuters" Brown"

Page 122: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

High  Ambiguity  • NER  services  have  low  confidence  

 

0"

0.1"

0.2"

0.3"

0.4"

0.5"

0.6"

0.7"

0.8"

0.9"

1"

Wikipedia"Miner" DBPedia"Spotlight"

Page 123: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

High  Ambiguity  • NER  services  have  low  confidence  

• Many  poten6al  candidates  (2  to  163,  avg.  5-­‐6,  median  4)  

0"

0.1"

0.2"

0.3"

0.4"

0.5"

0.6"

0.7"

0.8"

0.9"

1"

Wikipedia"Miner" DBPedia"Spotlight"

Page 124: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

High  Ambiguity  •  91%  of  uDerances  contain  at  least  1  ambiguous  en6ty  •  2/3  of  en66es  detected  are  ambiguous  •  Almost  no  en66es  without  at  least  2  senses  to  disambiguate  

Page 125: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Performance  Metric  •  Precision  at  rank  1  (P@1)  

Page 126: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Performance  Metric  •  Precision  at  rank  1  (P@1)  

Methods  of  comparison  • Human  annotated  gold  standard  • RC:  Randomly  sorted  candidates  • PF:  Prior  frequency    • RU:  RESLVE  given  a  random  Wikipedia  user's  interest  model    • DS:  DBPedia  Spotlight  • WM:  Wikipedia  Miner    

Page 127: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Results  

Flickr   YouTube  

RESLVE   0.63   0.76   0.84  

RC   0.21   0.32   0.31  

PF   0.74   0.69   0.66  

RU   0.51   0.71   0.78  

WM   0.78   0.58   0.80  

DS   0.53   0.67   0.63  

Twitter

Page 128: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Discussion  •  Best  performance  on  YouTube  texts          (longest)  due  to  content-­‐based  sim  

Page 129: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Discussion  •  Best  performance  on  YouTube  texts          (longest)  due  to  content-­‐based  sim  

• Outperforms  on  more  personal  text  (e.g.,  tweets)  •  Random  user  model  less  effec6ve    

Page 130: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Discussion  •  Best  performance  on  YouTube  texts          (longest)  due  to  content-­‐based  sim  

• Outperforms  on  more  personal  text  (e.g.,  tweets)  •  Random  user  model  less  effec6ve  

•  Less  effec6ve  on  impersonal  text  (e.g.,  photo  geo-­‐tags)  •   High  prior  frequency  so  standard  methods  suffice  •  Personally-­‐unfamiliar  topics  so  not  likely  to  make  Wiki  edits  about  them  •  Stable  interests  assump6on  breaks  down  here  

Page 131: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Error  Cases  •  Automated  messages  

•  “I  uploaded  a  video  on  @youtube”  à  1945  European  Films  

Page 132: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Error  Cases  •  Automated  messages  

•  “I  uploaded  a  video  on  @youtube”  à  1945  European  Films  

•  En66es  not  in  knowledge  base  •  “Peter  on  the  dock”  

Page 133: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Error  Cases  •  Automated  messages  

•  “I  uploaded  a  video  on  @youtube”  à  1945  European  Films  

•  En66es  not  in  knowledge  base  •  “Peter  on  the  dock”  

•  Less  prolific  contributors  

Page 134: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Future  Work  

Page 135: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Future  Work  •  Computability  

•  Wikipedia  has  5M  ar6cles,  700K  categories  à  Vector  pruning  

   

Page 136: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Future  Work  •  Computability  

•  Wikipedia  has  5M  ar6cles,  700K  categories  à  Vector  pruning  

• User  iden6ty  &  modeling  interests  

     

Page 137: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Bridging  User  Accounts  

#  Usernames   Exist  on  Wikipedia  TwiDer   479   46.1%  

YouTube   454   19.6%  

Flickr   226   21.7%  

Page 138: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Bridging  User  Accounts  

#  Usernames   Exist  on  Wikipedia   Matches  are  same  person  TwiDer   479   46.1%   47%  

YouTube   454   19.6%   48%  

Flickr   226   21.7%   71%  

Page 139: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Bridging  User  Accounts

Page 140: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Bridging  User  Accounts a.  True  nega6ve  (no  iden6ty  in  knowledge  base)  

Page 141: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Bridging  User  Accounts a.  True  nega6ve  (no  iden6ty  in  knowledge  base)      b.  False  nega6ve  (same  person,  different  usernames)  

Page 142: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Bridging  User  Accounts a.  True  nega6ve  (no  iden6ty  in  knowledge  base)      b.  False  nega6ve  (same  person,  different  usernames)      c.  False  posi6ves  (string  match,  but  different  people)  

Page 143: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Bridging  User  Accounts a.  True  nega6ve  (no  iden6ty  in  knowledge  base)      b.  False  nega6ve  (same  person,  different  usernames)      c.  False  posi6ves  (string  match,  but  different  people)  

Collabora6ve  filtering  techniques  to  approximate  user's  own  interests  with  contribu6ons  of  social  connec6ons  

ü     

Page 144: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Bridging  User  Accounts a.  True  nega6ve  (no  iden6ty  in  knowledge  base)      b.  False  nega6ve  (same  person,  different  usernames)      c.  False  posi6ves  (string  match,  but  different  people)  

Collabora6ve  filtering  techniques  to  approximate  user's  own  interests  with  contribu6ons  of  social  connec6ons  

ü     

Consider  more  profile  aDributes  than  username  ü     

Page 145: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Bridging  User  Accounts a.  True  nega6ve  (no  iden6ty  in  knowledge  base)      b.  False  nega6ve  (same  person,  different  usernames)      c.  False  posi6ves  (string  match,  but  different  people)  

Collabora6ve  filtering  techniques  to  approximate  user's  own  interests  with  contribu6ons  of  social  connec6ons  

ü     

Consider  more  profile  aDributes  than  username  ü     

Page 146: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Bridging  User  Accounts a.  True  nega6ve  (no  iden6ty  in  knowledge  base)      b.  False  nega6ve  (same  person,  different  usernames)      c.  False  posi6ves  (string  match,  but  different  people)  

• Use  other  knowledge  base  besides  Wikipedia  

Collabora6ve  filtering  techniques  to  approximate  user's  own  interests  with  contribu6ons  of  social  connec6ons  

ü     

Consider  more  profile  aDributes  than  username  ü     

Page 147: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Bridging  User  Accounts a.  True  nega6ve  (no  iden6ty  in  knowledge  base)      b.  False  nega6ve  (same  person,  different  usernames)      c.  False  posi6ves  (string  match,  but  different  people)  

• Use  other  knowledge  base  besides  Wikipedia  • Model  user  interest  from  addi6onal  kinds  of  par6cipa6on      (e.g.,  page  visits,  bookmarking  favori6ng)  

Collabora6ve  filtering  techniques  to  approximate  user's  own  interests  with  contribu6ons  of  social  connec6ons  

ü     

Consider  more  profile  aDributes  than  username  ü     

Page 148: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Bridging  User  Accounts a.  True  nega6ve  (no  iden6ty  in  knowledge  base)      b.  False  nega6ve  (same  person,  different  usernames)      c.  False  posi6ves  (string  match,  but  different  people)  

• Use  other  knowledge  base  besides  Wikipedia  • Model  user  interest  from  addi6onal  kinds  of  par6cipa6on      (e.g.,  page  visits,  bookmarking  favori6ng)  

•  Interest  driy  &  6me-­‐frame  of  pos6ngs  

Collabora6ve  filtering  techniques  to  approximate  user's  own  interests  with  contribu6ons  of  social  connec6ons  

ü     

Consider  more  profile  aDributes  than  username  ü     

Page 149: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Summary  &  Conclusion  •  Social  Web  texts:  short  &  highly  personal  

•  User  posts  about  same  topics  across  communi6es  (but  not  always)  

• Models  user  interest  as  personal  context  with  respect  to  a  knowledge  base’s  categorical  organiza6on  scheme  

•  Ranking  technique  compares  en6ty’s  poten6al  meanings  to  user’s  interests  to  determine  intended  meaning  •  Language  and  context  independent  

•  Promising  performance  gains  

•  Going  forward:  such  a  strategy  becomes  increasingly  necessary,  feasible,  and  effec6ve  

Page 150: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text

Thank    You!    

Acknowledgements  •  Claire  Cardie,  Dan  Cosley,  Lillian  Lee,  Sean  Allen,  Wenceslaus  Lee    • Na6onal  Science  Founda6on  Graduate  Research  Fellowship  under  Grant  No.  DGE  1144153  

• Marie  Curie  Interna6onal  Outgoing  Fellowship  within  the  7th  European  Community  Framework  Programme  (PIOF-­‐GA-­‐2009-­‐252206).  

• Ques6ons?  

Elizabeth  L.  Murnane  [email protected]  

Bernhard  Haslhofer  bernhard.haslhofer@  

univie.ac.at  

Carl  Lagoze  [email protected]