Entity Search: Building Bridges between Two Worlds Krisztian Balog, Edgar Meij, and Maarten de Rijke ISLA, University of Amsterdam http://ilps.science.uva.nl Entity search • Information organized around entities • Instead of finding documents about the entity, find the entity itself • Problem looked at by both the Information Retrieval (IR) and the Semantic Web (SW) communities Entity search tasks • Entity ranking • List completion • Related entity finding Motivation • To which extent are IR and SW methods capable of answering information needs related to entity finding? Where are we now? • Information Retrieval • Identifying and ranking entities in large volumes of data • Mostly based on co-occurrences between terms and entities • Generated models are not always meaningful for human consumption Where are we now? • Semantic Web • Structured data, naturally organized around entities • Entity retrieval is as simple as running SPARQL queries? • Free-text querying is more appealing to (naive) end users
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Entity Search: Building Bridges between Two Worlds
Krisztian Balog, Edgar Meij, and Maarten de RijkeISLA, University of Amsterdamhttp://ilps.science.uva.nl
Entity search
• Information organized around entities
• Instead of finding documents about the entity, find the entity itself
• Problem looked at by both the Information Retrieval (IR) and the Semantic Web (SW) communities
Entity search tasks
• Entity ranking
• List completion
• Related entity finding
Motivation
• To which extent are IR and SW methods capable of answering information needs related to entity finding?
Where are we now?
• Information Retrieval
• Identifying and ranking entities in large volumes of data
• Mostly based on co-occurrences between terms and entities
• Generated models are not always meaningful for human consumption
Where are we now?
• Semantic Web
• Structured data, naturally organized around entities
• Entity retrieval is as simple as running SPARQL queries?
• Free-text querying is more appealing to (naive) end users
Related entity finding
• Given
• Input entity E (name plus homepage)
• Type T of the target entity (person, organization, or product)
• Narrative R (describes nature of relation)
• Return homepages of related entities
Example topics(E) Source entity name
(E) Source entity URL
(T) Target type
(R) Narrative
Medimmune, Inc.
clueweb09-en0008-26-39300
Product
Products of Medimmune, Inc.
(E) Source entity name
(E) Source entity URL
(T) Target type
(R) Narrative
Boeing 747
clueweb09-en0005-75-02292
Organisation
Airlines that currently use Boeing 747 planes.
Aim
• Compare IR and SW approaches on the related entity finding task
• Focusing on finding all relevant entities, but not on actually ranking them
Related entity findingOur variation
• TREC Entity 2009 topics (20)
• Map source entity to a Wikipedia page (17)
• Map target category to the most specific class within the DBPedia ontology
• Ground truth: Wikipedia pages from relevance assessments
Example topic(E) Source entity name
(E) Source entity URL
(T) Target type
(R) Narrative
Boeing 747
clueweb09-en0005-75-02292
Organisation
Airlines that currently use Boeing 747 planes.
Source entity
DBPedia-owl
Relation
Boeing_747
Organisation/Company/Airline
Airlines that currently use Boeing 747 planes.
IR approaches
• Aggregation of approaches employed at the TREC Entity track
• Various ways of recognizing and ranking entities
• Common to all is a mechanism for capturing the co-occurrence between source and target entities
A typical IR approachQuery (input entity, relation)
Document/snippet retrieval
Answer candidate extraction
Answer candidate (type) filtering
Answer candidate ranking
Output (related entities)
Two SW approaches
• SPARQL query
• Exhaustive graph search
• Find all paths between E and T in a knowledge base