Motivation - Krisztian Balog

Entity Search: Building Bridges between Two Worlds

Krisztian Balog, Edgar Meij, and Maarten de RijkeISLA, University of Amsterdamhttp://ilps.science.uva.nl

Entity search

• Information organized around entities

• Instead of finding documents about the entity, find the entity itself

• Problem looked at by both the Information Retrieval (IR) and the Semantic Web (SW) communities

Entity search tasks

• Entity ranking

• List completion

• Related entity finding

Motivation

• To which extent are IR and SW methods capable of answering information needs related to entity finding?

Where are we now?

• Information Retrieval

• Identifying and ranking entities in large volumes of data

• Mostly based on co-occurrences between terms and entities

• Generated models are not always meaningful for human consumption

Where are we now?

• Semantic Web

• Structured data, naturally organized around entities

• Entity retrieval is as simple as running SPARQL queries?

• Free-text querying is more appealing to (naive) end users

Related entity finding

• Given

• Input entity E (name plus homepage)

• Type T of the target entity (person, organization, or product)

• Narrative R (describes nature of relation)

• Return homepages of related entities

Example topics(E) Source entity name

(E) Source entity URL

(T) Target type

(R) Narrative

Medimmune, Inc.

clueweb09-en0008-26-39300

Product

Products of Medimmune, Inc.

(E) Source entity name


(T) Target type

(R) Narrative

Boeing 747

clueweb09-en0005-75-02292

Organisation

Airlines that currently use Boeing 747 planes.

Aim

• Compare IR and SW approaches on the related entity finding task

• Focusing on finding all relevant entities, but not on actually ranking them

Related entity findingOur variation

• TREC Entity 2009 topics (20)

• Map source entity to a Wikipedia page (17)

• Map target category to the most specific class within the DBPedia ontology

• Ground truth: Wikipedia pages from relevance assessments

Example topic(E) Source entity name


(T) Target type

(R) Narrative

Boeing 747

clueweb09-en0005-75-02292

Organisation


Source entity

DBPedia-owl

Relation

Boeing_747

Organisation/Company/Airline


IR approaches

• Aggregation of approaches employed at the TREC Entity track

• Various ways of recognizing and ranking entities

• Common to all is a mechanism for capturing the co-occurrence between source and target entities

A typical IR approachQuery (input entity, relation)

Document/snippet retrieval

Answer candidate extraction

Answer candidate (type) filtering

Answer candidate ranking

Output (related entities)

Two SW approaches

• SPARQL query

• Exhaustive graph search

• Find all paths between E and T in a knowledge base

• The depth of search is limited

SELECT DISTINCT ?m ?rWHERE { ?m rdf:type dbpedia-owl:Drug . { ?m ?r dbpedia:MedImmune } UNION { dbpedia:MedImmune ?r ?m }}

SPARQL on DBPedia

Query: Products of Medimunne, Inc.

?m ?r

dbpedia:Amifostine dbp-prop:wikilink

dbpedia:Blinatumomab dbp-prop:wikilink

dbpedia:Motavizumab dbp-prop:wikilink

dbpedia:Palivizumab dbp-prop:wikilink

SPARQL on DBPediaQuery: Airlines that Air Canada has code

share flights with.

?m ?r

dbpedia:Air_Canada dbp-prop:wikilink

dbpedia:Austrian_Airlines dbp-prop:wikilink

dbpedia:Japan_Airlines dbp-prop:wikilink

dbpedia:Lufthansa dbp-prop:wikilink

dbpedia:Turkish_Airlines dbp-prop:wikilink

......

dbpedia:Air_Ontario dbp-ontology:Company/parentCompany

dbpedia:Air_Canada_Tango dbp-ontology:Company/parentCompany

dbpedia:Canadian_Airlines dbp-ontology:foundationPerson

SPARQL on DBPediaQuery: Members of the band Jefferson Airplane.

?m ?r

dbpedia:Jim_Morrison dbp-prop:wikilink

dbpedia:Jimi_Hendrix dbp-prop:wikilink

......

dbpedia:Jack_Casady dbp-ontology:associatedMusicalArtist

dbpedia:Paul_Kantner dbp-ontology:associatedMusicalArtist

dbpedia:Joey_Covington dbp-ontology:associatedMusicalArtist

dbpedia:Marty_Balin dbp-ontology:associatedMusicalArtist

......

dbpedia:Grace_Slick dbp-prop:pastMembers

dbpedia:Jorma_Kaukonen dbp-prop:pastMembers

......

Findings

• IR and SW methods find basically the same set of entities

• Most relations returned by SW methods are of type wikilink

Next

• Extend search to Linked Open Data (LOD)

• We use the Linked Data Semantic Repository (LDSR)

SPARQL on LOD

?m ?r

dbpedia:Amifostine dbp-prop:wikilink

dbpedia:Blinatumomab dbp-prop:wikilink

dbpedia:Motavizumab dbp-prop:wikilink

dbpedia:Palivizumab dbp-prop:wikilink

dbpedia:Motavizumab fb:base.bioventurist.product.developed_by

dbpedia:Palivizumab fb:base.bioventurist.science_or_technology_company.products

dbpedia:Motavizumab fb:base.bioventurist.product.developed_by

dbpedia:Palivizumab fb:base.bioventurist.science_or_technology_company.products

Query: Products of Medimunne, Inc.

Graph search on LOD Findings

• More entities as well as more diverse relations

• Having more data does not automatically improve results

• Some of the identified entities are now too general

Summarizing findings

• Information Retrieval

• Excellent ways of finding associations between topics and entities

• Tend to perform better for less popular entities (not represented in LOD)

• Missing: semantics of the found associations

Summarizing findings• Semantic Web

• Has the potential of generating a large number of candidate entities and relations

• Could be as simple as instantiating a SPARQL query

• For many queries LOD is very sparse w.r.t. semantically meaningful links between entities

Zooming out

• Enhance text-based models with semantic information from LOD

• Use IR models to discover and label links between entities in LOD

TREC Entity 2010

• Main task: Related entity finding

• Pilot task: List completion

• Given URIs of related entities, complete the list with additional entities from LOD

Questions?Krisztian Balog

http://staff.science.uva.nl/~kbalog

Motivation - Krisztian Balog

Documents