Top Banner
Link Sets and Why They ARE Important Anja Jentzsch, Freie Universität Berlin 6 June 2012 Realising and Exploiting the EU data cloud European Data Forum, Copenhagen, Denmark
26
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Link Sets And Why They Are Important (EDF2012)

Link Sets and Why They ARE Important

Anja Jentzsch, Freie Universität Berlin

6 June 2012

Realising and Exploiting the EU data cloud

European Data Forum, Copenhagen, Denmark

Page 2: Link Sets And Why They Are Important (EDF2012)

Outline

1. Motivation

2.  Link Creation Process

3.  LATC Platform

Page 3: Link Sets And Why They Are Important (EDF2012)

Links

• 4th Linked Data principle: set RDF links to other data sources on the Web

•  fundamental to the Web of Data

• connect data islands into a global, interconnected data space

• enable discovery of additional data sources

Page 4: Link Sets And Why They Are Important (EDF2012)

Links

• Definition: An external RDF link is an RDF triple in which the subject of the triple is a URI reference in the namespace of one data set, while the predicate and/or object of the triple are URI references pointing into the namespaces of other data sets.

Page 5: Link Sets And Why They Are Important (EDF2012)

Link Types

1.  Relationship Links point at related things in other data sources, for instance, other people, places or genes.

2.  Identity Links point at URI aliases used by other data sources to identify the same real-world object or abstract concept.

3.  Vocabulary Links point from data to the definitions of the vocabulary terms that are used to represent the data, as well as from these definitions to the definitions of related terms in other vocabularies.

Page 6: Link Sets And Why They Are Important (EDF2012)

Motivation

•  Web of Data is a single global data space because data sources are connected by links

•  Over 30 billion triples published as Linked Open Data (09/19/2011)

•  But:

•  Less than 500 million links

•  Most publishers only link to one other dataset

LOD data sets by the number of other data sources that are target of outgoing RDF links.

Page 7: Link Sets And Why They Are Important (EDF2012)

State of the LOD Cloud

http://lod-cloud.net/state

Page 8: Link Sets And Why They Are Important (EDF2012)

Challenges for Link Discovery

•  Large range of domains

•  277 data sources in the LOD cloud from a variety of domains

Link distribution by topical domain

Page 9: Link Sets And Why They Are Important (EDF2012)

Link Discovery Tools

•  Tools enable data publishers to set links

•  Most tools generate links based on user-defined linkage rules

•  A linkage rule specifies the conditions data items must fulfill in order to be interlinked

•  Popular Link Discovery Tools:

•  Silk Link Discovery Framework

•  LIMES

•  Others: http://esw.w3.org/TaskForces/CommunityProjects/LinkingOpenData/EquivalenceMining

Page 10: Link Sets And Why They Are Important (EDF2012)

(Simplified) Linking Workflow

Select Datasets • Select two data sources • Select the entity types

to be interlinked

Write Linkage Rule • Specifies how two

entities are compared • Can be written manually

or learned

Generate Links • Locally or on a Hadoop

Cluster • Write Links to file or a

triple store

Page 11: Link Sets And Why They Are Important (EDF2012)

Silk Workbench

• Web application which guides the user through the process of interlinking different data sources

• Enables the user to manage different sets of data sources and linking tasks

• Offers a graphical editor which enables the user to easily create and edit linkage rules

• Offers tools to evaluate the current linkage rule

•  Includes support for learning linkage rules

Page 12: Link Sets And Why They Are Important (EDF2012)

LATC Platform

Page 13: Link Sets And Why They Are Important (EDF2012)

LATC Workbench

•  Project in Workspace consists of:

•  Data Sources

•  Holds all information that is needed to retrieve entities from it 

•  E.g. a file dump or a SPARQL endpoint

•  Linking Tasks

•  Interlinks a type of entity between two data sources

•  e.g. Interlinking movies in DBpedia and LinkedMDB

Page 14: Link Sets And Why They Are Important (EDF2012)

LATC Linkage Rule Editor

•  Allows to view and edit linkage rules

•  Linkage Rules are shown as a tree

•  Editing using drag & drop

Page 15: Link Sets And Why They Are Important (EDF2012)

Learning Linkage Rules

•  Linkage Rules can be learned interactively

•  Can be used to generate new linkage rules or to improve existing rules

•  Learned Linkage Rule can be viewed and edited by the user

Page 16: Link Sets And Why They Are Important (EDF2012)

Example Linking with LATC Workbench

Page 17: Link Sets And Why They Are Important (EDF2012)
Page 18: Link Sets And Why They Are Important (EDF2012)
Page 19: Link Sets And Why They Are Important (EDF2012)
Page 20: Link Sets And Why They Are Important (EDF2012)
Page 21: Link Sets And Why They Are Important (EDF2012)
Page 22: Link Sets And Why They Are Important (EDF2012)
Page 23: Link Sets And Why They Are Important (EDF2012)
Page 24: Link Sets And Why They Are Important (EDF2012)

LATC Console

Page 25: Link Sets And Why They Are Important (EDF2012)

LATC Quality Assurance Module

Page 26: Link Sets And Why They Are Important (EDF2012)

References

LATC Project: http://latc-project.eu/

LATC Platform: http://latc-project.eu/platform/

Silk Link Discovery Framework: http://www4.wiwiss.fu-berlin.de/silk/