An Introduction to Linked (Open) Data Ali Khalili, PhD Knowledge Representation & Reasoning Group VU University Amsterdam
An Introduction to
Linked (Open) Data
Ali Khalili, PhDKnowledge Representation & Reasoning Group
VU University Amsterdam
Why do we need the Data Web?
Linked Open Data 2Ali Khalili
Linked Open Data
Why do we need the Data Web?
Linked Open Data 2Ali Khalili
Linked Open Data
Problem: Try to search for these things on the current Web:
• Apartments near Dutch-English bilingual childcare in Amsterdam
Why do we need the Data Web?
Linked Open Data 2Ali Khalili
Linked Open Data
Problem: Try to search for these things on the current Web:
• Apartments near Dutch-English bilingual childcare in Amsterdam
Web Server
funda.nl kindergarden.nl
Has data about childcare in Amsterdam.
Knows about real estate offers in the
Netherlands.
DBDB
HTMLHTML
Web Server
Why do we need the Data Web?
Linked Open Data 2Ali Khalili
Linked Open Data
Problem: Try to search for these things on the current Web:
• Apartments near Dutch-English bilingual childcare in Amsterdam
Web Server
funda.nl kindergarden.nl
Has data about childcare in Amsterdam.
Knows about real estate offers in the
Netherlands.
DBDB
HTMLHTML
Web Server
RDF
Why do we need the Data Web? (con.)
Linked Open Data 3Ali Khalili
Linked Open Data
• Researchers working on Semantic Web topics in the Middle East. • Side effects of some drugs with a specific chemical compound
prescribed for certain diseases. • Who are mayors of central European towns elevated more than 1000m? • Which movies are starring both Brad Pitt and Angelina Jolie? • All soccer players, who played as goalkeeper for a club that has a
stadium with more than 40.000 seats and who are born in a country with more than 10 million inhabitants
• …
Why do we need the Data Web? (con.)
Linked Open Data 3Ali Khalili
Linked Open Data
• Researchers working on Semantic Web topics in the Middle East. • Side effects of some drugs with a specific chemical compound
prescribed for certain diseases. • Who are mayors of central European towns elevated more than 1000m? • Which movies are starring both Brad Pitt and Angelina Jolie? • All soccer players, who played as goalkeeper for a club that has a
stadium with more than 40.000 seats and who are born in a country with more than 10 million inhabitants
• …
Information is available on the Web, but opaque to current search.
Evolution of the Web
Linked Open Data 4Ali Khalili
Linked Open Data
https://mcgratha.wordpress.com/
Evolution of the Web
Linked Open Data 4Ali Khalili
Linked Open Data
https://mcgratha.wordpress.com/
1998: Semantic Web Road map
Evolution of the Web
Linked Open Data 4Ali Khalili
Linked Open Data
https://mcgratha.wordpress.com/
1998: Semantic Web Road map
2006: emergence of Data Web
Linked Data
Linked Open Data 5Ali Khalili
Linked Open Data
• A set of best practices for publishing data on the Web.
Linked Data
Linked Open Data 5Ali Khalili
Linked Open Data
• A set of best practices for publishing data on the Web.• Follows 4 simple principles:
1. Use URIs as names (identifiers) for conceptual things.
2. Use HTTP URIs so that users can look up (dereference) those names.
3. When someone looks up a URI, provide useful information, using the open standards such as RDF, SPARQL, etc.
4. Include links to other URIs, so that users can discover more things.
Linked Data Principles
Linked Open Data 6Ali Khalili
Linked Open Data
WWW World
Linked Data Principles
Linked Open Data 6Ali Khalili
Linked Open Data
WWW World
Linked Data Principles
Linked Open Data 6Ali Khalili
Linked Open Data
WWW World
Linked Data Principles
Linked Open Data 6Ali Khalili
Linked Open Data
WWW World
Linked Data Principles
Linked Open Data 6Ali Khalili
Linked Open Data
WWW World
5 Open Data
Linked Open Data 7Ali Khalili
Linked Open Data
make your stuff available on the Web (whatever format) under an open license
make it available as structured data (e.g., Excel instead of image scan of a table)
make it available in a non-proprietary open format (e.g., CSV as well as of Excel)
use Linked Data format (URIs to identify things, RDF to represent data)
link your data to other people’s data to provide context
http://5stardata.info/
5 Open Data
Linked Open Data 8Ali Khalili
Linked Open Data
Linked Open Data Cloud
Linked Open Data 9Ali Khalili
Linked Open Data
http://lod-cloud.net/
Linked Open Data Cloud
Linked Open Data 9Ali Khalili
Linked Open Data
http://lod-cloud.net/
Linked Open Data Cloud
Linked Open Data 9Ali Khalili
Linked Open Data
http://lod-cloud.net/
Linked Open Data Cloud
Linked Open Data 9Ali Khalili
Linked Open Data
http://lod-cloud.net/
Linked Open Data Cloud
Linked Open Data 9Ali Khalili
Linked Open Data
http://lod-cloud.net/
Linked Open Data Cloud
Linked Open Data 9Ali Khalili
Linked Open Data
http://lod-cloud.net/
Linked Open Data Cloud
Linked Open Data 9Ali Khalili
Linked Open Data
http://lod-cloud.net/
Linked Open Data Cloud
Linked Open Data 9Ali Khalili
Linked Open Data
http://lod-cloud.net/
Linked Open Data Cloud
Linked Open Data 10Ali Khalili
Linked Open Data
http://lod-cloud.net/
Statistics
Linked Open Data 11Ali Khalili
Linked Open Data
http://lodlaundromat.org/
http://stats.lod2.eu/
more than 3426 datasets
Linked Open Data 12Ali Khalili
Linked Open Data
Hands-on
Linked Open Data 12Ali Khalili
Linked Open Data
Ranking of universities Amount of funding acquired by universities?
Hands-on
Linked Open Data 12Ali Khalili
Linked Open Data
Ranking of universities Amount of funding acquired by universities?
Hands-on
Make a sorted list of all universities in the Netherlands displaying their recent rankings (e.g. CWTS Leiden Ranking or Times World University Rankings) together with the amount of
EC contributions on H2020 projects?
Linked Open Data 13Ali Khalili
Linked Open Data
Interlinking
Enrichment
Quality Analysis
Evolution
Exploration
Extraction
Storage/Querying
Authoring
Linked (Open) Data Lifecycle
http://stack.linkeddata.org/
Linked Open Data 14Ali Khalili
Linked Open Data Lifecycle
Exploration
Linked Open Data 14Ali Khalili
Linked Open Data Lifecycle
• Search • Browse • Visualize
Exploration
Search for Linked Data
Linked Open Data 15Ali Khalili
Linked Open Data Lifecycle Exploration
Search for Linked Data
Linked Open Data 15Ali Khalili
Linked Open Data Lifecycle Exploration
http://swoogle.umbc.edu/
Search for Linked Data
Linked Open Data 16Ali Khalili
Linked Open Data Lifecycle Exploration
Search in Linked Data
Linked Open Data 17Ali Khalili
Linked Open Data Lifecycle Exploration
http://www.wolframalpha.com/
Question Answering Systems
Search for Linked Data
Linked Open Data 18Ali Khalili
Linked Open Data Lifecycle Exploration
http://lov.okfn.org/
Search for Linked Data
Linked Open Data 19Ali Khalili
Linked Open Data Lifecycle Exploration
http://lotus.lodlaundromat.org
Search for Linked Data
Linked Open Data 20Ali Khalili
Linked Open Data Lifecycle Exploration
http://demo.ld-r.org/lotus
Search for Linked Data
Linked Open Data 21Ali Khalili
Linked Open Data Lifecycle
Data hub http://datahub.io search for data, register published datasets, create and manage groups of datasets…
Exploration
Search for/in Linked Data
Linked Open Data 22Ali Khalili
Linked Open Data Lifecycle Exploration
Search for/in Linked Data
Linked Open Data 22Ali Khalili
Linked Open Data Lifecycle Exploration
Search for/in Linked Data
Linked Open Data 23Ali Khalili
Linked Open Data Lifecycle Exploration
https://www.google.com/cse/
Search for/in Linked Data
Linked Open Data 24Ali Khalili
Linked Open Data Lifecycle Exploration
http://schema.org/
http://bl.ocks.org/danbri/1c121ea8bd2189cf411c
Browsing Linked Data
Linked Open Data 25Ali Khalili
Linked Open Data Lifecycle
Loupe http://loupe.linkeddata.es/ discovering the type of data contained in a dataset, its structure, and the vocabularies used…
Exploration
Browsing Linked Data
Linked Open Data 26Ali Khalili
Linked Open Data Lifecycle
Linked Data Reactor http://ld-r.org a component-based framework to view, browse and edit Linked Data
Exploration
Browsing Linked Data
Linked Open Data 27Ali Khalili
Linked Open Data Lifecycle
Exhibit http://www.simile-widgets.org/exhibit/
Exploration
Visualizing Linked Data
Linked Open Data 28Ali Khalili
Linked Open Data Lifecycle Exploration
Using existing visualisations for structured datae.g. Sgvizler http://dev.data2000.no/sgvizler/
Visualizing Linked Data
Linked Open Data 29Ali Khalili
Linked Open Data Lifecycle Exploration
Using graph-based visualizationse.g. RelFinder http://www.visualdataweb.org/relfinder/relfinder.php
Hands-on
Linked Open Data 30Ali Khalili
Linked Open Data Lifecycle Exploration
1. List of all universities in the Netherlands 2. University Rankings (e.g. CWTS Leiden Ranking or Times
World University Rankings) 3. H2020 EU projects repository
Explore Web of Data for the following data sources:
Extraction
Linked Open Data 31Ali Khalili
Linked Open Data Lifecycle
from Unstructured sources
Linked Open Data 32Ali Khalili
Linked Open Data Lifecycle Extraction
…After leaving Apple, Jobs took a few of its members with him to found NeXT, a computer platform development company based in Redwood City, specializing in state-of-the-art computers for higher-education and business markets. In addition, Jobs helped to initiate the development of the visual effects industry when he funded the spinout of the computer graphics division of George Lucas's company Lucasfilm in 1986. The new company, Pixar, would eventually produce the first fully computer-animated film, Toy Story…
NLP, Text mining, Annotation
from Unstructured sources
Linked Open Data 32Ali Khalili
Linked Open Data Lifecycle Extraction
…After leaving Apple, Jobs took a few of its members with him to found NeXT, a computer platform development company based in Redwood City, specializing in state-of-the-art computers for higher-education and business markets. In addition, Jobs helped to initiate the development of the visual effects industry when he funded the spinout of the computer graphics division of George Lucas's company Lucasfilm in 1986. The new company, Pixar, would eventually produce the first fully computer-animated film, Toy Story…
NLP, Text mining, Annotation
Named Entity Recognition
from Unstructured sources
Linked Open Data 32Ali Khalili
Linked Open Data Lifecycle Extraction
…After leaving Apple, Jobs took a few of its members with him to found NeXT, a computer platform development company based in Redwood City, specializing in state-of-the-art computers for higher-education and business markets. In addition, Jobs helped to initiate the development of the visual effects industry when he funded the spinout of the computer graphics division of George Lucas's company Lucasfilm in 1986. The new company, Pixar, would eventually produce the first fully computer-animated film, Toy Story…
NLP, Text mining, Annotation
Named Entity Recognition
foundedBy
Relation Extraction
Named Entity Recognition
Linked Open Data 33Ali Khalili
Linked Open Data Lifecycle Extraction
http://spotlight.dbpedia.org
http://bioportal.bioontology.org/annotator
Visualizing Linked Data
Linked Open Data 34Ali Khalili
Linked Open Data Lifecycle Exploration
Contextual visualizationse.g. Zemanta https://wordpress.org/plugins/zemanta
Linked Open Data 35Ali Khalili
Linked Open Data Lifecycle Extraction
http://nlp2rdf.org
Interoperability between NLP tools, language resources and annotations
NLP Interchange Format (NIF)
Named Entity Recognition & Disambiguation (NERD) http://nerd.eurecom.fr/
from Semi-structured sources
Linked Open Data 36Ali Khalili
Linked Open Data Lifecycle Extraction
from Semi-structured sources
Linked Open Data 36Ali Khalili
Linked Open Data Lifecycle Extraction
Resource Property Value
Multi-lingual• 38.3m things in 125 languages • 23.8m are localized descriptions of things in the English DBpedia • English DBpedia describes 4.58m things (68,091,260 statements)
Multi-domain1,445,000 persons, 735,000 places (including 478,000 populated places), 411,000 creative works (including 123,000 music albums, 87,000 films and 19,000 video games), 241,000 organizations (including 58,000 companies and 49,000 educational institutions), 251,000 species and 6,000 diseases
Linked Open Data 37Ali Khalili
Linked Open Data Lifecycle Extraction DBpedia
Multi-lingual• 38.3m things in 125 languages • 23.8m are localized descriptions of things in the English DBpedia • English DBpedia describes 4.58m things (68,091,260 statements)
Multi-domain1,445,000 persons, 735,000 places (including 478,000 populated places), 411,000 creative works (including 123,000 music albums, 87,000 films and 19,000 video games), 241,000 organizations (including 58,000 companies and 49,000 educational institutions), 251,000 species and 6,000 diseases
Linked Open Data 37Ali Khalili
Linked Open Data Lifecycle Extraction DBpedia
Linked Open Data 38Ali Khalili
Linked Open Data Lifecycle Extraction
• Ad-hoc• DBpedia extraction framework
• Generic• OpenRefine
from Semi-structured sources
from Structured sources
Linked Open Data 39Ali Khalili
Linked Open Data Lifecycle Extraction
from Structured sources
Linked Open Data 39Ali Khalili
Linked Open Data Lifecycle Extraction
Triplification by Materialization
from Structured sources
Linked Open Data 40Ali Khalili
Linked Open Data Lifecycle Extraction
Triplification by SPARQL-to-SQL-Rewriting
Triplification
Linked Open Data 41Ali Khalili
Linked Open Data Lifecycle Extraction
• Relational Database to RDFR2RML: RDB to RDF Mapping Languagehttp://www.w3.org/TR/r2rml/
• D2R Server: Accessing databases with SPARQL & as Linked Datahttp://d2rq.org/
• Sparqlifydefining RDF views on relational databaseshttp://sparqlify.org/
Hands-on
Linked Open Data 42Ali Khalili
Linked Open Data Lifecycle Extraction
1. University Rankings (e.g. CWTS Leiden Ranking or Times World University Rankings)
2. H2020 EU projects repository
Triplify the following data sources:
Storage & Querying
Linked Open Data 43Ali Khalili
Linked Open Data Lifecycle
Relational Databases vs. Triple Stores
Linked Open Data 44Ali Khalili
Linked Open Data Lifecycle Storage/Querying
• A relational databases’ (e.g. MySQL, PostgreSQL, Oracle) natural representation is a collection interlinked tables.
• A triple stores’ (e.g. OpenSesame, AllegroGraph, Neo4j) natural representation is a multi-relational network, or graph.
Relational Databases vs. Triple Stores
Linked Open Data 45Ali Khalili
Linked Open Data Lifecycle Storage/Querying
• Relational databases tend to not maintain public access points (some might provide Web APIs).
• Triple stores maintain public access points called SPARQL end-points.
• Relational database users tend to not publish their schemas.
• Triple store users tend to reuse and extend public schemas called ontologies.
Existing Triple Stores
Linked Open Data 46Ali Khalili
Linked Open Data Lifecycle Storage/Querying
• Native triple stores4Store, AllegroGraph, BigData, Jena TDB, Sesame, Stardog, OWLIM and uRiKa
• RDBMS-backed triple storesJena SDB, IBM DB2 and OpenLink Virtuoso
• NoSQL triplestoresCumulusRDF
SPARQL – SQL for the Linked Data
Linked Open Data 47Ali Khalili
Linked Open Data Lifecycle Storage/Querying
What can be done with SPARQL that can't be done with SQL?
SPARQL – SQL for the Linked Data
Linked Open Data 47Ali Khalili
Linked Open Data Lifecycle Storage/Querying
What can be done with SPARQL that can't be done with SQL?
• SPARQL queries are considerably better aligned with users’ mental models of a domain.
SPARQL – SQL for the Linked Data
Linked Open Data 47Ali Khalili
Linked Open Data Lifecycle Storage/Querying
What can be done with SPARQL that can't be done with SQL?
• SPARQL queries are considerably better aligned with users’ mental models of a domain.
SPARQL – SQL for the Linked Data
Linked Open Data 48Ali Khalili
Linked Open Data Lifecycle Storage/Querying
• SPARQL allows the conceptual data model to be fully explored through queries.
SPARQL – SQL for the Linked Data
Linked Open Data 48Ali Khalili
Linked Open Data Lifecycle Storage/Querying
• SPARQL allows the conceptual data model to be fully explored through queries.
- example:workPhone rdfs:subPropertyOf example:phone- example:cellPhone rdfs:subPropertyOf example:phone- example:homePhone rdfs:subPropertyOf example:phone
SPARQL – SQL for the Linked Data
Linked Open Data 49Ali Khalili
Linked Open Data Lifecycle Storage/Querying
• Queries that have to traverse a chain of connections are particularly complex in SQL while very simple in SPARQL.
SPARQL – SQL for the Linked Data
Linked Open Data 49Ali Khalili
Linked Open Data Lifecycle Storage/Querying
• Queries that have to traverse a chain of connections are particularly complex in SQL while very simple in SPARQL.
SPARQL – SQL for the Linked Data
Linked Open Data 50Ali Khalili
Linked Open Data Lifecycle Storage/Querying
• In addition to SELECT, INSERT and DELETE, SPARQL supports ASK queries.
• SPARQL includes syntax (i.e. SERVICE) to call two or more data sources within a single query.
• …
SPARQL Query Interface
Linked Open Data 51Ali Khalili
Linked Open Data Lifecycle Storage/Querying
http://yasgui.org/
Hands-on
Linked Open Data 52Ali Khalili
Linked Open Data Lifecycle Storage/Querying
Write a SPARQL query to return a list of all universities in the Netherlands sorted by name.
DBpedia Endpoints: http://dbpedia.org/sparql http://live.dbpedia.org/sparql http://dbpedia-live.openlinksw.com/sparql http://lod.openlinksw.com/sparql
Authoring
Linked Open Data 53Ali Khalili
Linked Open Data Lifecycle
Semantic Wikis
Linked Open Data 54Ali Khalili
Linked Open Data Lifecycle Authoring
• Semantic (Text) Wikis Authoring of semantically annotated text
• Semantic Data Wikis Direct authoring of structured information (i.e. RDF, RDF-Schema, OWL)
http://semantic-mediawiki.org/
http://aksw.org/Projects/OntoWiki
OntoWiki use case: Catalogus Professorum
Linked Open Data 55Ali Khalili
Linked Open Data Lifecycle Authoring Semantic Data Wiki
The Catalogus Professorum Lipsiensis – Semantics-based Collaboration and Exploration for Historians
OntoWiki use case: Catalogus Professorum
Linked Open Data 55Ali Khalili
Linked Open Data Lifecycle Authoring Semantic Data Wiki
The Catalogus Professorum Lipsiensis – Semantics-based Collaboration and Exploration for Historians
Semantic Content Annotation
Linked Open Data 56Ali Khalili
Linked Open Data Lifecycle Authoring
WYSIWY - What You See Is What You GetG
Semantic Content Annotation
Linked Open Data 56Ali Khalili
Linked Open Data Lifecycle Authoring
WYSIWY - What You See Is What You GetG
Semantic Content Annotation
Linked Open Data 56Ali Khalili
Linked Open Data Lifecycle Authoring
WYSIWY - What You See Is What You Mean
http://rdface.aksw.org
M
LD-R Framework
Linked Open Data 57Ali Khalili
Linked Open Data Lifecycle Authoring
http://ld-r.org
Interlinking
Linked Open Data 58Ali Khalili
Linked Open Data Lifecycle
Interlinking
Linked Open Data 59Ali Khalili
Linked Open Data Lifecycle
• The degree to which entities that represent the same concepts are linked to each other.
• “Connecting things that are somehow related” • Methods
• Automatic, Semi-automatic, Manual • Universal, Domain-specific
<http://dbpedia.org/resource/VU_University_Amsterdam>
<https://www.wikidata.org/entity/Q1065414>
SameAs
Interlinking Methods
Linked Open Data 60Ali Khalili
Linked Open Data Lifecycle
• Ontology Matching • establish links between ontologies underlying two
data sources.
• Instance Matching (Link Discovery) • discover links between instances contained in two
data sources.
SILK Framework
Linked Open Data 61Ali Khalili
Linked Open Data Lifecycle
http://silk-framework.com/
Interlinking Semi-automatic
LIMES Framework
Linked Open Data 62Ali Khalili
Linked Open Data Lifecycle
http://aksw.org/Projects/LIMES
Interlinking Semi-automatic
Hands-on
Linked Open Data 63Ali Khalili
Linked Open Data Lifecycle Interlinking
1. Universities in CWTS Leiden Ranking or Times World University Rankings
2. Universities in H2020 EU projects
Reconcile the following resources against DBpedia:
Enrichment
Linked Open Data 64Ali Khalili
Linked Open Data Lifecycle
ORE (Ontology Repair and Enrichment)
Linked Open Data 65Ali Khalili
Linked Open Data Lifecycle Enrichment
http://ore.aksw.org/
Quality Analysis
Linked Open Data 66Ali Khalili
Linked Open Data Lifecycle
Linked Data Quality Dimensions
Linked Open Data 67Ali Khalili
Linked Open Data Lifecycle
Quality Assessment for Linked Data: A Survey
Luzzu
Linked Open Data 68Ali Khalili
Linked Open Data Lifecycle
http://eis.iai.uni-bonn.de/Projects/Luzzu
Quality Analysis Tools
RDFUnit
Linked Open Data 69Ali Khalili
Linked Open Data Lifecycle
http://aksw.org/Projects/RDFUnit
Quality Analysis Tools
Evolution
Linked Open Data 70Ali Khalili
Linked Open Data Lifecycle
EvoPat – Pattern based KB Evolution
Linked Open Data 71Ali Khalili
Linked Open Data Lifecycle Evolution
• Inspired by Software Refactoring • Agile Knowledge Engineering • Basic & compound evolution patterns
http://link.springer.com/chapter/10.1007%2F978-3-642-17746-0_41
Linked Open Data 72Ali Khalili
Linked Open Data
Interlinking
Enrichment
Quality Analysis
Evolution
Exploration
Extraction
Storage/Querying
Authoring
Linked (Open) Data Lifecycle
http://stack.linkeddata.org/
Life Cycle
Hands-on Result
Linked Open Data 73Ali Khalili
Linked Open Data
Linked Open Data 74Ali Khalili
Linked Open Data
Linked Open Data 74Ali Khalili
Linked Open Data
References
Linked Open Data 75Ali Khalili
Linked Open Data
• http://slidewiki.org/deck/11936_semantic-data-web-lecture-series • Introduction to linked data and its lifecycle on the web • http://euclid-project.eu/ • http://videolectures.net/wims2011_auer_interlinked/ • https://vimeo.com/76257120 • http://www.slideshare.net/slidarko/evolving-the-web-into-a-giant-global-
database-3880018 • http://www.dataversity.net/introduction-to-triplestores/ • http://www.topquadrant.com/2014/05/05/comparing-sparql-with-sql/