Faculty of Mathematics and Physics Charles University in Prague Linked Data Tutorial Tomáš Knap, Jindřich Mynarz, Martin Nečaský, Jakub Stárka February 16, 2012 (Partially based on slides of Chris Bizer [9])
Feb 25, 2016
Faculty of Mathematics and PhysicsCharles University in Prague
Linked Data TutorialTomáš Knap, Jindřich Mynarz, Martin Nečaský, Jakub Stárka
February 16, 2012
(Partially based on slides of Chris Bizer [9])
16th February 2012 | Linked Data Tutorial 2
Motivation
16th February 2012 | Linked Data Tutorial 3
Motivational ScenarioBasic data Employees Departments Public
contracts Budget Expenses
WWW page of the institution
Business Register ÚFIS Buyer‘s
Profile ISVZUS gov.cz
Data Consumer: Show me suppliers of the public contracts for the Ministry of Finance (MF) in the region Liberec. Show me the data on the Google maps in iPhone. For every public contract, I am also looking for the aggregation of all the payments made by MF, link to their budget and responsible person.
• Where can I get the data about public contracts, responsible persons, expenses, and budget of MF?
• How should I aggregate and link the data? • How can I observe the data on the map?
16th February 2012 | Linked Data Tutorial 4
Current Common Practise
1 – MF public contracts
2 – MF public contracts + employees
3 - Expenses
Consumer did not discovered
?
?
?
Information Integration very time consuming, boring, and ineffective!
Basic data Employees Departments Public
contracts Budget Expenses
WWW page of the institution
Business Register ÚFIS Buyer‘s
Profile ISVZUS gov.cz
16th February 2012 | Linked Data Tutorial 5
Linked Data - Basics
16th February 2012 | Linked Data Tutorial 6
Linked Data
• Set of best practices for publishing structured data on the Web in accordance with the general architecture of the Web using Semantic Web technologies and standards Semantic Web is the goal, Linked Data provides
the means to reach the goal
16th February 2012 | Linked Data Tutorial 7
Linked Data Principles
1. Use URIs as names for things2. Use HTTP URIs so that people can look up those
names.3. When someone looks up a URI, provide useful RDF
information4. Include RDF statements that link to other URIs so
that they can discover related things.
[Tim Berners-Lee, http://www.w3.org/DesignIssues/LinkedData.html, 2006]
16th February 2012 | Linked Data Tutorial 8
Architecture of the Classic Web Single global information space Small set of simple standards:
‒ HTTP URI • globally unique ID • retrieval mechanism
‒ HTML as document format‒ Hyperlinks to connect everything
Applications work on top of the complete information space
16th February 2012 | Linked Data Tutorial 9
Web 2.0 APIs and Mashups No single global dataspace Shortcomings:
‒ API have proprietary interfaces‒ No hyperlinks between data items
within different APIs‒ Mashups are based on a fixed set
of data sources
Web APIs slice the Web into Walled Gardens!
16th February 2012 | Linked Data Tutorial 10
Linked Data
• Extend the Web with a single global dataspace By using RDF to publish structured data on the Web By setting links between data items within different data
sources. Physically distributed, behaves like single dataspace
16th February 2012 | Linked Data Tutorial 11
RDF Data Model
• Flexible graph-based data model [2]
• HTTP URIs take the role of global primary keys.
pd:cygri = http://richard.cyganiak.de/foaf.rdf#cygri dbpedia:Berlin = http://dbpedia.org/resource/Berlin
16th February 2012 | Linked Data Tutorial 12
Resolving URIs over the Web
• The HTTP protocol brings together identification and retrieval
16th February 2012 | Linked Data Tutorial 13
Following Links deeper into the Web
16th February 2012 | Linked Data Tutorial 14
Pubby – Linked Data Browser
http://dbpedia.org/page/Český_Krumlov
16th February 2012 | Linked Data Tutorial 15
Properties of the Web of Linked Data
• Global, distributed data space build on a simple set of standards RDF, URIs, HTTP
• Entities are connected by links creating a global data graph that spans data sources enables the discovery of new data sources
• Data-coexistence Everyone can publish data to the Web of Linked Data Everyone can express their personal view on things
16th February 2012 | Linked Data Tutorial 16
Linked Data Deployment on the Web..
Is it real?
16th February 2012 | Linked Data Tutorial 17
W3C Linking Open Data Project
• Grassroots community effort to Publish existing open license datasets as Linked
Data on the Web Interlink things between different data sources
16th February 2012 | Linked Data Tutorial 18
Linked Data Cloud 2007
16th February 2012 | Linked Data Tutorial 19
Linked Data Cloud 2009
16th February 2012 | Linked Data Tutorial 20
Linked Data Cloud 2011
http://richard.cyganiak.de/2007/10/lod/lod-datasets_2011-09-19_colored.pdfhttp://thedatahub.org/
16th February 2012 | Linked Data Tutorial 21
More Statistics
http://stats.lod2.eu/stats
16th February 2012 | Linked Data Tutorial 22
Uptake in Governmental Domain
• The EU is publishing LinkedData EuroStat
‒ http://estatwrap.ontologycentral.com/
• National efforts The Government is releasing public data
‒ http://data.gov.uk/‒ Lots of initiatives in Great Britain
Budget in Germany‒ http://bund.offenerhaushalt.de/
Open Data in Catalonia‒ http://opendata.gencat.cat/en/dades-obertes.html
16th February 2012 | Linked Data Tutorial 23
Data.gov.uk
http://data.gov.uk/organogram/cabinet-office
16th February 2012 | Linked Data Tutorial 24
Linked Data Applications
? ? ??Linked Data Browsers
16th February 2012 | Linked Data Tutorial 25
Search Engines - Sig.ma
http://sig.ma
16th February 2012 | Linked Data Tutorial 26
Mashups – Public Contracts On the Map
http://gd.projekty.ms.mff.cuni.cz:2021/new/map.html
16th February 2012 | Linked Data Tutorial 27
Mashups – Crime, Transport, Education
http://apps.seme4.com/see-uk/
16th February 2012 | Linked Data Tutorial 28
Other Applications
• Browsers: Disco Hyperdata Browser
‒ http://www4.wiwiss.fu-berlin.de/rdf_browser/ OpenLink RDF Browser
‒ http://ode.openlinksw.com/
• Search Engines Falcons
‒ http://ws.nju.edu.cn/falcons/ Watson
‒ http://watson.kmi.open.ac.uk/WatsonWUI/
• Mashups
16th February 2012 | Linked Data Tutorial 29
Linked Data Applications - SummaryLinked Data Browsers Search Engines Linked Data Mashups
16th February 2012 | Linked Data Tutorial 30
Publishing Linked Data
16th February 2012 | Linked Data Tutorial 31
Publishing Tasks – Bizer 38
• 1. Make data available as RDF via HTTP Requires ways to serialize RDF data model
• 2. Set RDF links pointing at other data sources• 3. Make your data self-descriptive
16th February 2012 | Linked Data Tutorial 32
RDF/XML
• W3C Recommendation, 2004 [2]
16th February 2012 | Linked Data Tutorial 33
Turtle Syntax
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .@prefix dataModel: <http://www.w3.org/2000/10/swap/pim/contact#> .@prefix myContact: <http://www.w3.org/People/EM/contact#> .
myContact:me rdf:type dataModel:Person ;dataModel:fullName "Eric Miller".dataModel:mailbox <mailto:[email protected]>.dataModel:personalTitle "Dr.".
• W3C Team Submission, 2011, [4]
16th February 2012 | Linked Data Tutorial 34
RDFa
• A way to directly add RDF to XHTML pages Provides new attributes to handle additional
markup• W3C Recommendation, 2008 [5]• HTML is not extendable
• most RDFa parsers will recognize RDFa attributes in any version of HTML
16th February 2012 | Linked Data Tutorial 35
RDFa
• Provides new attributes to handle additional markup, reuses existing About, resource, … Href, src, …
• Used with any supported element, prefered: Span, div (in the body) a (linking element) Meta, link (in the header)
16th February 2012 | Linked Data Tutorial 36
RDFa Example• XHTML page http://example.com/alice/posts/42
• Original XHTML codeAll content on this site is licensed under <a href="http://cc.org/licenses/by/3.0/"> a Creative Commons License </a>.
• XHTML + RDFaAll content on this site is licensed under <a rel=“cc:license" href="http://cc.org/licenses/by/3.0/"> a Creative Commons License </a>.
• RDF triples destilled from XHTML+RDFa<http://example.com/alice/posts/42> cc:license <http://cc.org/licenses/by/3.0/>.
16th February 2012 | Linked Data Tutorial 37
RDF store + Linked Data Interface
• Virtuoso + pubby
16th February 2012 | Linked Data Tutorial 38
D2R server
• A way how to publish data in relational databases as Linked Data
• Requests from the Web are rewritten into SQL queries via the mapping. on-the-fly translation eliminates the need for replicating the data into a dedicated RDF triple
store.
16th February 2012 | Linked Data Tutorial 39
Publishing Tasks
1. Make data available as RDF via HTTP2. Set RDF links pointing at other data sources3. Make your data self-descriptive
16th February 2012 | Linked Data Tutorial 40
2. Set RDF links
<http://dbpedia.org/resource/Berlin> owl:sameAs <http://sws.geonames.org/2950159> .
• There are tools to help you generate links Silk [6]
16th February 2012 | Linked Data Tutorial 41
Publishing Tasks
1. Make data available as RDF via HTTP2. Set RDF links pointing at other data sources3. Make your data self-descriptive
16th February 2012 | Linked Data Tutorial 42
3. Make your data self-descriptive
• Increase the usefulness of your data and ease data integration
• Aspects of self-descriptiveness 1. Reuse terms from common vocabularies 2. Enable clients to retrieve the schema 3. Publish schema mappings for proprietary terms 4. Metadata
‒ Provide provenance metadata‒ Provide licensing metadata‒ Provide data-set-level metadata using voiD
16th February 2012 | Linked Data Tutorial 43
About Vocabularies
• We have to be able to define the meaning of the subject, properties Vocabularies, e.g. Public contracts ontology
16th February 2012 | Linked Data Tutorial 44
Public Contracts Ontology
http://purl.org/procurement/public-contracts#
16th February 2012 | Linked Data Tutorial 45
RDFS
• RDFS = RDF Schema W3C recommendation
‒ http://www.w3.org/TR/rdf-schema/ Vocabulary for RDF
‒ Definition of classes• is:Student rdf:type rdfs:Class
‒ Definition of properties• is:name rdf:type rdfs:Property
‒ Domains and ranges of properties• is:name rdfs:domain is:Student• is:name rdfs:range xsd:string
16th February 2012 | Linked Data Tutorial 46
OWL
• OWL = Web Ontology Language W3C recommendation
‒ http://www.w3.org/TR/owl2-overview/ Ontologies
‒ More complex constructs• Class or property equivalences• Cardinality restrictions• …
16th February 2012 | Linked Data Tutorial 47
3. Make your data self-descriptive
• Increase the usefulness of your data and ease data integration
• Aspects of self-descriptiveness 1. Reuse terms from common vocabularies 2. Enable clients to retrieve the schema 3. Publish schema mappings for proprietary terms 4. Metadata
‒ Provide provenance metadata‒ Provide licensing metadata‒ Provide data-set-level metadata using voiD
16th February 2012 | Linked Data Tutorial 48
3.1 Reuse Terms from Common vocabularies• Common Vocabularies
Friend-of-a-Friend for describing people and their social network SIOC for describing forums and blogs SKOS for representing topic taxonomies Organization Ontology for describing the structure of organizations GoodRelations provides terms for describing products and business entities Music Ontology for describing artists, albums, and performances Review Vocabulary provides terms for representing reviews
• Common sources of identifiers (URIs) for real world objects LinkedGeoData and Geonames locations GeneID and UniProt life science identifiers DBpedia wide range of things
16th February 2012 | Linked Data Tutorial 49
3.2 Enable Clients to retrieve the Schema
• Clients can resolve the URIs that identify vocabulary terms in order to get their RDFS or OWL definitions.
• If we discover in data URI:<http://opendata.cz/data/p6/contract/ocz_art_5161> http://purl.org/procurement/public-contracts#awardDate "2011-11-11"^^<http://www.w3.org/2001/XMLSchema#date> ;
• We resolve the URI and get the definition:
RDFS or OWL definition
16th February 2012 | Linked Data Tutorial 50
3.3 Publish Schema Mappings
pc:Tender a owl:Class ;rdfs:subClassOf gr:Offering .
pc:AwardCriterion a owl:Class ; owl:equivalentClass loted:AwardCriteria.
• Simple Mappings: rdfs:subClassOf, rdfs:subPropertyOf owl:equivalentClass, owl:equivalentProperty
• Complex mappings – R2R [7]
16th February 2012 | Linked Data Tutorial 51
3.4 Metadata
• Licenses • Data Provenance• Dataset description
16th February 2012 | Linked Data Tutorial 52
Consuming Linked Data
16th February 2012 | Linked Data Tutorial 53
Overview
• URI -> Description Pubby
• Keyword -> Description Sig.ma
• SPARQL query language [8] SQL for RDF databases
16th February 2012 | Linked Data Tutorial 54
SPARQL Example
• Contracts of the given supplier
16th February 2012 | Linked Data Tutorial 55
SPARQL Example - Result
16th February 2012 | Linked Data Tutorial 56
Issues of the Simple Consuming Scenarios
• How to aggregate the data if the links are missing, the data models (ontologies) differs?
• How to deal with data quality? Everybody can say whatever he wants!
• Solution: We are developing an infrastructure for cleaning, linking, and aggregating Linked Data Reusing existing technologies, such as Silk
16th February 2012 | Linked Data Tutorial 57
ODCleanStore
• Cleaning the data Custom cleaners
• Linking the data Silk
• Graphical user interface
• Smart data consuming Data aggregation (due to
links, ontology mappings)
Conflict resolution Data provenance
16th February 2012 | Linked Data Tutorial 58
Motivational Scenario - RecallBasic data Employees Departments Public
contracts Budget Expenses
WWW page of the institution
Business Register ÚFIS Buyer‘s
Profile ISVZUS gov.cz
Data Consumer: Show me suppliers of the public contracts for the Ministry of Finance (MF) in the region Liberec. Show the data on the Google maps in iPhone. For every public contract, I am looking for the aggregation of all the payments made by MF, link to their budget and responsible person. • Where can I get the data about public
contracts, responsible persons, expenses, and budget of MF?
• How should I aggregate and link the data? • How can I observer the data on the map?
16th February 2012 | Linked Data Tutorial 59
Goal
ODCleanStore
Basic data Employees Departments Public
contracts Budget Expenses
WWW page of the institution
Business Register ÚFIS Buyer‘s
Profile ISVZUS gov.cz
16th February 2012 | Linked Data Tutorial 60
Conclusions
16th February 2012 | Linked Data Tutorial 61
Linked Data vs. Open Data
• Open data – 3 stars! 4th star: Single and flexible model (RDF) is missing 5th star: Links
Open data are raw data, which are freely available on the Web to:• Everyone• Anytime• For whatever purpose
16th February 2012 | Linked Data Tutorial 62
Conclusions and Take Away MessageThe Power Of Linked Data (5 star data)
• Web-scale data publishing with web-based discovery mechanisms
• Distributed annotation – make comments about observations, data series, points on the map
• Easy to reuse Huge potential when connecting to the cloud, linking the data, the
benefits are growing as the amount of data published as Linked Data is increasing
• Integration on data level• Easy to extend (new data properties as required, no need to
be planned up-front)• Easy to merge – no name clashes!
16th February 2012 | Linked Data Tutorial 63
Future Steps
• If you managed to get interesting data, try to publish them as Linked Data! We can help you with the whole lifecycle – creating, publishing,
maintenance of the data Just create RDF data, we will publish it for you Just let us know (send us the data), we can publish it Publish data in the same way, but use global identifiers according
to LD principles• When the infrastructure (ODCleanStore) is ready, you can
just send us the RDF data using web service and we will do all the other stuffs – clean, link, and provide aggregated views.
16th February 2012 | Linked Data Tutorial 64
Thank You!
16th February 2012 | Linked Data Tutorial 65
References
• Textbook: Tom Heath, Christian Bizer: Linked Data: Evolving the Web into a Global Data Space. http://linkeddatabook.com/
• [2] http://www.w3.org/TR/rdf-primer/• [3] http://www.w3.org/TR/REC-rdf-syntax/• [4] http://www.w3.org/TeamSubmission/turtle/• [5] http://www.w3.org/TR/rdfa-syntax/• [6] http://www4.wiwiss.fu-berlin.de/bizer/silk/• [7] http://www.w3.org/TR/rdf-sparql-query/
16th February 2012 | Linked Data Tutorial 66
16th February 2012 | Linked Data Tutorial 67
Thank You!
16th February 2012 | Linked Data Tutorial 68
Motivational Scenario (to recall)základní údaje zaměstnanci oddělení veřejné
zakázky rozpočet výdaje
WWW stránky
instituce
Obchodní rejstřík ÚFIS Profil
zadavatele ISVZUS gov.cz
Uživatel: Dodavatelé veřejných zakázek MF z Libereckého kraje na Google mapách v iPhone aplikaci. Pro každou zakázku agregace nebo výpis plateb, vazbu na rozpočet a zodpovědnou osobu.• Kde získám data o zakázkách,
odpovědných osobách, výdajích a rozpočtu MF?
• Jak mám data sloučit a provázat?• Jak zobrazit data v iPhone na mapě?
16th February 2012 | Linked Data Tutorial 74
TODO How can Linked Data Help us in the Motivational Scenarios?
• Searching Specifying that I am looking for a city Than starting writing London, the combo box appears:
‒ http://dbpedia.org/resource/London‒ http://dbpedia.org/resource/London_Ontario‒ http://dbpedia.org/resource/London_Ohio
• Information Integration Faceted browsing, filter all restaurants based on the type of food,
distance from my place, opening hours, … • Data Mining
The application can understand the meaning of the Web pages‒ No re-coding for new pages, no re-coding if the layout changes, generic
piece of software
16th February 2012 | Linked Data Tutorial 76
RDFa
• A way to directly add RDF to XHTML pages Provides new attributes to handle additional
markup• W3C Recommendation (2008)• HTML is not extendable
• most RDFa parsers will recognize RDFa attributes in any version of HTML
16th February 2012 | Linked Data Tutorial 77
RDFa
• Provides new attributes to handle additional markup, reuses existing About, resource, … Href, src, …
• Used with any supported element, prefered: Span, div (in the body) a (linking element) Meta, link (in the header)
16th February 2012 | Linked Data Tutorial 78
Example – Default subject• XHTML page http://example.com/alice/posts/42
All content on this site is licensed under <a href="http://cc.org/licenses/by/3.0/"> a Creative Commons License </a>.
All content on this site is licensed under <a rel=“cc:license" href="http://cc.org/licenses/by/3.0/"> a Creative Commons License </a>.
• RDF triple<http://example.com/alice/posts/42> cc:license <http://cc.org/licenses/by/3.0/>.
16th February 2012 | Linked Data Tutorial 79
Example 2 – Attribute Property• XHTML page http://example.com/alice/posts/42
<div> <h2>The trouble with Bob</h2> <h3>Alice</h3> ... </div>
<div xmlns:dc="http://purl.org/dc/elements/1.1/"> <h2 property="dc:title">The trouble with Bob</h2><h3 property="dc:creator">Alice</h3> ...
</div>
16th February 2012 | Linked Data Tutorial 80
Example 3 – Adding attribute about
• XHTML page http://example.com/alice/posts/42 <base href="http://example.com/alice"></base>
<div xmlns:dc="http://purl.org/dc/elements/1.1/"> <div about="/posts/trouble_with_bob">
<h2 property="dc:title">The trouble with Bob</h2> <h3 property="dc:creator">Alice</h3> ...
</div> <div about="/posts/jos_barbecue">
... </div> ... </div>
16th February 2012 | Linked Data Tutorial 81
Example 4 – Redefining about
• XHTML page http://example.com/alice/posts/42 <base href="http://example.com/alice"></base>
<div about="/posts/trouble_with_bob"> <h2 property="dc:title">The trouble with Bob</h2> Bob takes much better photos than I do: <div about="http://example.com/bob/photos/sunset.jpg">
<img src="http://example.com/bob/photos/x.jpg" /> <span property="dc:title">Beautiful Sunset</span> by <span property="dc:creator">Bob</span>.
</div>
</div>
16th February 2012 | Linked Data Tutorial 82
Example 5 – Blank Node
• XHTML page http://example.com/alice/posts/42<div typeof="foaf:Person" xmlns:foaf=""> <p property="foaf:name"> Alice Birpemswick </p> <p> Email: <a rel="foaf:mbox“ href="mailto:[email protected]"> [email protected] </a> </p> <p> Phone: <a rel="foaf:phone" href="tel:+1-617-555-7332">+1 617.555.7332</a> </p> </div>
16th February 2012 | Linked Data Tutorial 83
Example 6 - Chaining
<div about="http://dbpedia.org/resource/Albert_Einstein"> <span property="foaf:name">Albert Einstein</span> <span property="dbp:dateOfBirth" datatype="xsd:date">1879-03-14</span> <div rel="dbp:birthPlace" resource="http://dbpedia.org/resource/Germany"> <span property="dbp:conventionalLongName">Federal Republic of Germany</span> </div> </div>
16th February 2012 | Linked Data Tutorial 84
Establishing Context, Completing
• @about, @src, @typeof @about and @src explicitly create a new context
for statements, @typeof does so implicitly If not present, the context is inherited
‒ Subject can be the whole document‒ @About, @rel, @resource…@rel @resource -> subject
of the second triple is deduced‒ @About, @rel, …@property @content -> Object and
subject of the triples is deduced as blank node
16th February 2012 | Linked Data Tutorial 85
Objects
• Literal can be set by using @property to express a predicate and then using either @content, or the inline text of the element that @property is on.
• URI resource can be set using one of @rel or @rev to express a predicate, and then either using one of @href, @resource or @src to provide an object resource explicitly, or using the chaining techniques to obtain an object from a nested subject, or from a bnode.
16th February 2012 | Linked Data Tutorial 86
Other ways
• Microformats Fixed vocabulary Uses attribute class
• Example<span class="vcard">
<span class="fn">Jeremy Keith</span>, <span class="org">Clearleft</span>
</span>
16th February 2012 | Linked Data Tutorial 87
GRDDL
• W3C Recommendation (2007)• To obtain RDF from RDFa XHTML page• To obtain RDF from microformats
Requires different templates for different microformats
16th February 2012 | Linked Data Tutorial 88
Assignment 1• Look at public contract described at http://
zakazky.praha.eu/detailZakazky.jsp?zakazkaId=136343• Download the page• Add RDF annotations (in your favorite editor) to specify:
Publisher of the document (“Jan Novak”) Contract title („Název“) in Czech and English Publication date („Datum vyvěšení“) Awarded tender together with winner of the tender („Dodavatel“), and his price („Nabídková cena“)
‒ Winner should be accompanied with a name and ICO.‒ Price should be accompanied with the currency information
• Distill RDF from XHTML file
http://www.w3.org/2007/08/pyRdfa/• Notes:
You will need to look at:‒ Opendata.cz – ontology for public contracts‒ Any other ontology referenced