LD4SC Summer School 7 th 12 th June, Cercedilla, Spain 1st Summer School on Smart Ci2es and Linked Open Data (LD4SC15) Linked Data Genera=on Process Raúl GarcíaCastro, Filip Radulovic, Oscar Corcho, María Poveda, Víctor RodríguezDoncel, Asunción GómezPérez, Daniel VilaSuero Presenter: Raúl GarcíaCastro
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
1st Summer School on Smart Ci2es and Linked Open Data (LD4SC-‐15)
Linked Data Genera=on Process Raúl García-‐Castro, Filip Radulovic, Oscar Corcho, María Poveda, Víctor Rodríguez-‐Doncel, Asunción Gómez-‐Pérez, Daniel Vila-‐Suero
Presenter: Raúl García-‐Castro
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Index
• Linked Open Data in Smart Ci2es • Guidelines for the Genera=on of Linked Data • Discussion • Hands-‐on Descrip=on
2
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
• For example, (re)using open transport data – Provide travel informa=on to persons – Allow beQer mul=modal route planning – Facilitate public transport management – … – Accessibility
• Which metro accesses are accessible for wheelchair users? • In which bus stops is it safer and more convenient for a wheelchair user to wait?
• Is there any accessible parking space nearby a bus stop? • etc.
Open data… for what?
4
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Legal framework and open data ini=a=ves
• Aarhus Conven=on (1998) – Right to par=cipa=on and access; 41 countries and the EU
• Open Access Ini=a=ve (2001) – Scien=fic informa=on on the Web; > 510 organisa=ons
• PSI Direc=ve – PSI Reuse (2003/98/EC)
• Conven=on for the access to official documents (2009) – Signed by 12 countries – Belgium, Finland, Norway, Sweden, Hungary, Estonia, Lithuania, Slovenia, Georgia,
Montenegro, Serbia and Macedonia
• Law 37/2007. PSI Reuse • Law 11/2007. Ci=zen access to public services and right to the quality of services • RD 4/2010 Na=onal Interoperability Scheme
– Open standards – Technology neutral – Open source solware
• RD 1495/2011 It develops law 37/2007 • Norma Técnica de Interoperabilidad (19/02/2013, BOE 4/3/2013)
Adapted from Antonio Rodríguez Pascual (IGN) 5
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
The problem: lack of interoperability
Publish
Extract
Publish
Extract
Publish
Extract
I want to publish data in an interoperable
structure and format
I use GTFS I use my own CSV structure
I provide a web service
Build an app that is available all over the
world
6
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Scenario: open transport data
Is there any open transport
data already?
We are surrounded by them
7
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Open data and how they are published
1) In no2ce boards – For those who have a lot of free =me – Or those who are there at the right moment in =me
Adapted from Antonio Rodríguez Pascual (IGN)
DATA
8
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Open data and how they are published
2) In web pages and mobile apps – For people
Adapted from Antonio Rodríguez Pascual (IGN)
On the Web, open license
DATA
9
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Open data and how they are published
2) In web pages and mobile apps – For people
Adapted from Antonio Rodríguez Pascual (IGN)
On the Web, open license
DATA
Machine-‐readable
Non-‐proprietary format
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Open data and how they are published
3) As web files – So that they can be loaded by humans in their
informa=on systems (XML, HTML, CSV, etc.) – Hopefully it is not a scanned PDF
Adapted from Antonio Rodríguez Pascual (IGN)
On the Web, open license
DATA
Machine-‐readable
Non-‐proprietary format
11
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain Adapted from Antonio Rodríguez Pascual (IGN)
Open data and how they are published
4) Via web services – For humans and machines – It allows genera=ng added-‐value services – And can be integrated in the applica=on business logic
On the Web, open license
DATA
Machine-‐readable
Non-‐proprietary format
12
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
What is open data?
• Open data are data that can be freely used, reused and redistributed by anyone -‐ subject only, at most, to the requirement to a9ribute and sharealike.
• The most important aspects to consider: – Availability and Access: data must be available as a whole and at no
more than a reasonable reproduc2on cost, preferably by downloading over the Internet. Data must also be available in a convenient and modifiable form.
– Reuse and Redistribu2on: data must be provided under terms that permit reuse and redistribu2on including the intermixing with other datasets.
– Universal Par2cipa2on: everyone must be able to use, reuse and redistribute -‐ there should be no discrimina2on against fields of endeavour or against persons or groups. For example, ‘non-‐commercial’ or ‘only in educa=on’ restric=ons.
Source: Open Data Handbook 13
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Scenario: open transport data
Is there any open transport
data already?
Can we do it beSer?
14
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Going into 4 and 5 Linked Data
Make it available as structured data (e.g., Excel instead of image scan or a table)
Use non-‐proprietary formats (e.g., CSV instead of Excel)
Use URIs to iden2fy things, so that people can point at your stuff
Link your data to other data to provide context
Make your stuff available on the Web (whatever format) under an open license
15
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
USE URIs + RDF RDF standards
José
Mobility impairment
Boardgames
API
Mirasierra
Ven=squero de la Condesa
Yes
CSV
Mega Games
Ven=squero de la Condesa
Yes
CSV
Mega Games
Conquer & Smash!
MG
29,95
HTML
José
Mobility Impairment
hasImpairment
WheelchairAccessibility
requires
Boardgame
likes
Mirasierra
address Ven=squero de la Condesa
WheelchairAccessibility
hasAccessibility
Mega Games
address
hasAccessibility WheelchairAccessibility
Ven=squero de la Condesa
Mega Games
Conquer & Smash!
is a Boardgame
sells
API RDF CSV RDF CSV RDF HTML RDF
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Link your data Linked RDF
José
Mobility impairment
Boardgames
Mirasierra
Ven=squero de la Condesa
Yes
Mega Games
Ven=squero de la Condesa
Yes
Mega Games
Conquer & Smash!
MG
29,95
API CSV CSV HTML
José
Mobility Impairment
hasImpairment
WheelchairAccessibility
requires
Boardgame
likes
Mirasierra
address Ven=squero de la Condesa
WheelchairAccessibility
Mega Games
address
hasAccessibility WheelchairAccessibility
Mega Games
Conquer & Smash!
is a
hasAccessibility
Boardgame
Ven=squero de la Condesa
sells
API RDF CSV RDF CSV RDF HTML RDF
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
WheelchairAccessibility
Ven=squero de la Condesa
Boardgame
Link your data Linked RDF
José
Mobility impairment
Boardgames
Mirasierra
Ven=squero de la Condesa
Yes
Mega Games
Ven=squero de la Condesa
Yes
Mega Games
Conquer & Smash!
MG
29,95
API CSV CSV HTML
José
Mobility Impairment
hasImpairment
WheelchairAccessibility
requires
Boardgame
likes
Mirasierra
address Ven=squero de la Condesa
hasAccessibility WheelchairAccessibility
Mega Games
address Ven=squero de la Condesa
hasAccessibility WheelchairAccessibility
Mega Games
sells Conquer & Smash!
is a Boardgame
API RDF CSV RDF CSV RDF HTML RDF
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Make complex queries
Where can I buy the Conquer & Smash!
game?
Which are the most accessible routes for Christmas shopping?
Expansion pack for Conquer & Smash! Take metro line 9 and in 35 minutes
we can demo it to you!
Or beQer take bus 231 because it is sunny and you can take a glance at the outdoor art
exhibi=on in Plaza de Cas=lla
MG
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Using Linked Open Transport Data
• Calculate accessible routes – Combined with geographical data (IGN) – Which stop should I use if I have mobility problems?
• Commercial routes by bus – Combined with Madrid’s shop census (from Ayto. Madrid)
• Geomarke=ng decisions for enterpreneurs – Where should I open my shop? Based on the combina=on of the number of travellers per stop, demographic data, data about other businesses and shops around, etc.
• Personalised offers to travellers – With real-‐=me data and data about consump=on paQerns (e.g., credit card transac=ons)
• …
20
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Index
• Linked Open Data in Smart Ci=es • Guidelines for the Genera2on of Linked Data • Discussion • Hands-‐on Descrip=on
21
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Linked Data life cycle
Specification
Modelling
Generation Publication
Exploitation
Linking
22
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Requirements (smart ci=es domain)
1. Tabular formats (i.e., SQL, XLS or CSV) – Other data structures (e.g., XML) less important in prac=ce
or are unstructured and would require much more work 2. Changing data (dynamic or streaming data), versioning,
(automa=c) data quality assurance and reliability 3. Data access through web services, proprietary APIs and
data files 4. Legal aspects (e.g., licensing, data ownership) 5. Access rights management or mechanisms for
extrac=ng public data (plenty of confiden=al data)
23
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Linked Data genera=on process
Select data source
Obtain access to
data source
Analyse data source
Analyse licensing of
the data source
Define resource naming strategy
Transform data source
Link with other
datasets
Data source
Access, data
License
Schema, data
Resource naming strategy
Ontology
RDF data
Linked dataset
Ontology Develop ontology
24
F. Radulovic, M. Poveda-‐Villalón, D. Vila-‐Suero, V. Rodríguez-‐Doncel, R. García-‐Castro and A. Gómez-‐ Pérez, Guidelines for Linked Data genera=on and publica=on: An example in building energy consump=on, Automa=on in Construc=on, Special Issue on Linked Data in Architecture and Construc=on. Available online April 2015.
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Linked Data genera=on process
Select data source
Obtain access to
data source
Analyse data source
Analyse licensing of
the data source
Define resource naming strategy
Transform data source
Link with other
datasets
Data source
Access, data
License
Schema, data
Resource naming strategy
Ontology
RDF data
Linked dataset
Ontology Develop ontology
DATA PREPARATION
25
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Select data source
• Select the data source that will be transformed into Linked Data
• Steps: – To define the requirements for selec=on – To select one or several data sources
• The data set may be: – Owned by your organiza=on… – … or not (external data sources)
26
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Select data source – LCmple
• Requirements – Real-‐world scenario in the smart city domain – Available for use – Available in machine-‐processable format (the more structured the data are, the beQer)
– Can be linked with generic en==es (e.g., loca=on) • Leeds City Council – energy consump=on
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Obtain access to data source
• Data access means – Technical means to retrieve the data – Legal rights to use the data
• If the data is not accessible: – To iden=fy the person to contact – To request the access – To obtain access and to retrieve the data
• Access alterna=ves: – file, – programming interface, – database, – data stream, – etc.
28
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Obtain access to data source – Lample
• Data set already available as a CSV file
29
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Analysing licensing of the data source
• Licenses specify the legal terms under which a data set can be used and exploited
• Neither legal prescrip=ons on how to declare licenses nor common standard prac=ces to do so
• Steps (not automatable): – To iden=fy the rightsholder and the authorita=ve publisher
• Righstholder vs. authorized distributor – To find the applicable license
• Web page, data set metadata, data themselves • Contact the publisher
– To read the license and analyse legal terms • Tips
– Analysis should be performed upon all copies and formats of the data – Ensure license compa=bility when integra=ng several data sources
30
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Linked Data resources can be protected
Ontologies are intellectual works, they can be protected by copyright RDF Datasets can be considered as databases, also legally protected in the EU
31
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Create, consume, aggregate, derive and publish Linked Data in a lawful environment
0
Always license your data
…
Data shops Government Individuals
32
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Licensed Linked Data
Non-‐licensed Linked Data Licensed Linked Data
+License
Unless there is a license allowing to do so, the resource cannot be copied, modified or published. In practice, non-licensed resources are useless in industrial settings
Licensed Linked Data can be used
33
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Licensed Linked Data in prac=ce
Linked Open Data Published Open License
(Published) Linked Data Published No Open License
Linked Data Not Published No Open License
34
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
ç
Guidelines for licensing linked data
35
Add "rights" metadata in the dataset descrip=on (e.g., VoID, DCAT) 1
Use standard predicates to declare "rights" statements (e.g., Dublin Core terms: dc:rights, dct:license) 2
?
Use rights declara2on language, e.g., ODRL
Yes
Use URI of standard license e.g., CC0 3b 3a
No
Standard license available
ODRL Open Digital Rights Language
DCAT Data catalog vocabulary
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Licensing Linked Data is Simple…
The Bri=sh Na=onal Bibliography (BNB) lists the books and new journal =tles published or distributed in the United Kingdom and Ireland since 1950.
J 36
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
… or complex depending your needs
Policies can be expressed with ODRL 2.0 to govern access to Linked Data Example of access to Linked Data for a price (15EUR for the dataset or 0.01EUR for a triple thereof)
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Linked Data genera=on process
Select data source
Obtain access to
data source
Analyse data source
Analyse licensing of
the data source
Define resource naming strategy
Transform data source
Link with other
datasets
Data source
Access, data
License
Schema, data
Resource naming strategy
Ontology
RDF data
Linked dataset
Ontology Develop ontology
DEFINE RESOURCE NAMING STRATEGY
46
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Hash and slash URIs
• Hash URIs (#) – hQp://www.energycompany.com/about#energyCompany – The fragment part has to be stripped off when the URI is requested from the server (i.e., the resource cannot be retrieved directly)
– Hash URIs can be used to iden=fy non-‐document resources • Slash URIs (/)
– hQp://www.energycompany.com/about/energyCompany – Imply a 303 redirec=on to the loca=on of a document that represents the resource (+ content nego=a=on)
• E.g., hQp://www.energycompany.com/about/energyCompany.rdf – Drawbacks: HTTP round-‐trip, redirects, web server configura=on
47
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Hash or slash?
• Depends on the data and on their expected use • Small data:
– Hash namespace – Access all the data as a whole – HTTP GET would return a single informa=on resource with everything
• Large / frequently-‐updated / modular data: – Slash namespace – Access resources individually or in groups – Resource descrip=ons may be divided among many informa=on resources or may be managed via a query service (e.g., SPARQL)
– Progressively greater detail about resources may be retrieved through mul=ple accesses
48
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Define resource naming strategy
• Steps: – To choose a URI form (hash or slash) – To choose a domain for the URIs. – To choose a path for the URIs. – To choose a paQern for ontology classes and proper=es in the ontology, as well as for individuals
• Tips: – One URI must iden=fy only one item (e.g., avoid mixing with web pages and real-‐world objects)
– URIs should be persistent and should not change over =me (e.g., state informa=on); PURL may support this
– Use a domain that is under your control (or a service such as PURL)
– Separate the ontology model from its instances – Define meaningful URIs
49
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Resource naming strategy – LCC
• Hash URIs for ontological terms, slash URIs for individuals • Domain: hQp://smartcity.linkeddata.es/ • Ontological terms path:
<http://smartcity.linkeddata.es/lcc/resource/CivicStructure/CouncilOfficesBelgraveHouse> a schema:CivicStructure ; rdfs:label "Belgrave House" .
<http://smartcity.linkeddata.es/lcc/resource/CivicStructure/CommunityCentreTunstallRoad> a schema:CivicStructure ; rdfs:label "Tunstall Road" .
Export à RDF as Turtle
72
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Evalua=ng the exported data
• Manual inspec=on • Syntax evalua=on (with syntax validator) • Consistency with the ontologies (with reasoner) • Usage evalua=on (e.g., by running SPARQL queries) – Show all electricity consump=ons and the related =me periods for all council sites related to culture
– Show all energy consump=ons and the related =me periods of council sites from the Wakefield district
73
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Index
• Linked Open Data in Smart Ci=es • Guidelines for the Genera=on of Linked Data • Discussion • Hands-‐on Descrip=on
74
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain