Page 1
Linked (Open) DataBut what does it buy me?
Rinke HoekstraVU University Amsterdam/University of Amsterdam
[email protected]
Linked (Open) Data - But what does it buy me? by Rinke HoekstraLicensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
maandag 11 maart 13
Page 2
maandag 11 maart 13
Page 3
http://www.youtube.com/watch?v=ga1aSJXCFe0
maandag 11 maart 13
Page 4
maandag 11 maart 13
Page 5
http://www.ted.com/talks/tim_berners_lee_the_year_open_data_went_worldwide.html
maandag 11 maart 13
Page 6
Linked Open Data
maandag 11 maart 13
Page 7
Linked Open Data
Texts taken from http://5stardata.info
maandag 11 maart 13
Page 8
Why people go “Meh”
• Data needs to be converted to RDF
• Data needs to be published on the Web
• An open license is required even for a single ★
Pacific Barreleye, http://imgur.com/gallery/Mzyb5(can rotate its eyes forwards or upwards to look through the transparent head to prey above)
maandag 11 maart 13
Page 9
Why people go “Meh”
• Data needs to be converted to RDF
• Data needs to be published on the Web
• An open license is required even for a single ★
What if people draw incorrect conclusions from my data?
Pacific Barreleye, http://imgur.com/gallery/Mzyb5(can rotate its eyes forwards or upwards to look through the transparent head to prey above)
maandag 11 maart 13
Page 10
Why people go “Meh”
• Data needs to be converted to RDF
• Data needs to be published on the Web
• An open license is required even for a single ★
What if people draw incorrect conclusions from my data?
What if journalists draw incorrect conclusions from my data?
Pacific Barreleye, http://imgur.com/gallery/Mzyb5(can rotate its eyes forwards or upwards to look through the transparent head to prey above)
maandag 11 maart 13
Page 11
Why people go “Meh”
• Data needs to be converted to RDF
• Data needs to be published on the Web
• An open license is required even for a single ★
What if people draw incorrect conclusions from my data?
What if journalists draw incorrect conclusions from my data?
What if combining data results in privacy infringement?
Pacific Barreleye, http://imgur.com/gallery/Mzyb5(can rotate its eyes forwards or upwards to look through the transparent head to prey above)
maandag 11 maart 13
Page 12
... but LOD is just asking for more!
maandag 11 maart 13
Page 13
... how can I sell this internally?
maandag 11 maart 13
Page 14
maandag 11 maart 13
Page 15
Open DataLinked
maandag 11 maart 13
Page 16
DataLinkedSix Ingredients
The missing ★
Mix ‘n MashContextualize!
Choose your Grain Size
Lower the Threshold
Repeatable Transformation
maandag 11 maart 13
Page 17
1The missing ★
maandag 11 maart 13
Page 18
1The missing ★
maandag 11 maart 13
Page 19
1The missing ★
http://give.everything/a/URI
HTTPs URIs only please!(or resolver + URN)
Version information
Version agnostic
Guessable
maandag 11 maart 13
Page 20
Messy Datahttp://wetten.overheid.nl/BWBIdService/BWBIdList.xml.zip
NB: The problem with the XML processing instruction was reported and fixed, but returned some weeks later
maandag 11 maart 13
Page 21
Example: Juriconnect
• Existing identification standard: Juriconnect
• URN-like... but no naming servercf. Document Object Identifiers
• Named elements do not carry identifier
• No explicit version information, only contextual
1.0:c:BWBR0005416&artikel=6vs
http://wetten.overheid.nl/cgi-bin/deeplink/law1/bwbid=BWBR0005416/article=6/date=2005-01-14vs
http://wetten.overheid.nl/BWBR0005416/TitelII698946/HoofdstukII/Artikel16/geldigheidsdatum_14-01-2005
maandag 11 maart 13
Page 22
Levels of Identification
• IFLA FRBR levels
• Work
• Expression
• Manifestation
Bibliographic Entity Work
Expression
Manifestation
Item
XML version of regulation
exemplifies
embodies
realizes
Version of regulation Regulation
XML version of regulation on my harddisk
maandag 11 maart 13
Page 23
• Hierarchical information (work)
• Version and language (expression)
• Format information (manifestation)
http://doc.metalex.eu/id/BWBR0011823/hoofdstuk/1/artikel/1
http://doc.metalex.eu/id/BWBR0011823/hoofdstuk/1/artikel/1/nl/2010-09-01
http://doc.metalex.eu/doc/BWBR0011823/hoofdstuk/1/artikel/1/nl/2010-09-01/data.xml
http://doc.metalex.eu/id/BWBR0011823/artikel/1
Transparent = Guessable
maandag 11 maart 13
Page 24
Versioning Issues• URIs don’t carry semantics...
• Detect changes:
• which element versions are the same
• ... and which versions are different?
Art. 44, lid 4(2011-03-26)
Art. 44, lid 4(2011-04-05)
From: Besluit prudentiële regels Wft, BWBR0020420
maandag 11 maart 13
Page 25
Opaque Identifiers
• Content information
• Unique SHA1 Hash of text
http://doc.metalex.eu/BWBR0011823/hoofdstuk/1/artikel/34b0cee26ee5138c74aa2c62caf2c117d3c616e9
vermogen van de erflater
SWHoofdstuk I, Artikel 10
2011-01-01
dcterms:subject
SHA18738ef273ea4dbc73
owl:sameAs
SWHoofdstuk I, Artikel 10
2011-10-12
maandag 11 maart 13
Page 26
Opaque Identifiers
• Content information
• Unique SHA1 Hash of text
http://doc.metalex.eu/BWBR0011823/hoofdstuk/1/artikel/34b0cee26ee5138c74aa2c62caf2c117d3c616e9
vermogen van de erflater
SWHoofdstuk I, Artikel 10
2011-01-01
dcterms:subject
SHA18738ef273ea4dbc73
owl:sameAs
SWHoofdstuk I, Artikel 10
2011-10-12
owl:sameAs
maandag 11 maart 13
Page 27
Opaque Identifiers
• Content information
• Unique SHA1 Hash of text
http://doc.metalex.eu/BWBR0011823/hoofdstuk/1/artikel/34b0cee26ee5138c74aa2c62caf2c117d3c616e9
vermogen van de erflater
SWHoofdstuk I, Artikel 10
2011-01-01
dcterms:subject
SHA18738ef273ea4dbc73
owl:sameAs
SWHoofdstuk I, Artikel 10
2011-10-12
owl:sameAs
dcterms:subject
owl:sameAs
maandag 11 maart 13
Page 28
Opaque Identifiers
• Content information
• Unique SHA1 Hash of text
http://doc.metalex.eu/BWBR0011823/hoofdstuk/1/artikel/34b0cee26ee5138c74aa2c62caf2c117d3c616e9
vermogen van de erflater
SWHoofdstuk I, Artikel 10
2011-01-01
dcterms:subject
SHA18738ef273ea4dbc73
owl:sameAs
SWHoofdstuk I, Artikel 10
2011-10-12
SHA1a433f53273c78a56f2
owl:sameAs
maandag 11 maart 13
Page 29
Network Analysis
maandag 11 maart 13
Page 30
2Repeatable Transformation
Transformation should be part of routine ...... manageable and scalable ...
... repeatable ...http://www.w3.org/TR/prov-overview/
maandag 11 maart 13
Page 31
2Repeatable Transformation
Transformation should be part of routine ...... manageable and scalable ...
... repeatable ...
Linked Data will not be the official source anytime soon
http://www.w3.org/TR/prov-overview/
Provenance is key
maandag 11 maart 13
Page 32
maandag 11 maart 13
Page 33
LODStatshttp://stats.lod2.eu
maandag 11 maart 13
Page 34
40.745.554.078 Triples!
maandag 11 maart 13
Page 35
40.745.554.078 Triples!(1.6 Billion)
(I tried to check the latest figures, but http://stats.lod2.eu was down)
maandag 11 maart 13
Page 36
3Choose your Grain Size
• The document is the traditional grain size(dublin core)
• Linked data allows for deep links into data
• Cost versus usefulness
• Are you the right party to provide detailed descriptions?
http://creatingandeducating.blogspot.nl/2011/11/blog-post.html
maandag 11 maart 13
Page 37
RDF Report Card
Report Card Categories
RDF Report Card by Leigh Dodds, talk at Semtech Biz London, 2011, http://slideshare.net/ldodds
Report Card Categories
MetadataScope
StructureInternals
Low Detail High Detail
maandag 11 maart 13
Page 38
4 Mix ‘n Mash
• Multiple vocabularies won’t bite
• Multiple identifiers won’t bite
• Choose what’s useful for you...
• ... then map to others!
Image © David Sykes 2009 All rights reserved
maandag 11 maart 13
Page 39
4 Mix ‘n Mash
• Multiple vocabularies won’t bite
• Multiple identifiers won’t bite
• Choose what’s useful for you...
• ... then map to others!
Image © David Sykes 2009 All rights reserved
Good News: the bulk has already been done for you!
maandag 11 maart 13
Page 40
Semantically-Interlinked Online Communities
maandag 11 maart 13
Page 41
Semantically-Interlinked Online Communities
maandag 11 maart 13
Page 42
Example: Provenance
http://doc.metalex.eu/id/BWBR0017869/2009-10-23
http://doc.metalex.eu/id/process/BWBR0017869/2009-10-23 http://doc.metalex.eu/id/event/BWBR0017869/2009-10-23
opmv:wasGeneratedByml:resultOf
http://doc.metalex.eu/id/date/2009-10-23
opmv:wasGeneratedAt
ml:date
ml:LegislativeModification
rdf:type
opmv:Process
rdf:type
"2009-10-23"^^xsd:date
time:inXSDDateTime
time:hasEnd
time:Instant
rdf:type
ml:Date
rdf:type
opmv:Artifact
rdf:type ml:BibliographicExpression
rdf:type
sem:Event
rdf:type
sem:eventType
sem:hasTime
sem:Time
rdf:typesem:timeTypesem:hasTimeStamp
The expression (version) URI of a regulation
The process that generated the expression
The date at which the expression was created
rdf:value
The creation event of the regulation
maandag 11 maart 13
Page 43
5• Information is not always compatible
• Make explicit in which context the information holds ...
• ... and who stated the information, why and how.
Contextualize!
Flat Earth and Square Earth idea courtesy of Szymon Klarman
maandag 11 maart 13
Page 44
• Namespaces don’t mean anything
• Use named graphs to compartmentalize metadata
• Add provenance information about groups of statements
<http://example.com/workbook1/sheet1/corrected><http://example.com/workbook1/sheet1>
:curation20120126
provo:wasGeneratedBy
provo:Activity
:RinkeHoekstra
_:a_:b
rdf:type
provo:hadAgent
provo:endedAtprovo:startedAt
"20120126T09:00:00" "20120126T08:30:00"
time:inXSDDateTime time:inXSDDateTime
_:x
:14--15_1875--1874
d2s:dimension
"11"^^xsd:int
d2s:populationSize
"1"^^xsd:int
d2s:populationSize
:14-15
d2s:ageGroup
:1875--1874d2s:birthYears
"1889"^^xsd:intd2s:censusYear
:Assendelft
d2s:gemeente
maandag 11 maart 13
Page 45
Compliance
Regulation A Art 12 Art 14, lid 3, 2e volzin
maandag 11 maart 13
Page 46
Compliance
Regulation A Art 12 Art 14, lid 3, 2e volzin
start
State Nameentry/actiondo/activityexit/actionevent/action(arguments)
Stateaction
end
maandag 11 maart 13
Page 47
Compliance
Regulation A Art 12 Art 14, lid 3, 2e volzin
start
State Nameentry/actiondo/activityexit/actionevent/action(arguments)
Stateaction
end
maandag 11 maart 13
Page 48
Compliance
Regulation A Art 12 Art 14, lid 3, 2e volzin
start
State Nameentry/actiondo/activityexit/actionevent/action(arguments)
Stateaction
end
maandag 11 maart 13
Page 49
Compliance
Regulation A Art 12 Art 14, lid 3, 2e volzin
start
State Nameentry/actiondo/activityexit/actionevent/action(arguments)
Stateaction
end
maandag 11 maart 13
Page 50
Compliance
Regulation A Art 12 Art 14, lid 3, 2e volzin
start
State Nameentry/actiondo/activityexit/actionevent/action(arguments)
Stateaction
end
Art 14, lid 3, 2e volzin
maandag 11 maart 13
Page 51
Compliancestart
State Nameentry/actiondo/activityexit/actionevent/action(arguments)
Stateaction
end
Regulation A(01-01-2011)
Art 12(04-02-2011)
Art 14, lid 3, 2e volzin(11-06-2008)
Art 14, lid 3, 2e volzin(01-07-2011)
maandag 11 maart 13
Page 52
Contextual Annotation
vermogen van de erflater
Successiewetvermogen van de erflater
SW Hoofdstuk Ivermogen van de erflater
SW Artikel 10vermogen van de erflater
SW Art. 10, zin 1vermogen van de erflater
Successiewet
SWHoofdstuk I, Artikel 10
SWHoofdstuk I
SWHoofdstuk I, Artikel 10
Zin 1
dcterms:subject
dcterms:subject
dcterms:subject
dcterms:subject
No nice background because Google Image search only returned boring images
maandag 11 maart 13
Page 53
6Lower the Threshold
• Integrate Linked Data production into everyday tools
• Allow tools to do the work for you
• Use a built-in reward model
Image courtesy of http://themaisonette.net
maandag 11 maart 13
Page 54
6Lower the Threshold
• Integrate Linked Data production into everyday tools
• Allow tools to do the work for you
• Use a built-in reward model
Image courtesy of http://themaisonette.net
Linked Data allows you to trace usage!
maandag 11 maart 13
Page 55
Wrap Legacy Systems
http://www.w3.org/TR/r2rml/
maandag 11 maart 13
Page 56
maandag 11 maart 13
Page 57
Idea: use reward mechanisms of Web 2.0
maandag 11 maart 13
Page 58
• Lightweight Web Application
• Interface to API of existing data repositories
• Enrich metadata by linking to Linked Data resources
• Provide annotation services for data files
• Plugin based architecture
• Publish RDF metadata as new data publicationhttp://linkitup.data2semantics.org
maandag 11 maart 13
Page 59
recoprovReconstruct provenance using
Dropbox file edit history
0
1
8
9
12
13
16 22
2
4
7
11
17
19
3
5
6
14
23
10 15
18
21
20
24
Sara Magliacane and Paul Grothmaandag 11 maart 13
Page 60
plsheetHow are results calculated (1)? Automatic analyis of workflow in spreadsheets
Analyse dependencies between cells in complex spreadsheets
Martine de Vos, Jan Wielemaker and Willem van Hagemaandag 11 maart 13
Page 61
plsheet
Reconstruct and explain the workflow of computations
Martine de Vos, Jan Wielemaker and Willem van Hagemaandag 11 maart 13
Page 62
Albert Merono-Penuela, Rinke Hoekstra, Laurens Rietveld, Christophe Gueret
TabLinker
http://www.cedar-project.nl
Semi-automatic RDF converter for eccentric spreadsheets
maandag 11 maart 13
Page 63
Albert Merono-Penuela, Rinke Hoekstra, Laurens Rietveld, Christophe Gueret
TabLinker
http://www.cedar-project.nl
Semi-automatic RDF converter for eccentric spreadsheets
maandag 11 maart 13
Page 64
DataLinkedSix Ingredients
The missing ★
Mix ‘n MashContextualize!
Choose your Grain Size
Lower the Threshold
Repeatable Transformation
maandag 11 maart 13
Page 65
Open DataLinkedThe missing ★
Mix ‘n MashContextualize!
Choose your Grain Size
Lower the Threshold
Repeatable Transformation
... be sure to use it internally too!
maandag 11 maart 13