Top Banner
Linked Open Data for Digital Humanities What is Linked Open Data and why is it relevant for you ? Christophe Guéret (@cgueret)
20

Linked Open Data for Digital Humanities

Jan 20, 2015

Download

Technology

This presentation was given to Digital Humanties students on March 7. The goal is to introduce LOD and showcase what can be done with it.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Linked Open Data for Digital Humanities

Linked Open Data for Digital HumanitiesWhat is Linked Open Data and

why is it relevant for you ?

Christophe Guéret (@cgueret)

Page 2: Linked Open Data for Digital Humanities

Open Data

“A piece of data or content is open if anyone is free to use, reuse, and redistribute it — subject only, at most, to the requirement to attribute and/or share-alike.”

http://opendefinition.org/

Page 3: Linked Open Data for Digital Humanities

Linked Data

"a term used to describe a recommended best practice for exposing, sharing, and connecting pieces of data, information, and knowledge on the Semantic Web using URIs and RDF."

http://linkeddata.org/

Page 4: Linked Open Data for Digital Humanities

Linked Open Data

● Linked Open Data = Open Data + Linked Data

● Interconnected data sets that are on the Web and free to use

● 5-star scheme http://5stardata.info/

Page 5: Linked Open Data for Digital Humanities

Why does it matter for DH ?

● Digital Humanities use a lot of data and study relations between things

● Data acquisition & curation represents a LOT of efforts for data consumers

● Linked Open Data is a good way to○ Facilitate your own work (as a data consumer)○ Facilitate other's work (as a data publisher)

Page 6: Linked Open Data for Digital Humanities

Data found on the Web

● You get the following table as a CSV file

● And that Excel table from somewhere else

Kennis Stad

Christophe Amsterdam

David Parijs

Ville Pays

Paris France

Amsterdam Pays-Bas

Page 7: Linked Open Data for Digital Humanities

And you want to integrate it

● Data integration issues○ Kennis, Stad, Ville, Pays ?○ Parijs = Paris ?○ Amsterdam = Amsterdam ?

● Lot of work for the (uninformed) consumer !

Kennis Stad

Christophe Amsterdam

David Parijs

Ville Pays

Paris France

Amsterdam Pays-Bas

+ = ?

Page 8: Linked Open Data for Digital Humanities

Linked Data approach

● Assign unique identifiers (URIs) to concepts and things

● Create a "triple": connect the identifiers with labelled, directed edges

dbpedia:Amsterdam dbpedia:Netherlandsdbo:country

Page 9: Linked Open Data for Digital Humanities

Why does it solves the issue?

● Shift some of the data integration load on the provider side○ Clarify the semantics of the data○ Refer to identifiers rather than names

● There is only one "dbpedia:Amsterdam" at http://dbpedia.org/resource/Amsterdam

● Labels used for the edges are published by an external authority

Page 10: Linked Open Data for Digital Humanities

Some vocabulary publishers

Page 11: Linked Open Data for Digital Humanities
Page 12: Linked Open Data for Digital Humanities

From triples to the Web of Data

● Every triple is a bit of factual information

● Because nodes are re-used across triples, the union of all the triples is a graph

● The "Web of Data" is a pre-integrated, semantically clear, data set ready to be used!

Page 13: Linked Open Data for Digital Humanities

Exploring relations in the graph

Page 14: Linked Open Data for Digital Humanities

Let's make a social network !

● The network○ A node per European country○ An edge means a shared official language○ Label the edges with the languages○ Label the nodes with the country names

● Data source○ DBpedia SPARQL http://dbpedia.org/sparql

● Visualisation tool○ Gephi https://gephi.org/

Page 15: Linked Open Data for Digital Humanities

SPARQL ?

● Query language for Linked Open Data● Describe part of the graph and use variables

dbpedia:Amsterdam ?Countrydbo:country

Suggested book to read

Page 16: Linked Open Data for Digital Humanities

The query in SPARQLSELECT DISTINCT ?Source ?Target ?Label WHERE {

?country1 a <http://dbpedia.org/class/yago/EuropeanCountries>.?country1 <http://dbpedia.org/ontology/officialLanguage> ?language.?country2 a <http://dbpedia.org/class/yago/EuropeanCountries>.?country2 <http://dbpedia.org/ontology/officialLanguage> ?language.FILTER (?country1 != ?country2)

?country1 <http://www.w3.org/2000/01/rdf-schema#label> ?Source.?country2 <http://www.w3.org/2000/01/rdf-schema#label> ?Target.?language <http://www.w3.org/2000/01/rdf-schema#label> ?Label.FILTER ((LANG(?Source) = "en") && (LANG(?Target) = "en") && (LANG(?Label) = "en"))

}

Page 17: Linked Open Data for Digital Humanities

Making the network

● Get the query from○ https://gist.github.com/cgueret/5098706

● Copy & paste in to○ http://dbpedia.org/sparql

● Change the result format to "CSV"● Press "Run Query" and save the result

● Open Gephi● Start a new project● Import the CSV file in the "Data Laboratory"

Page 18: Linked Open Data for Digital Humanities
Page 19: Linked Open Data for Digital Humanities

There is not only DBpedia ...

Page 20: Linked Open Data for Digital Humanities

Last words

● Look for data sources published as Linked Open Data (RDF), this can save you time

● Consider publishing your own data as Linked Open Data

● There is much more to say...○ Using SPARQL within R (very easily)

■ http://linkedscience.org/tools/sparql-package-for-r/○ Reasoning capabilities of triple stores○ Creating and extending vocabularies