Top Banner
Linked Open Data for the Humanities and Social Sciences Use cases: linking government data to news data in the PoliMedia and Talk of Europe projects Laura Hollink Centrum Wiskunde & Informatica (CWI) KU Leuven Guest lecture November 10, 2016
90

Guest Lecture: Linked Open Data for the Humanities and Social Sciences

Jan 18, 2017

Download

Technology

Laura Hollink
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

Linked Open Data for the Humanities and Social Sciences

Use cases: linking government data to news data in the PoliMedia and Talk of Europe projects

Laura HollinkCentrum Wiskunde & Informatica (CWI)

KU LeuvenGuest lecture November 10, 2016

Page 2: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

Linked Open Data in the SSH?

Example question:

How did the debate about the financial crisis in Greece develop?

Page 3: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

Searching the proceedings of the European Parliament

"Greece" in the plenary meetings of the European Parliament

Year

Nr.

of m

entio

ns

050

100

150

200

1999 2000 2001 2001 2002 2003 2004 2005 2006 2006 2007 2008 2009 2010 2010 2011 2012 2013

Page 4: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

Searching through newspaper archives

Mentions of “Griekenland” in the Dutch newspaper De Telegraaf

Page 5: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

Search volumes of a search engine

Frequency of the query “Greece” on Google

http://www.google.com/trends

Page 6: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

Search volumes of a search engine

Frequency of the query “Greece” on Google

http://www.google.com/trends

We need:

✦open access to data ✦to combine sources ✦more complex queries

Page 7: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

Linked Open Data in the SSH?

Example question:

Which political debate in the post-war period has attracted most media attention?

Page 8: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

“De Indonesische Quaestie"

Page 9: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

“De Indonesische Quaestie"

To answer this question we need to go through all newspaper articles about all political debates…

Page 10: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

“De Indonesische Quaestie"

To answer this question we need to go through all newspaper articles about all political debates…

We need:

✦open access to data ✦to combine sources ✦more complex queries

Page 11: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

Linked Open Data in the SSH?

Example question:

What are the differences between different media?

Example question:

Has the coverage changed over time?

Page 12: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

A method of publishing structured data on the Web in such a way that it can be linked and queried by computers as well as people.

A very brief introduction…

✦open access to data ✦to combine sources ✦more complex queries

Linked Open Data

Page 13: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

A method of publishing structured data on the Web in such a way that it can be linked and queried by computers as well as people.

A very brief introduction…

✦open access to data ✦to combine sources ✦more complex queries

Linked Open Data

Page 14: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

Thing Type Population Airport

Amsterdam City 1364422 Schiphol

…. … …. …

Structured data

ex:Amsterdam a ex:City . ex:Amsterdam dbo:populationUrban "1330235"^^xsd:integer . ex:Amsterdam dbp:cityServed ex:Schiphol .

Comparable to the data one may find in a database table

Represented as RDF triples

Page 15: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

On the WebEverything is identified by URIs (documents, concepts, instances, links)http://example.org/cities#Amsterdam http://example.org/City http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://dbpedia.org/ontology/population

Page 16: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

On the Web

Triples can be distributed over the Web

Everything is identified by URIs (documents, concepts, instances, links)http://example.org/cities#Amsterdam http://example.org/City http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://dbpedia.org/ontology/population

http://example.org/cities#Amsterdam a ex:City.

http://example.org/cities#Amsterdam dbo:populationUrban "1364422"

http://example.org/cities#Amsterdam dbp:cityServed ex:Schiphol

Page 17: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

On the Web

Amsterdamhas population

“1364422” City Schiphol

is a has airport

Triples can be distributed over the Web

Everything is identified by URIs (documents, concepts, instances, links)http://example.org/cities#Amsterdam http://example.org/City http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://dbpedia.org/ontology/population

http://example.org/cities#Amsterdam a ex:City.

http://example.org/cities#Amsterdam dbo:populationUrban "1364422"

http://example.org/cities#Amsterdam dbp:cityServed ex:Schiphol

Forming a graph

Page 18: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

The Web of Data vs. the Web of Documents

Page 19: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

The Web of Data vs. the Web of Documents

Page 20: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

The Web of Data vs. the Web of Documents

Note the differences Web of Data <-> database:• Non-unique naming assumption• Open World assumption• Everyone can say anything about anything

Page 21: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/

Page 22: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

Querying Linked Open Data

• A W3C recommendation for querying RDF graphs called “SPARQL Protocol And RDF Query Language”

• See http://www.w3.org/TR/rdf-sparql-query/ or http://www.w3.org/TR/sparql11-query/

:JamesDean ?what :Giant.

?who :playedIn :Giant.

:JamesDean :playedIn ?what .

:JamesDean :playedIn :Giant .

:Giant

:JamesDean

:playedIn

Data

Query Result

Page 23: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

Two example projects of Linked Open Data in SSH: data modelling and linking in the PoliMedia and

Talk of Europe projects

Page 24: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

Linking government data to news data

Page 25: Guest Lecture: Linked Open Data for the Humanities and Social Sciences
Page 26: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

Transcriptions of all 9,294 meetings of the Dutch parliament between 1945-1995, consisting of 1,208,903 speeches.

Roughly 1.8 Million news bulletins between 1937-1984

(We only use 1945-1995)

Archives of hundreds of newspaper with tons of newspaper issues or 10’s of Millions of articles between 1618-1995.

(We only use 1945-1995)

Page 27: Guest Lecture: Linked Open Data for the Humanities and Social Sciences
Page 28: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

Links in PoliMedia

is about

• 3 Million links

Page 29: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

Step 1: Translate the Dutch parliamentary debates to the standard structured web format RDF

nl.proc.sgd.d.194519460000002

nl.proc.sgd.d.194519460000002.1

PartOfDebateDebate

http://resolver.politicalmashup.nl/nl.proc.sgd.d.194519460000002

http://statengeneraaldigitaal.nl/

http://resolver.kb.nl/resolve?urn=sgd:mpeg21:19451946:0000002:pdf

nl.proc.sgd.d.19720000002

Handelingen Verenigde Vergadering...

Dutch

1945-11-20rdf:type

dc:id

dc:source

dc:source

dc:publisher

dc:language

dc:date

hasPart

rdf:type

nl.proc.sgd.d.194519460000002.1.1hasPart

DebateContext

rdf:type

nl.proc.sgd.d.194519460000002.1.2

Speech

rdf:type

hasPart

nl.proc.sgd.d.194519460000002.1.3

hasSubsequentSpeech

"Mijnheer de Voorzitter, de Commissie van …"

hasSpokenText

sem:hasActorSpeaker_0006

4

Party_kvp

hasParty

hasSpeaker

member_of _parliament

"De voorzitter opent de vergadering…"

hasText

http://resolver.kb.nl/resolve?urn=ddd:011198136:mpeg21:a0525:ocr

coveredIn

Party

KVP

Katholieke Volkspartijrdf:type

hasAcronym

hasFullName

Joannes Antonius James

Bargefoaf:firstName

foaf:lastName

Bargerdfs:label

http://resolver.politicalmashup.nl/nl.m.00064

dc:source

Politician

rdf:typehasRole

nl.proc.sgd.d.194519460000002.2

hasSubsequentPartOfDebate

Page 30: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

Step 2: Discovering links between politics and news

Detect topics in

speeches

Create queries

Search newspaper

archive

Topics

Named Entities

Name of speaker

Detect Named

Entities in speeches

Candidate articles

Queries

Rank candidate

articles

Links between speeches

and articles

Debates

Date of debate

Page 31: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

Step 2: Discovering links between politics and news

Detect topics in

speeches

Create queries

Search newspaper

archive

Topics

Named Entities

Name of speaker

Detect Named

Entities in speeches

Candidate articles

Queries

Rank candidate

articles

Links between speeches

and articles

Debates

Date of debate

Intuition 1: The name of the speaker should appear in the article and the article should be published within a week of the debate

Page 32: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

Step 2: Discovering links between politics and news

Detect topics in

speeches

Create queries

Search newspaper

archive

Topics

Named Entities

Name of speaker

Detect Named

Entities in speeches

Candidate articles

Queries

Rank candidate

articles

Links between speeches

and articles

Debates

Date of debate

Intuition 1: The name of the speaker should appear in the article and the article should be published within a week of the debate

Intuition 2: the more the article and the speech overlap in terms of topics and named entities, the more they are related.

Page 33: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

Representation of links

architecten architectsskos:exactMatch

Page 34: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

Representation of links

architecten

architects

Link 001

skos:exactMatch

handmatigL. Hollink

concept1

concept2

link type

link methodeauteur

architecten architectsskos:exactMatch

Page 35: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

Representation of links

architecten

architects

Link 001

skos:exactMatch

handmatigL. Hollink

concept1

concept2

link type

link methodeauteur

architecten architectsskos:exactMatch

• This is an example of the“design pattern” referred to as n-ary relations or relations as classes.

• It allows us to save provenance information about the statements we create.

Page 36: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

Evaluation of Links

Recall that we aim to use the links to answer a research question.

Can we still do that if there are errors in the links?

How many errors are acceptable?

We need to know the quality!

Page 37: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

Evaluation of Links

How would you determine the quality of the links?

Recall that we aim to use the links to answer a research question.

Can we still do that if there are errors in the links?

How many errors are acceptable?

We need to know the quality!

Page 38: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

Evaluation of Links

How would you determine the quality of the links?

1. Manually rating (a sample of) mappings

• relatively cheap and easy to interpret

• only precision, no recall

Recall that we aim to use the links to answer a research question.

Can we still do that if there are errors in the links?

How many errors are acceptable?

We need to know the quality!

Page 39: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

Evaluation of Links

How would you determine the quality of the links?

1. Manually rating (a sample of) mappings

• relatively cheap and easy to interpret

• only precision, no recall

2. Comparison to manually found links

• precision and recall

• more expensive! (but: crowd sourcing?)

Recall that we aim to use the links to answer a research question.

Can we still do that if there are errors in the links?

How many errors are acceptable?

We need to know the quality!

Page 40: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

Evaluation of links in PoliMedia

How good are the links?

• We ask 2 raters to manually score pairs of newspaper articles and speeches.

• a pilot study showed that we needed more than a 2 point scale.

• inter-rater agreement: 0.5 -> acceptable, but not high.

• Score: 80%

Page 41: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

Evaluation of links in PoliMedia

Score Setting 1 Setting 2 Setting 3

I don’t know 0,14 0,15 0,08

0 - unrelated 0,38 0,23 0,12

1- related 0,29 0,36 0,36

2- explicit mention of the debate 0,19 0,26 0,44

1+2 0,48 0,62 0,8

How good are the links?

• We ask 2 raters to manually score pairs of newspaper articles and speeches.

• a pilot study showed that we needed more than a 2 point scale.

• inter-rater agreement: 0.5 -> acceptable, but not high.

• Score: 80%

Page 42: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

Evaluation of links in PoliMedia

Score Setting 1 Setting 2 Setting 3

I don’t know 0,14 0,15 0,08

0 - unrelated 0,38 0,23 0,12

1- related 0,29 0,36 0,36

2- explicit mention of the debate 0,19 0,26 0,44

1+2 0,48 0,62 0,8

How many links did we miss?

• We ask the raters to manually search the archives of the National Library for related articles.

• Score: 62%

How good are the links?

• We ask 2 raters to manually score pairs of newspaper articles and speeches.

• a pilot study showed that we needed more than a 2 point scale.

• inter-rater agreement: 0.5 -> acceptable, but not high.

• Score: 80%

Page 43: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

Results

• An open data set of Dutch parliamentary debates,

• with almost 3 Million links between 450.000 speeches and 1.5 Million news paper articles and radio bulletins at the National Library.

• accessible though a Web demonstrator and through a Sparql Enpoint

Page 44: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

Demo

Page 45: Guest Lecture: Linked Open Data for the Humanities and Social Sciences
Page 46: Guest Lecture: Linked Open Data for the Humanities and Social Sciences
Page 47: Guest Lecture: Linked Open Data for the Humanities and Social Sciences
Page 48: Guest Lecture: Linked Open Data for the Humanities and Social Sciences
Page 49: Guest Lecture: Linked Open Data for the Humanities and Social Sciences
Page 50: Guest Lecture: Linked Open Data for the Humanities and Social Sciences
Page 51: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

Online database: SPARQL endpoint

• A service to query a knowledge base using the SPARQL query language.

“All speeches with more than 60 associated news items.”

Page 52: Guest Lecture: Linked Open Data for the Humanities and Social Sciences
Page 53: Guest Lecture: Linked Open Data for the Humanities and Social Sciences
Page 54: Guest Lecture: Linked Open Data for the Humanities and Social Sciences
Page 55: Guest Lecture: Linked Open Data for the Humanities and Social Sciences
Page 56: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

The European Parliament as Linked Open Data

Laura Hollink Centrum Wiskunde & Informatica, Amsterdam Astrid van Aggelen VU University Amsterdam Martijn Kleppe Erasmus University Rotterdam Henri Beunders Erasmus University Rotterdam Jill Briggeman Erasmus University Rotterdam Max Kemman University of Luxembourg

Page 57: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

Talk of Europe goals

• To publish the entire plenary debates of the European Parliament as Linked Open Data

• To improve access to the data• To enable large scale analysis across time spans.‣ To residents of the European Union access to the proceedings

of the European parliament is a formal right.

Page 58: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

Step 1: Translate the European parliamentary debates to Linked Open Data

Page 59: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

Step 1: Translate the European parliamentary debates to Linked Open Data

Page 60: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

14M RDF statements about the 30K speeches in 23 languages by 3K speakers in 1K session days that were held in the EU parliament between 1999 and 2014

Step 1: Translate the European parliamentary debates to Linked Open Data

Page 61: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

Modelling debates as events, not documents

• `

lpv:number

lpv:month

lpv:year

rdf:type

lp:eu/plenary/SessionDay/2013-11-20

lp:eu/plenary/2013-11-20/AgendaItem_6

lp:eu/plenary/2013-11-20/Speech_103

lp:eu/plenary/Session/2013-11

"2013-11-20"^xsd:date

"11"^xsd:gMonth

"2013"^xsd:gYear

lp:eu/plenary/2013-11-20/AgendaItem_7

lp:eu/plenary/2013-11-20/Speech_104

lpv:hasSubsequent

lpv:hasSubsequent

dc:date

dc:date

dc:date

103^xsd:integer

6^xsd:integer lpv:number

dc:hasPart dc:isPartOf

dc:hasPart dc:isPartOf

dc:isPartOfdc:hasPart

lpv:eu/plenary/Speech

lpv:eu/plenary/AgendaItem

lpv:eu/plenary/SessionDay

lpv:eu/plenary/Sessionrdf:type

rdf:type

rdf:type

PREFIX lpv: <http://purl.org/linkedpolitics/vocabulary/> PREFIX lp: <http://purl.org/linkedpolitics/> PREFIX dc: <http://purl.org/dc/elements/1.1/> PREFIX foaf: <http://xmlns.com/foaf/0.1/>

Page 62: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

How to relate a speech the party of the speaker?

lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>lpv:speaker

lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>lpv:speaker

lp:EUParty/SomeParty

lpv:hasParty

Page 63: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

How to relate a speech the party of the speaker?

Why is this not a good solution?

lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>lpv:speaker

lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>lpv:speaker

lp:EUParty/SomeParty

lpv:hasParty

Page 64: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

How to relate a speech the party of the speaker?

Why is this not a good solution?

1. A person might be a member of more than one party (at different times)

lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>lpv:speaker

lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>lpv:speaker

lp:EUParty/SomeParty

lpv:hasParty

Page 65: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

How to relate a speech the party of the speaker?

Why is this not a good solution?

1. A person might be a member of more than one party (at different times)

2. Since there is no link between a speech and a party, queries for all speeches spoken by the members of a certain party become very complicated.

lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>lpv:speaker

lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>lpv:speaker

lp:EUParty/SomeParty

lpv:hasParty

Page 66: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

How to relate a speech to the party of the speaker?

"20111126"^ xsd:date

"20090716"^ xsd:date

lp:political-Function102

lpv:beginning

lpv:end

lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>

lpv:role

lp:EUCommittee/Committee_on_Legal_Affairs

lp:Role/substitutelpv:political

Function

lpv:institution

lpv:speaker

lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>lpv:speaker

lp:EUparty_NI

lpv:party

Page 67: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

How to relate a speech to the party of the speaker?

"20111126"^ xsd:date

"20090716"^ xsd:date

lp:political-Function102

lpv:beginning

lpv:end

lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>

lpv:role

lp:EUCommittee/Committee_on_Legal_Affairs

lp:Role/substitutelpv:political

Function

lpv:institution

lpv:speaker

lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>lpv:speaker

lp:EUparty_NI

lpv:party

"20111126"^ xsd:datelp:political-Function101

lpv:end"20111126"^ xsd:date

lpv:beginning

"20071114"^xsd:date

lpv:PoliticalFunction

"20090716"^ xsd:date

lp:political-Function102

lpv:beginning

lpv:end

lp:EUmember_1023

lp:politicalFunction

lp:eu/plenary/2009-10-21/Speech_140>

lpv:role

lp:EUCommittee/Committee_on_Legal_Affairs

lp:Role/substitutelp:Role/member

lp:EUParty/NI

lpv:rolelpv:political

Function

lpv:institutionlpv:institution rdf:type

lpv:speaker

rdf:type

lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>lpv:speaker

lp:EUparty_NI

lpv:party

Page 68: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

How to relate a speech to the party of the speaker?

"20111126"^ xsd:date

"20090716"^ xsd:date

lp:political-Function102

lpv:beginning

lpv:end

lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>

lpv:role

lp:EUCommittee/Committee_on_Legal_Affairs

lp:Role/substitutelpv:political

Function

lpv:institution

lpv:speaker

lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>lpv:speaker

lp:EUparty_NI

lpv:party

"20111126"^ xsd:datelp:political-Function101

lpv:end"20111126"^ xsd:date

lpv:beginning

"20071114"^xsd:date

lpv:PoliticalFunction

"20090716"^ xsd:date

lp:political-Function102

lpv:beginning

lpv:end

lp:EUmember_1023

lp:politicalFunction

lp:eu/plenary/2009-10-21/Speech_140>

lpv:role

lp:EUCommittee/Committee_on_Legal_Affairs

lp:Role/substitutelp:Role/member

lp:EUParty/NI

lpv:rolelpv:political

Function

lpv:institutionlpv:institution rdf:type

lpv:speaker

rdf:type

lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>lpv:speaker

lp:EUparty_NI

lpv:party

"20111126"^ xsd:datelp:political-Function101

lpv:end"20111126"^ xsd:date

lpv:beginning

"20071114"^xsd:date

lpv:PoliticalFunction

"20090716"^ xsd:date

lp:political-Function102

lpv:beginning

lpv:end

lp:EUmember_1023

lp:politicalFunction

lp:eu/plenary/2009-10-21/Speech_140>

lpv:role

lp:EUCommittee/Committee_on_Legal_Affairs

lp:Role/substitutelp:Role/member

lp:EUParty/NI

lpv:rolelpv:political

Function

lpv:institutionlpv:institution rdf:type

lpv:spokenAs

lpv:speaker

lpv:spokenAs

rdf:type

lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>lpv:speaker

lp:EUparty_NI

lpv:party

Page 69: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

How to relate a speech to the party of the speaker?

"20111126"^ xsd:date

"20090716"^ xsd:date

lp:political-Function102

lpv:beginning

lpv:end

lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>

lpv:role

lp:EUCommittee/Committee_on_Legal_Affairs

lp:Role/substitutelpv:political

Function

lpv:institution

lpv:speaker

lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>lpv:speaker

lp:EUparty_NI

lpv:party

"20111126"^ xsd:datelp:political-Function101

lpv:end"20111126"^ xsd:date

lpv:beginning

"20071114"^xsd:date

lpv:PoliticalFunction

"20090716"^ xsd:date

lp:political-Function102

lpv:beginning

lpv:end

lp:EUmember_1023

lp:politicalFunction

lp:eu/plenary/2009-10-21/Speech_140>

lpv:role

lp:EUCommittee/Committee_on_Legal_Affairs

lp:Role/substitutelp:Role/member

lp:EUParty/NI

lpv:rolelpv:political

Function

lpv:institutionlpv:institution rdf:type

lpv:speaker

rdf:type

lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>lpv:speaker

lp:EUparty_NI

lpv:party

"20111126"^ xsd:datelp:political-Function101

lpv:end"20111126"^ xsd:date

lpv:beginning

"20071114"^xsd:date

lpv:PoliticalFunction

"20090716"^ xsd:date

lp:political-Function102

lpv:beginning

lpv:end

lp:EUmember_1023

lp:politicalFunction

lp:eu/plenary/2009-10-21/Speech_140>

lpv:role

lp:EUCommittee/Committee_on_Legal_Affairs

lp:Role/substitutelp:Role/member

lp:EUParty/NI

lpv:rolelpv:political

Function

lpv:institutionlpv:institution rdf:type

lpv:spokenAs

lpv:speaker

lpv:spokenAs

rdf:type

lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>lpv:speaker

lp:EUparty_NI

lpv:party

Note: this is another example of the design pattern called n-ary relations or relations as classes.

Page 70: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

Step 2: create links to external data sources

Page 71: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

Step 2: create links to external data sources

Page 72: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

Step 2: create links to external data sources

(links made by the EC)

Page 73: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

Linking Members of Parliament to Wikipedia / DBpedia

how?

Page 74: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

Linking Members of Parliament to Wikipedia / DBpedia

Page 75: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

Linking Members of Parliament to Wikipedia / DBpedia

• String matching is the most important feature in the linking process.

• “nearly all [alignment systems] use a string similarity metric” [12]

• stopping and stemming is not helpful! Nor is using WordNet synonyms. [12]

[12] Cheatham, M., & Hitzler, P. String similarity metrics for ontology alignment. ISWC 2013.

http://www.dbpedia.org/page/Judith_Sargentini

Page 76: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

Example query 1: speeches that contain a certain keyword

Query: all speeches that contain the phrase “open data”

…. So let us go for open data, let us go for utilisation of all the instruments available to that end! …..

…. but there too governments are encouraging the use of open data to increase transparency, accountability and citizen participation ….

…. We already have many open data projects in the Member States and local authorities…..

Page 77: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

Example 2: speeches that contain a certain keyword by date

"Slovenia" in the plenary meetings of the European Parliament

Year

Nr.

of m

entio

ns

020

4060

80100

1999 2000 2001 2003 2004 2005 2006 2007 2008 2010 2011 2012 2013

Page 78: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

Example 2: speeches that contain a certain keyword by date

"Slovenia" in the plenary meetings of the European Parliament

Year

Nr.

of m

entio

ns

020

4060

80100

1999 2000 2001 2003 2004 2005 2006 2007 2008 2010 2011 2012 2013

Page 79: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

Example 2: speeches that contain a certain keyword by date

Mentions of 'human rights'

dates

Frequency

0200

400

600

800

1999 2000 2001 2003 2004 2005 2006 2007 2009 2010 2011 2012 2013

Page 80: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

Example 3: speeches that contain a certain keyword by country

AT BE BG CY CZ DE DK EE ES FI FR GB GR HR HU IE IT LT LU LV MT NL PL PT RO SE SI SK

Mentions of 'human rights' by country

01000

2000

3000

4000

5000

6000

7000

Page 81: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

Example 4: the number of speeches per EU country

SELECT ?c (COUNT(?c) as ?count)

WHERE {

?x rdf:type <http://purl.org/linkedpolitics/vocabulary/eu/plenary/Speech>.

?x <http://purl.org/linkedpolitics/vocabulary#speaker> ?p.

?p <http://purl.org/linkedpolitics/vocabulary#countryOfRepresentation> ?c

} GROUP BY ?c LIMIT 50

Page 82: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

Example 5: include data external sourceQuery: MEPs that were born outside Europe.

Members of Parliament

(DBpedia contains info on birthplace, birth date, schools, careers, residence, family, etc. )

Page 83: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

Example 5: include data external sourceQuery: MEPs that were born outside Europe.

Members of Parliament

(DBpedia contains info on birthplace, birth date, schools, careers, residence, family, etc. )

Page 84: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

Intermezzo: one-question Quiz Reasoning on the Web of Data

Question: What can we conclude from this graph?A. Stihler is a member of exactly 3 partiesB. Stihler is a member of at least 3 partiesC. Stihler is a member of at most 3 partiesD. None of the aboveE. All of the aboveF. Other, namely ….

http://purl.org/linkedpolitics/EUmember_4545 "Catherine Stihler"foaf:name

http://purl.org/linkedpolitics/EUParty/PES http://dbpedia.org/resource/Party_of_European_Socialists

http://dbpedia.org/resource/Progressive_Alliance_of_Socialists_and_Democrats

:memberOf:memberOf

:memberOf

Page 85: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

Results

• An open data set of EU parliamentary debates,

• with links to other sources on the Web of Data

• accessible though a through a Sparql Enpoint

Page 86: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

Reflection: to what extent can we now answer these questions?

How did the debate about the financial crisis in Greece develop?

Which political event has attracted most media attention?

What are the differences between different media?

Has the coverage changed over time?

Page 87: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

Reflection: to what extent can we now answer these questions?

How did the debate about the financial crisis in Greece develop?

Which political event has attracted most media attention?

What are the differences between different media?

Has the coverage changed over time?

We can, but:• what is the influence of the selection of newspapers

available at the National Library?• what was the quality of the digitisation process (OCR)?• How good is our linking approach (based on

automatically detected entities and topics)?• How much can we trust the quality of external sources?

➡ How to handle these uncertainties is one of our research questions. We call this Tool Criticism

Page 88: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

Research directions at CWI

Transparent, reproducible analysis of large volumes of connected, heterogenous, multimodal data. 1. How do we automatically link heterogeneous datasets?

2. How do we interpret links between datasets of different quality and certainty?

3. How do we handle the fact that knowledge evolves?

4. How do we design interfaces that allow scholars to study the datasets

• including the links between them?

• while assessing the reliability of the findings?

Page 89: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

Research directions at CWI

Transparent, reproducible analysis of large volumes of connected, heterogenous, multimodal data. 1. How do we automatically link heterogeneous datasets?

2. How do we interpret links between datasets of different quality and certainty?

3. How do we handle the fact that knowledge evolves?

4. How do we design interfaces that allow scholars to study the datasets

• including the links between them?

• while assessing the reliability of the findings?

Data Science - Big Data - Web of Data

Page 90: Guest Lecture: Linked Open Data for the Humanities and Social Sciences

PoliMedia demo: http://polimedia.nl/ PoliMedia project video: https://youtu.be/u24oRCj7xrQ

Talk of Europe project: http://talkofeurope.eu/ Talk of Europe data: purl.org/linkedpolitics Talk of Europe project video: https://youtu.be/GxA53gkCe0o

My website: http://homepages.cwi.nl/~hollink/

A. van Aggelen, L. Hollink, M. Kemman, M. Kleppe & H. Beunders. The debates of the European Parliament as Linked Open Data. Semantic Web Journal. In press, 2016.

M. Kleppe, L. Hollink, J. Oomen, M. Kenman, D. Juric, J. Blom, H. Beunders. PoliMedia - Improving the Analyses of Radio & Newspaper coverage of Political Debates. First prize winner of the LinkedUp Veni Competition, presented at the Open Knowledge Conference (OKCon), Geneva, September 2013..

I’d be happy to answer any questions!