Using open datasets for research purposes Erasmus Studio Tuesday 20 January 2015 Martijn Kleppe, Erasmus Universiteit Rotterdam Astrid van Aggelen, Vrije Universiteit Laura Hollink, Vrije Universiteit
Using open datasets for research purposesErasmus Studio
Tuesday 20 January 2015
Martijn Kleppe, Erasmus Universiteit RotterdamAstrid van Aggelen, Vrije Universiteit
Laura Hollink, Vrije Universiteit
2
Program
• I. Introduction: PoliMedia (Martijn)
• II. Talk of Europe (Astrid)
• III. Concluding: Research with open datasets (Martijn)
Issues with current approach
+ =Limited
material and different systems
(No images + selection of programs)
PoliMedia approach
PoliMedia Portal
Search debate and person
NewspapersKB
TelevisionSound and Vision
RadioKB
Staten Generaal Digitaal
KB
• Yeah! It works (but no television)
• Not perfect
• But still ok (recall: 62%; precision: 80%)
• It is open for everyone: www.polimedia.nl
• We won a prize with it
Results
• Yeah! It works (but no television)
• Not perfect
• But still ok (recall: 62%; precision: 80%)
• It is open for everyone: www.polimedia.nl
• We won a prize with it
• People actually use it (!)
Results
NRC Handelsblad, Ewoud Sander, Voor al haar mantelzorgen, 14 April 2014
“Another digital source I often use is PoliMedia.nl
Yeah! An article in NRC HANDELSBLAD!
“PoliMedia is mainly interesting because of the advanced search &
filter options”
NRC Handelsblad, Ewoud Sander, Voor al haar mantelzorgen, 14 April 2014
Oh no, he does not use PoliMedia
for what it was made for…
• Do people understand it?
• Not only Ewoud Sanders uses PoliMedia not to its full potential. Me neither …
• Which topic received most press coverage?
• Can do this via Sparql Endpoint. Result the “Indonesische Kwestie”.
• But I do not know how to work with a Sparql Endpoint
Results
Talk of Europe
• Goal: publish the plenary debates of the European Parliament as Linked Data
• Linked Data: a format for publishing data on the Web, with URI’s as permanent identifiers, designed for connecting pieces of data.
• Why is this important?
To allow large scale analysis across time spans by social scientists interested in voting behavior, partisanship, lobbies, differences between countries, etc.
To residents of the European Union, so the electorate, access to the proceedings of the European parliament is a formal right.
Data
14M triples about the 30K speeches by 3K speakers (and their affiliations) in 1K session days that were held in the EU parliament so far (1999-2014)
Links to external datasets
Country names
Members of Parliament
Members of Parliament+ Parties Members of
Parliament
Access to the data
1. We provide access in three ways:
2. Through a SPARQL endpoint at http://linkedpolitics.ops.few.vu.nl/sparql/
3. Using the browse and search options of ClioPatria.
4. By downloading the data in turtle or RDF/XML.
5. As triple patterns fragments at http://data.linkeddatafragments.org/linkedpolitics (Thanks to Ruben Verborgh).
Example queries on the Talk-of-Europe data
• What are differences between members in terms of terms mentioned?
• What are differences between EU parties in terms of terms mentioned?
• Which new member was discussed most when they joined?
• For each EU country, get the number of speeches held by its representatives that contains the word “agriculture".
• …
Creative Camps
• 3 events of one week each, where people are invited to work with our data on-site.
• Outcome CC 1 @ Hilversum:• Links to the Italian
parliament.• Detection of people who
speak about an unusual mix of topics.
• Sentiment analysis
Check out our current Call for Participation! Deadline 30 January 2015http://www.talkofeurope.eu/creativecamp2/call-for-participation/
32
Our experiences• There are some really nice and interesting datasets
• How do you find an open dataset that matches your research question?
33
Our Experiences
34
Our Experiences• There are some really nice and interesting datasets
• How do you find an open dataset that matches your research question?
• What are really open datasets? And what is not open?
• Do you need to collaborate with computer scientists?
• Is an open dataset sufficient or a semi-finished product or ‘half-fabrikaat’? What was the goal for creating the dataset?
35
Our Experiences
https://easy.dans.knaw.nl/ui/datasets/id/easy-dataset:51895
http://youtu.be/HJbo-OAaJ1I?list=PLvIjtWk34TjWqbHG5Z9vqyKR7oPO8Z62u
36
Our Experiences• There are some really nice and interesting datasets
• How do you find an open dataset that matches your research question?
• What are really open datasets? And what is not open?
• Do you need to collaborate with computer scientists?
• Is an open dataset sufficient or a semi-finished product or ‘half-fabrikaat’? What was the goal for creating the dataset?
• What is the aim of using open datasets? Answering research questions or finding research questions?
Questions?www.polimedia.nl
www.talkofeurope.eu
Martijn [email protected]
www.martijnkleppe.nl
Astrid van [email protected]
https://www.linkedin.com/pub/astrid-van-aggelen/7/125/4b8
Laura [email protected] www.cs.vu.nl/~
laurah/