Top Banner
Using open datasets for research purposes Erasmus Studio Tuesday 20 January 2015 Martijn Kleppe, Erasmus Universiteit Rotterdam Astrid van Aggelen, Vrije Universiteit Laura Hollink, Vrije Universiteit
37

Using open datasets for research purposes

Jul 31, 2015

Download

Science

Martijn Kleppe
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Using open datasets for research purposes

Using open datasets for research purposesErasmus Studio

Tuesday 20 January 2015

Martijn Kleppe, Erasmus Universiteit RotterdamAstrid van Aggelen, Vrije Universiteit

Laura Hollink, Vrije Universiteit

Page 2: Using open datasets for research purposes

2

Program

• I. Introduction: PoliMedia (Martijn)

• II. Talk of Europe (Astrid)

• III. Concluding: Research with open datasets (Martijn)

Page 3: Using open datasets for research purposes

www.polimedia.nl

Page 4: Using open datasets for research purposes

Current approach of research

How do media cover debates in the Dutch Parliament?

Page 5: Using open datasets for research purposes

Issues with current approach

+ = Too much work

(Travel & manually)

Page 6: Using open datasets for research purposes

Issues with current approach

+ =Limited

material and different systems

(No images + selection of programs)

Page 7: Using open datasets for research purposes

PoliMedia approach

PoliMedia Portal

Search debate and person

NewspapersKB

TelevisionSound and Vision

RadioKB

Staten Generaal Digitaal

KB

Page 8: Using open datasets for research purposes
Page 9: Using open datasets for research purposes
Page 10: Using open datasets for research purposes
Page 11: Using open datasets for research purposes
Page 12: Using open datasets for research purposes

• Yeah! It works (but no television)

• Not perfect

• But still ok (recall: 62%; precision: 80%)

• It is open for everyone: www.polimedia.nl

• We won a prize with it

Results

Page 13: Using open datasets for research purposes
Page 14: Using open datasets for research purposes

• Yeah! It works (but no television)

• Not perfect

• But still ok (recall: 62%; precision: 80%)

• It is open for everyone: www.polimedia.nl

• We won a prize with it

• People actually use it (!)

Results

Page 15: Using open datasets for research purposes

NRC Handelsblad, Ewoud Sander, Voor al haar mantelzorgen, 14 April 2014

“Another digital source I often use is PoliMedia.nl

Yeah! An article in NRC HANDELSBLAD!

Page 16: Using open datasets for research purposes

“PoliMedia is mainly interesting because of the advanced search &

filter options”

NRC Handelsblad, Ewoud Sander, Voor al haar mantelzorgen, 14 April 2014

Oh no, he does not use PoliMedia

for what it was made for…

Page 17: Using open datasets for research purposes

• Do people understand it?

• Not only Ewoud Sanders uses PoliMedia not to its full potential. Me neither …

• Which topic received most press coverage?

• Can do this via Sparql Endpoint. Result the “Indonesische Kwestie”.

• But I do not know how to work with a Sparql Endpoint

Results

Page 18: Using open datasets for research purposes

• Not really open data

• Only Dutch

• Follow-up: Talk of Europe

Results

Page 19: Using open datasets for research purposes

LinkedPolitics: Linked Open Data of political events, actors, media.

Page 20: Using open datasets for research purposes

Talk of Europe

• Goal: publish the plenary debates of the European Parliament as Linked Data

• Linked Data: a format for publishing data on the Web, with URI’s as permanent identifiers, designed for connecting pieces of data.

• Why is this important?

To allow large scale analysis across time spans by social scientists interested in voting behavior, partisanship, lobbies, differences between countries, etc.

To residents of the European Union, so the electorate, access to the proceedings of the European parliament is a formal right.

Page 21: Using open datasets for research purposes

Data

Page 22: Using open datasets for research purposes

Data

Page 23: Using open datasets for research purposes

Data

14M triples about the 30K speeches by 3K speakers (and their affiliations) in 1K session days that were held in the EU parliament so far (1999-2014)

Page 24: Using open datasets for research purposes

Links to external datasets

Country names

Members of Parliament

Members of Parliament+ Parties Members of

Parliament

Page 25: Using open datasets for research purposes
Page 26: Using open datasets for research purposes

Access to the data

1. We provide access in three ways:

2. Through a SPARQL endpoint at http://linkedpolitics.ops.few.vu.nl/sparql/

3. Using the browse and search options of ClioPatria.

4. By downloading the data in turtle or RDF/XML.

5. As triple patterns fragments at http://data.linkeddatafragments.org/linkedpolitics (Thanks to Ruben Verborgh).

Page 27: Using open datasets for research purposes

Searching the proceedings of the EU Parliament

Page 28: Using open datasets for research purposes

Searching the proceedings of the EU Parliament

Page 29: Using open datasets for research purposes

Example queries on the Talk-of-Europe data

• What are differences between members in terms of terms mentioned?

• What are differences between EU parties in terms of terms mentioned?

• Which new member was discussed most when they joined?

• For each EU country, get the number of speeches held by its representatives that contains the word “agriculture".

• …

Page 30: Using open datasets for research purposes

Creative Camps

• 3 events of one week each, where people are invited to work with our data on-site.

• Outcome CC 1 @ Hilversum:• Links to the Italian

parliament.• Detection of people who

speak about an unusual mix of topics.

• Sentiment analysis

Check out our current Call for Participation! Deadline 30 January 2015http://www.talkofeurope.eu/creativecamp2/call-for-participation/

Page 31: Using open datasets for research purposes

31

Research with open datasets

Page 32: Using open datasets for research purposes

32

Our experiences• There are some really nice and interesting datasets

• How do you find an open dataset that matches your research question?

Page 34: Using open datasets for research purposes

34

Our Experiences• There are some really nice and interesting datasets

• How do you find an open dataset that matches your research question?

• What are really open datasets? And what is not open?

• Do you need to collaborate with computer scientists?

• Is an open dataset sufficient or a semi-finished product or ‘half-fabrikaat’? What was the goal for creating the dataset?

Page 36: Using open datasets for research purposes

36

Our Experiences• There are some really nice and interesting datasets

• How do you find an open dataset that matches your research question?

• What are really open datasets? And what is not open?

• Do you need to collaborate with computer scientists?

• Is an open dataset sufficient or a semi-finished product or ‘half-fabrikaat’? What was the goal for creating the dataset?

• What is the aim of using open datasets? Answering research questions or finding research questions?