Top Banner
validating external data sets what social scholars and data journalists can learn from each another
42

Etmaal

Jan 25, 2015

Download

Education

 
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Etmaal

validating external data sets

what social scholars and data journalists can learn from each another

Page 2: Etmaal

    Hille van der Kaa @Hillevanderkaa  

Page 3: Etmaal
Page 4: Etmaal

missing data, no value stored “I need to solve this”

Page 5: Etmaal

missing data, no value stored “I need to solve this”

missing data, no value stored “I need to write a story about this”

Page 6: Etmaal

forreporters.com/andrew-lehren/

Page 7: Etmaal
Page 8: Etmaal

“Trustworthiness and data management are vital to the success of

qualitative studies … There is a lack of scientific literature regarding the

structures and processes for managing large qualitative data sets.”

(White, Oelken, Friesen, 2012)

 

Page 9: Etmaal

“A simple answer to objective reporting is the kind of reporting that uses relevant and reliable sources which is not bias or

slanted to a certain party.”

Ibrahim, Pawanteh, Kee (2011)

Page 10: Etmaal

can I trust and use this dataset?

Page 11: Etmaal

check the data source

what are his/her/its intentions?

Page 12: Etmaal

what is the citation index of the data owner?

do other journalists cite the data owner?

 

Page 13: Etmaal

benefit do I really need this?

do I really need to use it?

 

Page 14: Etmaal

check data gathering? is this correct?

clarification of the data? do I understand?

 

Page 15: Etmaal

missing data what is wrong? I need to solve

what is the story?

I need to write  

Page 16: Etmaal

internal validation TEST!

CALL!  

Page 17: Etmaal

I need more sources! (do I?) give me data check consistency

give me humans

check my story  

Page 18: Etmaal

scientists

check the source

(citation)

check the data

check benefit

check data gathering

TEST!

more data sources

data journalists

check the source

(citation)

check the data

check benefit

check clarification

CALL!

more human sources

Page 19: Etmaal

scientist to journalist: “You twist everything”

Page 20: Etmaal

“Dear datajournalist,

Please take a look at the research method yourself and act a bit more like a

scientist.”

Page 21: Etmaal

journalist to scientist: “Your articles are useless”

Page 22: Etmaal

“Dear scientist,

Try to avoid intellectual arrogance. There are

other people who are just as smart.”

 

Page 23: Etmaal

journalistic data mining The process of finding correlations or

patterns in large relational databases.

It is the process of analyzing data from different perspectives and summarizing it

into useful and reliable information.  

Page 24: Etmaal
Page 25: Etmaal
Page 26: Etmaal
Page 27: Etmaal
Page 28: Etmaal
Page 29: Etmaal
Page 30: Etmaal

���Gross Time Ranking versus Net Time Ranking  

 ‘The net time is the measured time from starting line to finish line and the gross time is the measured time

from the starting shot until the finish line.

In photo's of the starting line of marathons one can see thousands of runners who are eager to start.

However, when one stands in the last starting pen, one can not directly run at full speed.

A kind of human traffic jam arises when the

marathon starts. On the internet people complain about this difference in time results, because the

ranking is based on gross times.’

Page 31: Etmaal
Page 32: Etmaal
Page 33: Etmaal
Page 34: Etmaal

missing values - solve

‘We discovered that the data of 100 runners lacked. Apparently one scraped page was added double. We removed

the 100 duplicates.’  

Page 35: Etmaal

missing values - story

‘Still, nineteen runners were missing in the Amsterdam data set.

Perchance these are runners that have been disqualified.’

Or…

Page 36: Etmaal

‘To calculate the average position changes, caused by net ranking, we converted the difference scores to

absolute figures.

The average position change in the Amsterdam Marathon was 281.6

places.’

Page 37: Etmaal

scientific outcome ‘We calculated the Kendalls Tau rank correlation coefficient for the net and

gross ranking of the Amsterdam Marathon.

This coefficient shows that despite of the average differences between the

rankings, the net and gross time rankings are almost equal to each other.’

Page 38: Etmaal

journalistic outcome ‘We spoke Patrick Schuerman from Tilburg on the phone. Patrick had starting number 11797 in the Amsterdam Marathon of 2013

and had a gross time versus net time difference of over 21 minutes.

In his opinion, the ranking of the marathon should happen after net times since these

are the ‘real’ times people ran.’

Page 39: Etmaal
Page 40: Etmaal

we are both right

Page 41: Etmaal

we can learn from each other

Page 42: Etmaal

current topic:

a citizen view on the credibility of machine

written news  

http://tinyurl.com/research-uvt

Part of PhD research

Human Component in Machine Written Narratives

    Hille van der Kaa @Hillevanderkaa