On the Coverage of Science in the Media - A Big Data Study on the Impact of the Fukushima Disaster Thomas Lansdall-Welfare and Nello Cristianini Department of Computer Science, University of Bristol BigData’ 14 2015/8/3(Mon.) Chang Wei-Yuan @ MakeLab Lab Meeting Keywords: Data analysis; Text mining; Knowledge discovery; Computational linguistics;
20
Embed
On the coverage of science in the media a big data study on the impact of the fukushima disaster (dragged)
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
On the Coverage of Science in the Media - !A Big Data Study on the Impact of the Fukushima
Disaster� Thomas Lansdall-Welfare and Nello Cristianini!
Department of Computer Science, University of Bristol BigData’ 14!
Keywords: Data analysis; Text mining; Knowledge discovery; Computational linguistics;�
Introduction�• This work analyzes online-news to explore
the impact of the Fukushima disaster on the media representations of nuclear power.!
Introduction�• A corpus of news articles are used to
detect the impact in the media before and after the event. !– the evolution of attention and sentiment of
nuclear power!– the networks of the actors and actions linked
to nuclear power!– the network of topics�
Introduction�• The key finding is that media attitude
towards nuclear power has significantly changed in the wake of the Fukushima disaster.�
Data Description�• This analysis only focus on science news
articles in an effort to ensure monitoring how the reporting of science has changed. !– number: over 5 million science articles !– period: about five years from1st May 2008
and 31st December 2013 !
NOAM: News Outlets Analysis and Monitoring System, SIGMOD 2011. !�
Data Description�• News articles are labeled as science
articles in one of two ways. !– that all news articles coming from an online
news feed that was explicitly hand annotated. !– automatically classify news articles into
different generic news categories. !
Methodology�• This work focused on analyzing the
context of how different scientific concepts and associated actors. !– the Apache Hadoop framework !– MongoDB!– ElasticSearch�
Methodology�• Extracting References !– References are extracted from the corpora of
science articles by the list of the items which we wish to detect. !
– scientific topics: Wikipedia!– universities: QS World University Rankings!– diseases: Wikipedia!
Methodology�• Generating Time Series!– two types of time series: !– 1. the amount of attention a given item
receives !– 2. the sentiment surrounding a given item
over time�
Methodology�• Mining Associations!– Associations between the items were
obtained by performing association rule mining using the FP-Growth algorithm. !
Methodology�• Extracting Triplets and Action Clouds!– To extract triplets that match the form
Subject-Verb-Object.!– By aggregating together all the verbs from
the triplets where a particular item forms the subject or object of the triplet. !
Result�• The attention on the topic of “Nuclear
Power”!– showing how a big data approach to corpus
analysis can reveal information.�
Result�• Evolution of Attention�
Result�• Evolution of Sentiment�
Before�
• Associations�
A'er�
Before�
A'er�
Conclusion�• The findings reveal an insight into the
change associated with the global media reporting. !– the nuclear power following the nuclear
disaster in Fukushima !
Conclusion�• The methodology presents a
comprehensive way to monitor critical events and their media. !
• The innovative character of these techniques opens up new possibilities in social scientific research.�