AFFECT ANALYSIS OF DUTCH SOCIAL MEDIA AND RANKING OF QUERY RESULTS OVER LINKED DATA Laurens Rietveld
Jan 19, 2015
AFFECT ANALYSIS OF DUTCH SOCIAL MEDIAAND
RANKING OF QUERY RESULTS OVER LINKED DATA
Laurens Rietveld
Master Project Background
Affect analysis of Dutch social media Finished July 2010 VU (Stefan) GfK Daphne
Marketing Research Online dashboard
Not involved yet in webmining Business case: National Railway Company (NS)
Data collectio
n
Data Processi
ngAnalysis
Project BackgroundAffect Analysis
Affect: experience of feeling or emotion[1]
Multiple measurements Physiological Behavioral Vocal Linguistic
[1] W. Huitt, The Affective System
[2] W. Parrott, Emotions in Social Psychology
Project BackgroundAffect Analysis
What is online affect analysis Detect emotions on web pages Types of emotions[2]:
Love Joy Surprise Anger Sadness Fear
Project BackgroundAffect Analysis
Main problems Unstructured data
Internet (html) Text
Domain dependencies “Go read the book” positive in book reviews,
negative in movie reviews Ambivalence
Text Emotion
Project BackgroundDutch Social Media
Used Social Media Types: Blogs (www.blogspot.com)
Online news item reactions (www.fok.nl)
Micro-blogs (www.twitter.com)
Project BackgroundCrowd Sourcing
Problems: Affect analysis needs training data Annotating data is time-consuming Annotate every domain Normally done by researcher
Solution: Crowd Sourcing Mechanical Turks Outsourcing simple tasks to large community
+ -
Many tasks English only
Quick Risk of lower quality
Cheap Unethical (debatably)
Affect Analysis Approach
Research Questions
Is it possible to apply crowd-sourcing to affect analysis of Dutch social media
Are there differences between social media types in affect analysis
Results
Inter annotator agreement: low Neutral outvotes emotion Possible causes:
Missing sentence context Too few annotators Noise introduced by translation
1/1/
2007
1/3/
2007
1/5/
2007
1/7/
2007
1/9/
2007
1/11
/200
7
1/1/
2008
1/3/
2008
1/5/
2008
1/7/
2008
1/9/
2008
1/11
/200
8
1/1/
2009
1/3/
2009
1/5/
2009
1/7/
2009
1/9/
2009
1/11
/200
9
1/1/
2010
1/3/
2010-1%
0%
1%
2%
3%
4%
5%
6%
7%
8%
9%
All social media
Joy Surprise Anger Sadness
Period
% o
f all d
ocum
ents
Results
Period EventJuly 2007 Problems in the payment system of ticket automatsJanuary 2009 Required chip card payment method for studentsDecember 2009 Train and railway malfunctions due to snowFebruary 2010 Filthy train stations due to cleaning crew strikes
Future work
Other list of emotions Improve annotation process
More voting Use other strategies for annotation tasks
Not sentence annotation but paragraph/document
Different social media types, different feature-extraction/classifier/annotation strategies
AFFECT ANALYSIS OF DUTCH SOCIAL MEDIAAND
RANKING OF QUERY RESULTS OVER LINKED DATA
Laurens Rietveld
Data2Semantics
Data2Semantics
Data2Semantics
Wicherts JM, Bakker M, Molenaar D, 2011 Willingness to Share Research Data Is Related to the Strength of the Evidence and the Quality of Reporting of Statistical Results. PLoS ONE 6(11)
Data2Semantics
Provide semantic infrastructure for e-Science
How to share, publish, access, analyze, interpret and reuse data? Querying Ranking Information utility Enriched publications Provenance Annotation/interpretation
Census
CDS toolsCDS tools
Patient Profile
EMR LIS
Linked Data
Elsevier-published Clinical Guideline
Clinical evidencee.g. CT report
AERSHospital
Clinical Decision Support
My Research
http://dbpedia.org/fct/ http://google.com
Ranking
My Research
Ranking1. Relevance
No proper ‘PageRank’ equivalent for semantic web
Heterogeneous and imprecise data
2. Ordering Performance
Relevance
What query results are most relevant?
Semantic web comes with implicit orderings. Possible indicators: Which ontologies are used more often? What can we say about these ontologies? Which query results are semantically similar? Which query results can I trust?
Ordering
SELECT ?price ?offer ?product ?vendor ((?rating + ?popularity) AS ?score){ ?product :hasRating ?rating . ?product :producer ?producer . ?producer :hasPopularity ?popularity . ?offer :product ?product . ?offer :price ?price .}ORDER BY DESC(?score)LIMIT 1
Berlin SPARQL Benchmark
Slice 1
?product?rating
BGP
BGP
?producer?popularity
RankJoin
Rank
Rank
SPA
RQ
L-R
an
k
Join
BGP
?product?producer?offer?price
646 679
30 29
195
1
1Slice
1
Order
BGP
?product?rating
BGP
BGP
?producer?popularity
?product?producer?offer?price
trad
itio
nal
Join
646
Join
679
438634 13205
13205
13205
1
Ordering
Related work: Sara Magliacane
Current Question
What if reasoning is required to materialize information?
Top-k Closure (Stefan Schlobach) Avoid full materialization while still being
complete
Vb materialisatie
Thank You