Introduction Development of the classification system Application and experimental results Conclusions Sentiment Analysis for italian language microblogging: development and valutation of an automatic system Corrado Monti 22-04-2013 Corrado Monti Sentiment Analysis for italian microblogging 1/23
24
Embed
Sentiment Analysis and Political Disaffection in Italy
Slideshow for my master thesis. I built a classification system for politic-related concepts on Italian microblogging, and applied it to political disaffection, measuring correlation between Twitter and h polls.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
IntroductionDevelopment of the classification system
Application and experimental resultsConclusions
Sentiment Analysis for italian languagemicroblogging: development and valutation of an
automatic system
Corrado Monti
22-04-2013
Corrado Monti Sentiment Analysis for italian microblogging 1/23
IntroductionDevelopment of the classification system
Application and experimental resultsConclusions
Twitter
I Main microblogging platformI instant pubblication of short textual contents
I Tweet → 140 charactersI Widespread in Italy too
Italian internet users
Hearsay knowledge of
Twitter
WeeklyTwitter users
DaylyTwitter users
28.6 M people100%
25.3 M people88.6% of internet users
1.24 M people4.4% of internet users
4.7 M people16.5% of internet users
december 2012
Corrado Monti Sentiment Analysis for italian microblogging 2/23
IntroductionDevelopment of the classification system
Application and experimental resultsConclusions
Textual classification
I We would like to classify millions of textsI Rule-based approach: definitions of rules (keywords, regular
expressions) together with field experts
I Inaccurate, hard to define
I Supervised machine learning
I They need a training set to learn how to classify new textsI Every text is transformed in a sequence of numerical features;
the algorithm learns what these numbers mean
I Recent problem: Sentiment Analysis → recognize thesentiment expressed by the author of the text
I Many applications, both industrial and academic
Corrado Monti Sentiment Analysis for italian microblogging 3/23
IntroductionDevelopment of the classification system
Application and experimental resultsConclusions
Measuring collective feelings
I One of the first works is O’Connor et al. (2010), whomeasured economic trust through Sentiment Analysis onTwitter
Sent
imen
t Rat
io
1.5
2.0
2.5
3.0
3.5
4.0
Gal
lup
Econ
omic
Con
fiden
ce
−60
−50
−40
−30
−20
2008−01
2008−02
2008−03
2008−04
2008−05
2008−06
2008−07
2008−08
2008−09
2008−10
2008−11
2008−12
2009−01
2009−02
2009−03
2009−04
2009−05
2009−06
2009−07
2009−08
2009−09
2009−10
2009−11Dates Dates
I For which phenomena is this possible?
I How much are these measures valid? Do they work in Italytoo?
Corrado Monti Sentiment Analysis for italian microblogging 4/23
IntroductionDevelopment of the classification system
Application and experimental resultsConclusions
Goals of this work
I Hypothesis: can political disaffection be measured throughmassive tweet classification?
I It is a relevant phenomenon, especially in ItalyI Lot of interest, academic (sociology) and not academic
I We’d also like to build reusable classifiers
Corrado Monti Sentiment Analysis for italian microblogging 5/23
IntroductionDevelopment of the classification system
Application and experimental resultsConclusions
Political disaffection
I How to define a “disaffected” tweet?
I According to domain experts, it must
1. have a politic-related topic → topic detection2. have negative sentiment → sentiment analysis3. be directed to all politicians
Tweet Is political?
Discarded
No
YesIs negative?
Discarded
No
YesIs general?
Discarded
No
YesPolitically
DisaffectedTweet
Corrado Monti Sentiment Analysis for italian microblogging 6/23
IntroductionDevelopment of the classification system
Application and experimental resultsConclusions
Training Set
I 28′340 tweet labelled by 40 political science studentsI Between april and june 2012I 3 labellers for every text
I We keep in the dataset only tweet with unanimous “politic”label
I For other labels we measure agreement with Krippendorff α:I “negative” → 0.78 → reliable labels XI “generic” → 0.41 → much noise on labels ×
⇓
negative non negative
politic 7′965 4′544
not politic 15′831
Corrado Monti Sentiment Analysis for italian microblogging 7/23
IntroductionDevelopment of the classification system
Application and experimental resultsConclusions
1 – Topic Detection: politics
I Time-robust classifier
I Training Set extension: 17′388 news titles (January-October2012)
I Best feature extraction:I 5-grams of characters → uniformity from different spacingsI tf-idfI Discard terms with less than 4 occourences
Corrado Monti Sentiment Analysis for italian microblogging 8/23
IntroductionDevelopment of the classification system
Application and experimental resultsConclusions
I 45′728 points with 78′642 featuresI SMO SVM-solver, k-Nearest Neighbor, Random Forest, kernel are too
resource-hungry
I We preferred online algorithmsI ALMA, Passive-Aggressive, PEGASOS, OIPCAC gave good results