NSLab, RIIT, Tsinghua Univ A New Approach to Bot Detection: Striking the Balance Between Precision and Recall Fred Morstatter et al. Presented by Jun Yang 2017.3.29
NSLab, RIIT, Tsinghua Univ
A New Approach to Bot Detection:Striking the Balance Between Precision and Recall
Fred Morstatter et al.Presented by Jun Yang
2017.3.29
NSLab, RIIT, Tsinghua Univ
What is a bot?
� Social media accounts that are controlled by software.
� Self-declared bots.� Spambots.� Socialbot.
2
NSLab, RIIT, Tsinghua Univ
What is a bot?
� Innocuous.� Post up-to-date weather, news, historical events, etc.
� Nocuous.� Infiltration.� Influence trending.� Repost or follow specific user.
3
NSLab, RIIT, Tsinghua Univ
How many?
4
� Over half of the accounts on Twitter are not human.� 5-9% bots produce 24% tweets on Twitter.� 28% of accounts created in 2008 and half of the accounts
created in 2014 have been suspended by Twitter.
NSLab, RIIT, Tsinghua Univ
Influence
5
� Harvest private users' data.� Sway discussion.� Influence trending hashtags and user statistics.� Lose user experience and trust.� Social media researches.
NSLab, RIIT, Tsinghua Univ
Bots detection
6
� Classification tasks.
NSLab, RIIT, Tsinghua Univ
Bots detection
7
� Classification tasks.� Content¾ Different from normal users.¾ URLs.¾ Sentiment.¾ Length.¾ Similarity.¾ Original tweet.
NSLab, RIIT, Tsinghua Univ
Bots detection
8
� Classification tasks.� User profile¾ Automatically generated accounts with detectable patterns.¾ E-mail addresses.¾ Creation times.¾ Life time.¾ Screen name and verified name.¾ Human typing.
NSLab, RIIT, Tsinghua Univ
Bots detection
9
� Classification tasks.� Activities¾ Request frequency.¾ IP addresses.¾ Multiple login location.
NSLab, RIIT, Tsinghua Univ
Bots detection
10
� Classification tasks.� Network structure and connection¾ Mass following and unfollowing behaviors.¾ Statistical and structural features.
NSLab, RIIT, Tsinghua Univ
Ground Truth Acquirements
11
� Manual annotation� Suspended users list.� Honeypots.
NSLab, RIIT, Tsinghua Univ
Precision vs Recall
12
� An undetected bot vs an angry user?
NSLab, RIIT, Tsinghua Univ
Contribution
13
� Two labeled datasets by different techniques.� Textual features by LDA.� Modified approach for higher Recall and F1.
NSLab, RIIT, Tsinghua Univ
Dataset
14
� Lybya� Querying keywords of Arab Spring.� Collect accounts from 2011.2 to 2013.2� Check whether suspended or removed in 2015.2� 7.5% accounts as bots.
NSLab, RIIT, Tsinghua Univ
Dataset
15
� Arabic Honeypot� Random tweet or retweet Arabic phrases.� Measures to avoid suspension.� Collect human users that tweet same phrases.� Balanced dataset.
NSLab, RIIT, Tsinghua Univ
Dataset
16
� Arabic Honeypot� Random tweet or retweet Arabic phrases.� Measures to avoid suspension.� Collect human users that tweet same phrases.� Balanced dataset.
NSLab, RIIT, Tsinghua Univ
Baselines
17
� Heuristics� Retweet fraction.� Average tweet length.� URLs fraction.� Average time interval.
NSLab, RIIT, Tsinghua Univ
LDA
18
NSLab, RIIT, Tsinghua Univ
AdaBoost
19
� Different weak classifiers will focus on different bots.
NSLab, RIIT, Tsinghua Univ
BoostOR
20
� A modified AdaBoost algorithm to improve Recall.
NSLab, RIIT, Tsinghua Univ
BoostOR
21
� A modified AdaBoost algorithm to improve Recall.
NSLab, RIIT, Tsinghua Univ
Discussion of F1-score
22
� Balanced test set.
� Positive:Negative = 3:1
� Positive:Negative = 1:3
Precision Recall F1C1 90.00% 70.00% 78.75%C2 70.00% 90.00% 78.75%
Precision Recall F1C1 96.43% 70.00% 81.12%C2 87.50% 90.00% 88.73%
Precision Recall F1C1 75.00% 70.00% 72.41%C2 43.75% 90.00% 58.88%
NSLab, RIIT, Tsinghua Univ
Number of topics
23
NSLab, RIIT, Tsinghua Univ
Thank you!
Questions?
24