Top Banner
When socialbots attack: Modeling susceptibility of users in online social networks Claudia Wagner, Silvia Mitter, Christian Körner, Markus Strohmaier Lyon, 16.4.2012
23
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Socialbots

When socialbots attack:

Modeling susceptibility of users in online social networksClaudia Wagner, Silvia Mitter, Christian Körner, Markus Strohmaier

Lyon, 16.4.2012

Page 2: Socialbots

What are socialbots?A socialbot is a piece of software that controls a user account in an online social network and passes itself of as a human being

Page 3: Socialbots

3Danger of socialbots

Social EngineeringGaining access to secure objects by exploiting human psychology rather than using hacking techniques

Harvest private user data such as email addresses, phone numbers, and other personal data that have monetary value

Spread MisinformationRatkiewicz et al. describe the use of Twitter bots to run smear campaigns during the 2010 U.S. midterm elections.

J. Ratkiewicz, M. Conover, M. Meiss, B. Goncalves, S. Patil, A. Flammini, and F. Menczer. Truthy: mapping the spread of astroturf in microblog streams. In Proceedings of the 20th international conference companion on World wide web, WWW '11, pages

Page 4: Socialbots

Danger of socialbots

Snowball effectsBoshmaf et al. show that Facebook can be infiltrated by social bots sending friend requests. 102 socialbots, 6 weeks, 3.517 friend requests and 2.079 infections

Average reported acceptance rate: 59,1% up to 80% depending on how many mutual friends the social bots had with the infiltrated users

Y. Boshmaf, I. Muslukhov, K. Beznosov, and M. Ripeanu. The socialbot network. In Proceedings of the 27th Annual Computer Security Applications Conference, page 93. ACM Press, Dec 2011.

Page 5: Socialbots

Experimental SetupHow likely will she

be infected by a bot ?

Is she a bot?

Whom shall we protect to avoid large scale infiltration due to snowball effects?

src: http://adobeairstream.com/green/a-natural-predicament-sustainability-in-the-21st-century/

Who is a bot? Whom shall we eliminate?

Page 6: Socialbots

Experimental Setup

Two-stage approachPredict Infections (binary classification task)

Who is susceptible for bot attacks – i.e. who gets infected?

Predict Infection level (regression task)

How susceptible is a user – i.e. how often does a user interact with bots?

Dataset: Social Bot Challenge 2011

Page 7: Socialbots

Social Bot Challenge 2011Competition organized by Tim Hwang

Aim was to develop socialbots that persuade 500 randomly Twitter users (targets) to interact with them

Targets have a topic in common: cats

Teams got points if targets replied to, mentioned, retweeted or followed their lead bot

14 days during which teams were allowed to develop their social bots.

Game started on the Jan 23rd 2011 (day 1) and ended Feb 5th 2011 (day 14)

At the 30th of January (day 8) the teams were allowed to update their codebase

Page 8: Socialbots

Dataset

Page 9: Socialbots

Feature EngineeringHow likely will this user become infected?

Behavior

User Network

Content

Page 10: Socialbots

Network Features3 directed networks: Follow, retweet and interaction (retweet, reply, mention and follow) network

Hub and Authority Score (HITS)High authority score node has many incoming edges from nodes with a high hub score

High hub score node has many outgoing edges to nodes with a high authority score

In-degree and Out-degree

Clustering Coefficientnumber of actual links between the neighbors of a node divided by the number of possible links between them

Page 11: Socialbots

Behavioral Features

Informational Coverage

Conversational Coverage

Question Coverage

Social Diversity

Informational Diversity

Temporal Diversity

Lexical Diversity

Topical Diversity

C. Wagner and M. Strohmaier. The wisdom in tweetonomies: Acquiring latent conceptual structures From social awareness streams. In Proc. of the Semantic Search 2010 Workshop, April 2010.

Page 12: Socialbots

Linguistic FeaturesLIWC uses a word count strategy searching for over 2300 words

Words have previously been categorized into over 70 linguistic dimensions.

standard language categories (e.g., articles, prepositions, pronouns including first person singular, first person plural, etc.)

psychological processes (e.g., positive and negative emotion categories, cognitive processes such as use of causation words, self-discrepancies),

relativity-related words (e.g., time, verb tense, motion, space)

traditional content dimensions (e.g., sex, death, home, occupation).

J. Pennebaker, M. Mehl, and K. Niederhoer. Psychological aspects of natural language use: Our words, our selves. Annual review of psychology, 54(1):547-577, 2003.

Page 13: Socialbots

Feature Computation

For all targets we computed the features by using all tweets they authored during the challenge (up to the point in time where they become infected) and a snapshot of the follow network which was as recorded at the 26th of January (day 4)

We only used targets which became susceptible at day 7 or later

Features do not contain any future information (such as tweets or social relations which were created after a user became infected)

Page 14: Socialbots

Predict InfectionsBinary Classification of users into susceptible and non-susceptible

Train 6 classifiers

97 Features

Compare classifiers via 10 cross-fold validation

Balanced dataset

Page 15: Socialbots

Feature Ranking

AUC value as ranking criterion

Page 16: Socialbots

Top 10 Features

Social and active

Meformer

Communicative and open

Emotional

Page 17: Socialbots

Predict Level of Infection

Which factors are correlated with users‘ susceptibility score?

Susceptibility score counts number of interactions between a target and any lead bot

Method: Regression Treescan handle strongly nonlinear relationships with high order interactions and different variable types

Fit the model to our 75% of the susceptible users

Page 18: Socialbots

Predicting Levels of Susceptibility

Users who • use more negation words (e.g. not, never, no), • tweet more regularly (i.e. have a high temporal balance) • use more words related with the topic death (e.g. bury, con, kill)tend to interact more often with bots

Page 19: Socialbots

Predicting Levels of SusceptibilityRank correlation of hold-out users given their real susceptibility level and their predicted susceptibility level (Kendall τ up to 0.45)

Goodness of fit (R2 up to 0.3)

Potential Reasons:

Dataset is too small (we only had 81 susceptible users and 61% of them had level 1, 17% had level 2, 10% had level 3, very few users had more than 3 interactions)

Page 20: Socialbots

Summary & Conclusions

Approach to identify susceptible users

Features of all three types contributed to the identification

Users are more likely to be susceptible ifthey are emotional Meformers

they use Twitter mainly for communicating

their communications are not focused to a small circle of friends

they are social and active (i.e., interact with many others)

Page 21: Socialbots

Summary & Conclusions

Active Twitter users are more susceptible They are more likely to see the messages/requests of social bots

But we expected that they develop some skills to distinguish social bots from human by using Twitter frequently

Predicting users’ susceptibility score is difficult More data and further experiments are required

Page 22: Socialbots

Future Work

Repeating experiments on larger datasets

Taxonomy of social bot strategiesMassive numbers of con-messages (brute force)

Manipulation of messages through false retweets (changing pro- to con messages)

Diverting attention by adding con-hashtags to pro-hashtags

Susceptibility of users for different strategies

Page 23: Socialbots

Experimental Setup

THANK YOU

[email protected]://claudiawagner.info

src: http://adobeairstream.com/green/a-natural-predicament-sustainability-in-the-21st-century/

Emotional Meformers which are active, communicative and social are more likely to be infected