Top Banner
Misinformation on Twitter CS315 – Web Search and Data Mining
48

Misinformation on Twitter CS315 – Web Search and Data Mining.

Jan 16, 2016

Download

Documents

Arron Todd
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Misinformation on Twitter CS315 – Web Search and Data Mining.

Misinformation on Twitter

CS315 – Web Search and Data Mining

Page 2: Misinformation on Twitter CS315 – Web Search and Data Mining.

Twitter Primer

Twitter is a short message serviceIt allows you to:Tweet =

send a message to those who follow what you say

Re-tweet = send to your followers something you received

Reply = send a specific user a message, also seen by your

followers (@user)

Direct-message = send a specific user a message, no one else sees it

#hashtag = marks terms for easy search

Page 3: Misinformation on Twitter CS315 – Web Search and Data Mining.

Social Media in Search Results

October 24, 2009: bing introduces real-time resultsinside search resultsDec. 7, 2009: Google adopts real-time results in search results (usually third position)Twitter’s visibility grows dramatically

(reaching 6% of population, yet small compared to Google etc.)

Information Web and Social Web start coming together

Page 4: Misinformation on Twitter CS315 – Web Search and Data Mining.

Interaction of Networks

Late December 2009: The major search engines use real-time data in their search resultsThe Jan 19, 2010: MA Special Elections

Martha Coakley (D) vs Scott Brown (R)

As elections near, people google for candidate names

Page 5: Misinformation on Twitter CS315 – Web Search and Data Mining.

Searching for Political Candidates

When searching for Scott Brown vs Martha Coakley

Page 6: Misinformation on Twitter CS315 – Web Search and Data Mining.

The Tweeter Corpus

Over 200,000 Tweets from January 13-20, 2010

“Coakley” and “Scott Brown”

Surprisingly large number (41%) of retweets (RT):Why?Significant number (7%) of replies: Why?One in 3 tweets (32%)are repeating tweets:Why?

Page 7: Misinformation on Twitter CS315 – Web Search and Data Mining.

Huge number of retweets (RT). Why?

Hypothesis: You retweet a message if you agree with it.Indeed: Retweets reveal communities:Behavioral patterns provideanother way to determine political affiliation of users

Top200 group retweeting behavior

Page 8: Misinformation on Twitter CS315 – Web Search and Data Mining.

32% are repeating tweets: Why?

Your followers with who you greatly agreehave seen your tweet, so why repeat?You assume a leadership role in the campaign:

Top200 group memberswere 70 times more likelyto repeat a tweet.

You repeat to influence Google search:

Twitter-enabled Google bomb

Top200 group repeating behavior

Page 9: Misinformation on Twitter CS315 – Web Search and Data Mining.

Significant number of replies. Why?

Hypothesis:You reply to engage in a dialog or fight with othersNot alwaysBut top200 users were far less likely to reply, even though they spent a lot more time tweeting

Except…

Top200 group replying behavior

Page 10: Misinformation on Twitter CS315 – Web Search and Data Mining.

Significant number of replies. Why?

Hypothesis:You reply to engage in a dialog or fight with othersNot alwaysBut top200 users were far less likely to reply, even though they spent a lot more time tweeting

Except…

Top200 group replying behavior

Page 11: Misinformation on Twitter CS315 – Web Search and Data Mining.

The first Tweeter-bomb

Account creation and tweet bombs: signature of spamming9 accounts sent 929 reply-tweets to 573 users in 138 min.

Page 12: Misinformation on Twitter CS315 – Web Search and Data Mining.

Where were the URLs linking?

Page 13: Misinformation on Twitter CS315 – Web Search and Data Mining.

Who was behind the Tweeter-bomb?

Page 14: Misinformation on Twitter CS315 – Web Search and Data Mining.
Page 15: Misinformation on Twitter CS315 – Web Search and Data Mining.

Pre-Fabricated Tweet Factory targets News Media & Reporters

30 lists with tweets2758 tweets180 media accounts targeted.

DO YOUR JOB SHINE THE LIGHT ON ACORNhttp://bit.ly/DoYourJob @ACORN Nat @SEIU@GlobeSenateRace @wwlp #masen

WE THE PEOPLE WANT A FAIR ELECTIONhttp://bit.ly/acRNFraud @ACORN Nat @SEIU@GlobeSenateRace @wwlp #masen

Is THIS http://bit.ly/CoakleyTHUGS Why YOU”RE AFRAID to Investigate ACORN? @CBSEveningNews @katiecouric @ACORN_Nat #ACORN

Page 16: Misinformation on Twitter CS315 – Web Search and Data Mining.
Page 17: Misinformation on Twitter CS315 – Web Search and Data Mining.

Is there defense against false rumors?

You hear a rumor on Twitter. “A plane is spotted on the sea!” Should you retweet it? “Two policemen are shot in Ferguson.” How people react? “Terrorist warnings of attacks on London Tubes.” Is it true?

Page 18: Misinformation on Twitter CS315 – Web Search and Data Mining.

What can do for you?

You hear a rumor on Twitter. “A plane is spotted on the sea!” Should you retweet it? “Two policemen are shot in Ferguson.” How people react? “Terrorist warnings of attacks on London Tubes.” Is it true?

What questions can TwitterTrails.com answer? Rumor origin, spreading, crowd skepticism, polarization

How does TwitterTrails.com work? Collects data on demand: bit.ly/TTrequest Provides cool visualizations and ML to respond within minutes Based on “Retweeting indicates interest, agreement, trust” Harnesses the power of crowdsourcing

Page 19: Misinformation on Twitter CS315 – Web Search and Data Mining.

Stories Investigated (200+ so far)

Page 20: Misinformation on Twitter CS315 – Web Search and Data Mining.

ORIGINATOR, FIRST POSTER: Who made the rumor known? Who posted the rumor first?

TIMELINE OF SPREADING: When and how did the story break? Is the story still spreading?

PROPAGATORS: Who have been spreading the story through retweeting? Do they form a dense group or are there disconnected

networks?

RUMOR NEGATION: Are there any denials to the story?

MAIN ACTORS: Who are the most visible actors in the spreading,

according to the audience?

Questions that can answer

Page 21: Misinformation on Twitter CS315 – Web Search and Data Mining.

Story Summary (Overview)

Page 22: Misinformation on Twitter CS315 – Web Search and Data Mining.

Visualization of Initial Spreading

PROPAGATION GRAPH

Page 23: Misinformation on Twitter CS315 – Web Search and Data Mining.

Timeline of Spreading

TIME SERIES OF RELEVANT TWEETS

Page 24: Misinformation on Twitter CS315 – Web Search and Data Mining.

Retweet Network reveals PROPAGATORS

RETWEET NETWORK

Page 25: Misinformation on Twitter CS315 – Web Search and Data Mining.

Most Visible Actors and Polarization

Co-RETWEETED NETWORK

Page 26: Misinformation on Twitter CS315 – Web Search and Data Mining.

Features: Color indicates text similarity

Page 27: Misinformation on Twitter CS315 – Web Search and Data Mining.

Feature: Negation and keyword occurrence

Page 28: Misinformation on Twitter CS315 – Web Search and Data Mining.

blogs.wellesley.edu/twittertrailsblogs.wellesley.edu/twittertrails

Page 29: Misinformation on Twitter CS315 – Web Search and Data Mining.

Is it true? Is it false? Ask the crowd

SPREAD: Rate of all RTsSKEPTICISM: negating RTs / promoting RTs

Page 30: Misinformation on Twitter CS315 – Web Search and Data Mining.

Request an investigation: bit.ly/TTrequest

Page 31: Misinformation on Twitter CS315 – Web Search and Data Mining.

bit.ly/3SocialTheoremsbit.ly/3SocialTheorems

Page 32: Misinformation on Twitter CS315 – Web Search and Data Mining.

Why Twittertrails.com works?

Page 33: Misinformation on Twitter CS315 – Web Search and Data Mining.

SocTh 1: Retweeting a message indicates interest, trust, agreement

Page 34: Misinformation on Twitter CS315 – Web Search and Data Mining.

The sender matters less than the message

Page 35: Misinformation on Twitter CS315 – Web Search and Data Mining.

Some Reporters Want to Differ

Retrieved all 2,585 profiles containing “RT” and (“endorsement” or “agreement”)

53% belong to media people

13% belong to politicians

Page 36: Misinformation on Twitter CS315 – Web Search and Data Mining.

SocThm 2: Propagation vs Skepticism

Conjecture: On Twitter,

claims with higher skepticism and lower propagation scores are more likely to be false claims with lower skepticism and higher propagation scores are more likely to be true.

Page 37: Misinformation on Twitter CS315 – Web Search and Data Mining.

In Facebook the Conjecture may not be true

“Rumor Cascades” paper finds that rumors in Facebook never die…Why?

Page 38: Misinformation on Twitter CS315 – Web Search and Data Mining.

bit.ly/3SocialTheoremsbit.ly/3SocialTheorems

Page 39: Misinformation on Twitter CS315 – Web Search and Data Mining.

How do you know what you know?

Extrinsic reasons Trust in the entity supporting the information The majority of people use it extensively Technology can help here

Intrinsic reasons Own experience Own ability to think critically,

which means: Understanding the Scientific Method

and apply it habitually on important matters

But this is tough and requires Education

Page 40: Misinformation on Twitter CS315 – Web Search and Data Mining.

We also “know”…

What we learned as childrenWhat we remember incorrectly

There is no database in our brain, but we recreate memories every time we remember them

What we misunderstood Our brain is a pattern matching machine,

we find similarities even where there is none

What we think under the influence of substances, of voices in sleep, lack of sleepWhat we thinkunder fear, anger, passion, personal interest, using illogical processes____________ (Add your own examples)

Page 41: Misinformation on Twitter CS315 – Web Search and Data Mining.

Are we so stupid?

Our brain is impressively complicated, but it is not perfectIt is influenced by construction limitations and errors, our feelings, our senses, our environmentIt was created through an ongoing evolutionary process. Some parts are old and are activated immediately, others are newer and demand great energy to get activated.We need to feel that we are in control of our environment. We do not easily accept randomness in phenomena, we want to “discover” reasons explaining randomness.Critical thinking uses neocortex, large in size and requiring lots of energy to operate. We try to avoid using so much energy by creating heuristics, stereotypes, personal ways of “thinking”

Page 42: Misinformation on Twitter CS315 – Web Search and Data Mining.

Conclusion and Future Work

TwitterTrails.com: Use it to monitor your stories! Blog your findings: blogs.wellesley.edu/twittertrails Email [email protected]

Which metrics are more likely to signify a true rumor? a false rumor?

Better methods to detect negations of rumors?

“The Internet is full of lies.” Is it so? How “full”?

Page 43: Misinformation on Twitter CS315 – Web Search and Data Mining.

On Google Bombs

Page 44: Misinformation on Twitter CS315 – Web Search and Data Mining.

Online Political Spam: A Short History

The 2006 elections show potential for spamThe 2006 elections show potential for spam

Activists openly collaborating to Google-bomb search results of political opponents in 2006

Page 45: Misinformation on Twitter CS315 – Web Search and Data Mining.

Online Political Spam: A Short History

Search results for Senatorial candidate John N. Kennedy, 2008 USA Elections

Search results for Senatorial candidate John N. Kennedy, 2008 USA Elections

In 2008 Google takes things in its own handsIn 2008 Google takes things in its own hands

Page 46: Misinformation on Twitter CS315 – Web Search and Data Mining.

A more sophisticated effort

Page 47: Misinformation on Twitter CS315 – Web Search and Data Mining.

Will it work?

Page 48: Misinformation on Twitter CS315 – Web Search and Data Mining.

Did it work?

2008

2010