Disinformation on the Web: Impact, Characteristics and Detection of Wikipedia Hoaxes Srijan Kumar Univ. of Maryland Robert West Stanford Univ. Jure Leskovec Stanford Univ. 1 Originally presented at the 25th International World Wide Web Conference, Montreal, Canada, April 2016
30
Embed
Disinformation on the Web: impact, characteristics and detection of Wikipedia hoaxes
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Disinformation on the Web: Impact, Characteristics and Detection of Wikipedia Hoaxes
Srijan Kumar Univ. of MarylandRobert West Stanford Univ.Jure Leskovec Stanford Univ.
1Originally presented at the 25th International World Wide Web Conference, Montreal, Canada, April 2016
Web: Source of information
2
62% adults in U.S.A. rely on social media
for news
28% of 18-24 year olds use
social media as primary
news source
Web: Source of false information
3
Types of false information
4
Misinformationhonest mistake
Disinformationdeliberate lie to
misleadHoax“deliberately fabricated falsehood made to masquerade as truth”Wikipedia
Why Wikipedia?
The free encyclopedia that anyone can edit
5
Easy to add (false) information
• Freely accessible
• Large reach• Major source of
information for many
Hoaxes on Wikipedia
6
Data: Wikipedia Hoaxes
Hoax article vs hoax facts
7
Data: Wikipedia Hoaxes
Hoax article vs hoax facts
21,218 hoax articles
8
Hoax lifecycle:
Wikipedia hoaxes
9
Impactof hoaxes
Characteristics
of hoaxesDetectionof hoaxes
Quantify their impact?
What are the hoaxes like?
Can we find them?
Impact of hoaxes“The worst hoaxes are those which (a) last for a long time, (b) receive significant traffic, (c) are relied upon by credible news media.”Jimmy Wales on Quora
10
Impact of hoaxes“The worst hoaxes are those which (a) last for a long time”
11
Time t between patrolling and flagging
0.99
0.90
Impact of hoaxes“The worst hoaxes are those which (b) receive significant traffic”
12
10 100
500
Number n of pageviews per day
Impact of hoaxes“The worst hoaxes are those which (c) are relied upon by credible news media”
13
1.08 active inlinks
per hoax article, on average
7% of hoax articles have
at least 5 active inlinks
Wikipedia hoaxes
14
Impactof hoaxes
Characteristics
of hoaxesDetectionof hoaxes
Most hoaxes are caught
soon, but some hoaxes are impactful
What are the hoaxes like?
Can we find them?
15
Successful hoaxpass patrolsurvive for a monthviewed 100+/day
Failed hoaxflagged and deleted during patrol
Wrongly flagged temporarily flagged
Legitimate articlesnever flagged
Hoax
Non-hoax
Characteristics of hoaxes
16
Appearance:how the article looks
Link-network:how the article connects
Support:how other articles refer to it
Editor:how the article creator looks
Characteristics of hoaxes
17
Surprisingly, hoax articles are longer than non-hoax articles!
Features:o Plain-text length
Appearance:how the article looks
Link-network:how the article connects
Support:how other articles refer to it
Editor:how the article creator looks
Characteristics of hoaxes
18
Surprisingly, hoax articles are longer than non-hoax articles!butthey mostly have plain text and have fewer web and wiki links.