Page 1
Making Sense of Millions of Thoughts
Findingpatterns
in theTweets
“Knowing comes from learning, from seeking.”
“What we call chaos is just we haven't recognized.”
“I am looking for a needle haystack.”
“140-character text messages, called ”
Krist Wongsuphasawat
(50 characters)
(58 characters)
(42 characters)
(42 characters)
Page 3
Prof. XAbility: Telepathy (mind reading)
Page 4
CerebroEnhance telepathy
Prof. X
Page 7
What are you thinking?
Page 8
What are people thinking about x?
Product Event
Personetc.
Page 12
Platformthought
thought
thought
thought
thought
crowdsourcing social networks
Data
Page 13
Twittertweet
tweet
tweet
tweet
tweet
Tweets
Page 14
Tweets• 140 characters
• text + media
• geo
• time
Page 15
Twittertweet
tweet
tweet
tweet
tweet
Tweets
Page 16
What can we learn from these Tweets?
Page 17
visual-insights@twitter@miguelrios @philogb @trebor @kristw
Page 18
World Cup
Election
Oscars
Pure Curiosity
Grammy
TV Shows
New Year
Breaking news
Earthquake
Page 19
Insights, Stories
(Tweets)DATA
with limited time
Audience: general public
Page 20
Tools
• Hadoop
• Apache Pig
• Vertica
• node.js, python
• d3 & co.
Page 22
Insights, Stories
(Tweets)DATA
Page 23
Insights, Stories
(Tweets)
Filter
DATA
Page 24
Having all Tweets
How people think I feel.
Page 25
Having all Tweets
How people think I feel. How I really feel.
Page 26
Filter data
Good news:
Bad news:
Want only relevant Tweets
Have all Tweets
Too many Tweets
Page 27
Filter data (2)• #hashtags — e.g. #world-cup
• easy to filter
• hashtags must be presented
• typo?
Page 28
Filter data (2)• #hashtags — e.g. #world-cup
• easy to filter
• hashtags must be presented
• keywords — e.g. goal
• broader
• can be ambiguous
Page 29
Filter data (3)• Combine with other attributes
• Time
• during the first half of World Cup final
Page 30
Filter data (3)• Combine with other attributes
• Time
• during the first half of World Cup final
• Location
• Tweets from Brazil
• Not every Tweet is geotagged.
Page 31
Filter data (4)
• Languages
• Sometimes use only English Tweets
• Future
• Translation?
Page 32
Insights, Stories
(Tweets)
Filter
Clean
DATA
Page 33
Clean data
• Typo (Mobile input)
• Abbreviation (due to 140-character limit)
• Exaggeration (e.g. GOOOOALLLL)
• Twitter specific e.g., Old-style retweet “RT …”
• Inappropriate content
Page 34
Insights, Stories
(Tweets)
Filter
Clean
Visualize
DATA
Page 35
(+ media)photos, videos
What?
Where? When?
GEO TIME
TEXT
DATA
Page 36
What?
Where? When?
GEO TIME
TEXT
Visualize Data
Page 37
What?
Where? When?
GEO TIME
TEXT
Visualize Data
Page 38
TIME Tweets/second
Page 39
TIME Tweets/second
Page 40
TIME Tweets/second + Annotation
http://www.flickr.com/photos/twitteroffice/5681263084/
Page 41
TIME Tweets/second + Annotation
Manual
To automate
Top tweets (most Retweets, Favs)
Page 42
What?
Where? When?
GEO TIME
TEXT
Visualize Data
Page 43
GEOHeatmap
Low density
High density
Page 44
GEONew York City
flickr.com/photos/twitteroffice/8798020541
Page 45
GEOSan Francisco
flickr.com/photos/twitteroffice/8798020541
Page 46
GEOSan Francisco
Rebuild the world based on
tweet volumes
twitter.github.io/interactive/andes/
Page 47
What?
Where? When?
GEO TIME
TEXT
Visualize Data
Page 48
TIME + GEO
blog.twitter.com/2011/global-pulseyoutu.be/SybWjN9pKQk
Japan Earthquake 2011
Page 50
TIME + GEO Tweet pattern [Rios & Lin 2012]
Night
Late night
Daytime
Night
Late night
Daytime
Page 51
What?
Where? When?
GEO TIME
TEXT
Visualize Data
Page 53
TEXT
www.wordle.net
Some samples from World Cup
Page 54
TEXT Word cloud of Tweets right after the 1st goal
www.wordle.net
Page 56
TEXT WordTree [Wattenberg & Viégas 2008]
www.jasondavies.com/wordtree
www.jasondavies.com/wordtree
Page 57
TEXT• Now
• Derived information: Sentiment, Topic
• Combine with other information (geo & time) + context
• Future
• Better technique + involves more NLP e.g. key phrases, etc.
Page 58
TEXT Descriptive Keyphrases [Chuang et al. 2012]
Page 59
TEXT• Challenge
• Scale
Page 60
What?
Where? When?
GEO TIME
TEXT
Visualize Data
Page 61
GEO + TEXT Real-time Tweet map
Page 62
GEO + TEXT Real-time Tweet map
Page 63
GEO + TEXT Real-time Tweet map
most frequent
term
Page 64
GEO + TEXT Real-time Tweet map
Gmail went down Jan 24, 2014
Page 65
GEO + TEXT Real-time Tweet map
Nelson Mandela passed away Dec 5, 2013
Page 66
GEO + TEXT Real-time Tweet map
• Next:
• Involves more NLP
• Tokenization - Languages without space between words
• etc.
• Challenge:
• Real-time
Page 67
GEO + TEXT
www.yelp.com/wordmap
Yelp Wordmap
Page 68
What?
Where? When?
GEO TIME
TEXT
Visualize Data
Page 69
TIME + TEXT
http://www.babynamewizard.com/voyager
Baby Name Voyager
Page 70
TIME + TEXT
http://www.babynamewizard.com/voyager
Baby Name Voyager
Page 71
TIME + TEXT
UEFA Champions League
Biggest Tournament for European soccer clubs
Many Tweets during the matches
Page 72
TIME + TEXT UEFA Champions League
Dortmund Bayern Munich
Count Tweets mentioning the teams every minute
Team 1 Team 2
Page 73
TIME + TEXT UEFA Champions League
Page 74
TIME + TEXT UEFA Champions League
+ “goal” count + context
Page 75
TIME + TEXT UEFA Champions League
+ “offside”
Page 76
TIME + TEXT UEFA Champions League
+ players
Page 77
A B C D
A C
C
Competition Tree
vs vs
vs
Page 78
A B C D
A C
C
Competition Tree
+
vs vs
vs
Page 79
A B C D
A C
C
Competition Tree
+ =
uclfinal.twitter.com
vs vs
vs
Page 80
TIME + TEXT UEFA Champions League
• Challenges
• Filter relevance tweets
• Multiple matches at the same time
• Ambiguous words: “goal”, “red”, “yellow”
• Tweets mentioning both teams e.g. “#GER 2-2 #GHA”
Page 81
What?
Where? When?
GEO TIME
TEXT
Visualize Data
Page 82
TIME + GEO + TEXT State of the Union
twitter.github.io/interactive/sotu2014
Page 83
TIME + GEO + TEXT State of the Union
1) timeline + topic from Tweets
4) Density map of Tweets about selected topic
3) Volume of Tweets by topics
during selected part of the SOTU
2) context (speech)
twitter.github.io/interactive/sotu2014
Page 84
TIME + GEO + TEXT New Year 2014
Page 85
TIME + GEO + TEXT New Year 2014
Page 86
TIME + GEO + TEXT New Year 2014
twitter.github.io/interactive/newyear2014/
Page 88
What can we learn from these Tweets?
many, many things.
Page 89
better
the examples in this talk
imagine…
DATA(Tweets)
Page 90
Insights, Stories
(Tweets)
Filter
Clean
Visualize
DATA
Page 91
(Tweets)
Insights, Stories
Filter
Clean
Process &Visualize
DATA
Page 92
(Tweets)
Insights, Stories
Filter
Clean
Process &Visualize
DATA
NLP
Page 93
TEXTWhat?
Where? When?
GEO TIME
Visualize data
Page 94
(Tweets)
Insights, Stories
Filter
Clean
Process &Visualize
DATA
Research
Page 95
Working together
Raw data
Human
Page 96
Working together
Raw data
Human
Computer (One machine, Cloud, MapReduce, etc.)
Page 97
Working together
Raw data
Human
Ignored informationProcessed information
Computer (One machine, Cloud, MapReduce, etc.)
Page 98
Working together
Raw data
Human
Aggregated information
Ignored informationProcessed information
Computer (One machine, Cloud, MapReduce, etc.)
Page 99
Working together
Raw data
Human
Aggregated information
Ignored informationProcessed information
Computer (One machine, Cloud, MapReduce, etc.)
NLP Make computers think more like Human.
Page 100
Working together
Raw data
Human
Aggregated information
Ignored informationProcessed information
VISHelp people consume information.
Computer (One machine, Cloud, MapReduce, etc.)
NLP Make computers think more like Human.
Page 101
Working together
Raw data
Human
Aggregated information
Ignored informationProcessed information
VISHelp people consume information.
Computer (One machine, Cloud, MapReduce, etc.)
NLP Make computers think more like Human.
HCI
User interactions or
Provide feedback
Bridge the gap. Connect human & computer.
Page 102
Advanced techniques vs.
Scalability
Page 103
LifeFlow => Flying SessionsResearch System at Twitter
Page 104
Summary• Thoughts are captured in the Tweets: what, where, when
• Finding patterns from: text + geo + time
• Opportunities for NLP + HCI + VIS collaboration
• Better technique vs. Scalability + Real-time
@kristw / interactive.twitter.com