Top Banner
Applications of Twitter Data Extraction For Business and Research John Conroy [email protected]
25

John Conroy

Nov 02, 2014

Download

Documents

blogtalk

 
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: John Conroy

Applications of Twitter Data Extraction

For Business and Research

John [email protected]

Page 2: John Conroy

This talk…• Twitter 101

• Twitter data: Open, Plentiful, Real-time

• Twitter Size, Growth & User Profile

• Acquiring Twitter Data (the easy & the hard way)

• Applications of Twitter data analysis– Business– Research

• Limitations of Twitter data – demographics and spam

Page 3: John Conroy

Twitter 101• Post short messages

• Follow Other Users

• Messages (tweets) can contain hyperlinks

• i.e. subscribe to see their tweets when you log in

• 1

Page 4: John Conroy

Twitter 101

Page 5: John Conroy

Twitter 101

Retweets, at/replies…

Page 6: John Conroy

Twitter Data is Open, Plentiful, Real-time

• Open attitude to data: most users’ tweets are public (>90%)

– Channels: API, Twitter Search

• Data is plentiful: ~100m tweets per day by Nov. ’10

• Data is real-time: – 140 char posts + retweets = wildfire dissemination of

news & viral content

Page 7: John Conroy

• Iran protests ‘09:– retweets

Page 8: John Conroy

Twitter Size, Growth

• Size: 105m users by April ’10• 2.1m new users per week• 600m search queries/day

» Williams(CEO), Chirp, April ‘10

• User growth: 155% p.a.• Daily tweets growing at 550% p.a• ~100m tweets per day by Nov. ‘10

» Conroy/Griffith June ’10 (6 months data)

Page 9: John Conroy

Twitter User-Profile

• Even Male-Female split• Brand knowledge now ubiquitous• 1/6 as many users as Facebook• Age: 1/3 between 25-34 years old• Better educated, earn more

http://www.edisonresearch.com/twitter_usage_2010.php

Edison research (U.S.-oriented research)

Page 10: John Conroy

Twitter users: Age

http://www.edisonresearch.com/twitter_usage_2010.php

Edison research (U.S.-oriented research)

Twitter Users Profile

Page 11: John Conroy

Acquiring Twitter Data

• Twitter Search– http://search.twitter.com– For anybody

• The Easy Way

• The Hard Way

• Twitter APIs– REST, Search, Streaming APIs– Code (Python/PHP/Java etc…)

Page 12: John Conroy

Acquiring Twitter Data- Twitter Search

Page 13: John Conroy

Acquiring Twitter Data- Twitter Search

Page 14: John Conroy

• Things to do with Twitter Search

Acquiring Twitter Data- Twitter Search

• Find business opportunities

• Intel on competitors

• Community-building: Answer “Does anyone know…?” queries in your segment

• Find gripes/compliments on your service

•Find anything else people are saying about you

•…etc…

Page 15: John Conroy

Acquiring Data from Twitter APIs

• REST api – find out about users – how many friends, how often they tweet, get last N tweets, are they active etc.

• SEARCH api – programmatic access to Twitter search

• STREAMING api – ‘firehose’ of tweets from everyone

Page 16: John Conroy

What can we do with this data?

• Model the social graph of sub groups: find most-influential users (retweets, replies, follower/friend quotient)

• Eg Modelling the Irish Twittersphere (Conroy, Griffith, 2010)

• Find the ‘true’ social graph described by conversations, find authoritative users, broadcasters

• User engagement metrics (how often they tweet etc.)• Find similar users based on graph theory• Study viral news propagation through this sub-group• Find super-users (with a view to engaging them)

Acquiring Data from Twitter APIs

Page 17: John Conroy

• Irish users: time since last tweet (c.23k users)

1351

19 491

3192

4077

27452974 2948

4012

252 57

0500

10001500200025003000350040004500

Time Since Last Tweet

Acquiring Data from Twitter APIs

Page 18: John Conroy

• Most replied-to by Irish users

• Feb-March ’10• 93k replies from

23k users• Also c.7k retweets

(?)

Acquiring Data from Twitter APIs

Page 19: John Conroy

Predictive Modelling• Business Intelligence

• Non-Twitter example: satellite images of Wal-Mart car-parks to predict earnings – smart but expensive!

Acquiring Data from Twitter APIs

Page 20: John Conroy

Using Twitter for Predictive Modelling

• Eg 1: holiday destinations

Tweets mentioning [~holiday] plus [DESTINATION]

0

500000

1000000

1500000

2000000

2500000

3000000

2010 2011 2012 2013 2014 2015

Jan-Mar: Year

#Tw

eets

Dublin Prague Dubrovnik

Acquiring Data from Twitter APIs

Actual Tourists

0

10000002000000

30000004000000

50000006000000

7000000

2010 2011 2012 2013 2014 2015

Year

#Tou

rists

Dublin Prague Dubrovnik

Page 21: John Conroy

• Eg 2: Movie pre-launch “Buzz” & marketing budget (not real figures!)

Acquiring Data from Twitter APIs – Predictive Modelling

0 500000

1000000

1500000

2000000

2500000

3000000

3500000

4000000

Inception

The A-Team

The Expendables

Our New Movie

#tweets two weeks before launch Opening w/e Take

Page 22: John Conroy

Brand Sentiment Analysis

• Sentiment analysis of Super Bowl commercials 2010 Conroy and Griffith, 2010

– 300k tweets collected during the game– Probabilistic classification models & machine

learning– Naïve Bayes, Maximum Entropy, (S.V.M.)

– Try to find out which were the most popular commercials

– Hard!! Human language is complex…

Acquiring Data from Twitter APIs

Page 23: John Conroy

Acquiring Data from Twitter APIs

Sentiment Analysis of Superbowl Commercials: Results

Note: initial manual verification of these results shows

disappointing results… the research continues

Page 24: John Conroy

What else can we do with this data?

Acquiring Data from Twitter APIs

Page 25: John Conroy

Limitations of Twitter Data

• Twitter < Facebook for “knowing your customer”

– Facebook has demographics- age, sex etc

• Demographic skewed towards 25-34 yr olds & tech-savvy- not ubiquitous

• Spam: The game can be rigged