Our Data, Ourselves Research on Online Digital Cultures — Community Extraction from Twitter Networks by Markov Clustering Department of Digital Humanities Giles Greenway Tobias Blanke Jenifer Pybus Mark Cote
Aug 10, 2015
Our Data, Ourselves
Research on Online Digital Cultures — Community Extraction from Twitter Networks by Markov Clustering
Department of Digital Humanities
Giles Greenway
Tobias Blanke
Jenifer Pybus
Mark Cote
A “mobile-data commons”?
• Can we write an app to capture the data-trails that smartphones transmit to third parties and make them available?
• NO.• This would require rooting the 'phones. An
Android phone is a Linux system, where the end user typically doesn't have admin rights.
• If the app reaches a mass audience, we cannot expect users to root their phones. Some rooting software contains malware, we cannot ensure that users root their devices safelyhttp://tinyurl.com/weidmandroid
A “mobile-data commons”?
20 young coders from Young RewiredState (YRS) were issued with Android smartphones.
'Phones were pre-loaded with our “MobileMiner” app, that logs app network traffic, GSM cells, app notifications and WIFI network connections.
The data is logged by a CKAN server, and also made available to users on their devices.
Twitter accounts were also scraped.
Is net activity a proxy for app usage?
Sometimes not...
Some apps use analytics / ad services continually.Provoked a workshop on app reversal and network traffic capture. http://kingsbsd.github.io/DroidDestructionKit
Notifications as a proxy for social network usage.
0 200 400 600 800 1000 12000
200
400
600
800
1000
1200Twitter Network Degree vs Notifications
Friends
Followers
Number of Notifications
frie
nd
s / f
ollo
we
rs c
ou
nt
Twitter sends notifications based on people you follow. The more notifications the more friends.
Questions to ask of Twitter
●How many different “tribes” does the average teenage hacker have?●What do they Tweet about?●Do they use it conversationally? What's the distribution of lengths of chains of tweets and replies?
Need a community-detection algorithm:●Easy to implement.●Can be explained to non-technical cultural-studies academics in three slides!●Returns realistic communities.
Markov Clustering -MCL
● There are clusters of Twitter users with densely connected networks of friend/follower relationships.
• If you take a random walk around the network, you are likely to stay within the cluster you started in.
http://www.micans.org/mcl/
MCL -A Trivial Example
1: Build an adjacency matrix for the graph.2: Normalize the columns to produce transition probabilities.
MCL -A Trivial Example
4: Element-wise square the matrix and re-normalize.5: Rinse and repeat until convergence.
The matrix entries will be 0 or 1. Interpret rows as: “If I'm in this row node, which column nodes are credible start-points?”
MCL -Does it work?
MCL was applied to two Twitter accounts of digital culture researchers with ~7000 once-removed friend-follower relationships.
Gephi's “OpenOrd” layout is meant to emphasise clusters. Are nodes in the same cluster close together?
Compare with Gephi's own “modularity algorithm”, the Louvain method.
MCL -Does it work?
Louvain: Twitter accounts in the same cluster are placed close together.
MCL: Accounts in the same cluster are scattered.
This suggests that Louvain performed better than MCL.
MCL -Does it work?
Louvain: Twitter accounts in the same cluster are placed close together.
MCL: Accounts in the same cluster are scattered.
This suggests that Louvain performed better than MCL.
WRONG!
MCL -Does it work?
Why did Gephi/Louvain put these two in the same modularity class / cluster?
MCL LouvainCluster is identifiable and relevant.
20% 0% !Cluster is not identifiable, but possibly relevant.
37%Cluster is neither identifiable or relevant.
43%
Researchers rated clusters for both methods.
MCL -Does it work?
Why did the Louvain method perform poorly?-The Louvain method works by combining smaller clusters to maximize modularity. Does the very high degree of Twitter networks harm its performance? One wrongly-placed Twitter account pulls in many others.
Why was the OpenOrd layout misleading?-Both OpenOrd and Louvain work by combining smaller clusters. Both are vulnerable to the same problems.
MCL -Does it work?
MCL can suggest plausible Twitter communities.Can it find pre-existing ones?Repeat for the YRS volunteers:
MCL Louvain
MCL -Does it work?
Do the Twitter accounts of 9 YRS volunteers end up in the same cluster?MCL: Mostly...
Cluster Size 20 26 6 6 5 45 5 319 6 5 14 14 5
YRS accounts 0 0 0 0 0 1 0 8 0 0 0 0 0
Louvain: Not so much...
Cluster Size 15 78 7 43 168 67 55 230 24
YRS accounts 0 1 0 0 0 3 2 3 0
[ ~4% probability of allocating 8 Twitter users to the largest MCL cluster by chance. ]
Is inferring from layouts always problematic?
-Of course not!
Th
Theban scribes with common contracting parties
Source: Silke Vanbeselaere http://tinyurl.com/thebanscribes
What do the clusters tweet about?
Top tags for the MCL clusters:Cluster Size 6 45 319
YRS accounts 0 1 8
Top tags dotnetnotts, 18TechNott, 10NottsTest, 8JavaScript, 2hack24, 2ukbestworkplace, 2
GE2015, 78Eurovision, 2015 58leadersdebate, 33bdw2015, 24BattleForNumber10, 21BBCQT, 18GBR, 15bbcqt, 14eurovision, 14NHTG15, 13FoC2015, 12YRSAmbassadors, 11depop, 11BBCFreeSpeech, 10VoteConverative, 9YRS2014, 9DimblebyLecture, 9endpointcon, 9
GE2015, 275tech, 214jobs, 207YRS2014 185Haunted, 183ghosts, 183YRSFoc, 181hackmcr, 167,yrs2014, 156Arduino, 149FoC2015, 141Norwich, 133gamedev, 132TG, 130BigData, 112linux, 111YRSHyperlocal, 105design, 99
Conclusions:
● Acquire Twitter data with Twython/Celery/Redis/RabbbitMQ.
● Store Twitter data with: Neo4J/Py2Neo.● Perform MCL with NumPy.● Export to Gephi with NetworkX.
● Gephi and the Louvain method are fine tools, use them carefully!
● MCL is very effective (if slow) at extracting Twitter communities.
● Numerical techniques should be easy to justify and validate.
● Visualizations are powerful, persuasive, and sometimes misleading! (“Beware of geeks bearing .gifs!”)
The tools:
Download our app: http://kingsbsd.github.io/MobileMiner
Follow us on Twitter: @KingsBSD
Read our blog:http://big-social-data.net/
Read about our data:http://tinyurl.com/miningmobileyouth
Slideshare:http://www.slideshare.net/kingsBSD/