Design and Prototyping of a Social Media Observatory Karissa McKelvey and Filippo Menczer Center for Complex Networks and Systems Research School of Informatics and Computing Indiana University, Bloomington 1
Jan 27, 2015
1
Design and Prototyping of a Social Media Observatory
Karissa McKelvey and Filippo Menczer
Center for Complex Networks and Systems ResearchSchool of Informatics and Computing
Indiana University, Bloomington
truthy.indiana.edu 2
Can we use social media as laboratories for social
science?
truthy.indiana.edu 3
Political Polarization on Twitter Michael Conover, Jacob Ratkiewicz, Bruno Gonçalves, Alessandro Flammini & Filippo Menczer International Conference on Weblogs and Social Media 2011
truthy.indiana.edu 4
Data “mine” ing
truthy.indiana.edu 5
Data “ours” ing
truthy.indiana.edu 6
truthy.indiana.edu
truthy.indiana.edu 7
Design and Prototyping
truthy.indiana.edu 8
Social Mediastreaming
sensitive
very large
structured
multiple sources
truthy.indiana.edu 9
Design Considerations
ReliabilityReproducibilityTopic FilteringVisualizationOpen AccessLegal Compliance
truthy.indiana.edu 10
Design Considerations
ReliabilityReproducibilityTopic FilteringVisualizationOpen AccessLegal Compliance
truthy.indiana.edu 11
Reliability
• Spam and misinformation
celebrities
spam
astroturf
politics
12
truthy.indiana.edu 13
Reliability
• Spam and misinformation
• Cleansing and tagging by social, algorithmic, or other means
truthy.indiana.edu 14
truthy.indiana.edu 15
Reliability
• Spam and misinformation
• Cleansing and tagging by social, algorithmic, or other means
• Sampling bias
truthy.indiana.edu 16
Data Collection
• Twitter Streaming API, random sample
• August, 2010 – present
• 5TB Compressed
• Real-time access to data from last 9 months related to 3 themes: US Politics, Social Movements, News
truthy.indiana.edu 17
Design Considerations
ReliabilityReproducibilityTopic FilteringVisualizationOpen AccessLegal Compliance
truthy.indiana.edu 18
Reproducibility
• Standard ontology for the storage, reference, and transfer of these datasets between users.
truthy.indiana.edu 19
Model• Events– Post on a social media site
• Users– Actors in the post– Sender, receiver, forwarder, etc
• Meme– Discernable unit of information transfer– Eg, hashtag, URL, user, phrase…
20
Meme Diffusion Networks
truthy.indiana.edu
Retweet (forward) Mention (conversation)
truthy.indiana.edu 21
Design Considerations
ReliabilityReproducibilityTopic FilteringVisualizationOpen AccessLegal Compliance
truthy.indiana.edu 22
truthy.indiana.edu 23
Design Considerations
ReliabilityReproducibilityTopic FilteringVisualizationOpen AccessLegal Compliance
truthy.indiana.edu 24
truthy.indiana.edu 25
Design Considerations
ReliabilityReproducibilityTopic FilteringVisualizationOpen AccessLegal Compliance
truthy.indiana.edu 26
Open Access
• A public and free -- or low-cost -- social observatory enables access to large-scale social media data analytics for non-profit endeavors.
truthy.indiana.edu 27
API & Website
• Filterable and searchable interface to find memes of interest
• Endpoints for users to access data programmatically
• Visualizations
truthy.indiana.edu/apidoc
truthy.indiana.edu 28
Design Considerations
ReliabilityReproducibilityTopic FilteringVisualizationOpen AccessLegal Compliance
truthy.indiana.edu 29
Legal Compliance
• Terms of Service– Tweet text– Derived data
truthy.indiana.edu 30
truthy.indiana.edu
Thanks!