Research frontiers in online social media studies Understanding content, user behaviors and information diffusion Emilio Ferrara Center for Complex Networks and Systems Research School of Informatics and Computing Indiana University Bloomington August 8, 2013 Summer Workshop on Algorithms and Cyberinfrastructure for large scale optimization/AI
26
Embed
Research frontiers in online social media studiessalsahpc.indiana.edu/summerworkshop2013/slides/emilio.pdf · Advantages of using memes • More granularity: each tweet is assigned
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Research frontiers in online social media studies
Understanding content, user behaviors and information diffusion
Emilio Ferrara Center for Complex Networks and Systems Research
School of Informatics and Computing Indiana University Bloomington
August 8, 2013 Summer Workshop on Algorithms and Cyberinfrastructure for large scale optimization/AI
August 8, 2013
Summer Workshop on Algorithms
and Cyberinfrastructure for large
scale optimization/AI
August 8, 2013
Summer Workshop on Algorithms
and Cyberinfrastructure for large
scale optimization/AI
August 8, 2013
Summer Workshop on Algorithms
and Cyberinfrastructure for large
scale optimization/AI
August 8, 2013
Summer Workshop on Algorithms
and Cyberinfrastructure for large
scale optimization/AI
Data Collection • Twitter Streaming API (10% sample of total traffic)
• August, 2010 – present
• ~5TB Compressed
• Real-time access to data from last 9 months related
to 3 themes: US Politics, Social Movements, News
Summer Workshop on Algorithms
and Cyberinfrastructure for large
scale optimization/AI
August 8, 2013
Detecting early signatures of persuasion in information cascades
August 8, 2013
Summer Workshop on Algorithms
and Cyberinfrastructure for large
scale optimization/AI
Scope of the project • Data acquisition in streaming scenario from
Social Media (Twitter, FB)
• Extraction of information tokens, so-called
memes
• Clustering of memes
• Meme clusters classification
August 8, 2013
Summer Workshop on Algorithms
and Cyberinfrastructure for large
scale optimization/AI
Architecture
August 8, 2013
Summer Workshop on Algorithms
and Cyberinfrastructure for large
scale optimization/AI
Problem statement • Goal: clustering a large volume of tweets, in a
streaming scenario, in topics based on their
similarity.
• Challenges: tweets text is too sparse for
classification, we need to exploit further features:
• Network structure
• Temporal signature
• Meta-data
August 8, 2013
Summer Workshop on Algorithms
and Cyberinfrastructure for large
scale optimization/AI
Meme definition
• @Mention: the user addresses another user mentioning its username
(Twitter syntax: @)
• #Hashtag: the user tags its message with a “concept” (syntax: #)
• URL: a message can include one/multiple URL(s) in extended or
shortened format
• Phrase: whatever remains after removing mentions, hashtags and
URLs, stemming verbs/nouns, removing stop-words and punctuation.
August 8, 2013
Summer Workshop on Algorithms
and Cyberinfrastructure for large
scale optimization/AI
Advantages of using memes • More granularity: each tweet is assigned to at least one (or more)
memes
• Efficiency in real-time scenario: each incoming tweet is directly assigned to its meme/s without additional overhead
• Memes can be aggregated each other forming clusters of topics related by content/structure similarity