Top Banner
#1 EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams Case study: Thessaloniki International Film Festival as, S. Papadopoulos, S. Diplaris, Y. Kompatsiaris, J. Herzig, L. Boudakidis
19

EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams

Jan 27, 2015

Download

Technology

Paper presentation in PCI 2013 conference.
Abstract: Social media platforms such as Twitter and Facebook have seen increasing adoption by people worldwide. Coupled with the habit of people to use social media for sharing their daily activities and experiences, it is not surprising that a substantial part of real-world events are well described by the online streams of status updates, posts and media content. In fact, in the case of large events, such as festivals, the number of online messages and shared content can be so high that it is very hard to get an objective view of the event. To this end, this paper presents EventSense, a social media sensing framework that can help event organizers and enthusiasts capture the pulse of large events and gain valuable insights into their impact on visitors. More specifically, EventSense enables the automatic association of online messages to entities
of interest (e.g. films in the case of a film festival), the automatic discovery of topics discussed online, and the detection of sentiment (positive/negative/neutral) both at an entity level (e.g. per film) and on aggregate. In addition, the framework produces an informative social media summary of the event of interest by automatically selecting and putting together its highlights, e.g. the most discussed entities and topics, the most influential users, the evolution of the discussions’ sentiment, and the most shared media and news content. A real-world case study is presented by applying EventSense on a rich dataset collected around the 53rd Thessaloniki International Film Festival.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams

#1

EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media StreamsCase study: Thessaloniki International Film Festival

E. Schinas, S. Papadopoulos, S. Diplaris, Y. Kompatsiaris, Y. Mass, J. Herzig, L. Boudakidis

Page 2: EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams

#2

Capturing & mining large-scale events• Large-scale events attended by

thousands of people captured by mobile devices in the form of status updates, photos, ratings, etc.

• SXSW Music, Film and Interactive Conferences and Festivals

– 30000+ attendees– ~300,000 tweets between Mar 3 and 7– 40,247 tweets even the last month

• Sundance film festival– 200 films, 10 days, 50,000+ attendees– 200,000+ tweets during the festival– 20,438 tweets even the last month

A search for #tiff53 in twitter returns an unstructured list of tweets

Page 3: EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams

#3

Capturing & mining large-scale events

• The online representation of an event as a sequential list of posts and status updates is ineffective

• A more effective means of event representation would employ facets, such as entities, sub-events and sentiment.

• Challenge: – Organize information around – entities of interest– Extract meaningful insights,

obtain informative summaries

• EventSense framework

Page 4: EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams

#4

Entity Detection (1/3)

• Entities are defined as lists of properties:– a film consists of a title, description, names of

director(s)/actors• Matching status updates (tweets) to entities relies on

representing both as tf * idf vectors

m: message (tweet), f: feature (term), M: set of all event messagesboost(f): boosting factor when f is a named entity

Page 5: EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams

#5

Entity Detection (2/3)

Unigrams Bigrams

αργυρ : 0.348 αργυρ αλεξανδρ : 0.348

αλεξανδρ : 0.289 αλεξανδρ τουρκ : 0.233

τουρκ : 0.231 τουρκ ταιν : 0.201

ταιν : 0.191 ταιν μουχλ : 0.418

μουχλ : 0.616 -

1. Language Detection2. Tokenization (using the appropriate tokenizer)3. Stemming4. tf * idf weighting

5. Boost film’s name

Page 6: EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams

#6

Entity Detection (3/3)

Entity of interest

1. Select a combination of properties e.g.title, director and actors

2. Aggregate selected properties to a single string «Μούχλα Αλί Αιντίν»

3. Calculate tf * idf vector of n-grams using the same vocabulary with tweets

4. Calculate cosine similarity between an incoming message and the set of all entities of interest.

5. Assign message to the entities that similarity exceeds a predefined threshold.

Page 7: EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams

#7

Topic analysis

• 1 NN clustering algorithm to create clusters/topicsAssign an incoming message to the nearest topic, if cosine similarity exceeds a predefined threshold. Else create a new topic.

• Similarity threshold sensitivity analysis similar to entity extraction

• LSH approximation to scale up (Petrovich at al., NAACL 2010)hash the input items so that similar items are mapped to the same buckets with high probability. Reduce search only to this bucket.

• Title Extraction per TopicFor the set of the items of a topic we find the largest sequence of words with the highest frequency.

Page 8: EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams

#8

Sentiment Analysis

• Training using tweets with emoticons. E.g. positive, negative (A. Go, R. Bhayani, and L. Huang)

• For each message we extract two types of features. The first is n-grams. The second includes the existence of user mentions and URLs, punctuation, repeated letters

• Naive Bayes (NB) classifier for positive and negative data. Assuming a uniform prior for all classes, independence between features, and using the Bayes rule we get:

Page 9: EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams

#9

Aggregation & summarization• For each entity we retrieve the set of associated messages

and calculate the mean value of sentiment, Polarity and Subjectivity

• Calculate the same sentiment measures per topic and per user

• Several other statistics: top shared messages, URLs and images, top active & influential users

Page 10: EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams

#10

Dataset: 53rd Thessaloniki International Film Festival

Three sources of data1. A detailed set of the 168 films included in the official festival

program of tiff532. 3,974 tweets that contain the official hashtag of the festival

(#tiff53) for the period between November 1st and 13th 3. Film rating and bookmarking data created by the ThessFest

mobile app (available both for iPhone* and Android**).

* https://itunes.apple.com/gr/app/thessfest/id504913309?mt=8** https://play.google.com/store/apps/details?id=com.mk4droid.FF_pack&hl=el

10 days long event 2-11 November 2012

Page 11: EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams

#11

Tweet-film matching results

• film = <title, description, directors, actors>• Multiple entity representations using Greek/English/both, uni-/bi-grams• Similarity threshold sensitivity analysis

Pooling multiple representationsthreshold (0.1, 0.3)

Page 12: EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams

#12

Topic analysis results

• 834 topics (clusters)• Manual inspection

of topics:– 53.8% of topic titles

considered informative

– 98.5% of topics were found to be “clean”

Topics in time

Top-10

Page 13: EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams

#13

Sentiment analysis results

• Training– 800K positive & negative tweets for English– 12K positive & negative tweets for Greek

• Tuning (for threshold)– Manually annotated dataset from Thessaloniki Documentary Festival

(similar event)– 325/73/553 in English and 781/216/781 in Greek

• Testing– 324/33/724 in English and 901/315/1667 in Greek

– Best accuracy (English) ~ 0.75– Performance in Greek much poorer

compared to English need for richer training corpus

pos neg neut

Page 14: EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams

#14

Aggregation & summarization results (1/2)

#T: number of tweetsPol: polarity of film tweetsSubj: subjectivity of film tweetsR: average rating#R: number of ratings#F: number of times the film was bookmarked

• Films with positive polarity are rated higher. • Films that are tweeted a lot are also more likely to be rated. • Films that are tweet a lot are also more likely to be added to the users’ bookmarks.

Pearson correlation across film statistics

Page 15: EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams

#15

Aggregation & summarization results (2/2)

Most active & influential Twitter accounts (+sentiment per user)

Most shared photos (+number of retweets)

Page 16: EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams

#16

Summary

• Extract entities of interest from messagesF1 = 0.737 (precision = 0.774, recall = 0.697)

• Detect topics in event related messages834 topics, 98.5% considered “clean”

• Sentiment analysis per messages, entities & topicsAccuracy: 0.75 for English, 0.62 for Greek

• Aggregation & statistics Valuable insights and overview information

Page 17: EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams

#17

Future Work

• Apply the proposed framework to larger-scale events of different nature (e.g. music festivals, sports events).

• Monitoring and processing more OSN sources (e.g. Facebook, Instagram).

• Refine the proposed methods with the goal of improving accuracy and robustness over different datasets.

• Experiment with techniques for automatically creating visual informative summaries based on the results of the automatic analysis.

Page 18: EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams

#18

Thank You

Questions?

Page 19: EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams

#19

References

1. Petrovic S., Osborne M., Lavrenko V. (2010) Streaming first story detection with application to Twitter. Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the ACL (NAACL)

2. A. Go, R. Bhayani, and L. Huang. Twitter sentiment classification using distant supervision. 2009.