Top Banner
1 Towards a Social Media Analytics Platform: Event Detection and User Profiling for Microblogs Manish Gupta Kevin Chang [email protected], [email protected], [email protected] April 8, 2014 Rui Li
130

Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

Mar 06, 2018

Download

Documents

dinhhanh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

1

Towards a Social Media Analytics Platform: Event Detection and User

Profiling for Microblogs

Manish Gupta Kevin Chang

[email protected], [email protected], [email protected]

April 8, 2014

Rui Li

Page 2: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Characteristics of Twitter Data

• 140 characters – short documents• SMS kind of language• Code mixing (mix of multiple languages)• Tweets, Retweets, Mentions, Hashtags• Very fresh news from human sensors• Large amount of data with huge data rate

– Many irrelevant messages– Many redundant messages

• Self-contained• Simple discourse structure

Page 3: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Noisy Twitter Text: Challenges

• Lexical Variation (misspellings, abbreviations)– `2m', `2ma', `2mar', `2mara', `2maro', `2marrow', `2mor', `2mora', `2moro', `2morow',

`2morr', `2morro', `2morrow', `2moz', `2mr', `2mro', `2mrrw', `2mrw', `2mw', `tmmrw', `tmo', `tmoro', `tmorrow', `tmoz', `tmr', `tmro', `tmrow', `tmrrow', `tmrrw', `tmrw', `tmrww', `tmw', `tomaro', `tomarow', `tomarro', `tomarrow', `tomm', `tommarow', `tommarrow', `tommoro', `tommorow', `tommorrow', `tommorw', `tommrow', `tomo', `tomolo', `tomoro', `tomorow', `tomorro', `tomorrw', `tomoz', `tomrw', `tomz‘

• Unreliable Capitalization– “The Hobbit has FINALLY started filming! I cannot wait!”

• Unique Grammar– “watchng american dad.”

Page 4: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

NLP in News vs. Twitter:Thought Experiment

• Task 1– Read each sentence from today’s New York times– Except, first randomly permute the sentences– Answer basic questions about today’s news

• Task 2– Read a random sample of tweets

• From high-quality sources– Order is picked randomly– Answer basic questions about today’s news

• Claim:– Task 2 is easier than task 1.

Page 5: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Tutorial Overview• Event Detection for Twitter (80 min)

– Event Detection using Tweet Content– Event Detection using Other External Sources– Applications of Event Detection

• Break (10 min)• Event Description for Twitter (25 min)

– Finding Best Phrase to Summarize an Event– Finding Event Types– Finding Event Timespans– Finding Event Credibility– Finding Event Locations

• User Profiling in Social Media (55 min)– Content-based Profiling– Network-based Profiling– Hybrid Approach– Co-profiling Attributes and Relationships

• Summary and Discussions (10 min)

Page 6: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Tutorial Overview• Event Detection for Twitter (80 min)

– Event Detection using Tweet Content– Event Detection using Other External Sources– Applications of Event Detection

• Break (10 min)• Event Description for Twitter (25 min)

– Finding Best Phrase to Summarize an Event– Finding Event Types– Finding Event Timespans– Finding Event Credibility– Finding Event Locations

• User Profiling in Social Media (55 min)– Content-based Profiling– Network-based Profiling– Hybrid Approach– Co-profiling Attributes and Relationships

• Summary and Discussions (10 min)

Page 7: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Why Detect Events from Twitter?

• Twitter is a great news source– Human sensors report very quickly– Tweet waves travel faster than earthquake waves!

• Overload of information– Show only ranked important events

• Showing 10 relevant tweets is not a great idea, since very few real information needs can be satisfied by a single short piece of text

• Will look at applications of event detection later

Page 8: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Manual Event Detection

• Twitter partnered with the third-party website WhatTheTrend to provide definitions of trending topics

• WhatTheTrend allows users to manually enter descriptions of why a topic is trending

• Problems– Spam– Manual (significant efforts)– Time lag

Page 9: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Tutorial Overview• Event Detection for Twitter (80 min)

– Event Detection using Tweet Content– Event Detection using Other External Sources– Applications of Event Detection

• Break (10 min)• Event Description for Twitter (25 min)

– Finding Best Phrase to Summarize an Event– Finding Event Types– Finding Event Timespans– Finding Event Credibility– Finding Event Locations

• User Profiling in Social Media (55 min)– Content-based Profiling– Network-based Profiling– Hybrid Approach– Co-profiling Attributes and Relationships

• Summary and Discussions (10 min)

Page 10: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Twitter Monitor• Michael Mathioudakis and Nick Koudas. TwitterMonitor: trend

detection over the twitter stream. SIGMOD '10• Identifies ‘bursty’ keywords, i.e., keywords that suddenly appear in

tweets at an unusually high rate.• Groups bursty keywords into trends based on their co-occurrences.• Extracts additional information from the tweets that belong to the

trend, aiming to discover interesting aspects of it.

Page 11: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Twitter Monitor: Trend Analysis

• Find bursty keywords• Group bursty keywords based on their co-

occurrences to get trends (keyword clusters)• For every trend

– Identify more keywords which may not be bursty but provide the context of the event

• Using SVD– Identify frequently mentioned entities in tweets

containing the trend keywords– Links in related tweets– Frequent geographical origins of related tweets

Page 13: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Detecting Events using Graph Community Analysis

Page 14: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Detecting Events using LSH (1)• Saša Petrović, Miles Osborne, and Victor Lavrenko.

Streaming first story detection with application to Twitter. HLT '10.

• Locality sensitive hashing can be used with the cosine similarity based distance function to compute nearest neighbor documents given a document d.– Cosine similarity based nearest neighbor finding can be done

using hash functions which project the document vector onto a random hyperplane

– Increasing the number of such hyperplanes (k) decreases the prob. of chance collisions.

– But that also decreases prob. of colliding with nearest neighbor.– Hence maintain multiple (L) hash tables.

Page 15: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Detecting Events using LSH (2)• LSH has a problem

– If the nearest neighbor is far away, LSH does not work

– So, if the minimum distance of new document d with all documents in colliding bucket (as per LSH) is higher than a threshold, check distance of d with most recent 1000 documents and update min distance if needed.

• Two improvements– In each hash table, maintain a

constant number of documents per bucket. Remove old documents

– On collision with buckets in L hash tables, don’t compare with all documents in all L hash tables. Instead compare to the 3L documents that collide most frequently with the new document.

Page 16: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Detecting Events using LSH (3)

Page 17: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Detecting Events using CRFs (1)

• Given a repository of tweets, first find named entities using a CRF-based NER on tweets

• Next, find entity-referring phrases.• Useful to display in connection with events

– E.g. “Steve Jobs” + “died” + “October 6”• Helpful in categorizing Events into Types• Examples

– Apple to Announce iPhone 5 on October 4th! YES!– iPhone 5 announcement coming Oct 4th– WOOOHOO NEW IPHONE TODAY! CAN’T WAIT!

Page 18: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Detecting Events using CRFs

• Using CRF to identify event-referring phrases– Contextual features

• POS tags• Adjacent words

– Dictionary Features• Event words gathered

from WordNet• Brown Clusters

– Orthographic Features• Prefixes, suffixes

Page 19: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Detecting Structured Events using CRFs (1)

• Arpit Khurdiya, Lipika Dey, Diwakar Mahajan, and Ishan Verma. Extraction and Compilation of Events and Sub-events from Twitter. WI-IAT '12

• Two level of CRFs– First level identifies a sub-event comprising of

actor, action, object, context, date, and location– Second level combines various sub-events to get

the event title

Page 20: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Detecting Structured Events using CRFs (2)

Page 21: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Detecting Structured Events using CRFs (3)

• Features for the CRF predictor– Word Features – Each word in its lemmatized form – Orthogonal features – This set of features include

capitalization, numeric features etc. – Twitter-specific features – hash-tags, user

mentions, re-tweets etc. – Parts-Of-Speech Tags for a 5-word window – Named Entity Tags – These include tags like

Names of People, Location, Date etc. assigned to words or sets of words

Page 22: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Detecting Breaking Events using Hashtags (1)• Anqi Cui, Min Zhang, Yiqun Liu, Shaoping Ma, and Kuo Zhang.

Discover breaking events with popular hashtags in twitter. CIKM '12.• Issues with hashtags

– A specific hashtag may refer to different objects, i.e., ambiguous.• #tea may refer to either the beverage or the Tea Party Movement

– Different hashtags may describe the same event• #taiwanfloods and #morakot both refer to the typhoon Morakot that attacked

Taiwan in 2009– Hashtags are sensitive to wide topics, so the topics they indicate may

not be real-world events, i.e., they are not with a specific time period, location or people involved

• Twitter memes (conversational topics that attract users to share their own personal feelings) like , #iaintafraidtosay and #foramilliondollars are ephemeral but less valuable than the real events

Page 23: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Detecting Breaking Events using Hashtags (2)

Page 24: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Detecting Breaking Events using Hashtags (3)

Page 25: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Detecting Events using Tag Correlations (1)• Foteini Alvanaki, Michel Sebastian, Krithi Ramamritham, and

Gerhard Weikum. EnBlogue: emergent topic detection in web 2.0 streams. SIGMOD '11.

• Compared to Mathioudakis and Koudas’s Twitter Monitor system, this work considers shifts in tag correlations

Page 26: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Detecting Events using Tag Correlations (2)

• Framework– Seed tag selection

• Based on popularity– Correlation tracking

• For each tag pair that contains at least one seed tag, track correlations

– Shift detection• Sudden (but significant) increases in the correlation of

tag pairs• If current correlation is significantly different from the

prediction based on the previous correlation values

Page 27: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Detecting Events using Segments (1)• Chenliang Li, Aixin Sun, and Anwitaman Datta. Twevent: segment-based

event detection from tweets. CIKM '12.• Each tweet is split into non-overlapping segments (i.e., phrases possibly

refer to named entities or semantically meaningful information units). • The bursty segments are identified within a fixed time window based on

their frequency patterns, and each bursty segment is described by the set of tweets containing the segment published within that time window.

• The similarity between a pair of bursty segments is computed using their associated tweets.

• After clustering bursty segments into candidate events, Wikipedia is exploited to identify the realistic events and to derive the most newsworthy segments to describe the identified events.

Page 28: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Detecting Events using Segments (2)

Page 29: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Detecting Events using Segments (3)

Page 30: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Detecting Events using Segments (4)

Page 31: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Detecting Events using Segments (5): Examples

Page 32: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Tutorial Overview• Event Detection for Twitter (80 min)

– Event Detection using Tweet Content– Event Detection using Other External Sources– Applications of Event Detection

• Break (10 min)• Event Description for Twitter (25 min)

– Finding Best Phrase to Summarize an Event– Finding Event Types– Finding Event Timespans– Finding Event Credibility– Finding Event Locations

• User Profiling in Social Media (55 min)– Content-based Profiling– Network-based Profiling– Hybrid Approach– Co-profiling Attributes and Relationships

• Summary and Discussions (10 min)

Page 33: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Detecting Events by Label Propagation from News (1)

Ting Hua, Feng Chen, Liang Zhao, Chang-Tien Lu, and Naren Ramakrishnan. STED: semi-supervised targeted-interest event detectionin in twitter. KDD '13

Page 34: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Detecting Events by Label Propagation from News (2)

• From news articles, extract named entities and action words

• Tweets containing at least 1 named entity and 1 action word is labeled as positive

• Label propagation– Identify social ties terms from labeled tweets:

Mentions(@), Hashtag(#)– Remove infrequent terms– Get more tweets from database which contain these terms– Label new tweets for a term as positive if #newly

discovered tweets<#already labeled tweets for term t– Iterate label propagation until no new tweets are found

Page 35: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Detecting Events by using Information from Knowledge Bases

• Heather S. Packer, Sina Samangooei, Jonathon S. Hare, Nicholas Gibbins, and Paul H. Lewis. Event detection using Twitter and structured semantic query expansion. CrowdSens '12

• Given an event query, extract tweets containing the query

• From these tweets extract entities• Find related entities from knowledge bases thereby

extending the query• Use these new entities to retrieve more tweets

relevant to the event, thereby summarizing the event in a more comprehensive manner.

Page 36: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Tutorial Overview• Event Detection for Twitter (80 min)

– Event Detection using Tweet Content– Event Detection using Other External Sources– Applications of Event Detection

• Break (10 min)• Event Description for Twitter (25 min)

– Finding Best Phrase to Summarize an Event– Finding Event Types– Finding Event Timespans– Finding Event Credibility– Finding Event Locations

• User Profiling in Social Media (55 min)– Content-based Profiling– Network-based Profiling– Hybrid Approach– Co-profiling Attributes and Relationships

• Summary and Discussions (10 min)

Page 37: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Detecting Forest-firesDetecting Forest-fires: Bertrand De Longueville, Robin S. Smith, and Gianluca Luraschi. "OMG, from here, I can see the flames!": a use case of mining location based social networks to acquire spatio-temporal data on forest fires. LBSN '09

Page 38: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Detecting Sporting Events (1)• Jeffrey Nichols, Jalal Mahmud, and Clemens Drews. Summarizing

sporting events using twitter. IUI '12• Generate a journalistic summary of events from tweets• Spikes are used to identify important moments to describe an event• Sporting events consist of a sequence of moments, each of which

may contain actions by players, the referee, the fans, etc.

Page 39: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Detecting Sporting Events (2)

Page 40: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Detecting Sporting Events (3)

7 point Likert scale

Page 41: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Detecting Local Festivals (1)

• Ryong Lee and Kazutoshi Sumiya. Measuring geographical regularities of crowd behaviors for Twitter-based geo-social event detection. LBSN '10

• To detect such unusual geo-social events, they depend on geographical regularities deduced from the usual behavior patterns of crowds with geo-tagged microblogs.

• By comparing these regularities with the estimated ones, they decide whether there are any unusual events happening in the monitored geographical area.

Page 42: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Detecting Local Festivals (2)

RoIs = Regions of InterestRoIs (sub-regions)are computed from the regionby running K-Meanson geographicallydistributed points

Page 43: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Detecting Local Festivals (3)• Measuring Geographical Regularity

– #Tweets: the number of tweets that were written in an RoI within a specific period of time.

– #Crowd: the number of Twitter users found in an RoI within a specific time period.

– #MovCrowd: : 1) Inner: A crowd in an RoI moves only inside the region without going outside. 2) Incoming: There are some people coming from outside, and 3) Outgoing: Conversely, some people move outside the RoI. For simplification, they consider #MovCrowd as inner+incoming

Num

ber o

f tw

eets

#MovCrowd

Page 44: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Detecting Local Festivals (4)

Page 45: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Detecting Local Festivals (5)

The algorithm could detect 13 of the 15 festivals

Page 46: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Detecting Drug Related Adverse Events (1)

• Jiang Bian, Umit Topaloglu, and Fan Yu. Towards large-scale twitter mining for drug-related adverse events. SHB '12

• Given the high frequency of user updates, mining Twitter messages can lead us to real-time pharmacovigilance.

• To mine Twitter messages for AEs, the process can be separated into two parts: 1) identifying potential users of the drug; 2) finding possible side effects mentioned in the users’ Twitter timeline that might be caused by the use of the drug concerned.

Page 47: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Detecting Drug Related Adverse Events (2)

Twitter users detected by mentions of drugs of interest in their tweets

Positive: “No more tamoxifen for me -finished 5 yrs of post cancer drug therapy”Negative: “Please visit us at www.genglob.com For generic anti cancer drugs medicines alkeran, iressa, gefitinib, erlotinib, temonat, revlimid, Velcade”

Page 48: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Detecting Drug Related Adverse Events (3)• Features for identifying drug users

– Textual features that construct a specific meaning in the text:• Bag-of-words features that indicate an action or a state that the user

has taken the drug• Number of hash-tags occurred in the document• Number of reply-tags occurred in the document• Number of words that indicate negation• Number of URLs• Number of pronouns• Number of occurrences of the drug name or its synonyms

– Semantic features that express the existence of semantic properties (i.e., based on Unified Medical Language System (UMLS) Concept Unique Identifiers (CUIs) extracted from the Tweets)

• Number of CUIs in each Semantic Type• Number of CUIs in each Semantic Group

Page 49: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Detecting Traffic Events• Sílvio S. Ribeiro, Jr., Clodoveu A. Davis, Jr., Diogo Rennó R. Oliveira, Wagner

Meira, Jr., Tatiana S. Gonçalves, and Gisele L. Pappa. Traffic Observatory: A System to Detect and Locate Traffic Events and Conditions using Twitter. LBSN '12

• Four phases– Preprocessing of the messages’ content– Traffic event identification/detection

• Using manually generated list of terms– Detection of locations using exact string matching– Enhancement of the location information using approximate string matching

• To handle typos, shortened place names, nicknames, historical names

Page 50: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Detecting Epidemics (1)

Page 51: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Detecting Epidemics (2)

Page 52: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Detecting Earthquakes (1)• Takeshi Sakaki, Makoto Okazaki, and Yutaka Matsuo.

Earthquake shakes Twitter users: real-time event detection by social sensors. WWW '10

• Search tweets including keywords related to a target event– “shaking”, “earthquake”

• Classify tweets (SVM) into a positive class or a negative class– “Earthquake right now!!” ---positive– “Someone is shaking hands with my boss” --- negative– Features

• Statistical features: # words in a tweet message and the position of the query within a tweet

• Keyword features: the words in a tweet• Word context features: the words before and after the query word

Page 53: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Detecting Earthquakes (2)

Page 54: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Detecting Earthquakes (3): Location Estimation Example

Tokyo

Osakaactual earthquake center

Kyoto

estimationby median

estimationby particle filter

• Particle filters performs better than other methods

• If the center of a target event is in an oceanic area, it’s more difficult to locate it precisely from tweets

• It becomes more difficult to make good estimation in less populated areas

• A person has about 20~30 sec before its arrival at a point that is 100 km distant from an actual center

Page 55: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Detecting Emerging Controversial Events (1)• Ana-Maria Popescu and Marco Pennacchiotti. Detecting controversial

events from twitter. CIKM '10• Controversial event: An event is controversial if it provokes a public

discussion in which audience members express opposing opinions or disbelief

• Twitter snapshot=target entity+time period+set of tweets• Goal: Rank Twitter snapshots by controversy score• Regression model to detect event snapshots and to compute controversy

scores• Features

– Twitter-based features: snapshots' linguistic properties, structural and social graph information, the intensity of the discussion about the entity, the distribution of sentiment words in the snapshot, and the level of controversy

– News buzz features: if an entity is buzzy in news articles at the same time it is buzzy in a Twitter snapshot, then the snapshot is likely to refer to a real-world event.

– Web and news controversy features: assess the past and present levels of controversy surrounding the target entity in the snapshot.

Page 56: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Detecting Emerging Controversial Events (2)

Page 57: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Detecting Emerging Controversial Events (3)

Page 58: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Summary for Event Detection

• Detecting events from Twitter is difficult because of the unique characteristics of the microblogs

• We saw various ways of detecting interesting events from Twitter– Using Twitter content itself– Using external knowledge sources

• Finally, we discussed various applications of event detection on Twitter

Page 59: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Further Reading (1)• Michael Mathioudakis and Nick Koudas. TwitterMonitor: trend detection over the twitter stream.

SIGMOD '10• Sayyadi, H., Hurst, M., and Maykov, A. (2009). Event Detection and Tracking in Social Streams.

ICWSM. • Saša Petrovic, Miles Osborne, and Victor Lavrenko. Streaming first story detection with application

to Twitter. HLT '10.• Arpit Khurdiya, Lipika Dey, Diwakar Mahajan, and Ishan Verma. Extraction and Compilation of

Events and Sub-events from Twitter. WI-IAT '12• Anqi Cui, Min Zhang, Yiqun Liu, Shaoping Ma, and Kuo Zhang. Discover breaking events with

popular hashtags in twitter. CIKM '12.• Foteini Alvanaki, Michel Sebastian, Krithi Ramamritham, and Gerhard Weikum. EnBlogue: emergent

topic detection in web 2.0 streams. SIGMOD '11.• Chenliang Li, Aixin Sun, and Anwitaman Datta. Twevent: segment-based event detection from

tweets. CIKM '12.• Ting Hua, Feng Chen, Liang Zhao, Chang-Tien Lu, and Naren Ramakrishnan. STED: semi-supervised

targeted-interest event detectionin in twitter. KDD '13• Heather S. Packer, Sina Samangooei, Jonathon S. Hare, Nicholas Gibbins, and Paul H. Lewis. Event

detection using Twitter and structured semantic query expansion. CrowdSens '12• Beaux Sharifi, Mark-Anthony Hutton, and Jugal Kalita. Summarizing microblogs automatically. HLT

'10• Alan Ritter, Mausam, Oren Etzioni, and Sam Clark. Open domain event extraction from twitter. KDD

'12

Page 60: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Further Reading (2)• Donald Metzler, Congxing Cai, and Eduard Hovy. Structured event retrieval over microblog archives.

HLT '12.• Chung-Hong Lee, Hsin-Chang Yang, Tzan-Feng Chien, and Wei-Shiang Wen. A Novel Approach for

Event Detection by Mining Spatio-temporal Information on Microblogs. ASONAM '11• Detecting Forest-fires: Bertrand De Longueville, Robin S. Smith, and Gianluca Luraschi. "OMG, from

here, I can see the flames!": a use case of mining location based social networks to acquire spatio-temporal data on forest fires. LBSN '09

• Carlos Castillo, Marcelo Mendoza, and Barbara Poblete. Information credibility on twitter. WWW '11.

• Jeffrey Nichols, Jalal Mahmud, and Clemens Drews. Summarizing sporting events using twitter. IUI '12

• Ryong Lee and Kazutoshi Sumiya. Measuring geographical regularities of crowd behaviors for Twitter-based geo-social event detection. LBSN '10

• Jiang Bian, Umit Topaloglu, and Fan Yu. Towards large-scale twitter mining for drug-related adverse events. SHB '12

• Sílvio S. Ribeiro, Jr., Clodoveu A. Davis, Jr., Diogo Rennó R. Oliveira, Wagner Meira, Jr., Tatiana S. Gonçalves, and Gisele L. Pappa. Traffic Observatory: A System to Detect and Locate Traffic Events and Conditions using Twitter. LBSN '12

• Vasileios Lampos, Tijl De Bie, and Nello Cristianini. Flu detector: tracking epidemics on twitter. ECML PKDD'10

• Takeshi Sakaki, Makoto Okazaki, and Yutaka Matsuo. Earthquake shakes Twitter users: real-time event detection by social sensors. WWW '10

• Ana-Maria Popescu and Marco Pennacchiotti. Detecting controversial events from twitter. CIKM '10

Page 61: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Tutorial Overview• Event Detection for Twitter (80 min)

– Event Detection using Tweet Content– Event Detection using Other External Sources– Applications of Event Detection

• Break (10 min)• Event Description for Twitter (25 min)

– Finding Best Phrase to Summarize an Event– Finding Event Types– Finding Event Timespans– Finding Event Credibility– Finding Event Locations

• User Profiling in Social Media (55 min)– Content-based Profiling– Network-based Profiling– Hybrid Approach– Co-profiling Attributes and Relationships

• Summary and Discussions (10 min)

Page 62: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Tutorial Overview• Event Detection for Twitter (80 min)

– Event Detection using Tweet Content– Event Detection using Other External Sources– Applications of Event Detection

• Break (10 min)• Event Description for Twitter (25 min)

– Finding Best Phrase to Summarize an Event– Finding Event Types– Finding Event Timespans– Finding Event Credibility– Finding Event Locations

• User Profiling in Social Media (55 min)– Content-based Profiling– Network-based Profiling– Hybrid Approach– Co-profiling Attributes and Relationships

• Summary and Discussions (10 min)

Page 63: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Finding the Best Phrase to Describe an Event• Beaux Sharifi, Mark-Anthony Hutton, and Jugal Kalita.

Summarizing microblogs automatically. HLT '10

• “Ted Kennedy died today”

Event p=“Ted Kennedy”

Page 64: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Finding the Best Phrase to Describe an Event• How to describe these events corresponding to a

phrase p?– Phrase Reinforcement Algorithm

• Get all tweets containing p• Remove spam and non-English tweets• Get the longest sentence from each post which contains p• Build a graph representing common sequences of words that

occur both before and after p• Partial sentence = path with max total weight beginning from root

and ending at a non-root node and containing nodes that occur>T times

• Build graph again by setting p as the partial path• Full sentence = path with max total weight beginning from root

and ending at a non-root node and containing nodes that occur>T times

Page 65: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Tutorial Overview• Event Detection for Twitter (80 min)

– Event Detection using Tweet Content– Event Detection using Other External Sources– Applications of Event Detection

• Break (10 min)• Event Description for Twitter (25 min)

– Finding Best Phrase to Summarize an Event– Finding Event Types– Finding Event Timespans– Finding Event Credibility– Finding Event Locations

• User Profiling in Social Media (55 min)– Content-based Profiling– Network-based Profiling– Hybrid Approach– Co-profiling Attributes and Relationships

• Summary and Discussions (10 min)

Page 66: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Finding Event Types (1)

• Alan Ritter, Mausam, Oren Etzioni, and Sam Clark. Open domain event extraction from twitter. KDD '12.

• Would like to categorize events into types, for example:– Sports– Politics– Product releases– …

• Benefits:– Allow more customized Twitter event calendars– Could be useful in upstream tasks

Page 67: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Finding Event Types (2)• Challenges

– Many Different Types– Not sure what is the right set of types– Set of types might change

• Might start talking about different things• Might want to focus on different groups of users

• Solution: Unsupervised Event Type Induction– Latent Variable Models

• Generative Probabilistic Models– Advantages:

• Discovers types which match the data• No need to annotate individual events• Don’t need to commit to a specific set of types• Modular, can integrate into various applications

Page 68: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Finding Event Types (3)

Each Event Phrase is modeled as a mixture of

types

Each Event phrase is modeled as a mixture of

types

Each Event Type is Associated with a

Distribution over Entities and Dates

P(SPORTS|cheered)= 0.6P(POLITICS|cheered)= 0.4

Page 69: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Finding Event Types (4)

Page 70: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Tutorial Overview• Event Detection for Twitter (80 min)

– Event Detection using Tweet Content– Event Detection using Other External Sources– Applications of Event Detection

• Break (10 min)• Event Description for Twitter (25 min)

– Finding Best Phrase to Summarize an Event– Finding Event Types– Finding Event Timespans– Finding Event Credibility– Finding Event Locations

• User Profiling in Social Media (55 min)– Content-based Profiling– Network-based Profiling– Hybrid Approach– Co-profiling Attributes and Relationships

• Summary and Discussions (10 min)

Page 71: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Finding Event TimeSpans (1)

• A list of timespans during which an instance of the event occurred and was actively discussed within the microblog stream.

• For each timespan, a small set of relevant messages are retrieved for the purpose of providing a high level summary of the event that occurred during the timespan

Donald Metzler, Congxing Cai, and Eduard Hovy. Structured event retrieval over microblog archives. HLT '12.

Page 72: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Finding Event TimeSpans (2)

• Framework– Timespan retrieval– Summarization

• Query expansion is needed because– Microblog messages that are highly related to the

query might not contain any of the query keywords– Vocabulary mismatch: Keyword might be expressed in

another form: possibly shortened or slang. E.g., earthquake may be written as quake or #eq

Page 73: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Finding Event TimeSpans (3)

Page 74: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Finding Event TimeSpans (4)

Page 75: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Finding Event TimeSpans (5)

Page 76: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Finding Event TimeSpans (6): Examples

Page 77: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Tutorial Overview• Event Detection for Twitter (80 min)

– Event Detection using Tweet Content– Event Detection using Other External Sources– Applications of Event Detection

• Break (10 min)• Event Description for Twitter (25 min)

– Finding Best Phrase to Summarize an Event– Finding Event Types– Finding Event Timespans– Finding Event Credibility– Finding Event Locations

• User Profiling in Social Media (55 min)– Content-based Profiling– Network-based Profiling– Hybrid Approach– Co-profiling Attributes and Relationships

• Summary and Discussions (10 min)

Page 78: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Finding Credibility of Events (1)

• Classifier (User Features)– User has many friends and followers.– User has linked his Twitter profile to his Facebook

profile.– User is a verified user.– User registered on Twitter long back.– User has made a lot of posts.– A description, URL, profile image, location is

attached to user’s profile.

Page 79: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Finding Credibility of Events (2)

• Classifier (Tweet Features)– It is complete. A more complete tweet gives a more

complete picture of the truth.– A professionally written tweet with no slang words,

question marks, exclamation marks, full uppercase words, or smileys is more credible.

– Number of words with first, second, third person pronouns.

– Presence of supportive evidence in the form of external URLs.

– A tweet may be regarded as more credible if it is from the most frequent location related to the event.

Page 80: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Finding Credibility of Events (3)

• Classifier (Event Features)– Number of tweets and retweets related to the

event.– Number of distinct URLs, domains, hashtags, user

mentions, users, locations related to the event.– Number of hours for which the event has been

popular.– Percentage tweets related to the event on the

day when the event reached its peak popularity.

Page 81: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Tutorial Overview• Event Detection for Twitter (80 min)

– Event Detection using Tweet Content– Event Detection using Other External Sources– Applications of Event Detection

• Break (10 min)• Event Description for Twitter (25 min)

– Finding Best Phrase to Summarize an Event– Finding Event Types– Finding Event Timespans– Finding Event Credibility– Finding Event Locations

• User Profiling in Social Media (55 min)– Content-based Profiling– Network-based Profiling– Hybrid Approach– Co-profiling Attributes and Relationships

• Summary and Discussions (10 min)

Page 82: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Finding Locations of Events (1)• GPS tags from Users’ Tweets from GPS-enabled mobile

devices– Current Tweets– Users’ historical tweets

• Location field from User Profiles– City Names – GPS cordinates

• Other profile information that can be utilized to infer users’ current location– UTC (Coordinated Universal Time) offset in the timezone

field of tweets– URL domain names (e.g. .com for US, .jp for Japan, .de for

Germany and .uk for UK) in profile “URL” field

Page 83: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Finding Locations of Events (2)

• However, – Less than one percent of tweets has GPS tags

(Cheng et al. 2010)– Around 16% users provide locations in their

profiles

• So, we need to profile users’ locations (as well as their other attributes) from social media, such as users’ tweets and their social network.

Page 84: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Summary for Event Description

• For best phrase detection, we discussed the phrase reinforcement algorithm.

• For event type detection, we discussed a latent variable model.

• For finding event timespans, we discussed a method that performs temporal query expansion.

• For event credibility computation, we discussed a classifier with interesting event, tweet and user features.

• For event location prediction, we will discuss ways to predict location of users.

Page 85: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Further Reading• Beaux Sharifi, Mark-Anthony Hutton, and Jugal Kalita.

Summarizing microblogs automatically. HLT '10• Alan Ritter, Mausam, Oren Etzioni, and Sam Clark.

Open domain event extraction from twitter. KDD '12.• Donald Metzler, Congxing Cai, and Eduard Hovy.

Structured event retrieval over microblog archives. HLT '12.

• C. Castillo, M. Mendoza, and B. Poblete. Information Credibility on Twitter. In Proc. of the 20th Intl. Conf. on World Wide Web (WWW), pages 675–684, 2011.

• M. Gupta, P. Zhao, and J. Han. Evaluating Event Credibility on Twitter. In Proc. of the 2012 SIAM Intl. Conf. on Data Mining (SDM), pages 153–164, 2012.

Page 86: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Tutorial Overview• Event Detection for Twitter (80 min)

– Event Detection using Tweet Content– Event Detection using Other External Sources– Applications of Event Detection

• Break (10 min)• Event Description for Twitter (25 min)

– Finding Best Phrase to Summarize an Event– Finding Event Types– Finding Event Timespans– Finding Event Credibility– Finding Event Locations

• User Profiling in Social Media (55 min)– Content-based Profiling– Network-based Profiling– Hybrid Approach– Co-profiling Attributes and Relationships

• Summary and Discussions (10 min)

Page 87: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

User Profiling is Difficult

• Content is noisy – Twitter users often rely on shorthand and non-

standard vocabulary for informal communication.– Users may be interested in some big cities/events

(such as New York, Japan earthquake ).• Social network is noisy

– Users may connect to their friends, who live in different cities.

– Users are more likely to follow only a few celebrities. • A user may have more than one associated

locations– E.g., a user studies at Illinois and works in California

Page 88: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Four Aspects of User Profiling with a focus on Location Prediction

• Content based profiling profiles users’ attributes based on their tweets.

• Network based profiling profiles users’ attributes based on the network (friends).

• Hybrid based profiling profiles users’ based on both their tweets and the network.

• Co-profiling integrates user attribute profiling with other tasks to gain mutual enhancement.

We will mainly focus on profiling locations

Page 89: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Tutorial Overview• Event Detection for Twitter (80 min)

– Event Detection using Tweet Content– Event Detection using Other External Sources– Applications of Event Detection

• Break (10 min)• Event Description for Twitter (25 min)

– Finding Best Phrase to Summarize an Event– Finding Event Types– Finding Event Timespans– Finding Event Credibility– Finding Event Locations

• User Profiling in Social Media (55 min)– Content-based Profiling– Network-based Profiling– Hybrid Approach– Co-profiling Attributes and Relationships

• Summary and Discussions (10 min)

Page 90: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Content based Profiling

• Intuition– Users at a specific location (e.g., Houston) may tweet

some local words in their tweets.• Generally, we can take a text based classification

approach – View attributes as labels.– Use words and other signals (lexicon, topics) as

features.– Train classification models with users whose attributes

are known. • Many interesting extensions are proposed.

Page 91: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

A Simple Probabilistic Model for Profiling Users’ Countries and States

Page 92: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

A Probabilistic Model for Profiling Users’ Cities (Overview)

• Z. Cheng, J. Caverlee, and K. Lee, “You are where you tweet: a content-based approach to geo-locating twitter users,” in ACM CIKM 2010.

• Two Important Improvements– A feature selection component for automatically

identifying words in tweets with a strong local geo-scope– A lattice-based neighborhood smoothing model for

refining a user's location estimate• Results

– On average the location estimates converge quickly (needing just 100s of tweets), placing 51% of Twitter users within 100 miles of their actual location.

Page 93: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

A Probabilistic Model for Profiling Users’ Cities (Basic Model)

Questions to ask1. Is there a subset of words which have a more compact geographical scope compared to other words in the dataset? And can these "local" words be discovered from the content of tweets?2. In what way can we overcome the location sparsity of words in tweets? Smoothing?

Page 94: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

A Probabilistic Model for Profiling Users’ Cities (Local Word Selection)

Page 95: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

A Probabilistic Model for Profiling Users’ Cities (Smoothing)

Page 96: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Further Improvement: Selecting Location Indicative Words with Information Gain

Page 97: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Further Improvement: GMMs to Handle Local Words with Multiple Peaks

Page 98: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Tutorial Overview• Event Detection for Twitter (80 min)

– Event Detection using Tweet Content– Event Detection using Other External Sources– Applications of Event Detection

• Break (10 min)• Event Description for Twitter (25 min)

– Finding Best Phrase to Summarize an Event– Finding Event Types– Finding Event Timespans– Finding Event Credibility– Finding Event Locations

• User Profiling in Social Media (55 min)– Content-based Profiling– Network-based Profiling– Hybrid Approach– Co-profiling Attributes and Relationships

• Summary and Discussions (10 min)

Page 99: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Network Based Profiling

• Intuition– A user is likely to connect to friends in real life,

who are likely to live close to the user.

• Generally, we can take a network (collective) classification approach.– Propagate friends’ location labels to users in a

iterative or non-iterative way.

• Many interesting ideas to extend the basic collective classification approach.

Page 100: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

A Simple Propagation Approach for Profiling Users’ Locations (Overview)

• Clodoveu A. Davis Jr., Gisele L. Pappa, Diogo RennóRocha de Oliveira, Filipe de L. Arcanjo. Inferring the Location of Twitter Messages Based on User Relationships. TGIS 2011.

• Propagate locations via relationships– Count the most popular locations among the friends of a

user, using a simple majority voting scheme– The most popular location among friends is set as the

location of a user– Some rules used in the approach

• Minimum and Maximum number of friends a user should have in order to have his or her location correctly inferred

• Minimum number of votes a location needs to be considered as the correct one

Page 101: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

A Probabilistic Propagation Approach for Profiling Users’ Locations (Overview)

• Lars Backstrom et.al. Find me if you can: improving geographical prediction with social and spatial proximity. In Proceedings of WWW '10. 61-70.

• Propagate locations among friends probabilistically – Define the probability of being friends based on users’ locations

• Consider distances of locations • Is Robust to noisy data

• Results– Dataset 2.9 million Facebook users whose locations are known.– Achieve 67.5% accuracy within 50 miles and improve IP based

baseline by 10% for users who have more ten labeled friends

Page 102: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

A Probabilistic Propagation Approach for Profiling Users’ Locations (Model)

Page 103: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

A Probabilistic Propagation Approach for Profiling Users’ Locations (Algorithm)

Page 104: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

A Dynamic Bayesian Network For Profiling Users’ Locations (Overview)

• A. Sadilek, H. Kautz, and J. P. Bigham. Finding your friends and following them to where you are. WSDM ’12.

• Best paper at WSDM 2012.• It studies two separated problems:

– Location profiling• GPS and Time Information • Fine grained profiling

– Friendship prediction• Results

– Achieved 57% accuracy with friends’ locations information.

Page 105: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

A Dynamic Bayesian Network For Profiling Users’ Locations (Model)

Page 106: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

A Dynamic Bayesian Network For Profiling Users’ Locations (Algorithm)

Page 107: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Tutorial Overview• Event Detection for Twitter (80 min)

– Event Detection using Tweet Content– Event Detection using Other External Sources– Applications of Event Detection

• Break (10 min)• Event Description for Twitter (25 min)

– Finding Best Phrase to Summarize an Event– Finding Event Types– Finding Event Timespans– Finding Event Credibility– Finding Event Locations

• User Profiling in Social Media (55 min)– Content-based Profiling– Network-based Profiling– Hybrid Approach– Co-profiling Attributes and Relationships

• Summary and Discussions (10 min)

Page 108: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Hybrid Approach: Profiling Users’ Locations based on both Content and Network

• Integrate both content and network• Capture additional insights• We will focus on two works

Page 109: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Unified and Discriminative Influence Model for Inferring Home Locations (Overview)

• Rui Li, Shengjie Wang, Hongbo Deng, Rui Wang, Kevin Chen-Chuan Chang: Towards Social User Profiling: Unified and Discriminative Influence Model for Inferring Home Locations. KDD 2012.

• Ideas– Integrate content and network of users and locations as a

directed graph – Propose influence model to capture nodes’ influences

• E.g., location of a local friend is more useful than the location from a celebrity

• Results – 160K Twitter Users – Improve the previous algorithm by 7% with only network data– Improve the previous algorithm by about 12% overall

Page 110: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Unified and Discriminative Influence Model for Inferring Home Locations (Representation)

∗ We model it as a directed graph.∗ We aim to infer the locations of

unlabeled nodes with locations of labeled nodes.

Head Node

Tail Node

New York

?Champaign

Beijing

San Francisco

?

?

Champaignv2

v1

u2

U6

u1

u3

u4

u5

Unlabeled Node labeled Node

We unify two types of resources (local words and friends) as a directed heterogeneous graph

Page 111: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Unified and Discriminative Influence Model for Inferring Home Locations (Model)

Observation 1 The probability decreases as their distance increases

Observation 2 At the same distance, different head (Chicago, Champaign) nodes have different probabilities to attract tail nodes.

30

35

40

45

70

80

90

100

110

0

50

100

150

200

250

300

350

400

450

500

latitude

Spread of Word "Champaign"

longitude

c o u n t

How likely a tail node nj at L(nj) builds an edge e<ni, nj> a head node ni at L(ni)

2

in

2ujiu

2ujiu

2

i

i

)y(y)x(x

njnij e

2π1))L(n,θ|n,nP(e σ

σ−

−+−

=><

An influence model for each node at L(ni) with difference influence scopes to capture the probabilities

Page 112: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Unified and Discriminative Influence Model for Inferring Home Locations (Local Algorithm)

• Simple but efficient• Closed-from solution

?

Champaign

v2

v1

u2

u1

u4

u5

Champaign

New York

Beijing

San Francisco

Influence Scope

Average Distance of a User’s Followers

User Location Weighted Average of Different Resources

Page 113: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Unified and Discriminative Influence Model for Inferring Home Locations (Global Algorithm)

• Complex but accurate• Iterative algorithm

New York

?Champaign

Beijing

San Francisco

?

?

Champaignv2

v1

u2

U6

u1

u3

u4

u5

The local algorithm only uses limited information.

The global algorithm aims to use all information.

Page 114: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Multiple Location Profiling based on Content and Social Network (Overview)

• Rui Li, Shengjie Wang, Kevin Chen-Chuan Chang. Multiple Location Profiling for Users and Relationships from Social Network and Content. VLDB 2012.

• Ideas – Use two probabilistic models to connect locations with

content and friendship– Introduce mixture model to capture a user may related to

multiple location• E.g., a user may live in California and studies in Illinois

• Results– Improve the baseline by 10%– Discover users’ multiple locations completely

Page 115: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Multiple Location Profiling based on Content and Social Network (Following and Tweeting Model)• The location-based following model

• The location-based tweeting model

Page 116: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Multiple Location Profiling based on Content and Social Network (Mixture Model)

• Location profile as a multinomial distribution over locations.

• Each observation is based on one particular location from her profile.

Carol

{Los Angels 0.1, Austin 0.1, … }

Location-based relationships

Carol follows Lucy

Carol tweets Hollywood

both Carol and Lucy studied at Austin

Carol lives Los Angeles

Page 117: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Multiple Location Profiling based on Content and Social Network (Complete Model)

Page 118: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Multiple Location Profiling based on Content and Social Network (Gibbs Sampling Algorithm)

e.g.,

Page 119: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Tutorial Overview• Event Detection for Twitter (80 min)

– Event Detection using Tweet Content– Event Detection using Other External Sources– Applications of Event Detection

• Break (10 min)• Event Description for Twitter (25 min)

– Finding Best Phrase to Summarize an Event– Finding Event Types– Finding Event Timespans– Finding Event Credibility– Finding Event Locations

• User Profiling in Social Media (55 min)– Content-based Profiling– Network-based Profiling– Hybrid Approach– Co-profiling Attributes and Relationships

• Summary and Discussions (10 min)

Page 120: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Co-profiling: Integrate User Profiling with Other Tasks

• Integrate user profiling and other related tasks to achieve mutual enhancement

• We will focus on three studies– Integrate user profiling with entity matching– Integrate user profiling with relationship profiling

Page 121: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Co-profiling: Profiling User Location and Matching Entities in their Tweets (Overview)

• Nilesh Dalvi, Ravi Kumar, and Bo Pang. 2012. Object matching in tweets with spatial models. In Proceedings of WSDM '12. 43-52.

• Ideas– Entity Matching (Focus): Users locations can help entity

(restaurant) matching in tweets.• E.g., “Bombay Grill” should be a restaurant in Sunnyvale other than

Champaign if the user lives in Bay area .– Location Profiling: Entities mentioned in tweets should help

predict users’ locations.• E.g., If a user mentions “Bombay Grill” in Champaign, he likely lives in

UIUC. • Results

– Entity matching performance gains over geography-less models– Infer locations of the users accurately in practice

Page 122: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Co-profiling: Profiling User Location and Matching Entities in their Tweets (Model)

Page 123: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Co-profiling: Profiling User Location and Matching Entities in their Tweets (Algorithm)

• EM Algorithm– E-step estimates the expectation of hidden

variables (e.g., entities that match tokens in tweets)

– M-step estimates the parameters (e.g., language model, distance model, and users’ locations)

– Some assumptions are used to simplify the estimation

Page 124: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Co-profiling: Profiling User Attributes and Relationship Types in the Ego Network (Overview)

• Rui Li, Chi Wang, and Kevin Chang. 2014. Co-profiling User Attributes and Relationship Types. WWW '14. 43-52.

• Ideas– Attribute Profiling (Focus): Different types of relationships propagate different

attributes • E.g., Colleges should propagate from college mates, while occupation should be

propagated from colleagues – Relationship Types Profiling: Relationship types can be identified by their

shared attributes and network structure• E.g., A set of friends who are strongly connected and share occupation might be

colleagues.

• Results– Dataset: Ego network from LinkedIn– Profiling attributes more accurately than the collective classification method – Profiling relationships more accurately than the clustering method

Please come to our talk in this conference to see details of our work.

Page 125: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Summary for User Profiling in Social Media

• Both content of tweets and social network connections are useful for location prediction on Twitter

• Location of users, tweets and events is crucial for a large number of applications based on tweet feeds

• We discussed algorithms for using tweet content and the network for user profiling. We also discussed hybrid approaches along with co-profiling techniques.

Page 126: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Further Reading (1)• B. Hecht, L. Hong, B. Suh, and E. H. Chi. Tweets from justin bieber’s heart: the dynamics of the

location field in user profiles. CHI 2011• J. Eisenstein, B. O'Connor, N. A. Smith, and E. Xing. A latent variable model for geographic lexical

variation. In Proceedings of EMNLP, 2010. • Z. Cheng, J. Caverlee, and K. Lee, “You are where you tweet: a content-based approach to geo-

locating twitter users,” in ACM CIKM 2010.• S. Kinsella, V. Murdock, and N. O’Hare, “I’m eating a Sandwich in Glasgow”: Modeling Locations

with Tweets,” SMUC 2011• W. Li, P. Serdyukov, A. P. de Vries, C. Eickhoff, and M. Larson, “The where in the tweet,” CIKM 2011.• Han Bo, Paul Cook, Timothy Baldwin. Geolocation Prediction in Social Media Data by Finding

Location Indicative Words. COLING 2012.• @Phillies Tweeting from Philly? Predicting Twitter User Locations with Spatial Word Usage. Hau-

Wen Chang, Dongwon Lee, Mohammed Eltahery and Jeongkyu Leey. ASONAM 2012.• N. Dalvi, R. Kumar, and B. Pang, “Object matching in tweets with spatial models,” WSDM 2012.• Jalal Mahmud, Jeffrey Nichols, Clemens Drews. Where Is This Tweet From? Inferring Home

Locations of Twitter Users. ICWSM 2012.• John Krumm, Rich Caruana, Scott Counts. Learning Likely Locations. UMAP 2013.• Clodoveu A. Davis Jr., Gisele L. Pappa, Diogo Rennó Rocha de Oliveira, Filipe de L. Arcanjo. Inferring

the Location of Twitter Messages Based on User Relationships. TGIS 2011.

Page 127: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Further Reading (2)• A. Sadilek, H. Kautz, and J. P. Bigham. Finding your friends and following them to where you are. WSDM ’12.• Rui Li, Kin Hou Lei, Ravi Khadiwala, Kevin Chen-Chuan Chang. TEDAS: a Twitter Based Event Detection and Analysis

System. ICDE 2012.• Rui Li, Shengjie Wang, Hongbo Deng, Rui Wang, Kevin Chen-Chuan Chang: Towards social user profiling: unified

and discriminative influence model for inferring home locations. KDD 2012.• Rui Li, Shengjie Wang, Kevin Chen-Chuan Chang. Multiple Location Profiling for Users and Relationships from

Social Network and Content. VLDB 2012.• Rui Li, Chi Wang, and Kevin Chang. 2014. Co-profiling User Attributes and Relationship Types. WWW '14. 43-52. • S. Yardi and D. Boyd. Tweeting from the town square: Measuring geographic local networks. In ICWSM, 2010.• Discovering Geographical Topics In The Twitter Stream. Liangjie Hong, Amr Ahmed, Siva Gurumurthy, Alex Smola,

Kostas Tsioutsiouliklis. WWW 2012• Zhiyuan Cheng, James Caverlee, Kyumin Lee, Daniel Z. Sui. Exploring Millions of Footprints in Location Sharing

Services. ICWSM 2011.• http://mashable.com/2009/06/08/twitter-local-2/• http://www.slideshare.net/pkitano/the-local-business-owners-guide-to-twitter• Detecting Forest-fires: Bertrand De Longueville, Robin S. Smith, and Gianluca Luraschi. "OMG, from here, I can see

the flames!": a use case of mining location based social networks to acquire spatio-temporal data on forest fires. LBSN '09

• Takeshi Sakaki, Makoto Okazaki, and Yutaka Matsuo. Earthquake shakes Twitter users: real-time event detection by social sensors. WWW '10

• Sílvio S. Ribeiro, Jr., Clodoveu A. Davis, Jr., Diogo Rennó R. Oliveira, Wagner Meira, Jr., Tatiana S. Gonçalves, and Gisele L. Pappa. Traffic Observatory: A System to Detect and Locate Traffic Events and Conditions using Twitter. LBSN '12

Page 128: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Tutorial Overview• Event Detection for Twitter (80 min)

– Event Detection using Tweet Content– Event Detection using Other External Sources– Applications of Event Detection

• Break (10 min)• Event Description for Twitter (25 min)

– Finding Best Phrase to Summarize an Event– Finding Event Types– Finding Event Timespans– Finding Event Credibility– Finding Event Locations

• User Profiling in Social Media (55 min)– Content-based Profiling– Network-based Profiling– Hybrid Approach– Co-profiling Attributes and Relationships

• Summary and Discussions (10 min)

Page 129: Towards a Social Media Analytics Platform: Event Detection ... a Social Media... · Towards a Social Media Analytics Platform: Event Detection and User ... `2morr', `2morro', `2morrow',

[email protected], [email protected], [email protected]

Summary• We discussed these key components of an analytics

platform for microblogging systems– Event Detection– Event Description– User Profiling

• Lots of our components are critical for such a platform, which we did not cover in this tutorial– Structured entity extraction– Sentiment analysis– Predictive analysis– Correlations with other media types like news and blogs– Influence Analysis – Visualization– Temporal Analysis