+ Effective Event Identification in Social Media 2014/4/27(Mon.) Chang Wei-Yuan @ MakeLab Lab Meeting Fotis Psallidas, Hila Becker DEB’13
Jul 24, 2015
+
Effective Event Identification in Social Media
2014/4/27(Mon.)Chang Wei-Yuan @ MakeLab Lab Meeting
Fotis Psallidas, Hila BeckerDEB’13
+Outline
n Introduction
nMethodn Known-Event Identificationn Unknown-Event Identificationn Improving Identification Effectiveness
nExperimental Evaluation
nConclusion
nThought
2
+ Introduction
nOnline social media are extensively distribute content related to real-world events. n event: something that occurs at a certain time
in a certain place
3
+ Introduction
nOnline social media are extensively distribute content related to real-world events. n event: something that occurs at a certain time
in a certain place
4
Goal:Identifying Events and Associated Social Media Documents
+ Introduction
nGeneral approach: group similar documents via clusteringn Each cluster corresponds to one event and
its associated social media documents
5
+ Introduction
nChallengesn Uneven data qualityn Highly heterogeneousn Dynamic data stream of event informationn Number of events unknown
6
+Event Identification
nKnown-Event Identification
nUnknown-Event Identification
nImproving Identification Effectiveness
7
+Known-Event Identification
nSocial media content related to known eventsn reside in multiple social media sites, each
contributing different information
nTo retrieve cross-site social media documents for same eventn miss many relevant event documents
9
+Known-Event Identification
nIn the first step, using the known event properties to achieve high-precision results.
nIn the second step, using term extraction and frequency analysis to improve recall.
10
+Unknown-Event Identification11
na Twitter stream may contain many tweets related to an event n with messages related to other eventsn with messages unrelated to events
+Unknown-Event Identification
nThe proposed online clustering frameworkn leverages the multiple features to decide
when two social media documents correspond to the same event
12
Social Media Document Clustering Framework
Document featurerepresentation
Social mediadocuments
Event clusters
13
Ensemble Algorithm
nThe proposed online clustering frameworkn deployed ensemble learning methods to learn and
associate each feature with a weight and a threshold that capture the importance of the features
14
Consensus Function:combine ensemble similarities
Wtitle
Wtags
Wtime
15
f(C,W)
Ctitle
Ctags
Ctime
Ensemble clustering solution
Learned in a training step
Ensemble Algorithm
Event Classification
nThe proposed online clustering frameworkn deployed event classification to distinguish
between event-related clusters and non-event ones
16
Event Classification 17
Ensemble clustering solution
Event
unrelated to events
related to an event
event classification
+ Improving Identification Effectiveness
nHow events behave over time have a significant impact on the effectiveness of the document clustering procedure?
nHow to refine the clustering procedure to benefit from these factors is a challenging task?
18
+URLs
nURLs in event-related social streams are ubiquitous. Individuals use them to share meaningful event-related external content.
19
+Bursty Vocabulary
nThe social media content related to an event tends to revolve around a central topic.n this central topic is expressed by a set of
terms that is significantly more frequent n span a wide time range exhibit a different set
of these bursty terms at different points of their lifetime.
20
+Bursty Vocabulary
nThe social media content related to an event tends to revolve around a central topic.
21
+Time Decay
na time decay function to the clustering frameworkn penalizes clusters that have been inactive for
a long time.n re-triggers events that have been inactive for
some time if the similarity score without the time-decay factor is strong enough.
22
+Experimental Evaluation
nDatan Upcoming datasetn 273,842 multi-featured Flickr photos that
correspond to 9,613 real-world events from the Upcoming event.
nthe BurstyV + TimeDec technique obtained the highest quality results.
23
+Conclusion
nThis article discussed the event identification task under two different scenarios, known- and.
nWe showed how to identify event content effectivelyn how we can exploit rich features of the social
media documentsn revealing temporal patterns of the relevant
content
24
+Thanks for listening.2014 / 4 / 27(Mon.) @ MakeLab Group [email protected]