Top Banner
D-Sieve: A Novel Data Processing Engine for Crises Related Social Messages Soudip Roy Chowdhury (INRIA & UPSUD, France) Hemant Purohit (Wright State Uni., USA) Muhammad Imran (QCRI, Doha) SWDM 2015 May18, 2015
20

D-sieve : A Novel Data Processing Engine for Crises Related Social Messages

Aug 07, 2015

Download

Education

wire unitn
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: D-sieve : A Novel Data Processing Engine for Crises Related Social Messages

D-Sieve: A Novel Data Processing Engine for Crises Related Social

Messages

Soudip Roy Chowdhury (INRIA & UPSUD, France)Hemant Purohit (Wright State Uni., USA)

Muhammad Imran (QCRI, Doha)

SWDM 2015 May18, 2015

Page 2: D-sieve : A Novel Data Processing Engine for Crises Related Social Messages

Social Media During Crises

Courtesy: http://worldmap.harvard.edu/maps/nepalquake

Page 3: D-sieve : A Novel Data Processing Engine for Crises Related Social Messages

And it contains…

TimelyInformation

Page 4: D-sieve : A Novel Data Processing Engine for Crises Related Social Messages

And it contains…

ActionableInformation

Page 5: D-sieve : A Novel Data Processing Engine for Crises Related Social Messages

And it contains…

TacticalInformation

Page 6: D-sieve : A Novel Data Processing Engine for Crises Related Social Messages

Social Data Processing

• Real-time analysis of crises related social data is necessary– Manual analysis • Standby task force [N. Morrow et al. 2011]

– System mediated analysis• Unsupervised learning [T. Sakaki et al. 2010]

• Supervised learning [S. Roy Chowdhury et al. 2013]

(not scalable)

(noisy / ambiguous data)

(scarce training data / non-uniform class distribution)

Page 7: D-sieve : A Novel Data Processing Engine for Crises Related Social Messages

D-Sieve• Post-classification data processing engine in a

supervised classification setting

• Noise

Clas

s A

Clas

s B

• NoiseCl

ass

ACl

ass

B

Supervised Classifier

D-SieveRaw DataCorrectly Classified Data

Page 8: D-sieve : A Novel Data Processing Engine for Crises Related Social Messages

Data Model

• A message contains

TEXT Date

Page 9: D-sieve : A Novel Data Processing Engine for Crises Related Social Messages

Data Model

• A message contains

TEXT Date

hashtags

content

URL

Page 10: D-sieve : A Novel Data Processing Engine for Crises Related Social Messages

Our Approach

• D-Sieve extracts two content features to classify messages correctly– Stable hashtag association– Stable named entity association

Page 11: D-sieve : A Novel Data Processing Engine for Crises Related Social Messages

Stable Hashtag Association

Relative frequencies of stable hashtags for an event remain constant over time….

Page 12: D-sieve : A Novel Data Processing Engine for Crises Related Social Messages

Stable Hashtag Association• A hashtag is stable, if the moving average score

of its frequency distribution (after k posts) becomes steady [X.S. Yang et al. 2013]

• Stable hashtag set contains all stable hashtags for an incident

• Stable hashtag association metric for an input message

• and are set of stable hashtags for true positive and true negative classes

Page 13: D-sieve : A Novel Data Processing Engine for Crises Related Social Messages

Stable Named Entity Association• Named entities are extracted from the

message content + URL content• Stable named entity association metric for an

input message

• We calculate the final feature association metrics

• Feature weights ( ) are chosen such that ranges [-1,1]

• Rule of thumb weights are set as 0.5

Page 14: D-sieve : A Novel Data Processing Engine for Crises Related Social Messages

Functional Architecture

Page 15: D-sieve : A Novel Data Processing Engine for Crises Related Social Messages

In our experiments..CrisisLexT6

[A.Oleanu et al. 2014]

Hurricane Sandy (10k) , Queensland

Flood (10k)

Random Forest[A. Liaw et al.

2002]

UB = 0.8LB = 0.7

50% of the data used during model creation and rest used for testing

Page 16: D-sieve : A Novel Data Processing Engine for Crises Related Social Messages

Few Observations

• Stable hashtag set contained – 24 and 21 tags for TP class of Sandy and

Queensland dataset respectively– 0 and 7 tags for TN class of Sandy and Queensland

dataset respectively– 68% and 70% of the whole data for Sandy and

Queensland data satisfy the UB and LB constraints

Page 17: D-sieve : A Novel Data Processing Engine for Crises Related Social Messages

Experimental Results

precision recall

OC 96.09 81.42

OC+HT 96.23 84.51

OC+NE 96.31 86.49

OC++ 96.35 87.55

72.5

82.5

92.5

Sandy 2012

#per

cent

age

precesion recall

OC 99.53 36.75

OC+HT 99.8 58.47

OC+NE 99.75 68.9

OC++ 99.8 74.24

10

50

90

Queensland 2013

#per

cent

age

D-sieve improves recall values substantially (6% and 37% improvements for two datasets).

Page 18: D-sieve : A Novel Data Processing Engine for Crises Related Social Messages

Future Work

• Investigate efficiency of D-sieve both for online and offline classification settings

• Address the latency issue for feature extraction (URL content crawling and entitiy spotting) using parallel computing

• Extend our experiments with Facebook and G+ data

Page 19: D-sieve : A Novel Data Processing Engine for Crises Related Social Messages

References• A. Liaw and M. Wiener. Classification and regression by randomforest. R news,

2(3):18–22, 2002. • A. Olteanu, C. Castillo, F. Diaz, and S. Vieweg. “Crisislex: A lexicon for collecting

and filtering microblogged communications in crises”. In ICWSM, 2014. • N. Morrow, N. Mock, A. Papendieck, and N. Kocmich. “Independent evaluation of

the ushahidi haiti project”. Development Information Systems International, 2011.• S. Roy Chowdhury, M. Imran, M. R. Asghar, S. Amer-Yahia, and C. Castillo.

“Tweet4act: Using incident-specific profiles for classifying crisis-related messages”. In ISCRAM 2013.

• Takeshi Sakaki, Makoto Okazaki, and Yutaka Matsuo. “Earthquake shakes Twitter users: real-time event detection by social sensors”. In WWW 2010.

• X. S. Yang, D. W. Cheung, L. Mo, R. Cheng, and B. Kao. “On incentive-based tagging”. In ICDE 2013.

Page 20: D-sieve : A Novel Data Processing Engine for Crises Related Social Messages