+ On the Ground Validation of Online Diagnosis with Twitter and Medical Records 2015/6/8(Mon.) Chang Wei-Yuan @ MakeLab Lab Meeting Todd Bodnar, Victoria Barclay WWW’ 14
Jul 30, 2015
+
On the Ground Validation of Online Diagnosis with Twitter and Medical Records
2015/6/8(Mon.)Chang Wei-Yuan @ MakeLab Lab Meeting
Todd Bodnar, Victoria BarclayWWW’ 14
+Outline
nIntroduction
nData Collection
nFeature Signals
nMeta Classifier
nConclusion
nThought
2
Introduction
nSocial media has been considered as a data source for tracking disease. n However, most analyses are based on
models with population level disease rates.
nThis paper develop a novel system for social-media based disease detection at the individual level.
Introduction
nHow do you track a disease?n Traditional Systems
n Doctor Reports It
n Self Reporting Systemsn Self Reports It
n Data Mining Systemsn Tweets Reports It
n Goal:extend Data Mining Systems with high accuracy
Data Collection
n104 Students with Twitter accounts
n35 diagnosed with Influenza
nStudy period: August 2012 - May 2013
Data Collection
n52,301 Tweets, 1609 when sick
n194,835 Friends and Followers
n31,103,713 Tweets from Friends orFollowers
n17/35 users that used Twitter while sick explicitly mention illness
Feature Signals
nAutomated text classificationn Expert keyword selectionn Machine keyword selection
nAnomaly Detection
nNetwork Classification
7
Automated text classification
nWe consider diagnosis based on the content of a user’s tweets.
nExpert keyword selectionn A set of keywords are defined that are
possibly signals of influenza. n { flu, influenza, sick, cough, cold, medicine,
fever, … }
Automated text classification
nMachine keyword selectionn We try algorithmically selecting keywords by
first finding the 12,393 most common keywords in the data set.
n rank them based off of information gain on predicting influenza.
Automated text classification
Anomaly Detection
nIn addition to illness affecting the content of individuals’ tweets, it is likely that illness also affects the rate at which individuals tweet.
nTo detect this, we perform one-dimensional anomaly detection on each user’s monthly tweeting rate as follows.
Anomaly Detection12
Network Classification
nEven if a user is not currently active on Twitter, users on her social network may give clues to her health status.
nAccounts that follow a user are referred to as her ‘followers,’ and accounts that a user follow are referred to as her ‘friends.’
nWe consider all text that a user’s friends or followers tweeted and perform keyword analysis.
Network Classification
Meta-classifiers
nSo far we have considered five separate methods for detecting illness based off of a user’s Twitter activity:n hand chosen keyword analysisn data mined keyword analysisn anomaly detectionn network analysis.
Meta-classifiers
nAggregating multiple classifiers by a ‘meta classifier’ has been shown to be an effective method for increasing classification accuracy
Meta-classifiers
Conclusion
nIn this paper, we have shown that it is possible to diagnose an individual from her social media data with high accuracy.n Combined long-term twitter data with medical
recordsn Able to find signal of disease in most Twitter
users that were sick
Thought
+Thanks for listening.2015 / 5 / 8 (Mon.) @ MakeLab Lab [email protected]