Dr. Ming-Hsiang Tsou Twitter @mingtsou [email protected], Director of the Center for Human Dynamics in the Mobile Age Professor, Department of Geography , San Diego State University Tracking Disease Outbreaks using Geotargeted Social Media and Big Data 2016 ESRI User Conference Paper #: 298, Session Title: GIS in Social Media Date: Thursday, June 30, 2016, Time: 8:30 AM - 9:45 AM, Room: Room 28 B
25
Embed
Tracking Disease Outbreaks using Geotargeted Social Media ...€¦ · Dr. Ming-Hsiang Tsou Twitter @mingtsou [email protected], Director of the Center for Human Dynamics in the
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Dr. Ming-Hsiang Tsou
Twitter @mingtsou [email protected], Director of the Center for Human Dynamics in the Mobile Age
Professor, Department of Geography , San Diego State University
Tracking Disease Outbreaksusing Geotargeted Social Media and Big Data
2016 ESRI User Conference Paper #: 298, Session Title: GIS in Social Media
Date: Thursday, June 30, 2016, Time: 8:30 AM - 9:45 AM, Room: Room 28 B
Geography (place and time) is the KEY for Understanding and Integrating Big Data
(Tsou and Leitner, 2013)KDC (Knowledge Discovery in Cyberspace) framework
Tsou, M. H. and Leitner, M. (2013). Editorial: Visualization of Social Media: Seeing a Mirage or a Message? In Special Content Issue: "Mapping Cyberspace and Social Media". Cartography and Geographic Information Science. 40(2), pp. 55-60. DOI: 10.1080/15230406.2013.776754
Geo-Targeted Social Media (Twitter) Analytics for Tracking Flu Outbreaks in U.S.
Geo-TargetingData Collection(Twitter APIs)
Analysis
Visualization
FilterMachine Learning
Trend Analysis
Spatial Analysis
SMARTDashboard
Application Programming Interfaces (API)
Data Filtering, Mining, and Visualization
Example: Use Twitter Search API to search for keyword “HIV test” or “HIV testing”Only 1% - 7% of Tweets have X, Y GEO-coordinates (from GPS or Geo-tagged). But 50% - 60% Tweets have city-level locations provided by their user profiles.90% Tweets have Time Zone (limited spatial meaning)
What we can get from Twitter data?Where to find geospatial information?
The HDMA Center has built our own Internal GeoCoder Engine for User Location Profile:using GeoNames.org gazetteers (Creative Commons Data).+ User defined rules.
Enable Flexible or Self-defined Geo-Target Boundaries (California, Santa Barbara, Los Angeles, San Diego – bounding boxes, or State boundaries)
GeoCoding Engine for Social Media
Human Dynamic in the Mobile Age (HDMA)
Collect Tweets from Top 31 U.S. Cities (17 miles radius)
31 different cities across the United States (chosen based on their population sizes): Atlanta, Austin, Baltimore, Boston, Chicago, Cleveland, Columbus, Dallas, Denver, Detroit, El Paso, Fort Worth, Houston, Indianapolis, Jacksonville, Los Angeles, Memphis, Milwaukee, Nashville-Davidson, New Orleans, New York, Oklahoma City, Philadelphia, Phoenix, Portland, San Antonio, San Diego, San Francisco, San Jose, Seattle, and Washington, D.C.
RED Line: National ILI data (Influenza-like illness) (provided by CDC)
CDC Influenza Positive Tests, National Data Summary, through Weeks 40-3, 2014-2015 Season
# of Filtered ILI Tweets, Top 30 US Cities, as of February 9, 2015(from SMART dashboard)
Only 1% -4% tweets has Geo-tagged coordinates.
Problems!!! Twitter
broke its Search APIs on 11/20/2014 and only returned Geo-tagged tweets only. (Reduce 90% -95% of tweets collected)
Tracking Flu Outbreaks in 2014/2015 Flu Season
Human Dynamic in the Mobile Age (HDMA)
2014-2015 Comparison between ILI and Geo-tagged-only Tweets (4%) among 30 U.S. Cities
2016 Flu Tweets (31 cities) vs CDC ILI data
The comparison between National ILI Rate and the 32 Cities Tweeting Rate, with prediction up to Week 15. Red National ILI, Purple Tweet Rate for 2015-2016.
?
Next Step: Syndromic Surveillance (Underdevelopment)(tracking multiple Symptoms: fever, cold, cough, vomiting, etc. ) http://vision.sdsu.edu/hdma/smart/syndromic
Designed for Early Detection of “unknown” disease outbreaks, such as Swine Flu and SARS
The Limitations and Challenges of Social Media and Big Data Research
Social Media User Profiles
Social Media messages can NOT represent all population,
but it can provide warning signals and real-time updates.
Twitter Users are
• Young (60% are between 16 – 34 years old).• More Urban residents than rural• Higher adoption% in African Americans• Many Journalists and Mass Media staff.• 20% are not real “human beings” (robots):
many advertisement and marketing activities.
Using Different Keywords can get different demographic groups:• #Healthcare: include more senior people (Very few teenagers will tweet
about “healthcare”). (We need more background study).• “Keywords” could be used as a sampling tool for social media users.
2014 Survey (Business Insider)
Are They “Mummies and Ghosts (Zombie) ” ?
Who are they? How they post the messages?
Who are these accounts? Why Say the exactly same words?
Use SMART dashboard to track “E-cigarette” topics
High Peak on Feb 11, 2016 (Why?)
From to 11114 – 9561 = 1553 (Mummy or Ghost Twitter Accounts?) for Advertisement?
1,553 Twitter AccountsSaid the Exact Sentence! In One Day (2/11/2016),
User Privacy Issue
• Concerns about “Big Brother”.
• Although all the tweets collected from APIs are “public tweets” (everyone can search them and retrieve them).
• Some content of tweets may contain personal private information (real names, locations of homes, offices, private conversations, medical situations, etc.)
* HDMA center conceals tweet locations by randomly selecting a coordinate in a 100m radius of the original location to protect Twitter users' privacy.