Assimilating HealthMap Data to Nowcast Epidemics J. Ray jairay [at] sandia [dot] gov Sandia National Laboratories, Livermore, CA Acknowledgements: The work was funded by DoD/NCMI SAND2012-9575P Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-AC04-94AL85000.
J. Ray jairay [at] sandia [dot] gov Sandia National Laboratories, Livermore, CA Acknowledgements: The work was funded by DoD /NCMI SAND2012-9575P. Assimilating HealthMap Data to Nowcast Epidemics. - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Assimilating HealthMap Data to Nowcast Epidemics
J. Ray jairay [at] sandia [dot] gov
Sandia National Laboratories, Livermore, CA
Acknowledgements: The work was funded by DoD/NCMI
SAND2012-9575P
Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-AC04-94AL85000.
The Problem
• Public health reports of disease’s progression are usually delayed
– Takes time to do lab confirmations; Sentinel physicians reports have to
be collated
– 2 weeks delay (CDC); more in poorer countries
• Unconventional, open-source reports of morbidity a lot more timely
– Media & social media reports appear on the Web and are searched &
curated by companies like HealthMap (HM)
– Also, if it appears in media, the outbreak must be somewhat anomalous
• Question: Can the timely HM data be used to cover up the 2 week
lag in public health reports?
– Called “nowcasting”2
Outlines of a Solution
• Make a correlative model between morbidity and HM data– Data/dependent variable (CDC): flu activity in US [weekly, time-
series]• flu isolates and Sentinel physician reports collected by CDC
– Independent variable (HM): # of media reports concerning flu in the region from HM [weekly, time-series]
– Will exploit the correlation between CDC & HM and the autoregressive structure of the CDC time-series
• Will check for– How small a region can we apply this model to?
– How well could we do if we did not have HM data?
– Under what conditions does HM data help?
3
Making the Model
4
Comparing Isolates and News Reports CDC +ve isolates plotted by
date of collection News reports seem to treat the
early 2009 flu activity as “business-as-usual”
Huge jump around Week 70 (April 2009)
But once primed, upsurges in media reports correspond to upsurges in CDC isolates
But no proportionality here HM data much more jagged
than CDC
5
Weeks, starting 2008-01-01
How Much Correlation between CDC & HM?
• Modest correlation of between log10(CDC) and log10(HM)
• A linear model will give a good trend, but not accuracy
• HM data will need smoothing– The spectral content of
log-CDC and log-HM should be similar, if using linear model
6
Smoothing log-HM Data
• Fourier decompose log-CDC and log-HM data– Plot A2 versus mode
frequency– About 5-6 modes in log-
CDC data
• 5 point smoothing stencil applied repeatedly to log-HM
• After 3 applications, similar spectral content
7
A Linear Model for the Trend• Propose:
– Log-CDC = a * log-HM + b
• Simple regression• Comes close – an approx.
• CDC – HM discrepancy does NOT look like noise; rather correlated
• Model discrepancy as a multivariate Gaussian – exploit smoothness / structure of CDC data
• New Model = Linear model + discrepancy (modeled as a multiGaussian)
• Such a model is constructed using Regression or Universal Kriging (RK/UK) 8
The linear model gives worst errors between Weeks (20:40), (90:110).
• Two ways of breaking kriging– Have a small time-series, so
that we can’t make a good covariance model
– Have a rough, non-smooth time-series, so that Gaussian-process assumptions don’t hold
• So,– The method should break if
applied early in the season OR– If the counts are small e.g., mild
outbreak or small region
• Test how small a region one can get away with
– The mild 2011-2012 season
11
Smallest Region – New England
• A very modest correlation exists for 2011-2012 season• Outcome: Incorporating HM data gives a good nowcast at 35 weeks• Go smaller – try to break model @ NYC 12
Even smaller – NYC, 2009 Swine Flu
• Just works – with 10 weeks of data too!
13
What Happens If We Had No HM Data?
14
Nowcasting without HealthMap Data
• Fit CDC (ILINet) data with typical time-series model – Literature says autoregressive models work
• Test with AR, ARMA and ARIMA models– Found that AR models, of order 4, work