UNIVERSITY OF WASHINGTON Managing and Analyzing Global Health Data Seattle, August 30, 2011 Peter Speyer, Director of Data Development
Jan 27, 2015
UNIVERSITY OF WASHINGTON
Managing and Analyzing
Global Health Data
Seattle, August 30, 2011
Peter Speyer, Director of Data Development
IHME Background
• Global institute dedicated to providing independent, rigorous, and scientific measurements and evaluations to accelerate progress on global health
• Part of the Department of Global Health at the University of Washington
• Funded by the Bill & Melinda Gates Foundation and the State of Washington (‘core funding’), and other funders through specific research grants
• Created in 2007
• 70 researchers, 30 staff
2
IHME Mission
Our goal isto improve the health of the world’s populations
by providing the best informationon population health
3
4
Health-related data
• Social determinants• Risk factors
Health Data
5
Population-based data
• Household / facility surveys• Census• Vital registration• Registries (provider,
disease)
Facility-based data
• Health records• Administrative data
(financial, operational)• Research data (DSS,
clinical trials, etc.)
Individual-based data
• Personal health records• “Quantified self”• Disease-based social
networks
Health Data Innovation
Patient engagementOpen data
Health apps
Key Health Data Challenges
6
Find & access
data
Dissemi-natedata
Use data
Key Health Data Challenges
• Lack of transparency
• Timeliness of data
• Lack of documentation• Access vs. privacy
7
Find & access
data
Dissemi-natedata
Use data
Key Health Data Challenges
• Sheer quantity of data files (30TB, 20K+ source datasets, 40M files)
• Diverse source data types and formats (pdf, csv, SPSS, CSPro, …)
• Data quality issues
8
Find & access
data
Dissemi-natedata
Use data
Key Health Data Challenges
• Make results data engaging
• Accountability: share results, code, source data
• Accommodate diverse audiences (expertise, geographies)
9
Find & access
data
Dissemi-natedata
Use data
Example: Global Burden of Disease
Mortality & causes of death
• Sources: census, surveys, vital registration, verbal autopsy
• Estimates: covariate models, spatial-temporal regressions; weighted combination of models
Morbidity
• Sources: Literature reviews, surveys, registries,hospital data
• Disease modeling: compartmental Bayesian model
• Health severity weights
Burden of disease
• DALYnator
10
300 diseases
40 risk factors
21 regions
1990, 2005, 2010
GBD Country Years, Causes of Death 1950-2009
11
GBD Country Years, Causes of Death 1950-2009
12
Data source Countries Site-years # of Deaths
VR 128 4,190 722,267,710
Household Surveys 136 2,827 10,132,976
Surveillance Systems 12 126 717,698
National VA 21 71 301,855
Subnational VA 59 442 2,606,815
Mortuary Registries 6 25 54,316
TOTAL 7,680 735,564,116
Solutions: Computing Infrastructure
• Analysis with statistical packages
– Projects with 100K+ lines of code
• File system
– 60TB disc space
– Redundant backup
• Cluster with 63 nodes (+300% in 2011), ~2000 cores
– Runs 24x7, very little downtime
• Virtual environments to test new applications, servethem to collaborators, etc.
13
Solutions: Global Health Data Exchange
• Transparency => data catalog• Access => data repository• Information => data community (future)
• One record per dataset• Standardized metadata• Internal users (10K records): files on file server• External users (5K records): files for download
• CMS: Drupal • Search: SOLR
14
Objectives
Approach
Implementation
15
UNIVERSITY OF WASHINGTON
Thank you!
[email protected]@peterspeyer
www.ghdx.org
Peter Speyer
Director of Data Development