Top Banner
UNIVERSITY OF WASHINGTON Managing and Analyzing Global Health Data Seattle, August 30, 2011 Peter Speyer, Director of Data Development
16
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Managing and Analyzing Health Data (VLDB Conference)

UNIVERSITY OF WASHINGTON

Managing and Analyzing

Global Health Data

Seattle, August 30, 2011

Peter Speyer, Director of Data Development

Page 2: Managing and Analyzing Health Data (VLDB Conference)

IHME Background

• Global institute dedicated to providing independent, rigorous, and scientific measurements and evaluations to accelerate progress on global health

• Part of the Department of Global Health at the University of Washington

• Funded by the Bill & Melinda Gates Foundation and the State of Washington (‘core funding’), and other funders through specific research grants

• Created in 2007

• 70 researchers, 30 staff

2

Page 3: Managing and Analyzing Health Data (VLDB Conference)

IHME Mission

Our goal isto improve the health of the world’s populations

by providing the best informationon population health

3

Page 4: Managing and Analyzing Health Data (VLDB Conference)

4

Page 5: Managing and Analyzing Health Data (VLDB Conference)

Health-related data

• Social determinants• Risk factors

Health Data

5

Population-based data

• Household / facility surveys• Census• Vital registration• Registries (provider,

disease)

Facility-based data

• Health records• Administrative data

(financial, operational)• Research data (DSS,

clinical trials, etc.)

Individual-based data

• Personal health records• “Quantified self”• Disease-based social

networks

Health Data Innovation

Patient engagementOpen data

Health apps

Page 6: Managing and Analyzing Health Data (VLDB Conference)

Key Health Data Challenges

6

Find & access

data

Dissemi-natedata

Use data

Page 7: Managing and Analyzing Health Data (VLDB Conference)

Key Health Data Challenges

• Lack of transparency

• Timeliness of data

• Lack of documentation• Access vs. privacy

7

Find & access

data

Dissemi-natedata

Use data

Page 8: Managing and Analyzing Health Data (VLDB Conference)

Key Health Data Challenges

• Sheer quantity of data files (30TB, 20K+ source datasets, 40M files)

• Diverse source data types and formats (pdf, csv, SPSS, CSPro, …)

• Data quality issues

8

Find & access

data

Dissemi-natedata

Use data

Page 9: Managing and Analyzing Health Data (VLDB Conference)

Key Health Data Challenges

• Make results data engaging

• Accountability: share results, code, source data

• Accommodate diverse audiences (expertise, geographies)

9

Find & access

data

Dissemi-natedata

Use data

Page 10: Managing and Analyzing Health Data (VLDB Conference)

Example: Global Burden of Disease

Mortality & causes of death

• Sources: census, surveys, vital registration, verbal autopsy

• Estimates: covariate models, spatial-temporal regressions; weighted combination of models

Morbidity

• Sources: Literature reviews, surveys, registries,hospital data

• Disease modeling: compartmental Bayesian model

• Health severity weights

Burden of disease

• DALYnator

10

300 diseases

40 risk factors

21 regions

1990, 2005, 2010

Page 11: Managing and Analyzing Health Data (VLDB Conference)

GBD Country Years, Causes of Death 1950-2009

11

Page 12: Managing and Analyzing Health Data (VLDB Conference)

GBD Country Years, Causes of Death 1950-2009

12

Data source Countries Site-years # of Deaths

VR 128 4,190 722,267,710

Household Surveys 136 2,827 10,132,976

Surveillance Systems 12 126 717,698

National VA 21 71 301,855

Subnational VA 59 442 2,606,815

Mortuary Registries 6 25 54,316

TOTAL 7,680 735,564,116

Page 13: Managing and Analyzing Health Data (VLDB Conference)

Solutions: Computing Infrastructure

• Analysis with statistical packages

– Projects with 100K+ lines of code

• File system

– 60TB disc space

– Redundant backup

• Cluster with 63 nodes (+300% in 2011), ~2000 cores

– Runs 24x7, very little downtime

• Virtual environments to test new applications, servethem to collaborators, etc.

13

Page 14: Managing and Analyzing Health Data (VLDB Conference)

Solutions: Global Health Data Exchange

• Transparency => data catalog• Access => data repository• Information => data community (future)

• One record per dataset• Standardized metadata• Internal users (10K records): files on file server• External users (5K records): files for download

• CMS: Drupal • Search: SOLR

14

Objectives

Approach

Implementation

Page 15: Managing and Analyzing Health Data (VLDB Conference)

15

Page 16: Managing and Analyzing Health Data (VLDB Conference)

UNIVERSITY OF WASHINGTON

Thank you!

[email protected]@peterspeyer

www.ghdx.org

Peter Speyer

Director of Data Development