Top Banner
Data Availability and Benchmarking in Observational Health Data Sciences and Informatics (OHDSI) George Hripcsak, MD, MS Columbia University Irving Medical Center NewYork-Presbyterian Hospital International Telecommunication Union & World Health Organization Artificial Intelligence for Health
41

Data Availability and Benchmarking in Observational Health Data … · 2018-11-14 · Data Availability and Benchmarking in Observational Health Data Sciences and Informatics (OHDSI)

Mar 20, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Data Availability and Benchmarking in Observational Health Data … · 2018-11-14 · Data Availability and Benchmarking in Observational Health Data Sciences and Informatics (OHDSI)

Data Availability and Benchmarking in

Observational Health Data Sciences and Informatics (OHDSI)

George Hripcsak, MD, MSColumbia University Irving Medical Center

NewYork-Presbyterian Hospital

International Telecommunication Union &World Health Organization

Artificial Intelligence for Health

Page 2: Data Availability and Benchmarking in Observational Health Data … · 2018-11-14 · Data Availability and Benchmarking in Observational Health Data Sciences and Informatics (OHDSI)

Observational Health Data Sciences and Informatics (OHDSI, as “Odyssey”)

Mission: To improve health by empowering a community to collaboratively generate the evidence that promotes better health decisions and better care

A multi-stakeholder, interdisciplinary, international collaborative with a coordinating center at Columbia University

http://ohdsi.org

Page 3: Data Availability and Benchmarking in Observational Health Data … · 2018-11-14 · Data Availability and Benchmarking in Observational Health Data Sciences and Informatics (OHDSI)

OHDSI’s global research community

• >200 collaborators from 25 different countries

• Experts in informatics, statistics, epidemiology, clinical sciences

• Active participation from academia, government, industry, providers

• Currently records on about 500 million unique patients in >100 databases

http://ohdsi.org/who-we-are/collaborators/

Page 4: Data Availability and Benchmarking in Observational Health Data … · 2018-11-14 · Data Availability and Benchmarking in Observational Health Data Sciences and Informatics (OHDSI)

Evidence OHDSI seeks to generate from observational data

• Clinical characterization - tally– Natural history: Who has diabetes, and who takes metformin?– Quality improvement: What proportion of patients with

diabetes experience complications?

• Population-level estimation - cause– Safety surveillance: Does metformin cause lactic acidosis?– Comparative effectiveness: Does metformin cause lactic

acidosis more than glyburide?

• Patient-level prediction - predict– Precision medicine: Given everything you know about me, if I

take metformin, what is the chance I will get lactic acidosis? – Disease interception: Given everything you know about me,

what is the chance I will develop diabetes?

Page 5: Data Availability and Benchmarking in Observational Health Data … · 2018-11-14 · Data Availability and Benchmarking in Observational Health Data Sciences and Informatics (OHDSI)

Open Science

Open science

Generate evidence

Database summary

Cohort definition

Cohort summary

Compare cohorts

Exposure-outcome summary

Effect estimation

& calibration

Compare databases

Data + Analytics + Domain expertise

Open source

software

Enable users to do

something

Standardized, transparent workflows

Page 6: Data Availability and Benchmarking in Observational Health Data … · 2018-11-14 · Data Availability and Benchmarking in Observational Health Data Sciences and Informatics (OHDSI)

How OHDSI Works

Source data warehouse, with

identifiable patient-level data

Standardized, de-identified patient-

level database (OMOP CDM v5)

ETL

Summary statistics results

repository

OHDSI.org

Consistency

Temporality

Strength Plausibility

Experiment

Coherence

Biological gradient Specificity

Analogy

Comparative effectiveness

Predictive modeling

OHDSI Data Partners

OHDSI Coordinating Center

Standardized large-scale analytics

Analysis results

Analytics development and testing

Research and education

Data network support

Page 7: Data Availability and Benchmarking in Observational Health Data … · 2018-11-14 · Data Availability and Benchmarking in Observational Health Data Sciences and Informatics (OHDSI)

Deep information modelOMOP CDM Version 6

Concept

Concept_relationship

Concept_ancestor

Vocabulary

Source_to_concept_map

Relationship

Concept_synonym

Drug_strength

Standardized vocabularies

Domain

Concept_classDose_era

Condition_era

Drug_era

Results Schema

Cohort_definition

Cohort

Standardized derived elements

Stan

dar

diz

ed

clin

ical

dat

a

Drug_exposure

Condition_occurrence

Procedure_occurrence

Visit_occurrence

Measurement

Observation_period

Payer_plan_period

Provider

Location

Cost

Device_exposure

Observation

Note

Standardized health system data

Fact_relationship

Specimen

Standardized health economics

CDM_source

Standardized metadata

Metadata

Person

Survey_conduct

Location_history

Note_NLP

Visit_detailCare_site

Page 8: Data Availability and Benchmarking in Observational Health Data … · 2018-11-14 · Data Availability and Benchmarking in Observational Health Data Sciences and Informatics (OHDSI)

Extensive vocabularies

Page 9: Data Availability and Benchmarking in Observational Health Data … · 2018-11-14 · Data Availability and Benchmarking in Observational Health Data Sciences and Informatics (OHDSI)

158

12

12

10

12

10

9

14

5

3

5

17

1

1

6

6

9

5

384

6

2

11

5

3

1

203Shared

Conventions developed by the THEMIS Workgroup

Standardized conventions

Page 10: Data Availability and Benchmarking in Observational Health Data … · 2018-11-14 · Data Availability and Benchmarking in Observational Health Data Sciences and Informatics (OHDSI)

Preparing your data for analysis

Patient-level data in source

system/ schema

Patient-level data in

OMOP CDM

ETL design

ETL implement

ETL test

WhiteRabbit: profile your source data

RabbitInAHat: map your source

structure to CDM tables and

fields

ATHENA: standardized vocabularies for all CDM

domains

ACHILLES: profile your CDM data;

review data quality

assessment; explore

population-level summaries

OH

DSI

to

ols

bu

ilt t

o h

elp

CDM: DDL, index,

constraints for Oracle, SQL

Server, PostgresQL;

Vocabulary tables with loading

scripts

http://github.com/OHDSI

OHDSI Forums:Public discussions for OMOP CDM Implementers/developers

Usagi: map your

source codes to CDM

vocabulary

Page 11: Data Availability and Benchmarking in Observational Health Data … · 2018-11-14 · Data Availability and Benchmarking in Observational Health Data Sciences and Informatics (OHDSI)

ACHILLES Heel Data Curation

Page 12: Data Availability and Benchmarking in Observational Health Data … · 2018-11-14 · Data Availability and Benchmarking in Observational Health Data Sciences and Informatics (OHDSI)

ATLAS to build, visualize, and analyze cohorts

Page 13: Data Availability and Benchmarking in Observational Health Data … · 2018-11-14 · Data Availability and Benchmarking in Observational Health Data Sciences and Informatics (OHDSI)

Characterize the cohorts of interest

Page 14: Data Availability and Benchmarking in Observational Health Data … · 2018-11-14 · Data Availability and Benchmarking in Observational Health Data Sciences and Informatics (OHDSI)

OHDSI in Action

• Characterization

Page 15: Data Availability and Benchmarking in Observational Health Data … · 2018-11-14 · Data Availability and Benchmarking in Observational Health Data Sciences and Informatics (OHDSI)

Treatment Pathways

Public

Industry

Regulator

AcademicsRCT, Obs

Literature

Lay press

Social media

Guidelines

Formulary

Labels

Advertising Clinician

Patient

Family

Consultant

Indication

Feasibility

Cost

Preference

Local stakeholders

Global stakeholders Conduits

Inputs

Evidence

Page 16: Data Availability and Benchmarking in Observational Health Data … · 2018-11-14 · Data Availability and Benchmarking in Observational Health Data Sciences and Informatics (OHDSI)

OHDSI in action:Chronic disease treatment pathways

• Conceived at AMIA

• Protocol written, code written and tested at 2sites

• Analysis submitted to OHDSI network

• Results submitted for 7 databases

15Nov2014

30Nov2014

2Dec2014

5Dec2014

Page 17: Data Availability and Benchmarking in Observational Health Data … · 2018-11-14 · Data Availability and Benchmarking in Observational Health Data Sciences and Informatics (OHDSI)

OHDSI participating data partnersAbbre-viation

Name Description Population, millions

AUSOM Ajou University School of Medicine South Korea; inpatient hospital EHR

2

CCAE MarketScan Commercial Claims and Encounters US private-payer claims 119

CPRD UK Clinical Practice Research Datalink UK; EHR from general practice 11

CUMC Columbia University Medical Center US; inpatient EHR 4

GE GE Centricity US; outpatient EHR 33

INPC Regenstrief Institute, Indiana Network for Patient Care

US; integrated health exchange 15

JMDC Japan Medical Data Center Japan; private-payer claims 3

MDCD MarketScan Medicaid Multi-State US; public-payer claims 17

MDCR MarketScan Medicare Supplemental and Coordination of Benefits

US; private and public-payer claims

9

OPTUM Optum ClinFormatics US; private-payer claims 40

STRIDE Stanford Translational Research Integrated Database Environment

US; inpatient EHR 2

HKU Hong Kong University Hong Kong; EHR 1

Page 18: Data Availability and Benchmarking in Observational Health Data … · 2018-11-14 · Data Availability and Benchmarking in Observational Health Data Sciences and Informatics (OHDSI)

Proceedings of the National Academy of Sciences, 2016

Page 19: Data Availability and Benchmarking in Observational Health Data … · 2018-11-14 · Data Availability and Benchmarking in Observational Health Data Sciences and Informatics (OHDSI)

Type 2 Diabetes Mellitus Hypertension Depression

OPTUM

GE

MDCDCUMC

INPC

MDCR

CPRD

JMDC

CCAE

Population-level heterogeneity across systems, and patient-level heterogeneity within systems

Page 20: Data Availability and Benchmarking in Observational Health Data … · 2018-11-14 · Data Availability and Benchmarking in Observational Health Data Sciences and Informatics (OHDSI)

Conclusions: Network research

• It is feasible to encode the world population in a single data model

• Generating evidence is feasible

• Stakeholders willing to share results

• Able to accommodate vast differences in privacy and research regulation

Page 21: Data Availability and Benchmarking in Observational Health Data … · 2018-11-14 · Data Availability and Benchmarking in Observational Health Data Sciences and Informatics (OHDSI)

howoften.org

• Incidence of side effects• Any drug on the world

market• Any condition• Absolute risk

• Not causal(Characterization)

• On the Internet

Page 22: Data Availability and Benchmarking in Observational Health Data … · 2018-11-14 · Data Availability and Benchmarking in Observational Health Data Sciences and Informatics (OHDSI)

OHDSI in Action

• Population-level estimation

Page 23: Data Availability and Benchmarking in Observational Health Data … · 2018-11-14 · Data Availability and Benchmarking in Observational Health Data Sciences and Informatics (OHDSI)

What is the quality of the current evidence from observational analyses?

August2010: “Among patients in the UK General Practice Research Database, the use of oral bisphosphonates was not significantly associated with incident esophageal or gastric cancer”

Sept2010: “In this large nested case-control study within a UK cohort [General Practice Research Database], we found a significantly increased risk of oesophagealcancer in people with previous prescriptions for oral bisphosphonates”

Page 24: Data Availability and Benchmarking in Observational Health Data … · 2018-11-14 · Data Availability and Benchmarking in Observational Health Data Sciences and Informatics (OHDSI)

Standard error vs effect size

Statistically significant

Page 25: Data Availability and Benchmarking in Observational Health Data … · 2018-11-14 · Data Availability and Benchmarking in Observational Health Data Sciences and Informatics (OHDSI)

Observational research results in literature

85% of exposure-outcome pairs have p < 0.05

29,982 estimates11,758 papers

Page 26: Data Availability and Benchmarking in Observational Health Data … · 2018-11-14 · Data Availability and Benchmarking in Observational Health Data Sciences and Informatics (OHDSI)

Addressing reproducibility

26

1. Propensity stratification with systematic variable

selection: measured confounding

2. Confidence interval calibration using negative

controls: unmeasured confounding

Page 27: Data Availability and Benchmarking in Observational Health Data … · 2018-11-14 · Data Availability and Benchmarking in Observational Health Data Sciences and Informatics (OHDSI)

Addressing reproducibility

3. Multiple databases, locations, practice types

4. Publish all hypotheses, code, parameters, runs

Page 28: Data Availability and Benchmarking in Observational Health Data … · 2018-11-14 · Data Availability and Benchmarking in Observational Health Data Sciences and Informatics (OHDSI)

Addressing reproducibility

5. Carry out on aligned hypotheses at scale

Page 29: Data Availability and Benchmarking in Observational Health Data … · 2018-11-14 · Data Availability and Benchmarking in Observational Health Data Sciences and Informatics (OHDSI)

Estimates are in line with expectations

11% of exposure-outcome pairs have calibrated p < 0.05

Page 30: Data Availability and Benchmarking in Observational Health Data … · 2018-11-14 · Data Availability and Benchmarking in Observational Health Data Sciences and Informatics (OHDSI)

OHDSI LEGEND Hypertension Study

Page 31: Data Availability and Benchmarking in Observational Health Data … · 2018-11-14 · Data Availability and Benchmarking in Observational Health Data Sciences and Informatics (OHDSI)

31

Whelton et al., Hypertension 2018

Page 32: Data Availability and Benchmarking in Observational Health Data … · 2018-11-14 · Data Availability and Benchmarking in Observational Health Data Sciences and Informatics (OHDSI)

Evidence to support the guideline

• 40 randomized trials

• Most decisions are“expert opinion”

Page 33: Data Availability and Benchmarking in Observational Health Data … · 2018-11-14 · Data Availability and Benchmarking in Observational Health Data Sciences and Informatics (OHDSI)

Theoretical Observed (n > 2,500)

Outcomes of interest 58 58

Target-comparator-outcomes 2,843,250 * 58 = 164,908,500 587,020

Comparisons of hypertension treatments

Theoretical Observed (n > 2,500)

Single ingredients 58 39

Single ingredient comparisons 58 * 57 = 3,306 1,296

Single drug classes 15 13

Single class comparisons 15 * 14 = 210 156

Dual ingredients 58 * 57 / 2 = 1,653 58

Single vs duo drug comparisons 58 * 1,653 = 95,874 3,810

Dual classes 15 * 14 / 2 = 105 32

Single vs duo class comparisons 15 * 105 = 1,575 832

Duo vs duo drug comparisons 1,653 * 1,652 = 2,730,756 2,784

Duo vs duo class comparisons 105 * 104 = 10,920 992

… … …

Total comparisons 2,843,250 10,278

33

Page 34: Data Availability and Benchmarking in Observational Health Data … · 2018-11-14 · Data Availability and Benchmarking in Observational Health Data Sciences and Informatics (OHDSI)
Page 35: Data Availability and Benchmarking in Observational Health Data … · 2018-11-14 · Data Availability and Benchmarking in Observational Health Data Sciences and Informatics (OHDSI)
Page 36: Data Availability and Benchmarking in Observational Health Data … · 2018-11-14 · Data Availability and Benchmarking in Observational Health Data Sciences and Informatics (OHDSI)

OHDSI in Action

• Patient-level prediction

Page 37: Data Availability and Benchmarking in Observational Health Data … · 2018-11-14 · Data Availability and Benchmarking in Observational Health Data Sciences and Informatics (OHDSI)

Patient-level predictionStroke risk in atrial fibrillation

Prevalence in patients without the outcome

Pre

vale

nce

in p

atie

nts

wit

h t

he

ou

tco

me

Size: valueRed: positiveGreen: negative

The OHDSI approach lets the model choose from all conditions and drugs

247 variables out of 16900 including:1. all the CHADS2 (afib stroke risk) markers2. plus some other variables that make clinical sense (ex: brain cancer,

smoking)3. plus some other variables that warrant further exploration (ex:

antiepileptic, COPD

Page 38: Data Availability and Benchmarking in Observational Health Data … · 2018-11-14 · Data Availability and Benchmarking in Observational Health Data Sciences and Informatics (OHDSI)

Model Discrimination

1.00

0.90

0.80

0.70

0.600.50

CC

AE

MD

CD

MD

CR

OP

TUM

AUC

AMI Hypothyroidism StrokeDiarrhea Nausea

Gradient boostingRandom forestRegularized regression

Page 39: Data Availability and Benchmarking in Observational Health Data … · 2018-11-14 · Data Availability and Benchmarking in Observational Health Data Sciences and Informatics (OHDSI)

Transportability Assessment Stroke

Random Forest

Gradient Boosting

Regularized RegressionCC

AE

MD

CD

MD

CR

OP

TUM

1.00

0.90

0.80

0.70

0.600.50

AUCMDCD MDCRCCAE OPTUM

Transportability to MDCR is low

Page 40: Data Availability and Benchmarking in Observational Health Data … · 2018-11-14 · Data Availability and Benchmarking in Observational Health Data Sciences and Informatics (OHDSI)

Conclusions

• It is feasible to create an enormous international research network

• Sites will volunteer to run studies

• Completely open

– Data model, methods, tools

• Concrete approach to address the credibility crisis

• Prediction

– It’s not ROC area

– New, useful information

Page 41: Data Availability and Benchmarking in Observational Health Data … · 2018-11-14 · Data Availability and Benchmarking in Observational Health Data Sciences and Informatics (OHDSI)

Join the journey

http://ohdsi.org