Southeastern Diabetes Initiative and Electronic Health Records April 28, 2015 Michael Pencina, PhD Director of Biostatistics, Duke Clinical Research Institute.

Southeastern Diabetes Initiative and Electronic Health Records

April 28, 2015

Michael Pencina, PhDDirector of Biostatistics, Duke Clinical Research Institute

Center for Predictive Medicine (CPM)

Mission Statement:

To advance personalized medicine through development and application of novel approaches to quantify and communicate risk.

Composition:

Faculty and Staff Statisticians, Clinicians and Informaticists from DCRI, Scientists from Electrical and Computer Engineering and Information Initiative at Duke

Southeastern Diabetes InitiativeSupporting a platform for real-time monitoring and evaluation

of population health through spatially enabled data architecture and analytics

CENTRAL OBJECTIVES• Improve healthcare delivery at the individual- and population-level • Improve health outcomes and quality of life • Deploy multiple levels of intervention, stratified by risk factors of individual patients• Reduce overall healthcare costs for populations by reducing hospital and ED admissions

and major proceduresTHEMES

• Use of electronic health record and publicly available data to monitor the health of a population

• Use of analytics to target interventions where they are most needed• Longitudinal monitoring through data sharing with multiple health provider partners

Missing data

monthly data extract + risk algorithms

Lower intensity Higher intensity

Intervention Spectrum

Data Driven Approach to Implementation

LOW RISK MODERATE RISK HIGH RISKIndividual • Access to community

resources• Educational materials• Access to evidence

based primary and specialty care (provider education)

• Telephone-based interventions

• Mobile technology interventions

• Diabetes self-management classes

• Clinical practice QI

• In home team-based care

Neighborhood • Grocery store tours• Faith-based access to

education and screening

• Faith-based and neighborhood interventions

• Neighborhood canvassing

County • Public awareness (communication, social media)

• Integration of resources

• Evaluate access to healthcare in community

Region • Dissemination of best practices

Training, Research, Education

Data Driven Approach to Risk Prediction

2010 2011

Use data from 2010 to predict serious

outcomes in 2011

20092007 2008 2012Validate with data from other years

Defining “serious outcome”• inpatient with dx of MI, cardiac

arrest, heart failure, ventricular fibrillation, ischemic cardiomyopathy, CAD, revascularization procedure, stroke, vascular disease, PVD, DM renal complications, kidney disease, dialysis, amputation, foot infection/ulcer

• death

• Clinical data• Laboratory values• Demographics• Utilization

Moderate risk

SEDI Intervention Flow Chart

EHR dataProviderReferrals

Risk Algorithm

High risk Low risk

Telephone Module InterventionHigh-risk Intervention

• 12 monthly calls by interventionist• Mailers reinforcing/supplementing

information covered during calls Electronic data capture:• Medication adherence• Side effects• Routine exam/test compliance• Weight/Tobacco/Exercise goals

• 5 sets of home visits by team, occurring every 6 months

• Assistance also provided between visits• Teams include nurses, pharmacists, social

workers, dietitians, patient navigators, community health workers

Electronic data capture:• Demographics• Vital signs/physical exam• Laboratory results• ED/Inpatient visits• Medication adherence• Patient activation• Health literacy

≥90% 65-89% ≤65%

Quitman County, MSDurham County, NC

Cabarrus County, NCMingo County, WV

Community-based events, diabetes self-management classes, quality improvement with current sources of care, media activities, cooking demonstrations, etc.Data collected: Participation and demographics

Ensuring our computable phenotypes are accurate and valid

Challenges to valid EHR-based computable phenotypes

Our goal is to implement algorithms within the EHR that accurately identify patients with a given phenotype and those without. But –

• Limitations of EHR data

• Variation in published authoritative source computable phenotype definitions/algorithms

Limitations of EHR Data Can Result in Inaccurate/biased Computable Phenotypes

• Frequent missingness (often non-random)

-- Data encounter-based:

only recorded during healthcare episodes (bias – healthiest excluded)

-- Patients move around and/or go elsewhere for care (lost to follow-up)

• Inaccurate or uninterpretable

-- Errors are common (most data entered by busy providers during visit, or from recall)

-- Based on coding that is influenced by billing – systematic biases

-- Uninterpretable data (e.g., Units of measurement missing, qualitative assessments)

• Complexity and inconsistency

-- Clinical definitions, coding rules, data collection systems vary over time

-- Data collection can vary by providers at different locations

-- Much information is non-coded and stored in narrative notes

Variation in Published (Authoritative Source) Computable Phenotype Definitions

The CPM-SEDI Phenotype

Development Process

The Data

Source Data for the PlatformElectronic Health Record Data

Core Clinical Domains

1. Patient Demographics

2. Encounters

3. Diagnoses

4. Procedures

5. Labs

6. Vital Signs

7. Medications

8. Social History

Research Data stored in electronic data capture

systems

Publicly available census data – community resources, social data, resource and environmental data

Duke’s Decision Support Repository

Medicare and Medicaid Claims Data

NATIONAL CENTER FOR GEOSPATIAL MEDICINE

datamart

A Continuum of Informatics Services to Support the Analysis Pipeline

Phenotyping

21

Known Diabetic Patients >35 in 2010 = 14,330

With Any Encounter in 2011 = 11,548

With An Inpatient Encounter in 2011 = 2488

With A Serious Outcome in 2011 = 1742

80% Modeling Set

12% of 2010 patients;15% of modeling set

Serious Outcome = EITHER: at least 1 inpatient stay in 2011 with diagnosis of at least one of: • MI, cardiac arrest, ventricular fibrillation, ischemic cardiomyopathy, CAD, revascularization

procedure• stroke• vascular disease, PVD• diabetic renal complications, kidney disease, dialysis• amputation, foot infection/ulcer

OR: death

Outcome Definition

22

• Potential factors were identified by their predictive power in the absence of other factors• Factors included the following categories:

• Clinical diagnoses• Demographics: age, sex• Insurance status• Hospital / clinic utilization• Diabetes medications• Labs:

• Max random glucose• Min random glucose• Max HbA1c• Max systolic b/p• Max HDL • Min LDL

• Lab values had a significant number of missing values. In this preliminary analysis, we simply added an N/A indicator variable to the model. E.g.

• Glu.max was the maximum random glucose if present, 0 if not• Glu.max.na was 0 if a random glucose value was present, 1 if not

• We also allowed interaction of the lab values and N/A indicator with other (fully present) variables. (Interaction terms implicitly create a non-constant imputation formula.)

• 52 distinct betas were estimated, including those for missing values

Preliminary Logistic Regression Model: Factors

23

Logistic Regression Model: Betas Grouped By Factor Type

Factor Estimate z value Pr(>|z|) Signif. Factor Estimate z value Pr(>|z|) Signif.kidney_disease 1.0E+00 9.0 2.0E-16 *** glu.max 1.1E+00 4.2 2.2E-05 ***heart_failure 6.7E-01 7.6 2.3E-14 *** glu.max * age1.5 -1.5E-03 -3.3 0.0008 ***(Intercept) -1.3E+01 -7.3 4.1E-13 *** glu.max.na 4.6E+00 3.1 0.0017 **cad 6.8E-01 6.1 9.0E-10 *** glu.max.na * age1.5 -6.4E-03 -2.6 0.0099 **age1.5 1.6E-02 5.4 5.5E-08 *** glu.max.na * hypertension -3.9E-01 -1.7 0.0828 .metformin -4.7E-01 -5.2 2.1E-07 ***sex.m 1.4E+00 4.2 2.3E-05 *** glu.min -8.2E-03 -3.9 9.1E-05 ***stroke 5.9E-01 3.9 7.9E-05 *** glu.min * nI 2.4E-03 4.4 1.3E-05 ***obesity -3.4E-01 -3.8 0.0001 *** glu.min^2 2.2E-05 3.5 0.0004 ***copd 3.8E-01 3.8 0.0002 *** glu.min.na * kidney_disease 1.4E+00 2.7 0.0076 **alcohol -5.8E-01 -2.6 0.0083 **depression 1.6E-01 1.6 0.1012 a1cmax 5.9E-02 2.0 0.0445 *dialysis 3.3E-01 1.8 0.0742 . a1cmax * sex.m -1.4E-01 -3.6 0.0004 ***foot 4.2E-01 2.4 0.0165 * a1cmax.na 6.2E-01 2.5 0.0127 *pvd | vascular_disease 3.0E-01 3.2 0.0016 ** a1cmax.na * sex.m -1.0E+00 -2.9 0.0037 **revasc_proc 1.6E-01 1.5 0.1422smoker 2.3E-01 2.2 0.0247 * sys.max 2.8E-02 3.8 0.0001 ***substance 2.5E-01 1.3 0.1833 sys.max * age1.5 -3.3E-05 -2.6 0.0086 **

sys.max.na 4.4E+00 3.8 0.0001 ***medicaid 5.0E-01 3.4 0.0008 *** sys.max.na * age1.5 -5.0E-03 -2.5 0.0139 *medicare & age < 63 5.7E-01 4.6 4.4E-06 ***medicare 5.9E-02 0.6 0.5335 hdl.max -4.0E-01 -3.1 0.0019 **

hdl.max * sex.m -9.4E-02 -2.2 0.0304 *cad:stroke -4.7E-01 -2.4 0.0159 * hdl.max.na -2.0E+00 -3.3 0.0010 ***cad:metformin 5.0E-01 3.3 0.0011 **cad * kidney_disease -5.8E-01 -3.7 0.0002 *** (ldl.min - 100)^2 4.5E-05 2.5 0.0143 *kidney_disease * stroke -5.8E-01 -2.9 0.0037 ** ldl.min.na 1.7E-01 0.4 0.6949kidney_disease * medicaid 8.0E-01 2.7 0.0079 **

log(1 + nEI) 1.8E-01 1.6 0.1061log(1 + nEonly) 1.4E-01 2.3 0.0225 *

insulin 1.8E-01 2.3 0.0231 *pioglitazone | rosiglitazone -2.4E-01 -1.8 0.0676 .

24

0.0 0.2 0.4 0.6 0.8

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Calibration Plot

Predicted Fraction (quantile=20)

Obse

rved F

ract

ion

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

ROC

1-Specificity

Sensi

tivity

AUC = 0.836

Logistic Regression Model: ROC and Calibration

Next Steps

• Incorporate additional labs, particularly creatinine and triglycerides• But also look at hematocrit, hemoglobin and other standard lab values

• Add non-diabetes meds

Summary: Opportunities for Collaboration

• Data still curated and updated

• New phenotypes developed/validated

• Unlimited possibilities to expand

• Strong analytic team eager to take on cutting-edge questions and hypotheses

Southeastern Diabetes Initiative and Electronic Health Records April 28, 2015 Michael Pencina, PhD Director of Biostatistics, Duke Clinical Research Institute.

Documents

health outcomes

data sharing

electronic health records

available data

calls electronic data

enabled data architecture

serious outcomes

information initiative