Southeastern Diabetes Initiative and Electronic Health Records April 28, 2015 Michael Pencina, PhD Director of Biostatistics, Duke Clinical Research Institute
Jan 17, 2016
Southeastern Diabetes Initiative and Electronic Health Records
April 28, 2015
Michael Pencina, PhDDirector of Biostatistics, Duke Clinical Research Institute
Center for Predictive Medicine (CPM)
Mission Statement:
To advance personalized medicine through development and application of novel approaches to quantify and communicate risk.
Composition:
Faculty and Staff Statisticians, Clinicians and Informaticists from DCRI, Scientists from Electrical and Computer Engineering and Information Initiative at Duke
Southeastern Diabetes InitiativeSupporting a platform for real-time monitoring and evaluation
of population health through spatially enabled data architecture and analytics
CENTRAL OBJECTIVES• Improve healthcare delivery at the individual- and population-level • Improve health outcomes and quality of life • Deploy multiple levels of intervention, stratified by risk factors of individual patients• Reduce overall healthcare costs for populations by reducing hospital and ED admissions
and major proceduresTHEMES
• Use of electronic health record and publicly available data to monitor the health of a population
• Use of analytics to target interventions where they are most needed• Longitudinal monitoring through data sharing with multiple health provider partners
Missing data
monthly data extract + risk algorithms
Lower intensity Higher intensity
Intervention Spectrum
Data Driven Approach to Implementation
LOW RISK MODERATE RISK HIGH RISKIndividual • Access to community
resources• Educational materials• Access to evidence
based primary and specialty care (provider education)
• Telephone-based interventions
• Mobile technology interventions
• Diabetes self-management classes
• Clinical practice QI
• In home team-based care
Neighborhood • Grocery store tours• Faith-based access to
education and screening
• Faith-based and neighborhood interventions
• Neighborhood canvassing
County • Public awareness (communication, social media)
• Integration of resources
• Evaluate access to healthcare in community
Region • Dissemination of best practices
Training, Research, Education
Data Driven Approach to Risk Prediction
2010 2011
Use data from 2010 to predict serious
outcomes in 2011
20092007 2008 2012Validate with data from other years
Defining “serious outcome”• inpatient with dx of MI, cardiac
arrest, heart failure, ventricular fibrillation, ischemic cardiomyopathy, CAD, revascularization procedure, stroke, vascular disease, PVD, DM renal complications, kidney disease, dialysis, amputation, foot infection/ulcer
• death
• Clinical data• Laboratory values• Demographics• Utilization
Moderate risk
SEDI Intervention Flow Chart
EHR dataProviderReferrals
Risk Algorithm
High risk Low risk
Telephone Module InterventionHigh-risk Intervention
• 12 monthly calls by interventionist• Mailers reinforcing/supplementing
information covered during calls Electronic data capture:• Medication adherence• Side effects• Routine exam/test compliance• Weight/Tobacco/Exercise goals
• 5 sets of home visits by team, occurring every 6 months
• Assistance also provided between visits• Teams include nurses, pharmacists, social
workers, dietitians, patient navigators, community health workers
Electronic data capture:• Demographics• Vital signs/physical exam• Laboratory results• ED/Inpatient visits• Medication adherence• Patient activation• Health literacy
≥90% 65-89% ≤65%
Quitman County, MSDurham County, NC
Cabarrus County, NCMingo County, WV
Community-based events, diabetes self-management classes, quality improvement with current sources of care, media activities, cooking demonstrations, etc.Data collected: Participation and demographics
Ensuring our computable phenotypes are accurate and valid
Challenges to valid EHR-based computable phenotypes
Our goal is to implement algorithms within the EHR that accurately identify patients with a given phenotype and those without. But –
• Limitations of EHR data
• Variation in published authoritative source computable phenotype definitions/algorithms
Limitations of EHR Data Can Result in Inaccurate/biased Computable Phenotypes
• Frequent missingness (often non-random)
-- Data encounter-based:
only recorded during healthcare episodes (bias – healthiest excluded)
-- Patients move around and/or go elsewhere for care (lost to follow-up)
• Inaccurate or uninterpretable
-- Errors are common (most data entered by busy providers during visit, or from recall)
-- Based on coding that is influenced by billing – systematic biases
-- Uninterpretable data (e.g., Units of measurement missing, qualitative assessments)
• Complexity and inconsistency
-- Clinical definitions, coding rules, data collection systems vary over time
-- Data collection can vary by providers at different locations
-- Much information is non-coded and stored in narrative notes
Variation in Published (Authoritative Source) Computable Phenotype Definitions
The CPM-SEDI Phenotype
Development Process
The Data
Source Data for the PlatformElectronic Health Record Data
Core Clinical Domains
1. Patient Demographics
2. Encounters
3. Diagnoses
4. Procedures
5. Labs
6. Vital Signs
7. Medications
8. Social History
Research Data stored in electronic data capture
systems
Publicly available census data – community resources, social data, resource and environmental data
Duke’s Decision Support Repository
Medicare and Medicaid Claims Data
NATIONAL CENTER FOR GEOSPATIAL MEDICINE
datamart
A Continuum of Informatics Services to Support the Analysis Pipeline
Phenotyping
21
Known Diabetic Patients >35 in 2010 = 14,330
With Any Encounter in 2011 = 11,548
With An Inpatient Encounter in 2011 = 2488
With A Serious Outcome in 2011 = 1742
80% Modeling Set
12% of 2010 patients;15% of modeling set
Serious Outcome = EITHER: at least 1 inpatient stay in 2011 with diagnosis of at least one of: • MI, cardiac arrest, ventricular fibrillation, ischemic cardiomyopathy, CAD, revascularization
procedure• stroke• vascular disease, PVD• diabetic renal complications, kidney disease, dialysis• amputation, foot infection/ulcer
OR: death
Outcome Definition
22
• Potential factors were identified by their predictive power in the absence of other factors• Factors included the following categories:
• Clinical diagnoses• Demographics: age, sex• Insurance status• Hospital / clinic utilization• Diabetes medications• Labs:
• Max random glucose• Min random glucose• Max HbA1c• Max systolic b/p• Max HDL • Min LDL
• Lab values had a significant number of missing values. In this preliminary analysis, we simply added an N/A indicator variable to the model. E.g.
• Glu.max was the maximum random glucose if present, 0 if not• Glu.max.na was 0 if a random glucose value was present, 1 if not
• We also allowed interaction of the lab values and N/A indicator with other (fully present) variables. (Interaction terms implicitly create a non-constant imputation formula.)
• 52 distinct betas were estimated, including those for missing values
Preliminary Logistic Regression Model: Factors
23
Logistic Regression Model: Betas Grouped By Factor Type
Factor Estimate z value Pr(>|z|) Signif. Factor Estimate z value Pr(>|z|) Signif.kidney_disease 1.0E+00 9.0 2.0E-16 *** glu.max 1.1E+00 4.2 2.2E-05 ***heart_failure 6.7E-01 7.6 2.3E-14 *** glu.max * age1.5 -1.5E-03 -3.3 0.0008 ***(Intercept) -1.3E+01 -7.3 4.1E-13 *** glu.max.na 4.6E+00 3.1 0.0017 **cad 6.8E-01 6.1 9.0E-10 *** glu.max.na * age1.5 -6.4E-03 -2.6 0.0099 **age1.5 1.6E-02 5.4 5.5E-08 *** glu.max.na * hypertension -3.9E-01 -1.7 0.0828 .metformin -4.7E-01 -5.2 2.1E-07 ***sex.m 1.4E+00 4.2 2.3E-05 *** glu.min -8.2E-03 -3.9 9.1E-05 ***stroke 5.9E-01 3.9 7.9E-05 *** glu.min * nI 2.4E-03 4.4 1.3E-05 ***obesity -3.4E-01 -3.8 0.0001 *** glu.min^2 2.2E-05 3.5 0.0004 ***copd 3.8E-01 3.8 0.0002 *** glu.min.na * kidney_disease 1.4E+00 2.7 0.0076 **alcohol -5.8E-01 -2.6 0.0083 **depression 1.6E-01 1.6 0.1012 a1cmax 5.9E-02 2.0 0.0445 *dialysis 3.3E-01 1.8 0.0742 . a1cmax * sex.m -1.4E-01 -3.6 0.0004 ***foot 4.2E-01 2.4 0.0165 * a1cmax.na 6.2E-01 2.5 0.0127 *pvd | vascular_disease 3.0E-01 3.2 0.0016 ** a1cmax.na * sex.m -1.0E+00 -2.9 0.0037 **revasc_proc 1.6E-01 1.5 0.1422smoker 2.3E-01 2.2 0.0247 * sys.max 2.8E-02 3.8 0.0001 ***substance 2.5E-01 1.3 0.1833 sys.max * age1.5 -3.3E-05 -2.6 0.0086 **
sys.max.na 4.4E+00 3.8 0.0001 ***medicaid 5.0E-01 3.4 0.0008 *** sys.max.na * age1.5 -5.0E-03 -2.5 0.0139 *medicare & age < 63 5.7E-01 4.6 4.4E-06 ***medicare 5.9E-02 0.6 0.5335 hdl.max -4.0E-01 -3.1 0.0019 **
hdl.max * sex.m -9.4E-02 -2.2 0.0304 *cad:stroke -4.7E-01 -2.4 0.0159 * hdl.max.na -2.0E+00 -3.3 0.0010 ***cad:metformin 5.0E-01 3.3 0.0011 **cad * kidney_disease -5.8E-01 -3.7 0.0002 *** (ldl.min - 100)^2 4.5E-05 2.5 0.0143 *kidney_disease * stroke -5.8E-01 -2.9 0.0037 ** ldl.min.na 1.7E-01 0.4 0.6949kidney_disease * medicaid 8.0E-01 2.7 0.0079 **
log(1 + nEI) 1.8E-01 1.6 0.1061log(1 + nEonly) 1.4E-01 2.3 0.0225 *
insulin 1.8E-01 2.3 0.0231 *pioglitazone | rosiglitazone -2.4E-01 -1.8 0.0676 .
24
0.0 0.2 0.4 0.6 0.8
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Calibration Plot
Predicted Fraction (quantile=20)
Obse
rved F
ract
ion
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
ROC
1-Specificity
Sensi
tivity
AUC = 0.836
Logistic Regression Model: ROC and Calibration
Next Steps
• Incorporate additional labs, particularly creatinine and triglycerides• But also look at hematocrit, hemoglobin and other standard lab values
• Add non-diabetes meds
Summary: Opportunities for Collaboration
• Data still curated and updated
• New phenotypes developed/validated
• Unlimited possibilities to expand
• Strong analytic team eager to take on cutting-edge questions and hypotheses