Top Banner
High-Throughput Phenotyping and Cohort Identification from Electronic Health Records for Clinical and Translational Research Jyotishman Pathak, PhD Assistant Professor of Biomedical Informatics Health Sciences Research Grand Rounds April 23, 2012
59

High-Throughput Phenotyping and Cohort Identification from Electronic Health Records for Clinical and Translational Research

Feb 25, 2016

Download

Documents

Bo Minh

High-Throughput Phenotyping and Cohort Identification from Electronic Health Records for Clinical and Translational Research. Jyotishman Pathak, PhD Assistant Professor of Biomedical Informatics. Health Sciences Research Grand Rounds April 23, 2012. Background – The Problem. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: High-Throughput  Phenotyping and Cohort Identification from Electronic Health Records for Clinical and Translational Research

High-Throughput Phenotyping and Cohort Identification from Electronic Health Records for Clinical and Translational ResearchJyotishman Pathak, PhDAssistant Professor of Biomedical Informatics

Health Sciences Research Grand RoundsApril 23, 2012

Page 2: High-Throughput  Phenotyping and Cohort Identification from Electronic Health Records for Clinical and Translational Research

High-Throughput Phenotyping from EHRs

Background – The Problem

• Patient recruitment is a huge bottleneck step in conducting successful clinical research studies• 50% of time is spent in recruitment

• Low participant rates (~ 5%); studies are underpowered

• Clinicians: lack resources to help patients find appropriate studies and trials

• Patients: face difficultly to find appropriate studies that are locally available

©2012 MFMER | slide-2

Page 3: High-Throughput  Phenotyping and Cohort Identification from Electronic Health Records for Clinical and Translational Research

High-Throughput Phenotyping from EHRs

Background – Use Cases• Large-scale genomics research

• Linking biospecimens and genetic data to personal health data via biorepositories

• Need large sample sizes for study design

• Population-based epidemiological studies in understanding disease etiology• Often limited in scope or population diversity

• Quality metrics and HITECH Act• Pay-for-Performance and quality-based incentives• Population management and cohort identification is non-

trivial

©2012 MFMER | slide-3

Page 4: High-Throughput  Phenotyping and Cohort Identification from Electronic Health Records for Clinical and Translational Research

High-Throughput Phenotyping from EHRs

Electronic health records (EHRs) driven phenotyping – The Proposed Solution

• EHRs are becoming more and more prevalent within the U.S. healthcare system• Meaningful Use is one of the major drivers

• Overarching goal• To develop techniques and algorithms that

operate on normalized EHR data to identify cohorts of potentially eligible subjects on the basis of disease, symptoms, or related findings

©2012 MFMER | slide-4

Page 5: High-Throughput  Phenotyping and Cohort Identification from Electronic Health Records for Clinical and Translational Research

High-Throughput Phenotyping from EHRs

Advantages: EHR-derived phenotyping• There is a LOT of information about subjects

• Demographics, labs, meds, procedures, clinical notes…• Identification of otherwise latent population differences

• Minimal costs for case ascertainment, no study-specific recruitment

• Records are “retrospectively longitudinal”• Records are real world and contain many different

phenotypes• Transportability and reuse of phenotype definitions

across EHR enabled sites = power for clinical and research studies

©2012 MFMER | slide-5

Page 6: High-Throughput  Phenotyping and Cohort Identification from Electronic Health Records for Clinical and Translational Research

High-Throughput Phenotyping from EHRs

Challenges: EHR-derived phenotyping

• There is a LOT of information about subjects…• Non-standardized, heterogeneous, unstructured

data (compared to protocol-based structured data collection)

• Measured (e.g., demographics) vs. un-measured (e.g., socio-economic status) population differences

• Hospital specialization and coding practices• Population/regional market landscape

©2012 MFMER | slide-6

Page 7: High-Throughput  Phenotyping and Cohort Identification from Electronic Health Records for Clinical and Translational Research

High-Throughput Phenotyping from EHRs

The challenges can be addressed…if we• Develop techniques for standardization and

normalization of clinical data and phenotypes• Develop techniques for transforming and

managing unstructured clinical text into structured representations

• Develop techniques for transportability of EHR-driven phenotyping algorithms

• Develop a scalable, robust and flexible framework for demonstrating all of the above in a “real-world setting”

©2012 MFMER | slide-7

Page 8: High-Throughput  Phenotyping and Cohort Identification from Electronic Health Records for Clinical and Translational Research

High-Throughput Phenotyping from EHRs ©2012 MFMER | slide-8

http://gwas.org

Page 9: High-Throughput  Phenotyping and Cohort Identification from Electronic Health Records for Clinical and Translational Research

High-Throughput Phenotyping from EHRs

• Funded by the NHGRI/NIGMS• Goal: to assess utility of EHRs as resources for genome

science• Each site includes a biorepository linked to EHRs• Each project includes informatics, biostatistics, community

engagement, ELSI, genetics experts• Initial proposals included identifying a primary phenotype of

interest in 3,000 subjects and conduct of a genome-wide association study at each center: Σ=18,000

• eMERGE Phase II has a target of developing ~40 phenotype algorithms by the end of 2014

• Algorithm transportability an integral component

©2012 MFMER | slide-9

Page 10: High-Throughput  Phenotyping and Cohort Identification from Electronic Health Records for Clinical and Translational Research

High-Throughput Phenotyping from EHRs

EHR-based Phenotyping Algorithms• Typical components

• Billing and diagnoses codes• Procedure codes• Labs• Medications• Phenotype-specific co-variates (e.g., Demographics,

Vitals, Smoking Status, CASI scores)• Pathology• Imaging?

• Organized into inclusion and exclusion criteria

©2012 MFMER | slide-10

Page 11: High-Throughput  Phenotyping and Cohort Identification from Electronic Health Records for Clinical and Translational Research

High-Throughput Phenotyping from EHRs

EHR-based Phenotyping Algorithms

• Iteratively refine case definitions through partial manual review to achieve ~PPV ≥ 95%

• For controls, exclude all potentially overlapping syndromes and possible matches; iteratively refine such that ~NPV ≥ 98%

©2012 MFMER | slide-11

Page 12: High-Throughput  Phenotyping and Cohort Identification from Electronic Health Records for Clinical and Translational Research

High-Throughput Phenotyping from EHRs

DataTransformTransform

Algorithm Development Process

PhenotypeAlgorithm

Visualization

Evaluation

NLP, SQL

Rules

Mappings [eMERGE Network]

©2012 MFMER | slide-12

Page 13: High-Throughput  Phenotyping and Cohort Identification from Electronic Health Records for Clinical and Translational Research

High-Throughput Phenotyping from EHRs

Hypothyroidism: Initial Algorithm

No secondary causes (e.g., pregnancy, ablation)

No ICD-9s forHypothyroidism

NoAbnormalTSH/FT4

No Antiboides for TTG/TPO

ICD-9s forHypothyroidism

Antibodies forTTG or TPO(anti-thyroglobulin,anti-thyroperidase)

AbnormalTSH/FT4

No thyroid-altering medications (e.g., Phenytoin, Lithium)

Thyroid replace. meds

Case 1 Case 2

No thyroid replace. meds

Control

2+ non-acute visits in 3 yrs

No hx of myasthenia gravis

©2012 MFMER | slide-13

[Denny et al., 2012]

Page 14: High-Throughput  Phenotyping and Cohort Identification from Electronic Health Records for Clinical and Translational Research

High-Throughput Phenotyping from EHRs

Hypothyroidism: Initial Algorithm

©2012 MFMER | slide-14[Conway et al. 2011]

Page 15: High-Throughput  Phenotyping and Cohort Identification from Electronic Health Records for Clinical and Translational Research

High-Throughput Phenotyping from EHRs

Hypothyroidism: Algorithm Refinement

No secondary causes (e.g., pregnancy, ablation)

No ICD-9s forHypothyroidism

NoAbnormalTSH/FT4

No Antiboides for TTG/TPO

ICD-9s forHypothyroidism

Antiboides forTTG or TPO(anti-thyroglobulin,anti-thyroperidase)

AbnormalTSH/FT4

No thyroid-altering medications (e.g., Phenytoin, Lithium)

Thyroid replace. meds

Case 1 Case 2

No thyroid replace. meds

Control

2+ non-acute visits in 3 yrs

No hx of myasthenia gravis

©2012 MFMER | slide-15

[Denny et al., 2012]

Page 16: High-Throughput  Phenotyping and Cohort Identification from Electronic Health Records for Clinical and Translational Research

High-Throughput Phenotyping from EHRs

New Hypothyroidism Algorithm: ValidationPositive Predictive Values (PPV) Based on Chart Review – All Sites

SiteEHR-based

Cases/Controls

Sampled forChart Review

Cases/ControlsOld CasePPV (%)

New Case PPV (%)

Group Health 430/1,188 50/50 92 98

Marshfield 509/1193 50/50 88 91

Mayo Clinic 250/2,145 100/100 76 97

Northwestern 103/516 50/50 88 98

Vanderbilt 184/1,344 50/50 90 98All sites 1,421/6,362 — 87 96

©2012 MFMER | slide-16

[Denny et al., 2012]

Page 17: High-Throughput  Phenotyping and Cohort Identification from Electronic Health Records for Clinical and Translational Research

Data Categories used to define the EHR-driven Phenotyping Algorithms

Clinical gold standard

EHR-derived phenotype

Phenotype Definitions

Validation (PPV/NPV)

Alzheimer’s Dementia

Demographics, clinical examination of mental status, histopathologic examination

Diagnoses, medications

Demographics, laboratory tests, radiology reports

73%

Cataracts Clinical exam finding (Ophthalmologic examination)

Diagnoses, procedure codes

Demographics, medications

98%/98%

Peripheral Arterial Disease

Radiology test results (ankle-brachial index or arteriography)

Diagnoses, procedure codes, medications, radiology test results

Demographics 94%/99%

Type 2 Diabetes Laboratory Tests Diagnoses, laboratory tests, medications

Demographics, height, weight, family history

98%/100%

Cardiac Conduction

ECG measurements ECG report results Demographics, diagnoses, procedure codes, medications, laboratory tests

97%

[eMERGE Network]©2012 MFMER | slide-17

Page 18: High-Throughput  Phenotyping and Cohort Identification from Electronic Health Records for Clinical and Translational Research

The Linked Clinical Data (LCD) Project

0.5 5

Genotype-Phenotype Association Results

0.5 50.5 5.01.0

Odds Ratio

rs2200733 Chr. 4q25rs10033464 Chr. 4q25rs11805303 IL23Rrs17234657 Chr. 5rs1000113 Chr. 5rs17221417 NOD2rs2542151 PTPN22rs3135388 DRB1*1501rs2104286 IL2RArs6897932 IL7RArs6457617 Chr. 6rs6679677 RSBN1rs2476601 PTPN22rs4506565 TCF7L2rs12255372 TCF7L2rs12243326 TCF7L2rs10811661 CDKN2Brs8050136 FTOrs5219 KCNJ11rs5215 KCNJ11rs4402960 IGF2BP2

Atrial fibrillation

Crohn's disease

Multiple sclerosis

Rheumatoid arthritis

Type 2 diabetes

disease gene / regionmarker

2.0[Ritchie et al. 2010]

observedpublished

©2012 MFMER | slide-18

Page 19: High-Throughput  Phenotyping and Cohort Identification from Electronic Health Records for Clinical and Translational Research

High-Throughput Phenotyping from EHRs

Key lessons learned from eMERGE• Algorithm design and transportability

• Non-trivial; requires significant expert involvement• Highly iterative process• Time-consuming manual chart reviews• Representation of “phenotype logic” for transportability

is critical

• Standardized data access and representation• Importance of unified vocabularies, data elements, and

value sets• Questionable reliability of ICD & CPT codes (e.g., billing

the wrong code since it is easier to find)• Natural Language Processing (NLP) needs

©2012 MFMER | slide-19

Page 20: High-Throughput  Phenotyping and Cohort Identification from Electronic Health Records for Clinical and Translational Research

High-Throughput Phenotyping from EHRs

DataTransformTransform

Algorithm Development Process - Modified

PhenotypeAlgorithm

Visualization

Evaluation

NLP, SQL

Rules

Mappings

Semi-Automatic Execution

[eMERGE Network]

©2012 MFMER | slide-20

Page 21: High-Throughput  Phenotyping and Cohort Identification from Electronic Health Records for Clinical and Translational Research

High-Throughput Phenotyping from EHRs

• Mission: To enable the use of EHR data for secondary purposes, such as clinical research and public health. Leveraging clinical and health informatics to:

•generate new knowledge•improve care•address population needs

http://sharpn.org

Strategic Health IT Advance Research Projects (SHARPn): Secondary Uses of

EHR Data

©2012 MFMER | slide-21

[Chute et al. 2011]

Page 22: High-Throughput  Phenotyping and Cohort Identification from Electronic Health Records for Clinical and Translational Research

High-Throughput Phenotyping from EHRs

SHARPn: Secondary Use of EHR DataA $15M National Consortium

• Harvard University• Intermountain Healthcare• Mayo Clinic• Mirth Corporation, Inc.• MIT • MITRE Corp. • Regenstrief Institute, Inc.• SUNY, Buffalo • University of Colorado

• Agilex Technologies• CDISC (Clinical Data Interchange

Standards Consortium)• Centerphase Solutions• Deloitte• Group Health, Seattle• IBM Watson Research Labs• University of Utah• University of Pittsburgh

©2012 MFMER | slide-22

Page 23: High-Throughput  Phenotyping and Cohort Identification from Electronic Health Records for Clinical and Translational Research

High-Throughput Phenotyping from EHRs

Cross-integrated suite of projects

©2012 MFMER | slide-23

Page 24: High-Throughput  Phenotyping and Cohort Identification from Electronic Health Records for Clinical and Translational Research

High-Throughput Phenotyping from EHRs

DataTransformTransform

Algorithm Development Process - Modified

PhenotypeAlgorithm

Visualization

Evaluation

NLP, SQL

Rules

Mappings

Semi-Automatic Execution

©2012 MFMER | slide-24

• Standardized representation of clinical data

• Create new and re-use existing clinical element models (CEMs)

• Standardized and structured representation of phenotype definition criteria

• Use the NQF Quality Data Model (QDM)

• Conversion of structured phenotype criteria into executable queries

• Use JBoss® Drools (DRLs)

[Welch et al. 2012][Thompson et al., submitted 2012]

[Li et al., submitted 2012]

Page 25: High-Throughput  Phenotyping and Cohort Identification from Electronic Health Records for Clinical and Translational Research

High-Throughput Phenotyping from EHRs

The SHARPn “phenotyping funnel”

©2012 MFMER | slide-25

Phenotype specific patient cohorts

DRLs

QDMs

CEMs

[Welch et al. 2012][Thompson et al., submitted 2012]

[Li et al., submitted 2012]

Intermountain EHR

Mayo Clinic EHR

Page 26: High-Throughput  Phenotyping and Cohort Identification from Electronic Health Records for Clinical and Translational Research

High-Throughput Phenotyping from EHRs

Clinical data normalization• Data Normalization

• Clinical data comes in all different forms even for the same kind of information

• Comparable and consistent data is foundational to secondary use

• Clinical Element Models (CEMs)• Basis for retaining computable meaning when data

is exchanged between heterogeneous computer systems

• Basis for shared computable meaning when clinical data is referenced in decision support logic

©2012 MFMER | slide-26

Page 27: High-Throughput  Phenotyping and Cohort Identification from Electronic Health Records for Clinical and Translational Research

The Linked Clinical Data (LCD) Project

Clinical Element ModelsHigher-Order Structured Representations

©2012 MFMER | slide-27

[Stan Huff, IHC]

Page 28: High-Throughput  Phenotyping and Cohort Identification from Electronic Health Records for Clinical and Translational Research

The Linked Clinical Data (LCD) Project

Pre- and Post-Coordination

©2012 MFMER | slide-28

[Stan Huff, IHC]

Page 29: High-Throughput  Phenotyping and Cohort Identification from Electronic Health Records for Clinical and Translational Research

High-Throughput Phenotyping from EHRs [Stan Huff, IHC]

Page 30: High-Throughput  Phenotyping and Cohort Identification from Electronic Health Records for Clinical and Translational Research

High-Throughput Phenotyping from EHRs

Data element harmonization

• Stan Huff (Intermountain Healthcare)• Clinical Information Model Initiative (CIMI)

• NHS Clinical Statement• CEN TC251/OpenEHR Archetypes• HL7 Templates• ISO TC215 Detailed Clinical Models• CDISC Common Clinical Elements• Intermountain/GE CEMs

©2012 MFMER | slide-30

Page 31: High-Throughput  Phenotyping and Cohort Identification from Electronic Health Records for Clinical and Translational Research

High-Throughput Phenotyping from EHRs

SHARPn data normalization flow - I

©2012 MFMER | slide-31

Page 32: High-Throughput  Phenotyping and Cohort Identification from Electronic Health Records for Clinical and Translational Research

©2012 MFMER | slide-32

SHARPn data normalization flow - II

CEM MySQL database with normalized patient information

[Welch et al. 2012]

Page 33: High-Throughput  Phenotyping and Cohort Identification from Electronic Health Records for Clinical and Translational Research

High-Throughput Phenotyping from EHRs

DataTransformTransform

Algorithm Development Process - Modified

PhenotypeAlgorithm

Visualization

Evaluation

NLP, SQL

Rules

Mappings

Semi-Automatic Execution

©2012 MFMER | slide-34

• Standardized representation of clinical data

• Create new and re-use existing clinical element models (CEMs)

• Standardized and structured representation of phenotype definition criteria

• Use the NQF Quality Data Model (QDM)

[Welch et al. 2012][Thompson et al., submitted 2012]

[Li et al., submitted 2012]

Page 34: High-Throughput  Phenotyping and Cohort Identification from Electronic Health Records for Clinical and Translational Research

High-Throughput Phenotyping from EHRs

NQF Quality Data Model (QDM) - I• Standard of the National Quality Forum (NQF)

• A standard structure and grammar to represent quality measures precisely and accurately in a standardized format that can be used across electronic patient care systems

• First (and only) standard for “eMeasures”• “All patients 65 years of age or older with at least two

provider visits during the measurement period receiving influenza vaccine subcutaneously”

• Implemented as set of XML schemas• Links to standard terminologies (ICD-9, ICD-10,

SNOMED-CT, CPT-4, LOINC, RxNorm etc.)

©2012 MFMER | slide-35

Page 35: High-Throughput  Phenotyping and Cohort Identification from Electronic Health Records for Clinical and Translational Research

High-Throughput Phenotyping from EHRs

NQF Quality Data Model (QDM) - II• Supports temporality & sequences

• AND: "Procedure, Performed: eye exam" > 1 year(s) starts before or during "Measurement end date"

• Groups of codes in a code set (ICD9, etc.)• Can group groups• Represented by OIDs, requires lookup• "Diagnosis, Active: steroid induced diabetes" using

"steroid induced diabetes Value Set GROUPING (2.16.840.1.113883.3.464.0001.113)”

• Focus on structured data• Would require extensions for NLP

©2012 MFMER | slide-36

Page 36: High-Throughput  Phenotyping and Cohort Identification from Electronic Health Records for Clinical and Translational Research

High-Throughput Phenotyping from EHRs ©2012 MFMER | slide-37

116 Meaningful Use Phase I Quality Measures

Page 37: High-Throughput  Phenotyping and Cohort Identification from Electronic Health Records for Clinical and Translational Research

High-Throughput Phenotyping from EHRs

Example: Diabetes & Lipid Mgmt. - I

©2012 MFMER | slide-38

Page 38: High-Throughput  Phenotyping and Cohort Identification from Electronic Health Records for Clinical and Translational Research

High-Throughput Phenotyping from EHRs

Example: Diabetes & Lipid Mgmt. - II

©2012 MFMER | slide-39

Page 39: High-Throughput  Phenotyping and Cohort Identification from Electronic Health Records for Clinical and Translational Research

High-Throughput Phenotyping from EHRs

NQF Measure Authoring Tool (MAT)

©2012 MFMER | slide-40

Page 40: High-Throughput  Phenotyping and Cohort Identification from Electronic Health Records for Clinical and Translational Research

High-Throughput Phenotyping from EHRs

Our task: human readable machine computable

©2012 MFMER | slide-41

[Thompson et al., submitted 2012]

Page 41: High-Throughput  Phenotyping and Cohort Identification from Electronic Health Records for Clinical and Translational Research

High-Throughput Phenotyping from EHRs

DataTransformTransform

Algorithm Development Process - Modified

PhenotypeAlgorithm

Visualization

Evaluation

NLP, SQL

Rules

Mappings

Semi-Automatic Execution

©2012 MFMER | slide-42

• Standardized representation of clinical data

• Create new and re-use existing clinical element models (CEMs)

• Standardized and structured representation of phenotype definition criteria

• Use the NQF Quality Data Model (QDM)

• Conversion of structured phenotype criteria into executable queries

• Use JBoss® Drools (DRLs)

[Welch et al. 2012][Thompson et al., submitted 2012]

[Li et al., submitted 2012]

Page 42: High-Throughput  Phenotyping and Cohort Identification from Electronic Health Records for Clinical and Translational Research

High-Throughput Phenotyping from EHRs

JBoss® open-source Drools environment

• Represents knowledge with declarative production rules• Origins in artificial intelligence expert systems• Simple when <pattern> then <action> rules

specified in text files• Separation of data and logic into separate

components• Forward chaining inference model (Rete algorithm)• Domain specific languages (DSL)

©2012 MFMER | slide-43

Page 43: High-Throughput  Phenotyping and Cohort Identification from Electronic Health Records for Clinical and Translational Research

High-Throughput Phenotyping from EHRs

Drools inference architecture

©2012 MFMER | slide-44

Inference Execution Model Define a Knowledge Base

• Compiled Rules• Produces Production Memory

Extract Knowledge Session from Knowledge Base

Insert Facts (data) into Knowledge Session “Agenda”

Fire Rules (Race Conditions/Infinite Loops)

Retrieve End Results

Page 44: High-Throughput  Phenotyping and Cohort Identification from Electronic Health Records for Clinical and Translational Research

High-Throughput Phenotyping from EHRs

Example Drools rule

©2012 MFMER | slide-46

rule "Glucose <= 40, Insulin On“

when $msg : GlucoseMsg(glucoseFinding <= 40,

currentInsulinDrip > 0 )then

glucoseProtocolResult.setInstruction(GlucoseInstructions.GLUCOSE_LESS_THAN_40_INSULIN_ON_MSG);end

{binding} {Java Class} {Class Getter Method}

Parameter {Java Class}

{Class Setter Method}

{Rule Name}

Page 45: High-Throughput  Phenotyping and Cohort Identification from Electronic Health Records for Clinical and Translational Research

High-Throughput Phenotyping from EHRs

The “obvious” slide - T2DM Drools flow

©2012 MFMER | slide-47

Page 46: High-Throughput  Phenotyping and Cohort Identification from Electronic Health Records for Clinical and Translational Research

High-Throughput Phenotyping from EHRs

Automatic translation from NQF QDM criteria to Drools

Measure Authoring

Toolkit

Drools Engine

From non-executable to executable

Data TypesXML-based structured

representation

Value Setssaved in XLS

files

MeasuresXML-basedStructured

representation

Mapping data typesand value sets

Fact Models

Converting measures to Drools scripts

Droolsscripts

©2012 MFMER | slide-48

[Li et al., submitted 2012]

Page 47: High-Throughput  Phenotyping and Cohort Identification from Electronic Health Records for Clinical and Translational Research

High-Throughput Phenotyping from EHRs

SHARPn phenotyping architecture using CEMs, QDMs, and DRLs

©2012 MFMER | slide-49

[Welch et al. 2012]

Page 48: High-Throughput  Phenotyping and Cohort Identification from Electronic Health Records for Clinical and Translational Research

High-Throughput Phenotyping from EHRs

The SHARPn “phenotyping funnel”

©2012 MFMER | slide-50

Phenotype specific patient cohorts

DRLs

QDMs

CEMs

[Welch et al. 2012][Thompson et al., submitted 2012]

[Li et al., submitted 2012]

Intermountain EHR

Mayo Clinic EHR

Page 49: High-Throughput  Phenotyping and Cohort Identification from Electronic Health Records for Clinical and Translational Research

©2012 MFMER | slide-51

Phenotype library and workbench - Ihttp://phenotypeportal.org

Page 50: High-Throughput  Phenotyping and Cohort Identification from Electronic Health Records for Clinical and Translational Research

©2012 MFMER | slide-52

Phenotype library and workbench - I

1. Converts QDM to Drools2. Rule execution by querying

the CEM database3. Generate summary reports

http://phenotypeportal.org

Page 51: High-Throughput  Phenotyping and Cohort Identification from Electronic Health Records for Clinical and Translational Research

©2012 MFMER | slide-53

Phenotype library and workbench - IIhttp://phenotypeportal.org

Page 52: High-Throughput  Phenotyping and Cohort Identification from Electronic Health Records for Clinical and Translational Research

©2012 MFMER | slide-54

Phenotype library and workbench - IIIhttp://phenotypeportal.org

Page 53: High-Throughput  Phenotyping and Cohort Identification from Electronic Health Records for Clinical and Translational Research

High-Throughput Phenotyping from EHRs

Additional on-going research efforts• Machine learning and

association rule mining• Manual creation of

algorithms take time• Let computers do the

“hard work”• Validate against

expert developed ones

©2012 MFMER | slide-55

[Caroll et al. 2011]

Page 54: High-Throughput  Phenotyping and Cohort Identification from Electronic Health Records for Clinical and Translational Research

High-Throughput Phenotyping from EHRs

Additional on-going research efforts• Machine learning and association rule mining

• Manual creation of algorithms take time• Let computers do the “hard work”• Validate against expert developed ones

• Just-in-time phenotyping• Current approach: retrospective, longitudinal

and offline data processing for phenotypes• Future: online, real-time phenotyping by

implementing “phenotype sniffers”• Applications in active syndrome surveillance

for transfusion medicine [Kor et al. 2012]

©2012 MFMER | slide-56

Page 55: High-Throughput  Phenotyping and Cohort Identification from Electronic Health Records for Clinical and Translational Research

High-Throughput Phenotyping from EHRs

What does this R&D mean to HSR?• Common, agreed-upon and well-validated phenotype

definitions and criteria• Standardized clinical data retrieval and management• “One-stop place” for visualization, execution, and

report generation of phenotyping algorithms• Implications for (to name a few):

• Center for Science of Healthcare Delivery (SHCD)• Data Management Services (DMS/BSI)• Epidemiology & Health Care and Policy Research

(Epi./HCPR/Rochester Epi. Project)• Mayo Clinic Biobank/Genome Consortia (MayoGC)

©2012 MFMER | slide-57

Page 56: High-Throughput  Phenotyping and Cohort Identification from Electronic Health Records for Clinical and Translational Research

High-Throughput Phenotyping from EHRs

Summary• EHRs contain a wealth of phenotypes for clinical

and translational research• EHRs represent real-world data, and hence has

challenges with interpretation, wrong diagnoses, and compliance with medications• Handling referral patients even more so

• Standardization and normalization of clinical data and phenotype definitions is critical

• Phenotyping algorithms are often transportable between multiple EHR settings• Validation is an important component

©2012 MFMER | slide-58

Page 57: High-Throughput  Phenotyping and Cohort Identification from Electronic Health Records for Clinical and Translational Research

High-Throughput Phenotyping from EHRs

Acknowledgements: eMERGE collaborators

©2012 MFMER | slide-59

• NHGRI• Rongling Li• Teri Manolio

• Group Health• Eric Larson• Gail Jarvik• Chris Carlson• Wylie Burke• Gene Jart• David Carrell• Malia Fullerton• Walter Kukull• Paul Crane• Noah Weston

• Marshfield• Cathy McCarty• Peggy Peissig• Marilyn Ritchie• Russ Wilke

• Northwestern• Rex Chisholm• Bill Lowe• Phil Greenland• Luke Rassmussen• Justin Starren• Maureen Smith• Jen Allen-Pacheco• Will Thompson

• Mayo Clinic• Christopher G. Chute• Iftikhar J. Kullo• Suzette Bielinski• Mariza de Andrade• John Heit• Jyoti Pathak• Matt Durski• Sean Murphy• Kevin Bruce

• Vanderbilt• Dan Roden• Josh Denny• Brad Malin• Ellen Wright Clayton• Dana Crawford• Melissa Basford

Page 58: High-Throughput  Phenotyping and Cohort Identification from Electronic Health Records for Clinical and Translational Research

High-Throughput Phenotyping from EHRs

Acknowledgement: SHARPn collaborators• Harvard University• Intermountain Healthcare• Mayo Clinic• Mirth Corporation, Inc.• MIT • MITRE Corp. • Regenstrief Institute, Inc.• SUNY, Buffalo • University of Colorado

• Agilex Technologies• CDISC (Clinical Data Interchange

Standards Consortium)• Centerphase Solutions• Deloitte• Group Health, Seattle• IBM Watson Research Labs• University of Utah• University of Pittsburgh

©2012 MFMER | slide-60

Page 59: High-Throughput  Phenotyping and Cohort Identification from Electronic Health Records for Clinical and Translational Research

High-Throughput Phenotyping from EHRs

Thank You!

©2012 MFMER | slide-61

http://jyotishman.info