Computable Semantics and Probabilistic Graphical Models Where Probabilistic Systems and Semantics Rub Elbows Peter Haug, MD Homer Warner Center for Informatics Res Intermountain Healthcare
Dec 17, 2015
Computable Semantics and Probabilistic Graphical Models
Where Probabilistic Systems and Semantics Rub Elbows
Peter Haug, MDHomer Warner Center for Informatics ResearchIntermountain Healthcare
First of all: Thanks
This work has many contributers:
Dominik Aronsky, MD, PhD
Jeffrey Ferraro, PhD
Stan Huff, MD
Scott Evans, PhD
Robert Hausam, MD
Lee Pierce
Xinzu Wu, PhD
Matthew Ebert
Kumar Mynam
And many more!
Please ask questions …3
Agenda• Why Decision Support?• Introduction: Bayesian Diagnostic Networks
• Bayesian Systems• A Framework for Computable Models
• A Few Bayesian Tools• Diagnostic Systems
• Representing the Semantics of Diagnosis• Diagnostic Modeling with Ontologies• Ontologies -> Bayesian Network
• Clinical Data• A Brief Look at Medical Data Forms
‘... man is not perfectible. There are limits to man’s capabilities as an information processor that assure the occurrence of random errors in his activities.’~ Clement J. McDonald, MD (1976)
‘The complexity of modern medicine exceeds the inherent limitations of the unaided human mind.’~ David M. Eddy, MD, Ph.D. (1990)
Computerized Decision Support:Core Assumptions
Patient
Underlying principle:
We are designing the system so that the
computer is an active part of patient
care, not just a way of getting data to
people to read.
Agenda• Why Decision Support?• Introduction: Bayesian Diagnostic Networks
• Bayesian Systems• A Framework for Computable Models
• A Few Bayesian Tools• Diagnostic Systems
• Representing the Semantics of Diagnosis• Diagnostic Modeling with Ontologies• Ontologies -> Bayesian Network
• Clinical Data• A Brief Look at Medical Data Forms
The Reverend Thomas Bayes
Bayes set out his theory of probability in 1764. At that time, Richard Price, a friend of Bayes, discovered two unpublished essays among Bayes's papers which he forwarded to the Royal Society.
1702 to 1761
A Way to Think about Probabilistic Systems(and an introduction to some terminology)
Learning from Data• The data comes from Health Care Encounters• It is captured in Electronic Health Records (EHRs)• It is aggregated and organized in Enterprise Data
Warehouses (EDW)• It includes the diagnoses and the data that support
them
Bayesian Networks • Model the joint probability distribution of the data
and diagnoses• Use directed graphs to structure these models
Medical Information System
Episodes of Care
Enterprise Data Warehouse
• Medical Decision Support
• Clinical Research
• Quality Improvement
• Measures of Care
Re-Using Healthcare Data
12
Example: Patients with Symptoms of Heart Disease
Patient Population
Data Collected in a Care Setting
Original Data13
Patient ID
Myocardial Infarction Chest Pain
ST Segment
1 Present Present Elevated2 Absent Absent Normal3 Present Absent Depressed4 Absent Absent Normal5 Absent Absent Normal6 Absent Absent Normal7 Absent Absent Normal
…. …. …. ….
Summarizing the Data: The Numbers
MI No MI
Chest Pain 15 80 95
No Chest Pain 5 900 905
20 980 1000
MI No MI
20 980 1000
A Condensed Look at 1000 Cases
Summarizing the Data: The Numbers
MI No MI
Chest Pain 15 80 95
No Chest Pain 5 900 905
20 980 1000
MI No MI
20 980 1000MI No MI
2% 98% 100%
A Condensed Look at 1000 Cases
Another Summary: The Joint Probability Distribution
MI No MI
Chest Pain 1.5% 8.0% 10%
No Chest Pain 0.5% 90.0% 91%
2% 98% 100%
MI No MI
Chest Pain 1.5% 8.0% 10%
No Chest Pain 0.5% 90.0% 91%
2% 98% 100%
And the “Marginal
Probabilities
Another View of the 2x2 Table16
MI No MI
Chest Pain 75% 8%
No Chest Pain 25% 92%
100% 100%
False Positive Rate: P(F|no D)Sensitivity: P(F|D)
False Negative Rate: P(no F| D) Specificity: P(no F|no D)
Dividing by the Column Marginals
Bayes Equation
)(
)|()()|(
FP
DFPDPFDP
Posterior DiseaseProbability
SensitivityPrior DiseaseProbability
Probability of Finding
Inferring the probability of a Disease (D) from a Finding (F)
Probability Updating
The Disease is Myocardial InfarctionThe Finding is Chest Pain
P(MI) = 2.0% (0.02)P(Chest Pain|MI) = 75% (0.75)P(Chest Pain) = ?
The Question of P(F)
Simple Bayes• Patient has One and Only One
Disease
Multi-Membership Bayes• Patient has Any Group of Disease• Each Disease is Evaluated
Independently
Bayesian Networks• Patient has Any Group of Disease• Diseases are Evaluated According
to Their Collective (Joint) Behavior
)()( ii
DandFPFP
Add All of the Probabilities Of Having Both the Finding and Disease
)|()()( ii
i DFPDPFP
20The Question of P(F)
Simple Bayes• Patient has One and Only One
Disease
Multi-Membership Bayes• Patient has Any Group of Disease• Each Disease is Evaluated
Independently
Bayesian Networks• Patient has Any Group of Disease• Diseases are Evaluated According
to Their Collective (Joint) Behavior
)|()()|()()( iiii DFPDPDFPDPFP
Two States Apply for Each Disease: With and Without the Disease
21The Question of P(F)
Simple Bayes• Patient has One and Only One
Disease
Multi-Membership Bayes• Patient has Any Group of Disease• Each Disease is Evaluated
Independently
Bayesian Networks• Patient has Any Group of Disease• Diseases are Evaluated According
to Their Collective (Joint) Behavior
DiseaseDisease
IntermediateConcept
IntermediateConcept
Finding 1Finding 1 Finding 2Finding 2Finding 3Finding 3Finding 4Finding 4
P(F) is Determined from the Joint Effect of Child Nodes
on Their Parents
Probability Updating
The Disease is Myocardial InfarctionThe Finding is Chest Pain
P(MI) = 2.0% (0.02)P(Chest Pain|MI) = 75% (0.75)P(Chest Pain) = ?
Multi-Membership Bayes
Probability Updating
The Disease is Myocardial InfarctionThe Finding is Chest Pain
?
75.002.0)|(
PainChestMIP
P(MI) = 2.0% (0.02)P(Chest Pain|MI) = 75% (0.75)P(Chest Pain) = 0.02 x 0.75 + 0.98 x 0.08
Using the Multi-Membership Model
08.098.075.002.0
75.002.0)|(
PainChestMIP
Probability Updating24
The Disease is Myocardial InfarctionThe Finding is Chest Pain
P(MI) = 2.0% (0.02)P(Chest Pain|MI) = 75% (0.75)P(Chest Pain) = 0.02 x 0.75 + 0.98 x 0.08
Using the Multi-Membership Model
16.0)|( PainChestMIP
Diagnostic Bayesian Networks(Demonstrating Different Characteristics)
Simple Bayes• Patient has one Disease• All findings are Conditionally Independent
Multi-Membership Bayes• Patient can have multiple Diseases• All Diseases are evaluated independently
Bayesian Networks• Any relationship among diseases and findings• Can represent any of the other models• Multilayered models• Graphical/probabilistic representation of knowledge
Using a Bayesian Network
Examples of Bayesian Diagnostics
In Netica (www.Norsys.com)
Myocardial Infarction
PresentAbsent
2.0098.0
A Simple Bayesian Network(One Finding)
Chest Pain
PresentAbsent
9.3490.7
ST Elevation
PresentAbsent
13.686.4
Troponin Increase
PresentAbsent
4.7495.3
Chest Pain
PresentAbsent
9.3490.7
Myocardial Infarction
PresentAbsent
2.0098.0
A Simple Bayesian Network(Several Findings)
More Diagnostic Examples
(Myocardial Infarction)
Using Pulmonary Diseases • Pneumonia• Asthma• COPD• Pulmonary Embolism
With Increasingly Complex Models• Simple Bayes• Multi-Membership Bayes• Complex Relationships
Bayesian Diagnostic Models(Naïve Bayes)
Fever
PresentAbsent
90.010.0
Elevated_WBC
PresentAbsent
92.08.00
Wheezing
PresentAbsent
10.090.0
Dyspnea
PresentAbsent
15.085.0
Disease
PneumoniaAsthmaChronic BronchitisOther
100 0 0 0
Cough
PresentAbsent
85.015.0
Bayesian Diagnostic Models (Multi-Membership Bayes)
Wheezing
PresentAbsent
10.389.7
Cough
PresentAbsent
8.8891.1
Fever
PresentAbsent
15.184.9
Cough
PresentAbsent
14.585.5
Dyspnea
PresentAbsent
15.384.7
Elevated WBC
PresentAbsent
14.985.1
Dyspnea
PresentAbsent
15.384.7
Pneumonia
PresentAbsent
6.0094.0
Asthma
PresentAbsent
4.0096.0
Bayesian Diagnostic Models(Bayesian Network: Two-Layer)
Elevated WBC
PresentAbsent
15.184.9
Fever
PresentAbsent
15.184.9
Cough
PresentAbsent
12.887.2
Dyspnea
PresentAbsent
15.884.2
Wheezing
PresentAbsent
11.388.7
Asthma
PresentAbsent
4.0096.0
Pneumonia
PresentAbsent
6.0094.0
Bayesian Diagnostic Models(Multi-Layer Bayesian Network)
Wheezing
PresentAbsent
11.388.7
Pneumonia
PresentAbsent
6.0094.0
Asthma
PresentAbsent
4.0096.0
Dyspnea
PresentAbsent
15.884.2
Cough
PresentAbsent
12.887.2
Systemic Inflamation
PresentAbsent
15.184.9
Fever
PresentAbsent
20.179.9
Elevated WBC
PresentAbsent
17.782.3
Wheezing
PresentAbsent
11.388.7
Systemic Inflamation
PresentAbsent
15.184.9
Elevated WBC
0 to 55 to 1010 to 1515 to 2020 to 2525 to 3030 to 3535 to 40
0 +84.915.1.003 0 + 0 + 0 0
8.26 ± 2.3
Temperature
35 to 35.535.5 to 3636 to 36.536.5 to 3737 to 37.537.5 to 3838 to 38.538.5 to 3939 to 39.539.5 to 4040 to 40.540.5 to 4141 to 41.541.5 to 4242 to 42.542.5 to 4343 to 43.543.5 to 4444 to 44.544.5 to 4545
0.100.211.9311.528.628.812.23.332.262.502.472.081.490.960.550.300.180.130.110.10.098
37.9 ± 1.2
Pneumonia
PresentAbsent
6.0094.0
Asthma
PresentAbsent
4.0096.0
Dyspnea
PresentAbsent
15.884.2
Cough
PresentAbsent
12.887.2
Bayesian Diagnostic Models(Multi-Layer with Continuous Variables)
Chest Pain
PresentAbsent
5.9194.1
Dyspnea
PresentAbsent
14.185.9
Cough
PresentAbsent
9.6490.4
Temperature
35 to 35.535.5 to 3636 to 36.536.5 to 3737 to 37.537.5 to 3838 to 38.538.5 to 3939 to 39.539.5 to 4040 to 40.540.5 to 4141 to 41.541.5 to 4242 to 42.542.5 to 4343 to 43.543.5 to 4444 to 44.544.5 to 45
.002 0
8.8718.621.017.612.58.185.233.121.911.170.710.440.270.160.10.062.039.023
37.8 ± 1.2
WBC
0 to 2.52.5 to 55 to 7.57.5 to 1010 to 12.512.5 to 1515 to 17.517.5 to 2020 to 22.522.5 to 2525 to 27.527.5 to 3030 to 32.532.5 to 3535 to 37.537.5 to 40
0 +0.9221.842.524.57.881.950.42.082.015.003 0 + 0 + 0 + 0 + 0 +
9.37 ± 2.6
Pulmonary Embolus
PresentAbsent
2.0098.0
Wheezing
PresentAbsent
8.3291.7
Pneumonia
PresentAbsent
2.0298.0
Chronic Bronchitis
PresentAbsent
0 100
Asthma
PresentAbsent
4.0096.0
Bayesian Diagnostic Models(Multi-Layer with Added Associations)
Using Bayesian Diagnostic Systems in Care
Example: Diagnosing Pneumonia?
Protocols: Computers Intervene in the Workflow(an example from the ED)
Goal:• Rapidly Screen for Pneumonia Patients in the ED• Assess Risk of Death• Apply a Pneumonia Care Protocol
Approach:• Use Probabilistic System to Identify Patients
• Diagnostic Bayesian Networks• Supported with Natural Language Processing*
• Suggest Enrollment in Pneumonia Protocol• Provide Therapeutic Suggestions
*Extracts Data from the X-ray Report
Pneumonia Screening Tool
Data Supporting Pneumonia Assessment Clinical Data
Repository
Pneumonia Protocol
Enrollment
Pneumonia Treatment Protocol
Computable Medical Knowledge Reposotory
Chest Xray Reports
Chest Xray Report Processing
(Structured Data Extraction)
Advanced CDS(Diagnositic Models)
Example: Community-Acquired Pneumonia
Does the patient have pneumonia?
Should we used the protocol?
Apply Pneumonia Care Protocol.
The Emergency Department Workflow
Imbed logic, orders into process of care Imbed logic, orders into process of care
Alerting for Pneumonia in the Patient Tracking System
System Watches the Data Flow in the ED
Identifies Possible Pneumonia Patients
Imbed logic, orders into process of care
Imbed logic, orders into process of care Imbed logic, orders into process of care
Treatment ProtocolUses Data from the EHR Combined with Manually Input Data
Diagnostic System
• Bayesian Network
• Model Trained Using EDW Data
NLP System
• Random Forests-Based Concept Identification
• Trained with Documents in the EDW
Implemented Using:BPDiastolic
< 69.569.5 to 82.5>= 82.5
28.336.235.5
76.9 ± 11
Chloride
< 103.5103.5 to 105.5>= 105.5
42.125.132.9
104.3 ± 1.8
WBC
< 11.8511.85 to 18.75>= 18.75
86.112.41.45
9.46 ± 3.4
PNEUMONIAAbsentPresent
94.95.09
Age
< 15.515.5 to 45.5>= 45.5
8.0645.646.4
42 ± 21
RespRate
< 19.519.5 to 21.521.5 to 27.5>= 27.5
52.324.916.16.72
20.8 ± 3.5
TempC
< 36.7536.75 to 37.4537.45 to 38.05>= 38.05
62.723.86.047.46
36.79 ± 0.63
MeanBP
< 85.585.5 to 99.5>= 99.5
23.035.441.7
95.1 ± 12
BPSystolic
< 121.5121.5 to 148.5>= 148.5
29.444.626.0
134 ± 22
HeartRate
< 85.585.5 to 99.599.5 to 110.5>= 110.5
44.524.713.017.8
92.1 ± 15
Sodium
< 137.5137.5 to 140.5>= 140.5
25.741.832.6
139.2 ± 2.4
BUN
< 13.5>= 13.5
45.154.9
Creatinine
< 0.405>= 0.405
3.9096.1
SpO2
< 92.192.1 to 95.395.3 to 98.4>= 98.4
10.223.644.921.3
96.1 ± 3
BS_CONGESTION
YesNo
0.5399.5
BS_RHONCHI
YesNo
0.4399.6
BS_ABNORMAL
YesNo
3.8796.1
BS_DECREASED
YesNo
2.2997.7
BS_COURSE
YesNo
0.9099.1
BS_WHEEZES
YesNo
2.8497.2
BS_NO_COUGH
YesNo
0 + 100
BS_STRIDOR
YesNo
.08399.9
BS_CLEAR
YesNo
44.056.0
BS_CRACKLES
YesNo
0.7299.3
BS_RALES
YesNo
0.1199.9
BS_ABSENT
YesNo
.030 100
BS_INSPIRATION
YesNo
0.7999.2
BS_TUBULAR
YesNo
.024 100
BS_INFREQUENT
YesNo
0.6299.4
BS_STRONG
YesNo
0.7699.2
BS_FINE_CRACK...
YesNo
0.3199.7
BS_EXPIRATION
YesNo
0.9099.1
BS_NOT_CLEARING_SECREA...
YesNo
0.1099.9
BS_FREQUENT
YesNo
1.1998.8
BS_WEAK
YesNo
0.1699.8
BS_NON_PRODUCTIVE_CO...
YesNo
1.7498.3
BS_PRODUCTIVE_CO...
YesNo
1.8198.2
BS_MODERATE
YesNo
1.3698.6
BS_CLEARING_SECREA...
YesNo
0.4599.6
ChiefComplaint
RESPIRATORY COMPLAINTFEVERABD PAINORTHO INJURYCHEST PAINNEURO COMPLAINTFALLTRAFFIC INJURYABD PROBLEMSCHEST PRESSUREBACK PAINWEAKNESSSYNCOPEENT PROBLEMBODY ACHESCV COMPLAINTSHEADACHEDIZZYFLANK PAINCV PROBLEMSASSAULT RAPEPSYCHIATRICCHEST HEAVINESSSKIN COMPLAINTSPECIFIC DIAGNOSISDIABETICPAIN CHESTHEART RACETRAUMAGENITOURINARY PROBLEMPALPITATIONSHEART IRRALLERGIESHIGH BPFLUID NUTRITIONCONVULSIONSINFECTIONRAPID HRIRR HEARTBEATLACERATIONINGESTIONBP HIGHUNCONSCIOUSNESSVAGINAL BLEEDINGMED REFILLUNKNOWNLOW BPCARDIAC ARRESTEYE PROBLEMBP LOWother-
32.46.966.054.264.123.693.623.503.453.102.822.792.282.191.881.881.831.771.430.920.870.860.820.780.510.440.370.330.310.310.310.300.290.280.270.250.200.190.160.160.160.130.11.098.091.087.064.059.055.0540.18
NLP_FINDINGPositiveNegative
25.974.1
Agenda• Why?• Introduction: Bayesian Diagnostic Networks
• Bayesian Systems• A Framework for Computable Models
• A Few Bayesian Tools• Diagnostic Systems
• Representing the Semantics of Diagnosis• Diagnostic Modeling with Ontologies• Ontologies -> Bayesian Network
• Clinical Data• A Brief Look at Medical Data Forms
The Process of Data-Based Research(finding the right data)
Identify Research Problem
Determine Subject Availability
Clinical Researcher
Clinical Researcher + Data Analyst + Terminologist
Query Database
Determine Data Availability
Clinical Researcher + Data Analyst + Terminologist
Query Database
Collect/Analyze Data
Clinical Researcher + Data Analyst + Terminologist+ Statistician
Query Database
Data Review/Analysis
Review Results Clinical Researcher
Data discovery and extraction takes 80-90% of the time.
Building a System to Automate Predictive Modeling
• Build a System That Can:• Identify the Target Patients• Identify Relevant Data Elements• Extract Patients and Data from the EDW/AHR• Provide Initial Analyses• Support Refinement
• The Key is Teaching the System a Certain Amount of Medical Knowledge• Ontologies: Tools For Capturing Complex
Medical Knowledge
Ontology-Driven Model Discovery• Can we use knowledge embedded in
ontologies to drive research?
• The Ontology would:
• Help select research patients
• Identify and extract relevant data
• Provide preliminary analysis of the data
• Allow visualization of this data
• Return Data and results to the user for further
study
• A tool to support Medical Data MiningAnalytic Health
Repository
DiseaseOntology
Concept Retrieval (from Ontology
Concept Translation to EDW Representation
Output
20%
20%
20%20%
20%
iii dfPdP
dfPdPfdP
)|()(
)|()()|(
Prediction Algorithm
Analysis ResultsAnalytic Data
Relevant Ontologic Concepts
Analysis Design Utility
Analytic Workbench· Screening Models· Model Comparisons· Model Explanation
(by reference to the Ontology)
Natural Language
Processing Subsystem
Structural Knowledge Retrieval from the Ontology
Data Retrieval from the Analytic Health Repository
Ontologies Describe How Diseases Are Related(according to ICD9)
Pneumococcal pneumoniaPneumococcal pneumonia
ICD9: 481
Other Bacterial PneumoniaOther bacterial pneumonia
ICD9: 482
Streptococal PneumoniaPneumonia due to Other
StreptococcusICD9: 482.3
BronchopneumoniaBronchopneumonia,
organism unspecifiedICD9: 485
Viral PneumoniaViral pneumonia
ICD9: 480
Staphlococcal PneumoniaPneumonia due to
StaphylococcusICD9: 482.4
Hemophilus PneumoniaPneumonia due to
Hemophilus influenzae ICD9: 482.2
Pseudomonas PneumoniaPneumonia due to
PseudomonaICD9: 482.1
Pneumonia
More Bactierial Pneumonias
Staph Aureus PneumoniaPneumonia due to
Staphylococcus, unspecifiedICD9: 482.40
MSSA Staph PneumoniaMethicillin Susceptable
Staph Aureus (MSSA) Pneumonia
ICD9: 482.41
MRSA Staph PneumoniaMethicillin Resistant Staph Aureus (MRSA) Pneumonia
ICD9: 482.42
Other Staph PneumoniaOther Staphylococcus
pneumoniaICD9: 482.49
Bacterial Pneumonia More Pneumonias
Ontologies Describe How Clinical Data are Related to Diseases
has_X-ray_Manifestation
PneumoniaPneumonia, Organism
unspecifiedICD9: 486
Pneumococcal pneumoniaPneumococcal pneumonia
ICD9: 481
Pneumonia
More Bacterial Pneumonias
Bacterial Pneumonia More Pneumonias
has_Sign
White Blood CountHematology: White
Blood CountLOINC: 62239-9
has_Altered_Lab_Value
Pulmonary RalesSigns: Chest
Auscultation-RalesPTXT:
28.1.3.22.34.2.1.32
TemperatureVital Signs:
TemperatureLOINC: 8310-5
has_Altered_VS
Localize InfitrateX-ray Finding:
Localized InfiltrateSNOMED: 128309002
has_Micro_Manifestation
Other Bacterial PneumoniaOther bacterial pneumonia
ICD9: 482
More Manifestations
has_??_Manifestation
Sputum Culture: Positive
SNOMED: 442773002
+ Sputum Culture
Visualizing the Results
Comparing Two Models Using the ROC Curves
Inspecting the Tradeoffs in Accuracy
Extensions of Diagnostic Modeling
• Large Models• Redundant Data• Equations and Logic
• Temporal Models• Following Disease Over Time• Summarized Data as Features
PNEUMONIA2
AbsentPresent
94.45.61
PNEUMONIA
AbsentPresent
95.34.71
Admit Dx: Pneumonia
PresentAbsent
4.7295.3
AGE
< 15.515.5 to 45.5>= 45.5
8.4142.349.3
42.8 ± 21
TEMP
< 36.7536.75 to 37.3537.35 to 38.05>= 38.05
75.620.53.440.49
36.63 ± 0.38
WBC
< 11.8511.85 to 15.15>= 15.15
81.211.77.07
11.1 ± 2.1
NLP_FINDING
NegativePositive
67.132.9
TEMP1
< 36.7536.75 to 37.3537.35 to 38.05>= 38.05
78.817.13.120.95
36.61 ± 0.39
WBC1
< 11.8511.85 to 15.15>= 15.15
100 0 0
10.2 ± 0.95
NLP_FINDING1
NegativePositive
65.934.1
TEMP2
< 36.7536.75 to 37.3537.35 to 38.05>= 38.05
77.018.24.120.67
36.62 ± 0.39
WBC2
< 11.8511.85 to 15.15>= 15.15
81.410.77.86
11.1 ± 2.2
NLP_FINDING2
NegativePositive
65.734.3
TEMP3
< 36.7536.75 to 37.3537.35 to 38.05>= 38.05
76.818.23.781.26
36.63 ± 0.41
WBC3
< 11.8511.85 to 15.15>= 15.15
83.410.85.75
10.9 ± 2
NLP_FINDING3
NegativePositive
66.833.2
CC
RESPIRATORY COMPLAINTABD PAINORTHO INJURYNEURO COMPLAINTFALLCHEST PRESSURECHEST PAINABD PROBLEMSWEAKNESSTRAFFIC INJURYother-
54.55.093.343.143.112.732.332.232.021.9219.6
PNEUMONIA1
PresentAbsent
5.0395.0
PNEUMONIA3
PresentAbsent
5.6194.4
PNEUMONIA4
PresentAbsent
6.9893.0
Simple Temporal Model
Time Slice 2Time Slice 1 Time Slice 3
Agenda• Why Decision Support?• Introduction: Bayesian Diagnostic Networks
• Bayesian Systems• A Framework for Computable Models
• A Few Bayesian Tools• Diagnostic Systems
• Representing the Semantics of Diagnosis• Diagnostic Modeling with Ontologies• Ontologies -> Bayesian Network
• Clinical Data• A Brief Look at Medical Data Forms
A diagram of a simple clinical model(A Data Object)
data 9.6 x 103
quals
White Blood CountWBCLabObs
data Whole Blood
Specimen TypeSpecimenType
data Specimen Hemolyzed
CommentCommment
Clinical Element Model for White Blood Count
Units Cells per CC
What Does a Medical Concept Look Like(in probability space)
Concepts vary based on source, goals, and usage.
Pneumonia• Present• Absent
White Blood Count• Specimen Type• Units• Value
Pulmonary Infiltrate (Chest X-ray Report)• Present• Possible• Absent• Unknown
Cough• Present• Absent• Unknown
Simple Concept
Numeric Object
Human Reported Concept
Human Reported Concept(extended value set)
What Does a Concept Look LikeSome concepts have subconcepts.
White Blood Count• Specimen Type• Units• Value
Pulmonary Infiltrate (Chest X-ray Report)• Present• Possible• Absent• Unknown
Numeric Concept
Concept values
Subconcepts
Value• Real Number
Units• Mg per Deciliter• Grams• Cells per CC• …
Specimen Type• Blood• Pleural Fluid• Ascitic Fluid• …
Categorical Concept
What Does a Concept Look LikeConcepts can Modeled Probabilistically
Simple Concept
Numeric Concept
Human Reported Concept
Human Reported Concept(extended value set)
Pneumonia
PresentAbsent
1.5098.5
Cough
PresentAbsentUnknown
4.2653.242.6
Pulmonary Infiltrate (Chest X-Ray Report)
PresentPossibleAbsentUnknown
6.223.3819.271.2
CBC_White_Blood_Count
Unavailable0 to 10001000 to 20002000 to 30003000 to 40004000 to 50005000 to 60006000 to 70007000 to 80008000 to 90009000 to 1000010000 to 1100011000 to 1200012000 to 13000>= 13000
95.4.022.0750.200.420.680.870.870.680.420.20.075.022.005.001
-203 ± 1500
White_Blood_Count_Units
mg per deciliterkilogramsgramscells per ccetc
16.711.133.35.5633.3
White_Blood_Count_Value
0 to 10001000 to 20002000 to 30003000 to 40004000 to 50005000 to 60006000 to 70007000 to 80008000 to 90009000 to 1000010000 to 1100011000 to 1200012000 to 13000>= 13000
0.491.664.419.2015.019.219.215.09.204.411.660.490.11.023
6010 ± 2000
White_Blood_Count_Specimen
BloodPleural FluidAcitic FluidUrine
82.04.002.0012.0
What Does a Concept Look LikeConcepts are (in part) defined by their relationships.
Pneumonia• Present• Absent
White Blood Count• Specimen Type• Units• Value
Pulmonary Infiltrate (Chest X-ray Report)• Present• Possible• Absent• Unknown
White Blood Count• Elevated• Normal• Reduced• Unavailable
Pulmonary Infiltrate• Present• Absent
Causes Reported As
Value Thesholds: High-9,000 Low-2,000
Specimen: BloodUnits: Cells/CC
Pulmonary Infiltrate
PresentAbsent
5.4194.6
Pulmonary Infiltrate (Chest X-Ray Report)
PresentPossibleAbsentUnknown
6.223.3819.271.2
Pneumonia
PresentAbsent
1.5098.5
CBC_White_Blood_Count
Unavailable0 to 10001000 to 20002000 to 30003000 to 40004000 to 50005000 to 60006000 to 70007000 to 80008000 to 90009000 to 1000010000 to 1100011000 to 1200012000 to 13000>= 13000
95.4.022.0750.200.420.680.870.870.680.420.20.075.022.005.001
-203 ± 1500
White_Blood_Count_Units
mg per deciliterkilogramsgramscells per ccetc
16.711.133.35.5633.3
White_Blood_Count_Value
0 to 10001000 to 20002000 to 30003000 to 40004000 to 50005000 to 60006000 to 70007000 to 80008000 to 90009000 to 1000010000 to 1100011000 to 1200012000 to 13000>= 13000
0.491.664.419.2015.019.219.215.09.204.411.660.490.11.023
6010 ± 2000
White_Blood_Count_Specimen
BloodPleural FluidAcitic FluidUrine
82.04.002.0012.0
White_Blood_Count
ElevatedNormalReducedUnavailable
0.304.15.09895.4
What Does a Concept Look LikeAnd there are a number of ways to compute Concepts.
Causes Reported As
Value Thesholds: High-9,000 Low-2,000
Specimen: BloodUnits: Cells/CC
Conclusion
• Graphical Probabilistic Models can capture the Semantics of Medical Diagnosis.
• These models can be manufactured using data collected during the course of care.
• Probabilistic models can participate in clinical care.
• Medical terminologies, embedded in Ontologies can help to develop these models.
Questions???
Comments and Questions
Probability and Semantics
Disease Finding
Concept Word
Whole Part
Pneumonia Cough
Mammal Mouse
Hand Thumb
P(A) P(B|A)
The arrows provide link across which we can reason
One way to think of semantics: a set of relationships between concepts
# 60
A diagram of a simple clinical model
data 138 mmHg
quals
SystolicBPSystolicBPObs
data Right Arm
BodyLocationBodyLocation
data Sitting
PatientPositionPatientPosition
Clinical Element Model for Systolic Blood Pressure
# 61
What if there is no model?
Dry Weight:Site #1
kg
Weight:Site #2
DrykgWetIdeal
70
70
# 62
Too many ways to say the same thing
A single name/code and value• Dry Weight is 70 kg
Combination of two names/codes and values• Weight is 70 kg
• Weight type is dry
Terminology
• Probability
• P(D) – Probability of Disease
• Implies a Ratio or Rate
• Names: Prevalence, Prior Probability
• Location Specific
64
PopulationinNumber
DiseasewithNumber
Population from a Specific Setting
More Terminology
Conditional Probability
• Probability of a Finding in a patient with a Disease
• Probability of a Disease in a Patient with a Finding
• Probability of Disease in a patient with Finding 1, Finding 2, neg Finding 3, Finding 4, no Finding 5, etc.
65
Number With Disease and FindingNumber with Disease
Number With Disease and FindingNumber with Finding
Number With Disease and a Group of FindingsNumber with the Group of Findings
Names for the Numbers66
MI No MI
2% 98% 100%
Prevalence
Prior Probability
P(D)
Yet Another View67
MI No MI
Chest Pain 16% 84% 100%
No Chest Pain 0.6% 99% 100%
Positive Predictive Value: P(D|F)
Negative Predictive Value:P(no D|no F)
Dividing by the Row Marginals
From Data to Probabilities
68
DIAGNOSIS PROB
Pneumonia 92%
Asthma 14%
Chronic Bronchitis 12%
Acute Bronchitis 8%
Data Data
BayesianCalculation
Bayes Equation69
)(
)()|(
FP
DandFPFDP
Probability of DiseaseWhen the Finding is Present
Probability of BothThe Disease and Finding
Probability ofFinding
Bayes Equation70
)(
)()|(
FP
DandFPFDP
From probability theory:P (F and D) = P (D) * P (F|D)
Bayes Equation71
)(
)|()()|(
FP
DFPDPFDP
Posterior DiseaseProbability
SensitivityPrior DiseaseProbability
Probability of Finding
Probability Updating72
)(
)|()()|(
FP
DFPDPFDP
The Disease is Myocardial InfarctionThe Finding is Chest Pain
Probability Updating73
The Disease is Myocardial InfarctionThe Finding is Chest Pain
)(
)|()()|(
PainChestP
MIPainChestPMIPPainChestMIP
P(MI) = 2.0% (0.02)P(Chest Pain|MI) = 75% (0.75)P(Chest Pain) = ?
Probability Updating74
The Disease is Myocardial InfarctionThe Finding is Chest Pain
P(MI) = 2.0% (0.02)P(Chest Pain|MI) = 75% (0.75)P(Chest Pain) = ?
?
75.002.0)|(
PainChestMIP
The Question of P(F)
Simple Bayes• Patient has One and Only
One Disease
Multi-Membership Bayes• Patient has Any Group of
Disease• Each Disease is Evaluated
Independently
Bayesian Networks• Patient has Any Group of
Disease• Diseases are Evaluated
According to Their Collective (Joint) Behavior
75
)()( ii
DandFPFP
Add All of the Probabilities Of Having Both the Finding and Disease
)|()()( ii
i DFPDPFP
The Question of P(F)
Simple Bayes• Patient has One and Only
One Disease
Multi-Membership Bayes• Patient has Any Group of
Disease• Each Disease is Evaluated
Independently
Bayesian Networks• Patient has Any Group of
Disease• Diseases are Evaluated
According to Their Collective (Joint) Behavior
76
)|()()|()()( iiii DFPDPDFPDPFP
Two States Apply for Each Disease: With and Without the Disease
The Question of P(F)
Simple Bayes• Patient has One and Only
One Disease
Multi-Membership Bayes• Patient has Any Group of
Disease• Each Disease is Evaluated
Independently
Bayesian Networks• Patient has Any Group of
Disease• Diseases are Evaluated
According to Their Collective (Joint) Behavior
DiseaseDisease
IntermediateConcept
IntermediateConcept
Finding 1Finding 1 Finding 2Finding 2
Finding 3Finding 3 Finding 4Finding 4
P(F) is Determined from the Joint Effect of Child Nodes
on Their Parents
Probability Updating
The Disease is Myocardial InfarctionThe Finding is Chest Pain
?
75.002.0)|(
PainChestMIP
P(MI) = 2.0% (0.02)P(Chest Pain|MI) = 75% (0.75)P(Chest Pain) = 0.02 x 0.75 + 0.98 x 0.08
Using the Multi-Membership Model
08.098.075.002.0
75.002.0)|(
PainChestMIP
What about more findings?79
• The joy of recursion!
)(
)|()()|(
1
11 FP
DFPDPFDP F1= Chest Pain
F2= ST Elevation
F3= CK Increased
…. etc.
Suppor
ts Acc
urate
and Co
mplete
Orderi
ng Pro
cess
82
Modeling Medical Phenomena
Examples of Some of the Things that can be Modeled
83Noise
The Effect of Noise on the Diagnosis of Pneumonia• Noisy Lab (continuous) Data• Noisy Physical Exam (categorical) Data
Types of Noise• Bias• Imprecision
PneumoniaPresentAbsent
2.0098.0
Measured White Blood Count
0 to 22 to 44 to 66 to 88 to 1010 to 1212 to 1414 to 1616 to 1818 to 2020 to 3030 to 5050 to 8080 to 130>= 130
2.948.5915.921.521.616.28.813.090.810.290.18.006 0 + 0 + 0 +
8.17 ± 3.5
Real White Blood Count
0 to 22 to 44 to 66 to 88 to 1010 to 1212 to 1414 to 1616 to 1818 to 2020 to 3030 to 5050 to 8080 to 130>= 130
0.132.0913.333.533.513.42.610.820.450.17.055 0 + 0 + 0 + 0 +
8.15 ± 2.4
Source of Result
Small SDBig SDBias HighBias Low
25.025.025.025.0
Auscultated Rales
PresentAbsent
18.681.4
Rales Really There!
PresentAbsent
6.7093.3
Reported By
Medical StudentResidentAttendingPulmonologistOver Sensitive Med Student
20.020.020.020.020.0
Normal and LogNormal Distributions
Different Types of Normal Noise/Bias
Noise/Bias modeled with Simple Discrete Distributions
84Boolean Logic
Probabilistic Logic• If A and B then C
• P(C) = P(A and B) = P(A) * P(B|A)• If A or B then C
• P(C) = P(A or B) = P(A) + P(B) – P(A and B)
• In a Bayesian Network, the resolution of Linked Rules Occurs Automatically
B
PresentAbsent
20.080.0
A
PresentAbsent
1.099.0
E
PresentAbsent
5.0095.0
C: If A and B then C
PresentAbsent
0.2099.8
D: If A or B then D
PresentAbsent
20.879.2
I: If B and F and H = High Then I
PresentAbsent
4.0096.0
F: If A or B or E then F
PresentAbsent
24.875.2
G: If (C and D) or (E and F) then G
PresentAbsent
5.1994.8
H
HighMediumLow
20.060.020.0
Five Interconnected Rules
Four Variables
85Temporal Phenomena
Several Approaches to Temporal Modeling have been Proposed
Markov and Hidden Markov Models are Most Common • Called Dynamic or Temporal Bayesian Networks• Can Model Complex Disease Behavior• Trained from Data Organized in “Time Slices”• Can be Extended to Include Decisions and Utilities
• (become “Partially Observable Markov Decision Processes”)
86The Dynamic Bayesian Network
Can Model Changing Medical Phenomena
• Changes in the State or Status of a Disease
• Findings Caused by the Disease in it’s Various States
• Can be Used for Diagnosis, Prediction and Explanation
Disease_Status2
AbsentMildModerateSevereDead
74.510.45.443.715.89
Disease_Status1
AbsentMildModerateSevereDead
81.67.544.603.123.13
Disease_Status
AbsentMildModerateSevereDead
90.04.003.002.001.00
Test
NormalMildly AbnormalSeverely AbnormalPatient Deceased
85.78.404.871.00
Test1
NormalMildly AbnormalSeverely AbnormalPatient Deceased
78.011.17.873.13
Test2
NormalMildly AbnormalSeverely AbnormalPatient Deceased
71.413.09.705.89
First Time Slice Second Time Slice
87
Pancreatitis Over Time
Pancreatitis
AcuteRecoveringDischargeable
63.08.2328.8
Pancreatitis
AcuteRecoveringDischargeable
39.728.931.4
Pancreatitis
AcuteRecoveringDischargeable
25.049.025.9
Amylase
30 to 8080 to 140140 to 200200 to 600600 to 8500
26.218.617.619.218.5
981 ± 2000
Amylase
30 to 8080 to 140140 to 200200 to 600600 to 8500
30.317.717.020.314.7
816 ± 1800
Amylase
30 to 8080 to 140140 to 200200 to 600600 to 8500
30.617.415.123.513.3
761 ± 1700
Lipase
0 to 300300 to 600600 to 12001200 to 30003000 to 1.28e5
22.814.724.711.326.5
17900 ± 34000
Lipase
0 to 300300 to 600600 to 12001200 to 30003000 to 1.28e5
31.515.924.19.9818.6
12700 ± 30000
Lipase
0 to 300300 to 600600 to 12001200 to 30003000 to 1.28e5
35.718.721.59.9914.2
9830 ± 26000
WBC
4 to 66 to 88 to 99 to 1212 to 17
14.628.616.018.322.5
9.28 ± 3.4
WBC
4 to 66 to 88 to 99 to 1212 to 17
17.830.516.515.719.5
8.9 ± 3.3
WBC
4 to 66 to 88 to 99 to 1212 to 17
18.730.916.915.418.2
8.78 ± 3.3
Abdomenal Pain
PresentAbsent
57.942.1
0.421 ± 0.49
Abdomenal Pain
PresentAbsent
53.546.5
0.465 ± 0.5
Abdomenal Pain
PresentAbsent
51.848.2
0.482 ± 0.5
Pain
PresentAbsent
60.839.2
0.392 ± 0.49
Pain
PresentAbsent
57.342.7
0.427 ± 0.49
Pain
PresentAbsent
56.743.3
0.433 ± 0.5
Glucose
60 to 9090 to 103103 to 115115 to 140140 to 410
20.516.619.522.620.8
139 ± 81
Glucose
60 to 9090 to 103103 to 115115 to 140140 to 410
24.316.622.819.916.5
130 ± 74
Glucose
60 to 9090 to 103103 to 115115 to 140140 to 410
27.617.624.617.313.0
122 ± 68
First Time Slice Second Time Slice
Bayesian Networks and Diagnosis
Re-Purposing Clinical Data
Strategic Goals
Minimum goal: Be able to share applications, reports, alerts, protocols, and decision support with ALL customers of our same vendor
Maximum goal: Be able to share applications, reports, alerts, protocols, and decision support with anyone in the WORLD
# 90
Why do we need detailed clinical models?
# 96
How are the models used in an EMR?Data entry screens, flow sheets, reports, ad hoc queries
• Basis for application access to clinical data
Computer-to-Computer Interfaces
• Creation of maps from departmental/external system models to the standard database model
Core data storage services
• Validation of data as it is stored in the database
Decision logic
• Basis for referencing data in decision support logic
Does NOT dictate physical storage strategy
Core Assumptions
‘The complexity of modern medicine exceeds the inherent limitations of the unaided human mind.’~ David M. Eddy, MD, Ph.D.
‘... man is not perfectible. There are limits to man’s capabilities as an information processor that assure the occurrence of random errors in his activities.’~ Clement J. McDonald, MD
Ontologies, Concepts, and
Probabilities
The way from medical concepts to diagnostic models
# 101
Relational database implications
How would you calculate the desired weight loss during the hospital stay?
Patient Identifier
Date and Time Observation Type Observation Value
Units
123456789 7/4/2005 Dry Weight 70 kg
123456789 7/19/2005 Current Weight 73 kg
Patient Identifier
Date and Time Observation Type
Weight type Observation Value
Units
123456789 7/4/2005 Weight Dry 70 kg
123456789 7/19/2005 Weight Current 73 kg