1 Semantic technology empowering real world outcomes in biomedical research and clinical practices Amit Sheth Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing Wright State University, Dayton, Ohio http://knoesis.org http://knoesis.org/amit/hcls Special thanks: Sujan Parera Talk presented at Case Western Reserve University on Nov 26, 2012
82
Embed
Semantic Technology empowering Real World outcomes in Biomedical Research and Clinical Practices
Talk at Case Western Reserve university: http://engineering.case.edu/eecs/node/392
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
Semantic technology empowering real world outcomes in
biomedical research and clinical practices
Amit ShethKno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing
Wright State University, Dayton, Ohio http://knoesis.org
http://knoesis.org/amit/hcls
Special thanks: Sujan Parera
Talk presented at Case Western Reserve University on Nov 26, 2012
• Identify Relationships• Textual pattern-based extraction for known
relationships• Facts available in background knowledge• Find evidence for such facts• Combined evidence from many different
patterns increases the certainty of a relationship between the entities
Beyond Hierarchy
• Evaluating acquired knowledge• Explicit
• User can vote for facts• Facts presented based on user interests
• Implicit• User’s browsing history used as a indication of
which propositions are correct and interesting• Now it adds validated knowledge back to community
Validating Knowledge
Base Hierarchy from Wikipedia
SenseLab Neuroscience Ontologies
Meta KnowledgebasePubMed Abstracts
Focused pattern based extraction
Initial KB creation
Enriched Knowledgebase
HPC Keywords
Kno.e.sis: NLP based triples
NLM: Rule based BKR triples
Building Human Performance & Cognition Ontology (HPCO)
Merge
Use Case for HPCO
• Number of Entities – 2 million• Number of non-trivial facts – 3 million
• NLP Based*: calcium-binding protein S100B modulates long-term synaptic plasticity
• Pattern Based**: Olfactory Bulb has physical part of anatomic structure Mitral cell
* Joint Extraction of Compound Entities and Relationships from Biomedical Literature , Web Intel. 2008 * A Framework for Schema-Driven Relationship Discovery from Unstructured Text, ISWC 2006** On Demand Creation of Focused Domain Models using Top-down and Bottom-up Information Extraction, Technical Report
An Up-to-date Knowledge-Based Literature Search and Exploration Framework for Focused Bioscience Domains , IHI 2012- 2nd ACM SIGHIT International Health Informatics Symposium
• Data Sources• Internal Lab Data• External Database
• Ontological Infrastructure
• Parasite Lifecycle• Parasite
Experiment
• Query Processing• Cuebee
• Integrated internal data with external databases, such as KEGG, GO, and some datasets on TriTrypDB
• Developed semantic provenance framework and influenced W3C community
• SPSE supports complex biological queries that help find gene knockout, drug and/or vaccination targets. For example:• Show me proteins that are downregulated in the epimastigote
stage and exist in a single metabolic pathway.• Give me the gene knockout summaries, both for plasmid
construction and strain creation, for all gene knockout targets that are 2-fold upregulated in amastigotes at the transcript level and that have orthologs in Leishmania but not in Trypanosoma brucei.
SPSE
Complex queries can also include:- on-the-fly Web services execution to retrieve additional data- inference rules to make implicit knowledge explicit
SPSE
• So many ontologies• Rich in number of concepts• Mostly concentrated on taxonomical
Is edema symptom of atrial fibrillation? Is edema symptom of hypertension? Is edema symptom of diabetes?
Domains
Cardiology
Orthopedics
Oncology
Neurology
Etc…
No of concepts 1008161
Problems(diseases, symptoms) 125778
Procedures 262360
Medicines 298993
Medical Devices 33124
Relationships 77261
is treated with (disease -> medication) 41182
is relevant procedure (procedure -> disease) 3352
is symptom of (symptom -> disease) 8299
contraindicated drug (medication -> disease) 24428
Knowledge Enrichment from Data
with the above method
+UMLS
healthline.comdruglib.com
• 80% unstructured healthcare data • Pose challenges in
• Searching • Understanding• Mining • Knowledge discovery• Decision support
• Evidence based medicine• Federal policies promote meaningful use
Healthcare Challenge
Coding Complexity ICD-9 ICD-10
Diagnostic Codes 14,000 69,000
Procedure Codes 3,800 72,000
Example: 821.01: ICD-9 code for “closed” Fractured Femur, or thigh bone.Translates to 36 codes in ICD-10 with details regarding the precise nature of fracture, which thigh was fractured, whether a delay in healing occurred etc.
Healthcare Challenge
• Traditional methods doesn’t work• Understanding the context is crucial
• Advance search• All hypertension patients with ejection
fraction <40• All MI patients who are taking either beta-
blockers or ACE Inhibitors• Patients diagnose with Atrial Fibrillation on
Coumadin or Lovanox• Support core-measure initiative
Error Detection
EMR: 1. “Sepsis due to urinary tract infection….”2. “Her prognosis is poor both short term and long term, however, we will do everything possible to keep her alive and battle this infection."
A syntax based NLP extractor (such as Medlee) can extract this term and annotate as SNM:40733004_infection
By utilizing IntellegO and cardiology background knowledge, we can more accurately annotate the term as SNM:68566005_infection_urinary_tract
without IntellegOwith usage of IntellegO
Problem Problem
EMR: ”The patient is to receive 2 fluid boluses."
SNM:32457005_body_fluid
A syntax based NLP extractor (such as Medlee) can extract this term and annotate as SNM:32457005_body_fluid
without IntellegO
Problem
Fluid is part of buloses treatment, not a problem
with IntellegO
By utilizing IntellegO and cardiology background knowledge, we can determine that this is an incorrect annotation.
Treatment
Error Detection
The balance of evidence would suggest that his episode of atrial fibrillation seems to be an isolated event
He has had no documented atrial fibrillation since that time
Patient has atrial fibrillation
Patient does not have atrial fibrillation
NLP
NLP
Atrial FibrillationSyncope
Is_symptom_of
Warfarin
Atenolol
AspirinIs_medication_for
Resolve Inconsistency
She denies any chest pain but is not really function due to leg stiffness, swelling an shortness of breath
Regarding the shortness of breath, we will send for a dobutamine stress echocardiogram
Patient does not have shortness of breath
Patient has shortness of breath
NLP
NLP
Shortness of Breath
Is_symptom_of
Obesity
Hypertension
Sleep Apnea Obstructive
Resolve Inconsistency
PREscription Drug abuse Online Surveillance and Epidemiology -
PREDOSE• Non-Medical Use of Prescr - iption Drugs
• Fastest growing drug issue in US• Escalating accidental overdose deaths
• Epidemiological Data Systems• Data collection practices• Data analysis limitations
• Poor Scalability• Limited Reusability• Interoperability is
challenging• Small sample size
Of course, junkie that I am, I decided to repeat the experiment. Today, after waiting 48 hours after my last bunk 4 mg injection, I injected 2 mg. There wasn't really any rush to speak of, but after 5 minutes I started to feel pretty damn good. So I injected another 1 mg. That was about half an hour ago. I feel great now.
Health information is now available from multiple sources
• medical records• background knowledge • social networks• personal observations • sensors• etc.
69
Foursquare is an online application which integrates a persons physical location and social network.
Community of enthusiasts that share experiences of self-tracking and measurement.
FitBit Community allows the automated collection and sharing of health-related data, goals, and achievements
kHealth
70
Sensors, actuators, and mobile computing are playing an increasingly important role in providing data for early phases of the health-care life-cycle
This represents a fundamental shift: • people are now empowered to monitor and manage their own health; • and doctors are given access to more data about their patients
kHealth
71
kHealth
72
Personal Health Dashboard
kHealth
73
Personal Health Dashboard
1 2 3
Continuous Monitoring Personal Assessment Medical Service
Auxiliary Information – background knowledge, social/community support, personal context, personal medical history
Explanation: is the act of choosing the objects or events that best account for a set of observations; often referred to as hypothesis building
Discrimination: is the act of finding those properties that, if observed, would help distinguish between multiple explanatory features
81
kHealth - Technology
Explanatory Feature: a feature that explains the set of observed propertiesExplanatoryFeature ≡ ssn:isPropertyOf∃ —.{p1} … ssn:isPropertyOf⊓ ⊓ ∃ —.{pn}
elevated blood pressure
clammy skin
palpitations
Hypertension
Hyperthyroidism
Pulmonary Edema
Observed Property Explanatory Feature
Explanation
82
kHealth - Technology
Discrimination
Expected Property: would be explained by every explanatory featureExpectedProperty ≡ ssn:isPropertyOf.{f∃ 1} … ssn:isPropertyOf.{f⊓ ⊓ ∃ n}
elevated blood pressure
clammy skin
palpitations
Hypertension
Hyperthyroidism
Pulmonary Edema
Expected Property Explanatory Feature
83
kHealth - Technology
Discrimination
Not Applicable Property: would not be explained by any explanatory feature