Ontologies & Terminologies in Biomedicine Methodological Approaches to the Population, Selection, and Evaluation of Conceptual Knowledge Collections Philip R.O. Payne, Ph.D. The Ohio State University Medical Center Department of Biomedical Informatics
60
Embed
Ontologies In Biomed - bmi.osu.edupayne/presentations/Ontologies_In_Biomed.pdfFormal Concept Analysis (FCA) ... Appendicitis, NOS Acute D5-46100 01 G-A231 01 ... Mapping Between Terminologies:
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Ontologies & Terminologiesin Biomedicine
Methodological Approaches to the Population, Selection, andEvaluation of Conceptual Knowledge Collections
Philip R.O. Payne, Ph.D.The Ohio State University Medical Center
• Ontologies and terminologies– Differentiation– Uses– Challenges and opportunities
DNA Sequences
Gene Expression
Pathways
Literature
Protein StructureProteomics
Genomic Databases
Lead Compounds
Databases(Structured Data, Text, Images)
Biomedical Knowledge
Grand Challenge for Translational Research: Integration, Modeling and Analysis
Defining Knowledge• Procedural knowledge
– Rules, algorithms, and procedures
• Conceptual knowledge– Interconnection of concepts by a network
of relationships– Concepts are “the knowledge possessed
by an individual about a object or event”
• Strategic knowledge– Relates procedural and conceptual
knowledge
Importance of Conceptual Knowledge
• Conceptual knowledgestructures or collections– Allow for translation of domain
knowledge into computationalforms amenable togeneralization and inference
– Enable efficient and effectivedevelopment, maintenance,and dissemination ofknowledge-based systems
Knowledge Engineering
• “…integrating knowledge intocomputer systems in order to solvecomplex problems normallyrequiring a high level of humanexpertise…”
• Four major steps:– Knowledge Acquisition (KA)– Knowledge Representation (KR)– System Implementation and
Refinement– Verification and Validation
Essential Theories of Knowledge Engineering
• Catalogues libraries of PSM’s or explores a single PSM within such a library• Extensive use of ontologies• At runtime, a general inference engine may be employed
T6
• Hybrid of T3 and T5 approach in which PSM’s are used to structure the analysisdiscussions, but are converted to T5 during implementationT5
• Strong commitment to customizable inference proceduresT4
• Active focus on ontology creation• Ontologies not always used for execution, but rather for domain analysisT3
• No explicit representation of meta-knowledge• Focuses on axioms• Applies rigid controls as to how axioms may be asserted
T2
• Rejects declarative representations• Focuses on frame-based representationsT1
DescriptionT
Knowledge Engineering & Expertise Transfer
Conceptual Knowledge Acquisition Techniques
Formal Concept Analysis (FCA)• Formal Objects + Formal Attributes = Formal Context• Closed Relation: can no longer enlarge the attribute or object set• Formal Concept: Pairing of a Formal Object and Attributes in a Closed Relation
X
X
X
X
Mammal
XXHarriet
XXGreyfriar’sBobby
XXSocks
XXSnoopy
XXGarfield
CatDogTortoiseRealCartoonAttributes /Objects
Concept Lattice• Used to visualize the connection between “formal concepts”• Allows for the application of graph theoretic evaluation metrics
to the resulting conceptual knowledge structure
FCA in Multidimensional Data Sets
Use “situations”– Decomposition of the n-dimensional matrix
into 2-dimensional matrices– Associate with set of remaining attributes
(e.g., axes) that comprise a “situation” forresulting “formal context”
Conceptual Knowledge Discoveryand Data Analysis (CKDD)
• Derivative of the field of conceptual knowledge processing• Based on the mathematical theory of FCA• Operationalization of FCA that elicits expert knowledge from
existent sources rather than from experts directly– In databases, this is known as Constructive Induction
• Often requires human intervention to arbitrate conflicts orambiguity in resulting conceptual knowledge structures.
• Software tools:– TOSCANA: comprehensive CKDD environment– CHIANTI: in-line CKDD algorithm that can be integrated with other
data mining applications
Constructive InductionOntology-anchored Knowledge Discovery in
Databases
Knowledge Representation (KR)
“The process and the result of formalization of knowledge insuch a way, that it can be used automatically for problemsolving.”
• Five defining principles:– Medium for human expression– Set of ontological commitments– Surrogate– Fragmentary theory of intelligent reasoning– Medium for pragmatically efficient computation
KR in Biomedicine - Terminologies• Segregation of collections into terminology and assertions• Surface-form representation(s) versus conceptual knowledge
AcuteInflammation, NOSInAppendix, NOS
G-A231 01M-40000 01G-C006 01T-59200 01
Acute Inflammation, NOSInAppendix, NOS
M-41000 01G-C006 01T-59200 01
Appendicitis, NOSAcute
D5-46100 01G-A231 01
Acute Appendicitis, NOSD5-46210 01
Textual DescriptionSNOMED-CT Code
KR in Biomedicine• Critical representation issues:
– Anatomic location and temporal relationships– “Meaning”, which can take multiple forms
Conceptual KnowledgeRepresentation Structures
• Logical Models• Ontologies
– Mark-up languages– Sharing of knowledge
• Terminologies• Database Schemas
Logical Model for Clinical Data (1)
• Formal (Predicate) logic– Topic neutral– Allows for representation of formal relationships
between concepts that is computationally tractablefor inference.
∀ x PLEURAL-EFFUSION (x) ≡ EFFUSION (x) ∧ ∃ y PLEURAL-CAVITY(y) ∧ ∃ z DISEASE (z) ∧ LOCATED-IN (x,y) ∧ CAUSED-BY (x,z)
Logical Model for Clinical Data (2)• Conceptual graph notation
– Ability to represent complete first order, modal, or higher-order logicsin a human and computer readable format.
(data models comprised of objects)– OWL: Web Ontology Language (adds semantics/meta-data to RDF,
as well as ability to define logical assertions)– SPARQL: Semantic web query language– RIF: Rule interchange format– N-stores: Repositories for semantic web components (e.g.,
RDF/OWL)
Semantic Web Component “Stack”
Conceptual Knowledge Mark-upLanguage (CKML)
• OML: Representation language for ontologicaland schematic structure which allows for theassertion of constraints
• Simple OML:– Representation language for functions, reification,
• Systematized Nomenclature of Medicine andVeterinary Medicine
• A Semantic Network between granular phenotypesand diseases
• > 500,000 clinical medicine concepts• Licenses
– Free for perpetual use in USA for any field of use– Included in free international UMLS license for research
SNOMED
SNOMED Information Model:Compositional, Multiaxial, Multiple Hierarchies
T M F C D P G L
H. Pylori associated heamorrhagic Gastric Ulcer =(4) D5-32220 Gastric (1) Ulcer (2) with haemorrhage (3)
G-C002 associated with (5) L-13551 H. pylori (6)
1
3
2 4
4
5
5
6
6
Axe
s
Terminologies• Represent elemental concepts and relationships among them.• Knowledge representation closely related to that of ontologies• Terminological and relational knowledge can be partitioned into a
terminology construct and a semantic network• Complexities in designing terminology representation models:
– Representation of surface-level and conceptual entities– Poly-hierarchies– Inheritance– Maintenance and growth
• LOINC laboratory codes are composed of sixattributes:1. Component or analyte2. Property3. Time4. System or specimen or sample5. Scale of precision6. Method
Example LOINC Laboratory Code
<NULL>Method
QuantitativeScale
Amniotic FluidSystem/Specimen
Point In TimeTime
Mass ConcentrationProperty
CreatinineComponent / Analyte
CREATININE:MCNC:PT:AMN:QN:LOINC Name
2159-2LOINC Code
Example Controlled Terminology: MED
• MED = Medical Entities Dictionary (CUMC)• Concept-oriented• Directed acyclic graph• Frame-based• Incorporates UMLS, ICD9-CM, and LOINC codes• Currently contains over 69,000 conceptual entities• Hierarchical and semantic relationships
Statistical Metrics• System performance can be compared to the
reference standard using some combination ofthe following measures:– Simple accuracy/agreement– Sensitivity and Specificity– Area under the ROC-curve (C-statistic)– Recall and Precision– Specific agreement– Chance corrected agreement
Challenges• Philosophical approach• Resources
– Curation (magnitude of domain concepts ≥ 10^6)• Scalability of KE methods• Content/domain coverage of available
knowledge sources– Available expertise
• Tools to apply knowledge collections topractical problems
Philosophical Approach• Realism: There exists a singular, factual truth
that can be used to characterize allphenomena, which can be ascertained givensufficient time and effort
• Instrumentalism: It is not necessarily possibleto ascertain or measure all the characteristics ofa phenomena such that a singular truth can bedefined, therefore, all knowledge is anapproximation (“snapshot”) of the truth
[del(17p13)]-may_be_cytogenetic_abnormality_of_disease-[ChronicLymphocytic Leukemia with Unmutated Immunoglobulin Heavy ChainVariable-Region Gene]-disease_has_molecular_abnormality-[ClonalImmunoglobulin Heavy Chain Gene Rearrangement]-may_be_molecular_abnormality_of_disease-[Pyothorax-AssociatedLymphoma]-disease_may_have_finding-[Lactic acid dehydrogenase raised]