Http://ncor.us 1 Ontologies in Biomedicine: The Good, The Bad and The Ugly Barry Smith .
Post on 21-Dec-2015
219 Views
Preview:
Transcript
http://ncor.us 1
Ontologies in Biomedicine:
The Good, The Bad and The Ugly
Barry Smith
http://ontology.buffalo.edu/smith
http://ncor.us 2
The GoodFoundational Model of Anatomy (FMA)
ProVery clear statement of scope: structural human anatomy, at all levels of granularity, from the whole organism to the biological macromoleculePowerful treatment of definitions, from which the entire FMA hierarchy is generated – can serve as basis for formal reasoning
ConSome unfortunate artifacts in the ontology deriving from its specific computer representation (Protégé)
http://ncor.us 3
Intermediate
GALENPro Allows formal representation of clinical information Allows multiple views of relevant detail as needed Uses powerful Description Logic (DL)-based
formal structureConRemains only partially developedContains errors: Vomitus contains carrot
– which DLs did not prevent
http://ncor.us 4
IntermediateThe Gene Ontology
Con
Poor formal architecture
Full of errors
menopause part_of death
Poor support for automatic reasoning and error-checking
Poor treatment of definitions
Not trans-granular
No relation to time or instances
http://ncor.us 5
The Gene Ontology
Pro
Open Source
Cross-Species
... has recognized the need for reform, including explicit representation of granular levels
http://ncor.us 6
Problem of Circularity
GO:0042270:
Protection from natural killer cell mediated cytolysis
Definition: The process of protecting a cell from cytolysis by natural killer cells.
http://ncor.us 7
GO:0019836 hemolysis
Definition: The processes that cause hemolysis
X =def. the Y of X
this is worse than circular
http://ncor.us 8
The Bad
Reactome ProRich catalogue of biological process ConIncoherent treatment of categories:
ReferentEntity (embracing e.g. small molecules) is a sibling of PhysicalEntity (embracing complexes, molecules, ions and particles). Similarly CatalystActivity is a sibling of Event.
http://ncor.us 9
The Bad
National Cancer Institute Thesaurus
ProOpen source; ambitiously broad coverage; DL-basedConPoor realization of DL formalismFull of mistakes (many inherited from its UMLS sources):– three disjoint classes of plants: Vascular Plant, Non-
vascular Plant, Other Plant
– three disjoint kinds of cells: Cell, Normal Cell, Abnormal Cell
– Normal Cell is_a Microanatomy
See http://ontology.buffalo.edu/medo/NCIT_Smith.html
http://ncor.us 10
National Cancer Institute Thesaurus
Duratec, Lactobutyrin and Stilbene Aldehyde classified as: Unclassified Drugs and Chemicals
Pro
NCIT, too, has recognized the need for reform
(NCIT is part of the OBO library)
http://ncor.us 11
The UglyUMLS Semantic Network
Pros
Broad coverage; no multiple inheritance
Cons
Incoherent use of ‘conceptual entities’
(e.g. the digestive system as a conceptual part of the organism)
Full of errors
http://ncor.us 12
UMLS Semantic Network
Edges in the graph represent merely “possible significant relations”:– Bacterium causes Experimental Model of
Disease– Experimental Model of Disease affects
Fungus– Experimental model of disease is_a
Pathologic Function
http://ncor.us 13
UMLS Semantic Network
Unclear what the nodes of the graph are:Drug Delivery Device contains Clinical Drug Drug Delivery Device narrower_in_meaning_than Manufactured Object
The use-mention confusion:“Swimming is healthy and has 8 letters”
http://ncor.us 14
The Ugly
Clinical Terms Version 2 (The Read Codes)
Classifies chemicals into:
chemicals whose name begins with ‘A’,
chemicals whose name begins with ‘B’,
chemicals whose name begins with ‘C’, ...
http://ncor.us 15
The Astonishingly (Criminally?) Ugly
Health Level 7HL7 is a UML-based standard for exchange
of information between clinical information systems
has proved very crumbly as a standardThe HL7 Reference Information Model (RIM)
is supposed to overcome this problem by defining the universe of healthcare data in a rigorous way
http://ncor.us 16
HL7-RIM
AnimalDefinition: A subtype of Living Subject representing any
animal-of-interest to the Personnel Management domain.
PersonA subtype of Living Subject representing single human
being [sic] who, in the context of the Personnel Management domain, must also be uniquely identifiable through one or more legal documents.
LivingSubject Definition: A subtype of Entity representing an organism or
complex animal, alive or not.
http://ncor.us 17
HL7 RIM: The Problem of Circularity
Person = Person with documents
has the form: ‘An A is an A which is B’– useless in practical terms since neither we
nor the machine can use them to find out what ‘A’ means
– incorporate a vicious infinite regress– have the effect of making it impossible to
refer to A’s which are not Bs, for example to an undocumented person
http://ncor.us 18
HL7 Logically Incoherent
act = the record of an act
This has the form: An X is the Y of an X
again worse than circular
http://ncor.us 19
HL7-RIM: Logically Contradictory Definitions
Definition of Act: An Act is an action of interest that has happened, can happen, is happening, is intended to happen, or is requested/demanded to happen.
Definition of Act: An Act is the record of something that is being done, has been done, can be done, or is intended or requested to be done.
http://ncor.us 20
HL7 RIM Ontologically Incoherent
The truth about the real world is constructed through a combination and arbitration of attributed statements ...
As such, there is no distinction between an activity and its documentation.
http://ncor.us 21
HL7 Incredibly Successful
• embraced as US federal standard;
• central part of $15 billion program to integrate all UK hospital information systems
• made mandatory by Canada Health Infoway
• adopted by Oracle as basis for its EHR support programs
http://ncor.us 23
From molecules to diseases
A good ontology should enable us to organize our information resources in such a way that we can bridge the granularity gap between genomics and proteomics data and phenotype (clinical, pharmacological, patient-centered) data
http://ncor.us 24
good ontologies require:
Coherent upper level taxonomy distinguishing• continuants (cells, molecules, organisms ...)• occurrents (events, processes)• dependent entities (qualities, functions ...)• independent entities (their bearers)• universals (types, kinds)• instances (tokens, instances)
Coherent relation ontology supporting inference both within and between ontologies.
http://ncor.us 25
good ontologies require:
Consistent use of terms, supported by logically coherent (non-circular) definitions, in both human-readable and computable formats
http://ncor.us 26
Open Biomedical Ontologies (OBO) Upper Biomedical Ontology (UBO)
root UBO:0000001:top subclass BFO:continuant:continuant
– subclass BFO:dependent_entity:dependent_entity • subclass UBO:0000023:quality
– subclass UBO:0000026:phenotype » subclass UBO:0000025:state
– subclass UBO:0000027:disease » subclass UBO:0000005:function
– subclass GO:0003674:molecular_function • subclass BFO:disposition:disposition
– subclass BFO:independent_entity:independent_entity • subclass UBO:0000002:substance
– subclass UBO:0000019:protein – subclass GO:0005575:cellular_component – subclass UBO:0000006:anatomical_entity
» subclass UBO:0000008:gross_anatomical_entity – subclass UBO:0000007:organism
» subclass UBO:0000015:microbe » subclass UBO:0000014:plant » subclass UBO:0000017:animal
• subclass BFO:fiat_part_of_substance:fiat_part_of_substance • subclass BFO:boundary_of_substance:boundary_of_substance • subclass BFO:aggregate_of_substances:aggregate_of_substances
subclass BFO:occurrent:occurrent – subclass BFO:dependent_occurrent:dependent_occurrent
• subclass UBO:0000004:process – subclass GO:0008150:biological_process
• subclass BFO:fiat_part_of_process:fiat_part_of_process – subclass UBO:0000029:life_cycle_stage
• subclass BFO:aggregate_of_processes:aggregate_of_processes – subclass EO:0007359:environment ontology
• subclass BFO:temporal_boundary_of_process:temporal_boundary_of_process – subclass BFO:independent_occurrent:independent_occurrent
http://ncor.us 27
OBO Relation Ontology (RO)OBO Relation Ontology (RO)
• Clear distinction between universals (classes, kinds, types and instances (individuals, tokens
• Precise formal definitions of relations• Automatic applicability to time-indexed instance-
data e.g. in Electronic Health Record• Consistency with the Relation Ontology now a
criterion for admission to the OBO ontology library
see see Genome Biology Genome Biology Apr. 2006Apr. 2006
http://ncor.us 28
Three types of relations
between instances:
Mary’s heart part_of Mary
between an instance and a universal:
Mary instance_of homo sapiens
between universals:
gastrulation part_of embryonic development
http://ncor.us 29
A suite of primitive instance-level relations
identical_to
part_of
located_in
adjacent_to
earlier
derives_from
...
http://ncor.us 30
A suite of defined relations between universals
Foundational is_apart_of
Spatial located_incontained_inadjacent_to
Temporal transformation_ofderives_frompreceded_by
Participation has_participanthas_agent
http://ncor.us 31
GALEN: Vomitus contains carrot
All portions of vomit contain all portions of carrot
All portions of vomit contain some portion of carrot
Some portions of vomit contain some portion of carrot
Some portions of vomit contain all portions of carrot
http://ncor.us 32
all-some structure
A part_of B =def. given any instance a of A there is some instance b of B such that a part_of b on the instance level
Allows automatic ontology integration via cascading reasoning:
A R1 B
B R2 C
A R3 C
http://ncor.us 33
adjacent_to
cell wall adjacent_to cytoplasm
intron adjacent_to exon
Golgi apparatus adjacent_to endoplasmic
reticulum
periplasm adjacent_to plasma membrane
presynaptic membrane adjacent_to synaptic cleft
http://ncor.us 34
A adjacent_to B
every instance of A stands in the instance-level adjacent_to relation to some instance of B
http://ncor.us 35
adjacent_to as a relation between universals is not
symmetric
nucleus adjacent_to cytoplasm
Not: cytoplasm adjacent_to nucleus
seminal vesicle adjacent_to urinary bladder
Not: urinary bladder adjacent_to seminal vesicle
http://ncor.us 36
The Granularity Gulf
most existing data-sources are of fixed, single granularity
many (all?) clinical phenomena cross granularities
http://ncor.us 37
Main obstacle to integrating genetic and EHR data
No facility for dealing with time and instances (particulars, individuals) in current ontologies
http://ncor.us 38
Key idea
To define ontological relations like
part_of, develops_from
it is not enough to look just at universals / classes / types / ‘concepts’ :
we need also to take account of instances and time
http://ncor.us 39
transformation_of
A transformation_of B
=def. any instance of A was at some
earlier time an instance of B
http://ncor.us 40
transformation_of
c at t1
C
c at t
C1
time
same instance
mature RNA transformation_of pre-RNA
adult transformation_of child
carcinomatous colon transformation_of colon
top related