The Unbearable Lightness of Biomedical Informatics
Post on 30-Jan-2016
38 Views
Preview:
DESCRIPTION
Transcript
1
The Unbearable Lightness of Biomedical Informatics
Barry Smith
Saarbrücken/Buffalo
http://ontologist.com
2
if Medical WordNet* is the solution
what is the problem?
*Coling Proceedings, Vol. 1, pp. 371-380
3
4
Cerebellar tumor
5
DNA
Protein
Organelle
Cell
Tissue
Organ
Organism
10-5 m
10-1 m
10-9 m
6
The quantity-quality divide30,000 genes in human200,000 proteins100s of cell types100,000s of disease types 1,000,000s of biochemical pathways (including
disease pathways)
… legacy of Human Genome Project… and of attempts to institute the electronic
health record
7
DNA
Protein
Organelle
Cell
Tissue
Organ
Organism
10-5 m
10-1 m
10-9 m
8
FUNCTIONAL GENOMICS
proteomics,
reactomics,
metabonomics,
toxicopharmacogenomics
phenomics,
behaviouromics,
…
9
DNA
Protein
Organelle
Cell
Tissue
Organ
Organism
10-5 m
10-1 m
10-9 m
The method of annotations
10
DNA
Protein
Organelle
Cell
Tissue
Organ
Organism
10-5 m
10-1 m
10-9 m
The method of indexing
11
The Gene Ontology
menopause
sensitivity to blue light
heptolysis
12
13
How overcome incompatibilities between different scientific index
terms?
immunology
genetics
cell biology
14
One answer (statistical) computational linguistics
Pattern recognition based on string searches
15
String searches need constraints
we can’t leave it to luck to overcome terminological incompatibilities
16
Remember –different disciplines are using different terminologies to refer to the same
objects, processes, features in reality
immunology
genetics
cell biology
17
An alternative answer:
“Ontology”
18
Ontology, roughly:
Overcome terminological incompatibilities by creating a standardized framework into which diverse vocabularies can be mapped
19
Kinds of Ontologies
Terms
General Logic
Thesauri
formalTaxonomies
Frames(OKBC)
Data Models(UML, STEP)
Description Logics
(DAML+OIL)
Principled, informal
hierarchies
ad hoc Hierarchies
(Yahoo!)structured Glossaries
XML DTDs
Data Dictionaries
(EDI)
‘ordinary’Glossaries
XML Schema
DB Schema
Glossaries & Data Dictionaries
MetaData,XML Schemas, & Data Models
Formal Ontologies & Inference
Thesauri, Taxonomies
Michael Gruninger
20
Kinds of Ontologies
A shared vocabulary plus a specification of its intended meaning
meaning specifiedexplicitly in a logically rigorous way
Two extremes
21
Kinds of Ontologies
Terms
General Logic
Thesauri
formalTaxonomies
Frames(OKBC)
Data Models(UML, STEP)
Description Logics
(DAML+OIL)
Principled, informal
hierarchies
ad hoc Hierarchies
(Yahoo!)structured Glossaries
XML DTDs
Data Dictionaries
(EDI)
‘ordinary’Glossaries
XML Schema
DB Schema
Glossaries & Data Dictionaries
MetaData,XML Schemas, & Data Models
Formal Ontologies & Inference
Thesauri, Taxonomies
22
Kinds of Ontologies
A shared vocabulary plus a specification of its intended meaning
meaning specifiedexplicitly in a logically rigorous way
Too expensive
23
Kinds of Ontologies
A shared vocabulary plus a specification of its intended meaning
Meaning specified informally via natural
language
Two extremes
24
Work on biomedical ontologies grew out of work on medical thesauri and nomenclatures
25
Kinds of Ontologies
Terms
General Logic
Thesauri
formalTaxonomies
Frames(OKBC)
Data Models(UML, STEP)
Description Logics
(DAML+OIL)
Principled, informal
hierarchies
ad hoc Hierarchies
(Yahoo!)structured Glossaries
XML DTDs
Data Dictionaries
(EDI)
‘ordinary’Glossaries
XML Schema
DB Schema
Glossaries & Data Dictionaries
MetaData,XML Schemas, & Data Models
Formal Ontologies & Inference
Thesauri, Taxonomies
26
Fruit
Orange
Vegetable
similarTo
ApfelsinesynonymWith
NarrowerTerm
Graph with labels edges (similarTo, Narrower, synonymWith)Fixed set of edge labels (a.k.a. relations)
Goble & Shadbolt
27
Unified Medical Language System (UMLS)
UMLS Metathesaurus:
1 million biomedical concepts
2.8 million concept names
from more than 100 controlled vocabularies and classifications
built by US National Library of Medicine
28
UMLS Source Vocabularies
MeSH – Medical Subject Headings
…
ICD International Classification of Diseases
…
GO – Gene Ontology
…
FMA – Foundational Model of Anatomy
…
29
To reap the benefits of standardization
we need to make ONE SYSTEM out of many different terminologies
= UMLS “Semantic Network”nearest thing to an “ontology” in the UMLS
30
UMLS SN
Alexa McCray, “An Upper Level Ontology for the Biomedical Domain”, Comparative and Functional Genomics, 4 (2003), 80-84.
31
UMLS SN
134 Semantic Types
54 types of edges (relations)
yielding a graph containing more than 6,000 edges
32Fragment of UMLS SN
33
34
35
UMLS SN Top Level
entity event
physical conceptual object entity
organism
36
conceptual entity
Organism Attribute
Finding
Idea or Concept
Occupation or Discipline
Organization
Group
Group Attribute
Intellectual Product
Language
37
conceptual entity
Organism Attribute
Finding
Idea or Concept
Occupation or Discipline
Organization
Group
Group Attribute
Intellectual Product
Language
38
Idea or ConceptFunctional ConceptQualitative ConceptQuantitative ConceptSpatial Concept
Body Location or RegionBody Space or JunctionGeographic AreaMolecular Sequence
Amino Acid SequenceCarbohydrate SequenceNucleotide Sequence
39
Idea or ConceptFunctional ConceptQualitative ConceptQuantitative ConceptSpatial Concept
Body Location or RegionBody Space or JunctionGeographic AreaMolecular Sequence
Amino Acid SequenceCarbohydrate SequenceNucleotide Sequence
40
Idea or ConceptFunctional ConceptQualitative ConceptQuantitative ConceptSpatial Concept
Body Location or RegionBody Space or JunctionGeographic AreaMolecular Sequence
Amino Acid SequenceCarbohydrate SequenceNucleotide Sequence
41
Idea or ConceptFunctional ConceptQualitative ConceptQuantitative ConceptSpatial Concept
Body Location or RegionBody Space or JunctionGeographic AreaMolecular Sequence
Amino Acid SequenceCarbohydrate SequenceNucleotide Sequence
42
Lake Geneva
is an Idea or Concept
43
Idea or ConceptFunctional ConceptQualitative ConceptQuantitative ConceptSpatial Concept
Body Location or RegionBody Space or JunctionGeographic AreaMolecular Sequence
Amino Acid SequenceCarbohydrate SequenceNucleotide Sequence
44
UMLS
Fingers is_a Body Location or Region
Hand is_a Body Part, Organ, or Organ Component
hand part_of body
BUT NOT
fingers part_of hand
45
Problem: Running together of concepts and entities in reality
bioinformatics à la UMLS SN( like many “knowledge engineering” disciplines )
floats free from reality in a conceptual world
of its own creation
46
Blood Pressure OntologyThe hydraulic equation:
BP = CO*PVR
arterial blood pressure (BP) is directly proportional to the product of blood flow (cardiac output, CO) and peripheral vascular resistance (PVR).
47
UMLS SN
blood pressure is an Organism Function
cardiac output is a Laboratory or Test Result or Diagnostic Procedure
48
BP = CO*PVR thus asserts that
blood pressure is proportional either to a laboratory or test result or to a diagnostic procedure
49
Problem: Confusion of reality with our (ways of gaining) knowledge
about reality
50
UMLS Semantic Network
entity
physical conceptual object entity
51
Physical Object
Substance
Food Chemical Body
52
Chemical
Chemical Chemical
Viewed Viewed
Structurally Functionally
53
Problem: Confusion of objects with our ways of referring to
objects
54
Chemical
Chemical ChemicalViewed Viewed
Structurally Functionally
Inorganic Organic Enzyme Biomedical or Chemical Chemical Dental Material
55
This multiple inheritance leads to errors in coding
Gene Ontology will eliminate multiple inheritance
56
UMLS Semantic Network
entity
physical conceptual object entity
organism
is_a
57
UMLS SN
is_a =def.
If one item ‘is_a’ another item then the first item is more specific in meaning than the second item. (Italics added)
58
fish is_a vertebrate
copulation is_a biological process
both testes is_a testis
Nazi is_a Nazism
plant parts is_a plant
59
60
What are the nodes in this graph?
Almost all nodes are linked to other nodes by a multiplicity of different types of edges
Compare: swimming is healthy
swimming has 8 letters
61
Semantic Network Definition:
Concept =def. An abstract concept, such as a
social, religious, or philosophical concept
UMLS Definition:
Concept =def. A class of synonymous terms
62
63
How can concepts figure as relata of these relations?
part_of = def. Composes, with one or more other physical units, some larger whole
causes =def. Brings about a condition or an effect.
contains =def. Holds or is the receptacle for fluids or other substances.
64
How can a set of synonymous terms serve as a receptacle for fluids or other substances?
How can sets of synonymous terms stand in relations such as affects or causes?
65
connected_to =def. Directly attached to another physical unit as tendons are
connected to muscles.
How can a concept be directly attached to another physical unit?
66
What are the relata which are linked by the edges in the SN
graph?
67
To answer this question
we need to distinguish clearly between concepts and classes:
concepts are creatures of cognition
classes are invariants (types, kinds, universals) out there in reality
68
If ontologies are about meanings / concepts
it becomes impossible to deal coherently with those relations between entities in reality which involve appeal to both classes and their instances.
69
Illustration re: part_of
heart part_of human
human heart part_of human
testis part_of human
human testis part_of human
70
For instances:part_of = instance-level parthood
(for example between Mary and her heart)
For classesA part_of B =def. given any instance a of A
there is some instance b of B such that a part_of b
This is an assertion about As.
71
a adjacent_to b
(instance-level adjacency, for example between Mary’s head and Mary’s neck)
For classes:
A adjacent_to B =def. given any instance a of A there is some instance b of B which is such that a adjacent_to b
72
A adjacent_to B
as an assertion about classes
is never an assertion about As exclusively
73
A adjacent_to B =def.
given any instance a of A there is some instance b of B which is such that a adjacent_to b
and
given any instance b of B there is some instance a of A which is such that a adjacent_to b
74
Almost all of the 54 types of edges in SN are dealt with
incoherently
part_of HAS INVERSE has_part
nucleus part_of cell
cell has_part nucleus
75
76
Acquired Abnormality affects FishExperimental Model of Disease affects
FungusFood causes Experimental Model of
DiseaseBacterium causes Experimental Model of
DiseaseBiomedical or Dental Material causes
Mental or Behavioral Dysfunction Manufactured Object causes Disease or
SyndromeVitamin causes Injury or Poisoning
77
How to do better?
78
How to do better?How to create a network of biomedically relevant terms/classes, with coherently defined relations between them, to which expert terms of the UMLS can be assigned in a maximally intelligible way?
79
What linguistic framework
is shared in common by immunologists, geneticists and cell biologists,
by phenobehavioromists and by toxicopharmacogenomists?
80
Answer:
the natural language they all use to talk about biological (biomedical) phenomena
81
BioWordNetjoint work with
Christiane Fellbaum
(see paper in Proceedings)
82
BioWordNet
use WordNet’s biomedical vocabulary, to create a better alternative to UMLS SN
83
Strengths of WordNet 2.0
Open source
Very broad coverage
Is-a / part-of architecture
Tool for automatic sense disambiguation
84
Weaknesses of WordNet 2.0Problems with relationsMixes up expert and non-expert vocabularyErrorsGapsNoise
all prevent WordNet’s being used in scientific context as substitute for UMLS SN
85
Fix WordNet’s relations by using the methodology outlined above
already applied to:
Foundational Model of Anatomy
Gene Ontology
Open Biological Ontologies
86
Institute for Formal Ontology and Medical Information Science
Saarbrücken
http://ifomis.org
87
WordNet mixes up expert and non-expert vocabulary,
both current and medieval:
suppuration#2 {pus, purulence, suppuration, ichor, sanies, festering}
88
WordNet contains biomedically relevant errors
snore-sleep
WordNet: if someone snores, then he necessarily also sleeps
snoring = the respiratory induced vibration of glottal tissues
associated not only with sleep but also with relaxation or obesity
89
WordNet has too much noise for purposes of scientific applications
90
13 senses for feel is a verbexperience – She felt resentfulfind – I feel that he doesn't like mefeel – She felt small and insignificant; feel – We felt the effects of inflationfeel – The sheets feel softgrope –He felt for his walletfinger – Feel this soft cloth! explore – He felt his way around the dark room)feel – It feels nice to be home againfeel – He felt the girl in the movie theater)
91
Medical senses of ‘feel’
palpate – examine a body part by palpation:
The runner felt her pulse.
sense – perceive by a physical sensation, e.g. coming from the skin or muscles:
He felt his flesh crawl
feel – seem with respect to a given sensation:
My cold is gone – I feel fine today
92
WordNet has gaps even in its coverage of biomedical natural
language
93
WordNet seness of ‘regulation’1. regulation (ordinance, rule)2. rule, regulation -- (a principle that customarily governs behavior; "short haircuts were the regulation")3. regulation -- (the state of being controlled or governed)4. regulation -- (the ability of an early embryo to continue normal development after its structure has been somehow damaged)5. regulation, regularization, regularisation -- (the act of bringing to uniformity)6. regulation, regulating -- (the act of controlling according to rule; "fiscal regulations are in the hands of politicians")
94
Biological sense of ‘regulation’:
A process that modulates the frequency, rate or extent of behavior
(Gene Ontology)
95
WordNet senses of ‘inhibition’1. inhibition, suppression -- ((psychology) the conscious exclusion of unacceptable thoughts or desires)2. inhibition -- (the quality of being inhibited)3. inhibition -- the process whereby nerves can retard or prevent the functioning of an organ or part; "the inhibition of the heart by the vagus nerve")4. prohibition, inhibition, forbiddance -- (the action of prohibiting or forbidding)
96
Biological senses of ‘inhibition’ much broader
inhibition = negative regulation
enzymes can be inhibited
reactions can be inhibited
… and not only by nerves
97
WordNet senses of ‘binding’1. binding -- (the capacity to attract and hold something)2. binding -- (a strip sewn over or along an edge for reinforcement or decoration)3. dressing, bandaging -- (the act of applying a bandage)4. binding, book binding; "the book had a leather binding")
98
biological sense of ‘binding’
interacting selectively with
(Gene Ontology)
99
Remove errors, noise and gaps in a two-stage process
1.select biomedically relevant natural-language terms from WordNet 2.0 extended by standard biomedical information sources
2.validate these terms and the relations between them
100
Validationeach arc in BWN is converted into a natural-
language sentence
e.g. ‘mumps is an inflammation’
via controlled human subjects experiments:
are accredited
1. as intelligible by non-experts
2. as true by experts
101
we use logical methods to ensure a coherent treatment of BWN’s
upper-level classes and relations
and thereby also bring logical rigor in a practical fashion to the
whole of the UMLS Metathesaurus
102
Bring ontological rigour to BWN
Terms
General Logic
Thesauri
formalTaxonomies
Frames(OKBC)
Data Models(UML, STEP)
Description Logics
(DAML+OIL)
Principled, informal
hierarchies
ad hoc Hierarchies
(Yahoo!)structured Glossaries
XML DTDs
Data Dictionaries
(EDI)
‘ordinary’Glossaries
XML Schema
DB Schema
Glossaries & Data Dictionaries
MetaData,XML Schemas, & Data Models
Formal Ontologies & Inference
Thesauri, Taxonomies
103
The long-term goal
BWN should serve as scaffolding/indexing system for the much larger and denser net of expert biomedical terminology which is the UMLS Metathesaurus
104The End
top related