Top Banner
Ontologies in Biomedicine Mark A. Musen Stanford University QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.
54
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Ontologies in Biomedicine Mark A. Musen Stanford University.

Ontologies in Biomedicine

Mark A. Musen

Stanford University

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 2: Ontologies in Biomedicine Mark A. Musen Stanford University.

What Is An Ontology?

• The study of being • A discipline co-opted by Computer Science to enable the

“explicit specification of the conceptualization” of application domains:– Entities– Properties and attributes of entities– Constraints on properties and attributes– Individuals (often, but not always)

• A theory that provides – a common vocabulary– a shared understanding of the entities in

an appliation area

Page 3: Ontologies in Biomedicine Mark A. Musen Stanford University.

Why Develop an Ontology?

• To share common understanding of the structure of descriptive information – among people– among software agents– between people and software

• To enable reuse of domain knowledge– to avoid “re-inventing the wheel”– to introduce standards to allow interoperability

Page 4: Ontologies in Biomedicine Mark A. Musen Stanford University.

Ontologies are just the beginning

OntologiesOntologies

Software agents

Software agents Problem-

solving methods

Problem-solving

methods

AnnotatedData

AnnotatedData

DatabasesDatabasesDeclarestructure

Knowledgebases

Knowledgebases

Providedomain

descriptions

Enumeratedomainterms

Page 5: Ontologies in Biomedicine Mark A. Musen Stanford University.

Supreme genus: SUBSTANCE

Subordinate genera: BODY SPIRIT

Differentiae: material immaterial

Differentiae: animate inanimate

Differentiae: sensitive insensitive

Subordinate genera: LIVING MINERAL

Proximate genera: ANIMAL PLANT

Species: HUMAN BEAST

Differentiae: rational irrational

Individuals: Socrates Plato Aristotle …

Porphyry’s depiction of Aristotle’s Categories

Page 6: Ontologies in Biomedicine Mark A. Musen Stanford University.

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.

Page 7: Ontologies in Biomedicine Mark A. Musen Stanford University.

Foundational Model of Anatomy

• Long-term project at University of Washington to create a comprehensive ontology of human anatomy

• 72K concepts, 1.9M relationships

• One of the largest and best developed ontologies in biomedicine

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 8: Ontologies in Biomedicine Mark A. Musen Stanford University.

Physical Anatomical Entity

AnatomicalSpatial Entity

Anatomical Structure

Body Substance

BodyPart

OrganSystem

OrganismThe Body

OrganPartOrganCell

OrganSubdivision

OrganComponentTissue

Top level of the Foundational Model of Anatomy

Page 9: Ontologies in Biomedicine Mark A. Musen Stanford University.

Heart

Cavityof Heart

Wallof Heart

RightAtrium

Cavity ofRight Atrium

Wall ofRight Atrium

FossaOvalis

Myocardium

SinusVenarum

SANode

Myocardiumof Right Atrium

CardiacChamber

HollowViscus

InternalFeature

OrganCavity

Organ CavitySubdivision

AnatomicalSpatial Entity

AnatomicalFeature

BodySpace

OrganComponent

OrganSubdivision

Viscus

OrganPart

Organ

AnatomicalStructure

Parts of the heart

Classes of anatomical structures

Is-a

Part-of

Page 10: Ontologies in Biomedicine Mark A. Musen Stanford University.

But we really want ontologies in electronic form

• Ontology contents can be processed and interpreted by computers

• Interactive tools can assist developers in ontology authoring

Page 11: Ontologies in Biomedicine Mark A. Musen Stanford University.

The FMA demonstrates that distinctions are not universal

• Blood is not a tissue, but rather a body substance (like saliva or sweat)

• The pericardium is not part of the heart, but rather an organ in and of itself

• Each joint, each tendon, each piece of fascia is a separate organ

These views are not shared by many anatomists!

Page 12: Ontologies in Biomedicine Mark A. Musen Stanford University.

Ontologies are cropping up everywhere!

• Indexing of online information for access by humans or search engines

• Product catalogs for e-commerce

• Reference terminologies for machine translation and data interchange

• Standard terms for describing experimental data

• Frameworks for structuring knowledge for decision support

Page 13: Ontologies in Biomedicine Mark A. Musen Stanford University.

The New Philosophers

• Categorizing “what exists” in machine-understandable form

• Providing a structure that enables– Developers to locate and update relevant

descriptions – Computers to infer relationships and properties

• Creating new abstractions about the world to facilitate the creation of this structure

Page 14: Ontologies in Biomedicine Mark A. Musen Stanford University.

Lots of ontology builders are not very good philosophers

• Nearly always, ontologies are created to address pressing professional needs

• The people who have the most insight into professional knowledge may have little appreciation for metaphysics, principles of knowledge representation, or computational logic

• There simply aren’t enough good philosophers to go around

Page 15: Ontologies in Biomedicine Mark A. Musen Stanford University.

A case in point: The International Classification of Diseases

• An enumeration of diseases that forms the basis for all medical claims and reimbursements in the United States

• A “legacy” terminology that has its roots in 19th century epidemiology

• Created initially by biostatisticians with a pressing need to compare death statistics in different European countries

• A system that won’t go away—and yet we would never create anything like it again

Page 16: Ontologies in Biomedicine Mark A. Musen Stanford University.

A Small Portion of ICD9-CM724 Unspecified disorders of the

back724.0 Spinal stenosis, other than

cervical724.00 Spinal stenosis,

unspecified region724.01 Spinal stenosis, thoracic

region724.02 Spinal stenosis, lumbar

region724.09 Spinal stenosis, other724.1 Pain in thoracic spine724.2 Lumbago724.3 Sciatica724.4 Thoracic or lumbosacral neuritis724.5 Backache, unspecified724.6 Disorders of sacrum724.7 Disorders of coccyx724.70 Unspecified disorder of

coccyx724.71 Hypermobility of coccyx724.71 Coccygodynia724.8 Other symptoms referable to back724.9 Other unspecified back disorders

Page 17: Ontologies in Biomedicine Mark A. Musen Stanford University.

ICD9 (1977): A Handful of Codes for Traffic Accidents

Page 18: Ontologies in Biomedicine Mark A. Musen Stanford University.

ICD10 (1999): 587 codes for such accidents

•V31.22 Occupant of three-wheeled motor vehicle injured in collision with pedal cycle, person on outside of vehicle, nontraffic accident, while working for income

•W65.40 Drowning and submersion while in bath-tub, street and highway, while engaged in sports activity

•X35.44 Victim of volcanic eruption, street and highway, while resting, sleeping, eating or engaging in other vital activities

Page 19: Ontologies in Biomedicine Mark A. Musen Stanford University.

ICD is used for lots of (too many?) things!

• ICD is used to code all patient encounters with the health-care system for purposes of– Billing and reimbursement– Institutional planning– Disease surveillance and public health– Quality assurance– Economic modeling by third-party payors

• ICD was never intended to make the distinctions relevant to all these tasks!

• When patient encounters are encoded with ICD, it is impossible to keep all these uses in mind

Page 20: Ontologies in Biomedicine Mark A. Musen Stanford University.

If real ontologists could build the ICD from scratch …

• Diseases would be organized with well-defined relationships

• Diseases would be associated with computer-understandable definitions

• There would be well-defined rules to enable aggregation of primitive concepts into complex descriptions—and for ensuring that those descriptions were sensible

• There would be well-defined mechanisms for creating use-specific views of the ICD

Page 21: Ontologies in Biomedicine Mark A. Musen Stanford University.

The components of ontologies

• Classes: The primary entities in the world being models (e.g., “organ”)

• Attributes: The properties of classes (e.g., “shape”, “location”)

• Relations: Statements regarding how one class may relate to others (e.g., “the heart” is-a “organ”)

• Axioms: More complex logical statements (e.g., “only paired organs can be left-sided or right-sided”)

Page 22: Ontologies in Biomedicine Mark A. Musen Stanford University.

Classes and attributes in the FMAClasses and attributes in the FMA

Page 23: Ontologies in Biomedicine Mark A. Musen Stanford University.

Attributes of a class (e.g., “Esophagus”)

Page 24: Ontologies in Biomedicine Mark A. Musen Stanford University.

“is-a” is a special relation

If a sub-class is-a member of a super-class, then – every instance of the sub-

class is also an instance of the super-class (e.g., every member of the set aorta is necessarily a member of the set artery)

– Values of attributes of the super-class are inherited by every instance of the sub-class (e.g., if arteries have cylindrical shape, then aorta has cylindrical shape)

Page 25: Ontologies in Biomedicine Mark A. Musen Stanford University.

“Frame-based” knowledge-representation systems

• Allow developers to encode – Taxonomic hierarchies of classes– Other relations among classes

(e.g., “part-of”) in addition to the is-a hierarchy

– Attributes of classes that take on particular values to define instances of the classes

• Support inheritance of attributes and values along taxonomic relations

Page 26: Ontologies in Biomedicine Mark A. Musen Stanford University.

Distinctions about ontologies

• “Light” versus “heavy”: Is the ontology a simple taxonomy or does the ontology additional detail regarding the nature of classes?

• “Upper-level” versus “domain-oriented”: Does the ontology try to describe general, abstract concepts or concepts tied to a particular application area?

Page 27: Ontologies in Biomedicine Mark A. Musen Stanford University.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Suggested Upper Merged Ontology (SUMO)

Page 28: Ontologies in Biomedicine Mark A. Musen Stanford University.

Part of the CYC Upper Ontology

Page 29: Ontologies in Biomedicine Mark A. Musen Stanford University.

The story so far …

• Ontologies define the entities—and relationships among entities—in some application area

• The authors’ point of view determines which distinctions are appropriate in a particular ontology

• Ontologies often use frame-based representations (including classes, attributes, relationships, and axioms) to encode knowledge

• People are building ontologies for nearly every niche of biomedicine

Page 30: Ontologies in Biomedicine Mark A. Musen Stanford University.

The pressing need to standardize the names of human genes

Page 31: Ontologies in Biomedicine Mark A. Musen Stanford University.

But the human genome is only part of the problem …

• Scientist maintain huge databases of gene sequences and gene expression for a wide range of “model organisms” (e.g., mouse, rat, yeast, fruit fly, round worm, slime mold)

• Database entries are annotated with entries such as the name of a gene, the function of the gene, and so on

• How do you ensure uniformity in the nature of these annotations?

Page 32: Ontologies in Biomedicine Mark A. Musen Stanford University.

Gene Ontology Consortium

• Founded in 1998 as a collaboration among scientists responsible for developing different databases of genomic data for model organisms (fruit fly, yeast, mouse)

• Now, essentially all developers of all model-organism databases participate

• Goal: To produce a dynamic, controlled vocabulary that can be applied to all organism databases even as knowledge of gene and protein roles in cells is accumulating and changing

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Page 33: Ontologies in Biomedicine Mark A. Musen Stanford University.

Gene Ontology (GO)

• Comprises three independent “ontologies”– molecular function of gene products– cellular component of gene products– biological process representing the gene product’s higher

order role.• Uses these terms as attributes of gene products in the

collaborating databases (gene product associations)• Allows queries across databases using GO terms, providing

linkage of biological information across species

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Page 34: Ontologies in Biomedicine Mark A. Musen Stanford University.
Page 35: Ontologies in Biomedicine Mark A. Musen Stanford University.

GO has been wildly successful!!

• Dozens of biologists around the world contribute to GO on a regular basis

• The ontology is updated every 30 minutes!

• It’s now impossible to work in most areas of computational biology without making use of GO terms

Page 36: Ontologies in Biomedicine Mark A. Musen Stanford University.

But GO has had real problems …

• Ontologies initially were represented in an idiosyncratic format that was not compatible with standard knowledge-representation systems (DAG-Edit)

• The format was based on directed acyclic graphs of concepts, without the general ability to specify machine interpretable properties of entities or definitions of entities

• Because of the informal knowledge-representation system, lots of errors crept into GO– Terms that were duplicated in different places– Terms with no superclasses– Uncertain relationships between terms

• The GO consortium is working hard to rectify these problems by means of a new representation (OBO-Edit) and enhanced quality control

Page 37: Ontologies in Biomedicine Mark A. Musen Stanford University.

Creating ontologies has become a widespread cottage industry

• Professional Societies– HL7: Reference Information Model– MGED: Microarray Gene Expression Data Society Ontology– HUPO: Human Protein Organization Ontology

• Government– NCI Thesaurus– NIST: Process Specification Language

• Open Biological Ontologies– GO– Three dozen (and growing) other ontologies– Mostly in DAG-Edit, some in Protégé format

Page 38: Ontologies in Biomedicine Mark A. Musen Stanford University.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

A Portion of the OBO Library

Page 39: Ontologies in Biomedicine Mark A. Musen Stanford University.

HL-7 Reference Information Model (RIM)

Page 40: Ontologies in Biomedicine Mark A. Musen Stanford University.

HL7 RIM• Provides a uniform framework for specification of

information required by health-care information systems

• Based on six top-level, very general classes: Act, Entity, Role, Participation, Act_relationship, and Role_link

• Designed to facilitate information exchange among distributed elements of clinical information systems

• Has the same limitations that all “upper level” ontologies share:– Abstract entities are hard to define– It’s hard to know what should be “in” and what should be

“out”

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 41: Ontologies in Biomedicine Mark A. Musen Stanford University.

Description Logic (DL)

• A subset of logic designed to focus on categories and their definitions in terms of existing relations

• More expressive than frame-based representations systems (as in FMA) but less expressive than first-order logic (as in CYC)

• Major inference tasks:– Subsumption

Is category C1 a subset of C2?– Classification

Does Object O belong to C?

Page 42: Ontologies in Biomedicine Mark A. Musen Stanford University.

Kinds of classes

• Defined– Have explicit necessary and sufficient properties

(roles)– Often are specializations of primitive concepts

• Primitive– Have no sufficient properties– May have other, necessary properties– Correspond to natural kinds

Page 43: Ontologies in Biomedicine Mark A. Musen Stanford University.

A simple network of Generic Concepts

THING

WOMAN

HUMAN

MAMMALFEMALE-ANIMAL

MALE-ANIMAL

PLANT

ANIMAL

MINERAL

FISH

HORSE

*

*

* **

*

*

**

*

MAN

Defined concepts are in yellow;Primitive concepts are in green.

Page 44: Ontologies in Biomedicine Mark A. Musen Stanford University.

A classifier is a program that can use DL to conclude:

• All WOMEN are FEMALE ANIMALS

• A HORSE may not also be a PLANT

• HUMAN subsumes MAN and WOMAN

• A MAN may not also be a WOMAN

Page 45: Ontologies in Biomedicine Mark A. Musen Stanford University.

The Primitive Concept MESSAGE

THING

DATE MESSAGE PERSON

TEXT

*

**

*

*

A MESSAGE is, among other things, a THING with at least one Sender, all of which are PERSONs, at least one Recipient, all of which are PERSONs, a Body, which is a TEXT, a SendDate, which is a DATE, and a ReceivedDate, which is a DATE.

SendDate(1,1)

ReceiveDate(1,1)

Body(1,1)

Recipient(1,NIL)

Sender(1,NIL)

v/r

v/r

v/r

v/r

v/r

Page 46: Ontologies in Biomedicine Mark A. Musen Stanford University.

Defined concepts are derived from primitive concepts

DATE MESSAGE PERSON

TEXT

**

*

*

A STARFLEET-MESSAGE is a MESSAGE, all of whose Senders are STARFLEET-COMMANDERS.

SendDate(1,1)

ReceivedDate(1,1)

Body(1,1)

Recipient(1,NIL)

Sender(1,NIL)

v/r

v/r

v/r

v/r

v/r

STARFLEET-MESSAGE

STARFLEET-COMMANDER

v/r

restricts

Page 47: Ontologies in Biomedicine Mark A. Musen Stanford University.

A DL Classifier

• Takes a new Concept and automatically determines all subsumption relations between it and all other Concepts in the network

• Adds new links when new subsumption relations are discovered

• Automates the placement of new Concepts in the taxonomy

Page 48: Ontologies in Biomedicine Mark A. Musen Stanford University.

Before Classifying the Concept X

DATE MESSAGE PERSON

TEXT

**

*

*

A MESSAGE with exactly one Recipient, and all of whose Senders are STARFLEET-COMMANDERs.

SendDate(1,1)

ReceivedDate(1,1)

Body(1,1)

Recipient(1,NIL)

Sender(1,NIL)

v/r

v/r

v/r

v/r

v/r

STARFLEET-MESSAGE

STARFLEET-COMMANDER

v/r

restricts

v/r

restricts

restricts

(1,1)

X

Page 49: Ontologies in Biomedicine Mark A. Musen Stanford University.

After Classifying the Concept X

DATE MESSAGE PERSON

TEXT

**

*

*SendDate(1,1)

ReceivedDate(1,1)

Body(1,1)

Recipient(1,NIL)

Sender(1,NIL)

v/r

v/r

v/r

v/r

v/r

STARFLEET-MESSAGE

STARFLEET-COMMANDER

v/r

restricts

restr

icts

(1,1)

X

X IS-A STARFLEET MESSAGE!

Page 50: Ontologies in Biomedicine Mark A. Musen Stanford University.

The Beauty of Classification for Ontologies

• The classifier takes care of where to place a new concept in the hierarchy

• All inheritance relationships are automatically propagated to the new concept

• Relationships among a new concept and other entities are automatically simplified by classifying the new concept as a specialization of existing concepts

Page 51: Ontologies in Biomedicine Mark A. Musen Stanford University.

Classification generates a new, inferred hierarchy

Page 52: Ontologies in Biomedicine Mark A. Musen Stanford University.

The Ontology Web Language (OWL)

• Comes in three flavors:– OWL Lite (frame-based)– OWL DL (decription logic)– OWL Full (first-order logic and then some)

• Rapidly being adopted for use in biomedical ontologies, including:– NCI Thesaurus (cancer biology and oncology)– MGED Ontology (DNA micro-array experiments)– BioPAX (metabolic pathways)

• The new editor and representation system for OBO ontologies (OBO-Edit) uses a subset of OWL

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 53: Ontologies in Biomedicine Mark A. Musen Stanford University.

DL and Ontologies

• There is not just one “description logic”; DLs come in different varieties with different expressivity

• DLs are of value primarily to ontology developers, to see the implications of modeling decisions

• DLs also can be used by end users, when reasoning about systems that ontologies model

Page 54: Ontologies in Biomedicine Mark A. Musen Stanford University.

A thousand flowers are blooming!

• Ontologies are being developed by interested groups from every sector of academia, industry, and government

• Many of these ontologies have been proven to be extraordinarily useful to wide communities

• We finally have tools and representation languages that can enable us to create durable and maintainable ontologies with rich semantic content