Top Banner
-Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, http://www.ai.sri.com/~pkarp/ Dr. Robert Stevens & Professor Carole Goble University of Manchester, UK http://img.cs.man.ac.uk/
84

-Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

Dec 26, 2015

Download

Documents

Phillip Jordan
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

-Ontologies: Bio-Ontologies:

Their Creation and Design Dr. Peter Karp

SRI, http://www.ai.sri.com/~pkarp/Dr. Robert Stevens & Professor Carole Goble

University of Manchester, UKhttp://img.cs.man.ac.uk/tambis

Page 2: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

AdvertisementThe Fourth Annual Bio-Ontologies Meeting

"Sharing Experiences and Spreading Best Practice”Sponsored by

GlaxoSmithKline Pharmaceuticals

Tivoli Gardens, Copenhagen, Denmark,26th July 2001

Organised by: Richard Chen, Carole Goble, Robert Stevens, Peter Karp, Pat Hayes, Robin McEntire and Eric Neumann.

http://img.cs.man.ac.uk/stevens/workshop01

Page 3: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

Outline

• What is an ontology?– Motivation for ontologies in bioinformatics– Definition of an ontology– Naming the parts & comparing the types– Knowledge representation

• Building an ontology– Methodologies, pprinciples and pitfalls– Running example: a macromolecule fragment– Ontology Tools – Development tools

Page 4: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

Ontologies:Definitions,

Components, Subtypes

Page 5: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

Outline

• Motivations for ontologies in bioinformatics

• Definition of ontology

• Principles and pitfalls of ontology design

• GKB Editor ontology development tool

Page 6: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

Definition of an Ontology

• Conceptualization of a domain of interest– Concepts, relations, attributes, constraints,

objects, values• An ontology is a specification of a conceptualization

– Formal notation– Documentation

• A variety of forms, but includes:– A vocabulary of terms– Some specification of the meaning of the terms

• Ontologies are defined for reuse

Page 7: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

Roles of Ontologies in Bioinformatics

• Success of many biological DBs depends on– High fidelity ontologies– Clearly communicating their ontologies

• Prevent errors on data entry and interpretation

• Common framework for multidatabase queries

• Controlled vocabularies for genome annotation– Riley ontology, GO– EC numbers

Page 8: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

Roles of Ontologies in Bioinformatics

• Information-extraction applications

• Reuse is a core aspect of ontologies– Reuse of existing ontologies faster than designing

new ones– Reuse decreases semantic heterogeneity of DBs

• Schema-driven Software– Knowledge-acquisition tools– Query tools

Page 9: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

Definitions

• Data Model:– Primitive data structuring mechanism in

which an ontology is expressed– Relational data model, object-oriented data

model, frame data model

• Ontology:– Domain specific conceptualization

expressed within some data model

Page 10: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

Components of an Ontology

• Concepts– AKA: Class, Set, Type, Predicate– Gene, Reaction, Macromolecule

• Taxonomy of concepts– Generalization ordering among concepts– Concept A is a parent of concept B iff every

instance of B is also an instance of A– Superset / subset– “A kind of” vs “a part of”

Page 11: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

Taxonomy of Concepts

Page 12: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

Components of an Ontology

– Objects• AKA: Instances, members of the set• trpA Gene, Reaction 1.1.2.4• Strictly speaking, an ontology with instances is a

knowledge base

– Relations and Attributes• AKA: Slots, properties• Product of Gene, Map-Position of Gene

• Reactants of Reaction, Keq of Reaction

– Values• The Product of the trpA Gene is tryptophan-synthetase• trpA.Product = tryptophan-synthetase

Page 13: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

Components of an Ontology• Constraints and other meta information about relations

– Slot Product:– Value type: Poypeptide or RNA– Domain: Genes

– Slot Map-Position:– Value type: Number– Domain: Genes– Cardinality: At-Most 1– Range: 0 <= X <= 100

• General Axioms– Nucleic acids < 20 residues are oligonucleiotides

Page 14: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

More on Concepts

• Primitive: properties are necessary– Globular protein must have hydrophobic

core, but a protein with a hydrophobic core need not be a globular protein

• Defined: properties are necessary + sufficient– Eukaryotic cells must have a nucleus.

Every cell that contains a nucleus must be Eukaryotic.

Page 15: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

Ontology Subtypes Expressiveness

• Controlled vocabulary– List of terms

• Taxonomy– Terms in a generalization hierarchy

• DB schemas (relational and object-oriented)– More implementation specific– No instance information– Limited constraints

• Frame knowledge bases• Description Logics

Page 16: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

Ontology Subtypes • Database schema

– Concepts, relations, constraints– Perhaps no taxonomy– At most hundreds of concepts

• Taxonomy– Concepts, taxonomy, perhaps a few relations– Thousands of concepts

• Knowledge base– Concepts, relations, constraints, objects, values– Hundreds to hundreds of thousands of concepts

and objects

Page 17: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

Ontology Subtypes• Generic (a.k.a. upper, core or reference)

– common high level concepts– “Physical”, “Abstract”, “Structure”, “Substance”– useful for ontology re-use – important when generating or analysing natural

language expressions

• Domain-oriented– domain specific (e.g. E.coli)– domain generalisations (e.g. gene function)

• Task-oriented– task specific (e.g. annotation analysis)– task generalisations (e.g. problem solving)

Page 18: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

Knowledge Representation• Ontology are best delivered in some computable

representation• Variety of choices with different:

– Expressiveness• The range of constructs that can be used to formally,

flexibly, explicitly and accurately describe the ontology

– Ease of use– Computational complexity

• Is the language computable in real time

– Rigour• Satisfiability and consistency of the representation• Systematic enforcement mechanisms

– Unambiguous, clear and well defined semantics• A subclassOf B don’t be fooled by syntax!

Page 19: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

Languages• Vocabularies using natural language

– Hand crafted, flexible but difficult to evolve, maintain and keep consistent, with poor semantics

– Gene Ontology

• Object-based KR: frames– Extensively used, good structuring, intuitive. Semantics

defined by OKBC standard– EcoCyc (uses Ocelot) and RiboWeb (uses Ontolingua)

• Logic-based: Description Logics– Very expressive, model is a set of theories, well defined

semantics– Automatic derived classification taxonomies– Concepts are defined and primitive– Expressivity vs. computational complexity balance– TAMBIS Ontology (uses FaCT)

Page 20: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

Vocabularies: Gene Ontology

• Hand crafted with simple tree-like structures• Position of each concept and its relationships wholly

determined by a person• Flexible but… • Maintenance and consistency preservation difficult and

arduous• Poor semantics• Single hierarchies are limiting

Page 21: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

Description Logics

• Describe knowledge in terms of concepts and relations• Concept defined in terms of other roles and concepts

– Enzyme = protein which catalyses reaction– Reason that enzyme is a kind of protein

• Model built up incrementally and descriptively• Uses logical reasoning to figure out:

– Automatically derived (and evolved) classifications– Consistency -- concept satisfaction

Page 22: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

Frames and Logics• Frames

– Rich set of language constructs– Impose restrictive constraints on how they are combined or

used to define a class– Only support primitive concepts– Taxonomy hand-crafted

• Description logics– Limited set of language constructs– Primitives combined to create defined concepts– Taxonomy for defined concepts established through logical

reasoning– Expressivity vs. computational complexity– Less intuitive

• Ideal: both! Current OIL activity uses a mixture. Logics provide reasoning services for frame schemes.

Page 23: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

Ontology Exchange• To reuse an ontology we need to share it with others in

the community

• Exchanging ontologies requires a language with:– common syntax– clear and explicit shared meaning

• Tools for parsing, delivery, visualising etc

• Exchanging the structure, semantics or conceptualisation?

Page 24: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

Ontology Exchange Languages• XOL eXtensible Ontology Language

– XML markup – Frame based– Rooted in OKBC

– http://www.ai.sri.com/pkarp/xol/ • OIL Ontology Interface LayerOntology Inference Layer

– Gives a semantics to RDF-Schema– http://www.ontoknowledge.org/oil

Frames: modelling primitives,

OKBC

Description Logics: formal semantics & reasoning support

Web languages:XML & RDF based syntax

OIL

Page 25: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

OIL: Ontology Metadata (Dublin Core) Ontology-container

title “macromolecule fragment”creator “robert stevens”subject “macromolecule generic ontology”description “example for a tutorial”description.release “2.0”publisher “R Stevens”type “ontology”formal “pseudo-xml”identifier “http://www.ontoknowledge.org/oil/oil.pdf”

source “http://img.cs.man.ac.uk/stevens/tambis-oil.html”language “OIL”language “en-uk”relation.haspart “http://www.ontoRus.com/bio/mmole.onto”

Page 26: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

The Three Roots of OILFrame-based Systems:Epistemological ModellingPrimitives

Web Languages:XML- and RDF-basedsyntax

Description Logics:Formal Semantics & Reasoning Support

OIL

Page 27: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

OIL primitive ontology definitionsslot-def has-backbone

inverse is-backbone-ofslot-def has-component

inverse is -component-ofproperties transitive

class-def nucleic-acidclass-def rna subclass-of nucleic-acid

slot-constraint has-backbone value-type ribophosphate

class-def ribophosphateclass-def deoxyribophosphate

subclass-of NOT ribophosphate

Page 28: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

OIL defined ontology definitionsclass-def defined dna

subclass-of nucleic-acid AND NOT rnaslot-constraint has-backbone

value-type deoxyribophosphate

class-def defined enzyme subclass-of protein slot-constraint catalyse

has-value reaction

class-def defined kinasesubclass-of protein slot-constraint catalyse

has-value phosphorylation-reaction

Page 29: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

OIL in XML• OIL has a DTD, an XML Schema and a mapping to RDF-Schema. See

web site for details<slot-def>

<slot-name = “has-component”/> <inverse> <slot-name = “is-component-of”/> </inverse> <properties> <transitive/> </properties> </slot-def><class-def> <class-name= “nucleic-acid”/> </class-def> <class-def>

<class-name= “rna”/> <subclass-of> <class name = “nucleic-acid”/> </subclass-of> <slot-constraint>

<slot-name = “has-backbone”/><value-type> <class name= “ribophosphate” </value-type>

</slot-constraint> </class-def>

Page 30: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

OIL Remarks

• Tools:– Protégé II editor– FaCT reasoner

• Other projects:– Semantic Web projects (http://www.semanticweb.org)– Agents for the web projects (e.g. DAML)

A knowledge representation language and inference mechanism for the web

Page 31: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

OIL Features• Based on standard frame languages• Extends expressive power with DL style logical

constructs– Still has frame look and feel– Can still function as a basic frame language

• OIL core language restricted in some respects so as to allow for reasoning support– No constructs with ill defined semantics– No constructs that compromise decidability

• Has both XML and RDF(S) based syntax

Page 32: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

OIL Features

• Semantics clearly defined by mapping to very expressive Description Logic, e.g.:– slot-constraint reverse-transcribe-from has-

valuemRNA or (part-of has-value mRNA) eats.meat eats.fish

• Note the importance of clear semantics: eats.(meat fish)

• is inconsistent (assuming meat and fish are disjoint)• Mapping can also be used to provide reasoning support

from a Description Logic system (e.g., FaCT)

Page 33: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

Why Reasoning Support?• Key feature of OIL core language is availability of reasoning support• Reasoning intended as design support tool

– Check logical consistency of classes– Compute implicit class hierarchy

• May be less important in small local ontologies– Can still be useful tool for design and maintenance– More important with larger ontologies/multiple authors

• Valuable tool for integrating and sharing ontologies– Use definitions/axioms to establish inter-ontology relationships– Check for consistency and (unexpected) implied relationships– Already shown to be useful technique for DB schema

integration

Page 34: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

Classifying by Reasoning

Page 35: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

Finding Inconsistencies

Page 36: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

Changing Classifications

Page 37: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

DAML+OIL

• OIL merged with DAML• Originally retained frame syntax• DAML more concerned with deploymnent rather

than building and managing• OIL mapped to DAML+OIL, but not reliably

reversed• FRAME look and feel may return• Web ontology language

Page 38: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

Building Ontologies

Page 39: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

Building Ontologies

• No field of Ontological Engineering equivalent to Knowledge or Software Engineering;

• No standard methodologies for building ontologies;• Such a methodology would include:

– a set of stages that occur when building ontologies; – guidelines and principles to assist in the different

stages; – an ontology life-cycle which indicates the

relationships among stages.• Gruber's guidelines for constructing ontologies are well

known.

Page 40: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

The Development Lifecycle• Two kinds of complementary methodologies emerged:

– Stage-based, e.g. TOVE [Uschold96] – Iterative evolving prototypes, e.g. MethOntology [Gomez

Perez94]. • Most have TWO stages:

1. Informal stage • ontology is sketched out using either natural language

descriptions or some diagram technique

2. Formal stage • ontology is encoded in a formal knowledge representation

language, that is machine computable

• An ontology should ideally be communicated to people and unambiguously interpreted by software– the informal representation helps the former – the formal representation helps the latter.

Page 41: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

A Provisional Methodology• A skeletal methodology and life-cycle for building

ontologies;• Inspired by the software engineering V-process model;

• The overall process moves through a life-cycle.

The left side charts the processes in building an ontology

The right side charts the guidelines, principles and evaluation used to ‘quality assure’ the ontology

Page 42: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

The V-model Methodology

Conceptualisation

Integrating existing ontologies

Encoding

Representation

Identify purpose and scope

Knowledge acquisition

Evaluation: coverage, verification, granularity

Conceptualisation Principles: commitment, conciseness, clarity, extensibility, coherency

Encoding/Representation principles: encoding bias, consistency, house styles and standards, reasoning system exploitation

Ontology in Use

User Model

Conceptualisation Model

Implementation Model

Page 43: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

The ontology building life-cycleIdentify purpose and scope

Knowledge acquisition

Evaluation

Language and representation

Available development tools

Conceptualisation

Integrating existing ontologiesEncoding

Building

Page 44: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

User Model: Identify purpose and scope

• Decide what applications the ontology will support• EcoCyc: Pathway engineering, qualitative simulation of

metabolism, computer-aided instruction, reference source

• TAMBIS: retrieval across a broad range of bioinformatics resources

• The use to which an ontology is put affects its content and style

• Impacts re-usability of the ontology

Page 45: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

User Model: Knowledge Acquisition

• Specialist biologists; standard text books; research papers and other ontologies and database schema.

• Motivating scenarios and informal competency questions – informal questions the ontology must be able to answer

• Evaluation:– Fitness for purpose– Coverage and competency

Page 46: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

Ontology Scenario

• A molecule ontology;• Describes the molecules stored in bioinformatics

databases and annotated therein;• It should cover the molecules and other

chemicals described in the resources;• The ontology will be used for querying and

annotating information in bioinformatics resources.

Page 47: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

Competency Questions

• Cover the macromolecules found in molecular biology resources and courses;

• Should accommodate various views on the macromolecules;

• should cover the queries people want to ask of macromolecules;

• In reality, need more detail on these questions- “give me tRNA genes with anticodon x, from aardvark”.

Page 48: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

• Find your knowledge!• An important source is your head, but…• Use text books, glossaries (many of which lie on

the web) and domain experts;• Use other ontologies – what did they include and

how did they do it?• Record your sources of knowledge.• Use your competency questions;

Acquiring Knowledge

Page 49: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

Starting Concept List

• Chemicals – atom, ion, molecule, compound, element;• Molecular-compound, ionic-compound, ionic-molecular-

compound, …;• Ionic-macromolecular-compound and ionic-msall-

macromolecular-compound;• Protein, peptide, polyprotein, enzyme, holo-protein, apo-

protein,…• Nucleic acid – DNA, RNA, tRNA, mRna, snRNA, …

Page 50: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

Conceptualisation Model: Conceptualisation

• Identify the key concepts, their properties and the relationships that hold between them; – Which ones are essential?– What information will be required by the

applications?

• Structure domain knowledge into explicit conceptual models.

• Identify natural language terms to refer to such concepts, relations and attributes;

Page 51: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

Conceptualisation SketchChemical

AtomElementCompoundMolecule Ion

MetalNon-Metal

Metaloid

Molecular Compound

Molecular Element

Ionic Compound

Ionic Molecule

Ionic Molecular Compound

Page 52: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

Molecule Conceptualisation Sketch

NucleicAcid

ProteinPolysaccharide

DNA RNAEnzyme

Macromolecule SmallMolecule

Ionic MacromolecularCompound

Starch Glycogen

mRNA tRNA rRNAsnRNA

Peptide

Page 53: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

Conceptualisation Model: Naming

• Determine naming conventions– Consistent naming for classes and slots– EcoCyc:

• Classes are capitalized, hyphenated, plural• Slot names are uppercase

A quality ontology captures relevant biological distinctions with high fidelity

Page 54: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

Conceptualisation Model: Pitfalls

• Pitfall: Missing ontological elements– Missing classes: Swiss-Prot Protein complexes– Lack of Lipid and Cofactor in example ontology– Missing attributes: Genetic code identifier– Confuse 1:1 with 1:Many, or 1:Many with

Many:Many• Cofactor as an attribute of reaction as well as protein

– Important data is stored within text/comment fields

• Pitfall: Extra ontological elements• Pitfall: Stop over-elaborating – when do I stop?• Pitfall: Relevance – do I really need all this detail?

Page 55: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

Conceptualisation: Partonomy

• Part-of relationships very important• Several linds of part-of: component-of, region-of,

mixture-of• Alpha-helix is a region of a protein, but a protein

is compoennt of a complex• Care in placing transitivity

Page 56: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

Integrating Existing Ontologies• Reuse or adapt existing ontologies when possible

– Save time– Correctness– Facilitate interoperation– Reuse GO to give example ontology Function, Process and Location

• Integration of ontologies– Ontologies have to be aligned– Hindered by poor documentation and argumentation– Hindered by implicit assumptions– Shared generic upper level ontologies should make integration easier

Page 57: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

Encoding: Implementation Toolkit

• Construct ontology using an ontology-development system– Does the data model have the right expressivity?

• Is it just a taxonomy or are relationships needed?• Is multiple parentage needed? Inverse relationships?• What types of constraints are needed?

– Are reasoning services needed? – What are authoring features of the development tool?– Can ontology be exported to a DBMS schema?– Can ontology be exported to an ontology exchange language?– Is simultaneous updating by multiple authors needed?– Size limitations of development tool?

Page 58: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

Encoding

Encode sketch in KRL;

• Use OIL – a frame syntax with reasoning support if we want it;

• Wide range of expressivity (see cofactor example later);• Hand craft a hierarchy – implement the sketch made

earlier;• This hand-crafted version can be migrated to a more

descriptive form later.

Page 59: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

Initial Encoding

class-def chemical

subclass-of substance

class-def molecule

subclass-of chemical

class-def compound

subclass-of chemical

class-def molecular-compound

subclass-of molecule and compound

Page 60: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

Encoding: Ontology Implementation

Pitfalls• Pitfall: Semantic ambiguity

– Multiple ways to encode the same knowledge

– Meaning of class definitions unclear

• Pitfall: Encoding Bias– Encoding the ontology changes the

ontology

Page 61: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

Encoding: Ontology Implementation

Pitfalls• Pitfall: Redundancy (lack of normalization)– Exact same information repeated– Presence of computationally derivable information

• Date of birth and age• Sequence length• DNA sequence and reverse complement

– More effort required for entry and update– In KB partial updates lead to inconsistency– OK if redundant information is maintained

automatically

Page 62: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

Encoding: The Interaction Problem

• Task influences what knowledge is represented and how its represented– Molecular biology: chemical and physical

properties of proteins– Bioinformatics: accession number, function gene– Underlying perspectives mean they may not be

reconcilable

• If an ontology has too many conflicting tasks it can end up compromised – TaO experience

Page 63: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

Evaluate it - A guide for reusability

• Conciseness– No redundancy– Appropriateness – protein molecules at the atomic

resolution when amino acid level would do• Clarity• Consistency• Satisfiability – it doesn’t contradict itself• Molecule and Compound disjoint, but molecular-cpound is

(molecule and compound)– Commitment– Do I have to buy into a load of stuff I don’t really

need or want just to get the bit I do?

Page 64: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

Documentation: Make Ontology Understandable!

• Produce clear informal and formal documentation– An ontology that cannot be understood will not be reused– Genbank feature table– NCBI ASN.1 definitions

• There exists a space of alternative ontology design decisions– Semantics / Granularity– Terminology

• Pitfall: Neglecting to record design rationale

Page 65: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

Molecules Revisited

NucleicAcid

ProteinPolysaccharide

DNA RNAEnzyme

Macromolecule SmallMolecule

Ionic MacromolecularCompound

Starch Glycogen

mRNA tRNA rRNAsnRNA

Peptide

Non-Ionic MacromolecularCompound

Page 66: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

More Encoding

class-def chemical

subclass-of substance

class-def defined molecule

subclass-of chemical

Slot-constraint contains-bond min-cardinality 1 has-value covalent-bond

class-def defined compound

subclass-of chemical

Slot-constraint has-atom-types greater-than 1

class-def defined molecular-compound

subclass-of molecule and compound

Page 67: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

Cofactor Knowledge

• Gather knowledge about cofactors, coenzymes and prosthetic groups from glossaries and dictionaries etc.

• Note that definitions are inconsistent and even contradictory.

• Synthesise knowledge and make judgements.

Page 68: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

Encoding Cofactor

Class-def defined cofactor

Subclass-of metal-ion or small-organic-molecule

Slot-constraint binds-to has-value protein

Class-def defined coenzyme

Subclass-of cofactor

Slot-constraint binds-loosley-to has-value protein

Class-def defined prosthetic-group

Subclass-of cofactor and (not metal-ion)

Slot-constraint binds-strongly-to has-value protein

Page 69: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

Cofactor Discussion

• Classifies as a kind of chemical;• Taken from IUPAC definition – document – not a

child of organic-molecule and metal-ion;• Can express both disjunction and negation in

OIL;• Uses a slot hierarchy in describing binds-to.

Page 70: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

More Discussion

• Can we define sufficiency conditions for peptide?

• Mass and length are not easy to use in definition – A protein is > 100 Kda;

• What about a 99 Kda protein;

Page 71: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

Publish the Ontology

• Formal and informal specifications• Intended domain of application• Design rationale• Limitations

• See EcoCyc paper in ISMB-93/Bioinformatics 00• See TAMBIS paper in Bioinformatics 99

Page 72: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

Ontological Pitfalls

• Stop-over – when do I stop over elaborating? – Proteins amino acid residues side

chains physical chemical properties ….

• Relevance– Do we need to mention all the types of

nucleic acid?

Page 73: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

Ontology-Development Tools

Page 74: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

Ontology DevelopmentTools

• Development environments

• Ontology Libraries

• Ontology publishing and exchange• Across all representational forms (logic, frame, etc..)• Web compliant

• Ontology delivery• Ontology servers

Page 75: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

Development Environments– Considerations depend on ontology subtype!

• Expressiveness of data model• Authoring features• DBMS export capabilities• Ontology-exchange language export capabilities• Distributed authoring• Size limitations

– WebOnto– Ontosaurus– GKB Editor– Protégé II– Ontolingua– GRAIL toolkit etc…– Wondertools

Page 76: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

GKB EditorOntology Development Toolkit• Graphical editor for KBs and ontologies• Ontologies stored in Ocelot object-oriented knowledge

base– Expressive, scalable, distributed– EcoCyc ontology contains 1K classes, 15K

instances• Knowledge is graphically portrayed in 3 viewers• All operations are schema driven

• See http://www.ai.sri.com/~gkb/user-man.html

Page 77: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

Ocelot Capabilities• Frame data model

• KBs and ontologies stored in files or Oracle

• Oracle KBs and ontologies:– Better scalability -- frame faulting on demand and in

background– Concurrency control system coordinates changes by multiple

users– Transaction logging (recall operation history)

• GFP API provides programmatic interface

Page 78: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

Distributed Ontology Development

OracleServer

User 1 User 2

User 3 User 4

Internet

Page 79: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

GKB Editor• Taxonomy Viewer

– Create/delete classes and instances– Browse class taxonomy– Alter class/subclass links

• Frame editor– Add/remove slots to/from classes– Create/delete/edit slot values for instances

• Frame relationships viewer– View and update a network of relationships among instances

Page 80: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

Summary• A definition of ontology as a characterisation of

conceptualisation -- capturing the things we know about a domain;

• The knowledge within an ontology can be applied to a variety of tasks;

• Building an ontology -- process and life-cycle;• Influences on the choice of encoding language;• The desirability of tools for the building,

management and exchange of ontologies;

Page 81: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

Final remarks

• The use of ontologies is growing within the bio-molecular world

• They are a high-cost, but high-benefit solution to a variety of problems confronting the bioinformatics community.

Page 82: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

Some References (1)Review• Stevens R., Goble C.A. and Bechhofer, S. Ontology-based Knowledge Representation for Bioinformatics accepted

for Briefings in Bioinformatics

Bio-ontologies & Systems• Karp P. D. An ontology for biological function based on molecularinteractions Bioinformatics 2000;16 269-285• Ashburner et al Gene Ontology: Tool for the Unification of Biology, Nature Genetics Vol 25 pages 25-29• R. Altman, M. Bada, X.J. Chai, M. Whirl Carillo R.O. Chen, and N.F. Abernethy. RiboWeb: An Ontology-Based

System for Collaborative Molecular Biology. IEEE Intelligent Systems, 14(5):68-76, 1999.• P.G. Baker, C.A. Goble, S. Bechhofer, N.W. Paton, R. Stevens, and A Brass. An Ontology for Bioinformatics

Applications. Bioinformatics, 15(6):510-520, 1999.• R.O. Chen, R. Felciano, and R.B. Altman. RiboWeb: Linking Structural Computations to a Knowledge Base of

Published Experimental Data. In Proceedings of the 5th International Conference on Intelligent Systems for Molecular Biology, pages 84-87. AAAI Press, 1997.

• Guarino, N. 1992. Concepts, Attributes and Arbitrary Relations: Some Linguistic and Ontological Criteria for Structuring Knowledge Bases. Data & Knowledge Engineering, 8: 249-261.

• Guarino, N., Carrara, M., and Giaretta, P. 1994a. An Ontology of Meta-Level Categories. In J. Doyle, E. Sandewall and P. Torasso (eds.), Principles of Knowledge Representation and Reasoning: Proceedings of the Fourth International Conference (KR94). Morgan Kaufmann, San Mateo, CA: 270-280.

• P. Karp and S. Paley Integrated Access to Metabolic and Genomic Data Journal of Computational Biology, 3(1):191--212, 1996.

• P. Karp, M. Riley, S. Paley, A. Pellegrini-Toole, and M. Krummenacker. EcoCyc: Electronic Encyclopedia of phE. coli Genes and Metabolism. Nucleic Acids Research, 27(1):55-58, 1999.

• S. Schulze-Kremer. Ontologies for Molecular Biology. In Proceedings of the Third Pacific Symposium on Biocomputing, pages 693-704. AAAI Press, 1998.

• P.G. Baker, A. Brass, S. Bechhofer, C. Goble, N. Paton, and R. Stevens. TAMBIS: Transparent Access to Multiple Bioinformatics Information Sources. An Overview. In Proceedings of the Sixth International Conference on Intelligent Systems for Molecular Biology, pages 25--34. AAAI Press, June 28-July 1, 1998 1998.

Page 83: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

Some References (2)

Ontology development and exchange

• T.R. Gruber. Towards Principles for the Design of Ontologies Used for Knowledge Sharing. In Roberto Poli Nicola Guarino, editor, International Workshop on Formal Ontology, Padova, Italy, 1993. Available as technical report KSL-93-04, Knowledge Systems Laboratory, Stanford University:ftp.ksl.ftanford.edu/pub/KSL_Reports/KSL-983-04.ps.

Page 84: -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, pkarp/ Dr. Robert Stevens & Professor Carole Goble University.

More References (3)

• I. Horrocks, D. Fensel, J. Broekstra, M. Crubezy, S. Decker, M. Erdmann, W. Grosso, C. Goble, F. Van Harmelen, M. Klein, M. Musen, S. Staab, and R. Studer. The ontology interchange language oil: The grease between ontologies. http://www.cs.vu.nl/ dieter/oil.

• R. Jasper and M. Uschold A Framework for Understanding and Classifying Ontology Applications. In Twelfth Workshop on Knowledge Acquisition Modeling and Management KAW'99, 1999.

• M. Uschold and M. Gruninger. Ontologies: Principles, Methods and Applications. Knowledge Engineering Review, 11(2), June

• Guarino, N. and Welty, C. Identity, Unity, and Individuality: Towards a Formal Toolkit for Ontological Analysis, in H.\ Werner (Ed), Proceedings of ECAI-2000: The European Conference on Artificial Intelligence , IOS Press, Amsterdam August, 2000 219--223