GO terms implicitly refer to other term cysteine biosynthesis myoblast fusion hydrogen ion transporter activity snoRNA catabolism wing disc pattern formation.

Post on 05-Jan-2016

216 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

Transcript

GO terms implicitly refer to other term

• cysteine biosynthesis • myoblast fusion• hydrogen ion transporter activity• snoRNA catabolism• wing disc pattern formation• epidermal cell differentiation• regulation of flower development• interleukin-18 receptor complex• B-cell differentiation• dorsal ectoderm

biosynthesisis_ametabolism

cysteineis_aserine family amino acidis_aamino acidis_aamine

cysteineis_aserine family amino acidis_aamino acidis_aserine

Composed terms currently cause problems

– No link to external ontology term– Redundancy– Inconsistency– Extra work– Annotation bottleneck– Tangled DAGs and confusing displays

• we have no way to disentangle

• Solution so far:– fix errors based on results of term name

parsing (Obol)• reactive, not proactive

Solution: actively manage composed terms

• Explicit pre-coordination– Composed terms should now/soon be

coordinated using oboedit plugin• building block terms are recorded in ontology along

with composite term

• Benefits:– Correct DAG structure can be inferred from

external ontologies• e.g. make sure GO + CHEBI “align”

– placement & consistency checking automated– additional work can be automated

• synonyms, text definitions

How will terms be pre-coordinated by oboedit?

• How do we record a definition for a composite term?– using a logical definition (computational essence)

• A logical definition consists of:– a generic term (aka genus)– relationships to other terms which serve to

discriminate this specific term from other is_a children of the generic term (aka differentiae)

• Can be written in natural language as:– A <generic term> which <discriminating

characteristics>

Example of pre-coordination

• cysteine biosynthesis• generic term:

– biosynthesis

• discriminating characteristics:– outputs cysteine

– natural language (Aristotelian style):• a biosynthesis process which outputs

cysteine

Example in Obo format

[Term]id: GO:0019344name: cysteine biosynthesisintersection_of: GO:0009058 ! biosynthesisintersection_of: outputs CHEBI:15356 ! cysteineis_a: GO:0009070 ! serine family amino acid biosynthesisis_a: GO:0006534 ! cysteine metabolism

Alternate syntax

• used in pheno-syntax• more compact• similar to OWL abstract syntax• I use Obo1.2 format or natural language in the rest of this presentation

GO:cysteine_biosynthesis == GO:biosynthesis ∏ outputs(CHEBI:cysteine)

This allows us to dynamically untangle

• Process axis view (primary is_as, via generic term):– biological_process

• metabolism– biosynthesis

» cysteine biosynthesis

• Process participant axis view:– amine

• amino acid– serine family amino acid

» cysteine

• Combined view– (same as current tangled diamond lattice)

Obol demo

• http://yuri.lbl.gov/amigo/obol

Recording the relationship is important

• Why not just a simple cross-product?– e.g. biosynthesis x cysteine

• Relationships are important for reasoning and querying– Consider:

• cysteine biosynthesis from serine• mRNA export from nucleus during heat stress

• Without the relations, the logical definition is not specific enough– the essence is not captured

• Relations should come from RO– more required

Multiple discriminating characteristics are allowed• Cysteine biosynthesis from serine– Generic term:

• biosynthesis

– Discriminating characteristics:• output cysteine• input serine

[Term]name: cysteine biosynthesis from serineintersection_of: GO:0009058 ! biosynthesisintersection_of: outputs CHEBI:15356 ! cysteineintersection_of: input CHEBI:17822 ! serine

Composite terms can be nested

[Term]id: GO:xxxxxxxname: regulation of cysteine biosynthesisintersection_of: GO:0050789 ! regulation of biological processintersection_of: regulates GO:0019344 ! cysteine biosynthesis

[Term]id: GO:0019344name: cysteine biosynthesisintersection_of: GO:0009058 ! biosynthesisintersection_of: outputs CHEBI:15356 ! cysteine

regulation^regulates(biosynthesis^outputs(cysteine))regulation^regulates(biosynthesis)^outputs(cysteine)

YES

NO

Composite terms can optionally be

manufactured in bulk• Generic term:

{metabolism,biosynthesis}• Differentia: has_output {serine,

cysteine, …}• With caution…

– Sparse vs dense matrices– not all combinations are types

On the importance of necessary and sufficient

conditions• Why intersection_of?• Why not just make normal links in

the GO DAG?– normal relationships are for

necessary conditions only– we want both necessary and

sufficient conditions • captures the essence of the term

Normal DAG links only capture necessary

conditions, not essence

immune cellactivation

inflammatoryresponse

part_ofA change in morphology and behavior of a macrophage resulting from exposure to a cytokine, chemokine, cellular ligand, pathogen, or soluble factor

text def:macrophage

activation

is_a

Indistinguishable by DAG

immune cellactivation

inflammatoryresponse

part_ofA change in morphology and behavior of a monocyte resulting from exposure to a cytokine, chemokine, cellular ligand, pathogen, or soluble factor

text def:monocyteactivation

is_a

essence captured by genus-differentia

macrophageactivation

immune cellactivation

is_ainflammatory

response

part_of

id: GO:macrophage_activationintersection_of: GO:cell_activationintersection_of: activates CL:macrophage

essence captured by genus-differentia

macrophageactivation

immune cellactivation

is_a

inflammatoryresponse

part_of

id: GO:macrophage_activationintersection_of: GO:cell_activationintersection_of: activates CL:macrophage

CL:macrophage

cellactivation is_a

genus

activates

Current status of pre-coordinated terms

• SO already contains composite terms– 46 pre-coordinated terms– A silenced gene is a gene which has the

quality of being silenced

• GO-BP/CL integration underway– retrospectively pre-coordinated terms

• Obol page has pre-coordinated terms from automatic parsing– http://www.fruitfly.org/~cjm/obol

Pre- vs post- coordinated

• Pre-coordination– terms are in ontology with IDs and

computable definitions– increases complexity of ontology– complexity can be managed by tools

• e.g. new oboedit features

• Post-coordination– terms are combined in the database– forces more complexity in database schema

and database applications

Pre-coordination is useful in moderation

• Commonly used terms should be pre-coordinated

• eg cysteine biosynthesis; oocyte differentiation; pectoral fin

• Avoid taking to extremes• cf ICD-9

• Where do we draw the line?– ontologies should be built around one or a few

axes of classification• term ‘explosion’ typically gets large when multiple axes

are combined

– we can change our minds later• pre- and post- coordination is commensurable

Commensurability

• Annotator annotates to– nucleus^part_of(astrocyte)

• Anatomy editor creates new term– uses oboedit cross-product plugin– astrocyte_nucleus = nucleus^part_of(astrocyte)

• Annotation can be dynamically ‘promoted’ to new term in answer to queries– various software techniques for achieving this

Post-coordination in GO annotations

• Pre- and post- coordination are compatible and commensurable

• We should extend the annotation format to allow denoting more specific classes– e.g.

• cholesterol transport in liver

– advanced applications can query this– standard applications suffer no loss– extended annotations can be used to help seed new

terms in the ontology

• This is already being done (MGI,Dicty)– we just want to capture this in interopeable way

Post-composition in gene association files

• New column in GA file format

Gene Product

Term ID … Properties

AABC1 GO:0030301(cholesterol transport)

located_in(MA:liver)

AABC2 GO:0048663(neuron fate development)

has_participant(FBbt:Y_neuron)

Database issues

• Chado and GO DB can handle pre- and post- coordination– in theory anyway

• not yet fully tested

• How does it work?– ‘anonymous term’ created for

coordinated term– documentation in chado cvs

• chado/modules/cv/doc/

top related