Top Banner
30

GO terms implicitly refer to other term cysteine biosynthesis myoblast fusion hydrogen ion transporter activity snoRNA catabolism wing disc pattern formation.

Jan 05, 2016

Download

Documents

Thomas Lyons
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: GO terms implicitly refer to other term cysteine biosynthesis myoblast fusion hydrogen ion transporter activity snoRNA catabolism wing disc pattern formation.
Page 2: GO terms implicitly refer to other term cysteine biosynthesis myoblast fusion hydrogen ion transporter activity snoRNA catabolism wing disc pattern formation.

GO terms implicitly refer to other term

• cysteine biosynthesis • myoblast fusion• hydrogen ion transporter activity• snoRNA catabolism• wing disc pattern formation• epidermal cell differentiation• regulation of flower development• interleukin-18 receptor complex• B-cell differentiation• dorsal ectoderm

Page 3: GO terms implicitly refer to other term cysteine biosynthesis myoblast fusion hydrogen ion transporter activity snoRNA catabolism wing disc pattern formation.
Page 4: GO terms implicitly refer to other term cysteine biosynthesis myoblast fusion hydrogen ion transporter activity snoRNA catabolism wing disc pattern formation.

biosynthesisis_ametabolism

Page 5: GO terms implicitly refer to other term cysteine biosynthesis myoblast fusion hydrogen ion transporter activity snoRNA catabolism wing disc pattern formation.

cysteineis_aserine family amino acidis_aamino acidis_aamine

Page 6: GO terms implicitly refer to other term cysteine biosynthesis myoblast fusion hydrogen ion transporter activity snoRNA catabolism wing disc pattern formation.

cysteineis_aserine family amino acidis_aamino acidis_aserine

Page 7: GO terms implicitly refer to other term cysteine biosynthesis myoblast fusion hydrogen ion transporter activity snoRNA catabolism wing disc pattern formation.

Composed terms currently cause problems

– No link to external ontology term– Redundancy– Inconsistency– Extra work– Annotation bottleneck– Tangled DAGs and confusing displays

• we have no way to disentangle

• Solution so far:– fix errors based on results of term name

parsing (Obol)• reactive, not proactive

Page 8: GO terms implicitly refer to other term cysteine biosynthesis myoblast fusion hydrogen ion transporter activity snoRNA catabolism wing disc pattern formation.

Solution: actively manage composed terms

• Explicit pre-coordination– Composed terms should now/soon be

coordinated using oboedit plugin• building block terms are recorded in ontology along

with composite term

• Benefits:– Correct DAG structure can be inferred from

external ontologies• e.g. make sure GO + CHEBI “align”

– placement & consistency checking automated– additional work can be automated

• synonyms, text definitions

Page 9: GO terms implicitly refer to other term cysteine biosynthesis myoblast fusion hydrogen ion transporter activity snoRNA catabolism wing disc pattern formation.

How will terms be pre-coordinated by oboedit?

• How do we record a definition for a composite term?– using a logical definition (computational essence)

• A logical definition consists of:– a generic term (aka genus)– relationships to other terms which serve to

discriminate this specific term from other is_a children of the generic term (aka differentiae)

• Can be written in natural language as:– A <generic term> which <discriminating

characteristics>

Page 10: GO terms implicitly refer to other term cysteine biosynthesis myoblast fusion hydrogen ion transporter activity snoRNA catabolism wing disc pattern formation.

Example of pre-coordination

• cysteine biosynthesis• generic term:

– biosynthesis

• discriminating characteristics:– outputs cysteine

– natural language (Aristotelian style):• a biosynthesis process which outputs

cysteine

Page 11: GO terms implicitly refer to other term cysteine biosynthesis myoblast fusion hydrogen ion transporter activity snoRNA catabolism wing disc pattern formation.

Example in Obo format

[Term]id: GO:0019344name: cysteine biosynthesisintersection_of: GO:0009058 ! biosynthesisintersection_of: outputs CHEBI:15356 ! cysteineis_a: GO:0009070 ! serine family amino acid biosynthesisis_a: GO:0006534 ! cysteine metabolism

Page 12: GO terms implicitly refer to other term cysteine biosynthesis myoblast fusion hydrogen ion transporter activity snoRNA catabolism wing disc pattern formation.

Alternate syntax

• used in pheno-syntax• more compact• similar to OWL abstract syntax• I use Obo1.2 format or natural language in the rest of this presentation

GO:cysteine_biosynthesis == GO:biosynthesis ∏ outputs(CHEBI:cysteine)

Page 13: GO terms implicitly refer to other term cysteine biosynthesis myoblast fusion hydrogen ion transporter activity snoRNA catabolism wing disc pattern formation.

This allows us to dynamically untangle

• Process axis view (primary is_as, via generic term):– biological_process

• metabolism– biosynthesis

» cysteine biosynthesis

• Process participant axis view:– amine

• amino acid– serine family amino acid

» cysteine

• Combined view– (same as current tangled diamond lattice)

Page 14: GO terms implicitly refer to other term cysteine biosynthesis myoblast fusion hydrogen ion transporter activity snoRNA catabolism wing disc pattern formation.

Obol demo

• http://yuri.lbl.gov/amigo/obol

Page 15: GO terms implicitly refer to other term cysteine biosynthesis myoblast fusion hydrogen ion transporter activity snoRNA catabolism wing disc pattern formation.

Recording the relationship is important

• Why not just a simple cross-product?– e.g. biosynthesis x cysteine

• Relationships are important for reasoning and querying– Consider:

• cysteine biosynthesis from serine• mRNA export from nucleus during heat stress

• Without the relations, the logical definition is not specific enough– the essence is not captured

• Relations should come from RO– more required

Page 16: GO terms implicitly refer to other term cysteine biosynthesis myoblast fusion hydrogen ion transporter activity snoRNA catabolism wing disc pattern formation.

Multiple discriminating characteristics are allowed• Cysteine biosynthesis from serine– Generic term:

• biosynthesis

– Discriminating characteristics:• output cysteine• input serine

[Term]name: cysteine biosynthesis from serineintersection_of: GO:0009058 ! biosynthesisintersection_of: outputs CHEBI:15356 ! cysteineintersection_of: input CHEBI:17822 ! serine

Page 17: GO terms implicitly refer to other term cysteine biosynthesis myoblast fusion hydrogen ion transporter activity snoRNA catabolism wing disc pattern formation.

Composite terms can be nested

[Term]id: GO:xxxxxxxname: regulation of cysteine biosynthesisintersection_of: GO:0050789 ! regulation of biological processintersection_of: regulates GO:0019344 ! cysteine biosynthesis

[Term]id: GO:0019344name: cysteine biosynthesisintersection_of: GO:0009058 ! biosynthesisintersection_of: outputs CHEBI:15356 ! cysteine

regulation^regulates(biosynthesis^outputs(cysteine))regulation^regulates(biosynthesis)^outputs(cysteine)

YES

NO

Page 18: GO terms implicitly refer to other term cysteine biosynthesis myoblast fusion hydrogen ion transporter activity snoRNA catabolism wing disc pattern formation.

Composite terms can optionally be

manufactured in bulk• Generic term:

{metabolism,biosynthesis}• Differentia: has_output {serine,

cysteine, …}• With caution…

– Sparse vs dense matrices– not all combinations are types

Page 19: GO terms implicitly refer to other term cysteine biosynthesis myoblast fusion hydrogen ion transporter activity snoRNA catabolism wing disc pattern formation.

On the importance of necessary and sufficient

conditions• Why intersection_of?• Why not just make normal links in

the GO DAG?– normal relationships are for

necessary conditions only– we want both necessary and

sufficient conditions • captures the essence of the term

Page 20: GO terms implicitly refer to other term cysteine biosynthesis myoblast fusion hydrogen ion transporter activity snoRNA catabolism wing disc pattern formation.

Normal DAG links only capture necessary

conditions, not essence

immune cellactivation

inflammatoryresponse

part_ofA change in morphology and behavior of a macrophage resulting from exposure to a cytokine, chemokine, cellular ligand, pathogen, or soluble factor

text def:macrophage

activation

is_a

Page 21: GO terms implicitly refer to other term cysteine biosynthesis myoblast fusion hydrogen ion transporter activity snoRNA catabolism wing disc pattern formation.

Indistinguishable by DAG

immune cellactivation

inflammatoryresponse

part_ofA change in morphology and behavior of a monocyte resulting from exposure to a cytokine, chemokine, cellular ligand, pathogen, or soluble factor

text def:monocyteactivation

is_a

Page 22: GO terms implicitly refer to other term cysteine biosynthesis myoblast fusion hydrogen ion transporter activity snoRNA catabolism wing disc pattern formation.

essence captured by genus-differentia

macrophageactivation

immune cellactivation

is_ainflammatory

response

part_of

id: GO:macrophage_activationintersection_of: GO:cell_activationintersection_of: activates CL:macrophage

Page 23: GO terms implicitly refer to other term cysteine biosynthesis myoblast fusion hydrogen ion transporter activity snoRNA catabolism wing disc pattern formation.

essence captured by genus-differentia

macrophageactivation

immune cellactivation

is_a

inflammatoryresponse

part_of

id: GO:macrophage_activationintersection_of: GO:cell_activationintersection_of: activates CL:macrophage

CL:macrophage

cellactivation is_a

genus

activates

Page 24: GO terms implicitly refer to other term cysteine biosynthesis myoblast fusion hydrogen ion transporter activity snoRNA catabolism wing disc pattern formation.

Current status of pre-coordinated terms

• SO already contains composite terms– 46 pre-coordinated terms– A silenced gene is a gene which has the

quality of being silenced

• GO-BP/CL integration underway– retrospectively pre-coordinated terms

• Obol page has pre-coordinated terms from automatic parsing– http://www.fruitfly.org/~cjm/obol

Page 25: GO terms implicitly refer to other term cysteine biosynthesis myoblast fusion hydrogen ion transporter activity snoRNA catabolism wing disc pattern formation.

Pre- vs post- coordinated

• Pre-coordination– terms are in ontology with IDs and

computable definitions– increases complexity of ontology– complexity can be managed by tools

• e.g. new oboedit features

• Post-coordination– terms are combined in the database– forces more complexity in database schema

and database applications

Page 26: GO terms implicitly refer to other term cysteine biosynthesis myoblast fusion hydrogen ion transporter activity snoRNA catabolism wing disc pattern formation.

Pre-coordination is useful in moderation

• Commonly used terms should be pre-coordinated

• eg cysteine biosynthesis; oocyte differentiation; pectoral fin

• Avoid taking to extremes• cf ICD-9

• Where do we draw the line?– ontologies should be built around one or a few

axes of classification• term ‘explosion’ typically gets large when multiple axes

are combined

– we can change our minds later• pre- and post- coordination is commensurable

Page 27: GO terms implicitly refer to other term cysteine biosynthesis myoblast fusion hydrogen ion transporter activity snoRNA catabolism wing disc pattern formation.

Commensurability

• Annotator annotates to– nucleus^part_of(astrocyte)

• Anatomy editor creates new term– uses oboedit cross-product plugin– astrocyte_nucleus = nucleus^part_of(astrocyte)

• Annotation can be dynamically ‘promoted’ to new term in answer to queries– various software techniques for achieving this

Page 28: GO terms implicitly refer to other term cysteine biosynthesis myoblast fusion hydrogen ion transporter activity snoRNA catabolism wing disc pattern formation.

Post-coordination in GO annotations

• Pre- and post- coordination are compatible and commensurable

• We should extend the annotation format to allow denoting more specific classes– e.g.

• cholesterol transport in liver

– advanced applications can query this– standard applications suffer no loss– extended annotations can be used to help seed new

terms in the ontology

• This is already being done (MGI,Dicty)– we just want to capture this in interopeable way

Page 29: GO terms implicitly refer to other term cysteine biosynthesis myoblast fusion hydrogen ion transporter activity snoRNA catabolism wing disc pattern formation.

Post-composition in gene association files

• New column in GA file format

Gene Product

Term ID … Properties

AABC1 GO:0030301(cholesterol transport)

located_in(MA:liver)

AABC2 GO:0048663(neuron fate development)

has_participant(FBbt:Y_neuron)

Page 30: GO terms implicitly refer to other term cysteine biosynthesis myoblast fusion hydrogen ion transporter activity snoRNA catabolism wing disc pattern formation.

Database issues

• Chado and GO DB can handle pre- and post- coordination– in theory anyway

• not yet fully tested

• How does it work?– ‘anonymous term’ created for

coordinated term– documentation in chado cvs

• chado/modules/cv/doc/