Top Banner
Gene Onotology Part 1: what is the GO? http://www.geneontology.org Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics Bar Harbor, ME
48

Gene Onotology Part 1: what is the GO? Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics.

Jan 04, 2016

Download

Documents

Chester Wilcox
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics.

Gene Onotology Part 1: what is the GO?

http://www.geneontology.org

Harold J DrabkinSenior Scientific CuratorThe Jackson Laboratory

Mouse Genome InformaticsBar Harbor, ME

Page 2: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics.

What is the GOThe scope of the GOThe GO Relationships

Using the GO for annotationAnatomy of an annotationEvidence codesqualfiersgene association files

Page 3: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics.

What IS the GO

The Gene Ontology is a dictionary of concepts used to describe the normal properties of a gene product

It has concepts describing molecular functionsIt has concepts describing biological processesIt has concepts describing cellular locations that

the gene products are found in

Page 4: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics.

Gene Ontology

Built for a very specific purpose:“annotation of genes and proteins in

genomic and protein databases”Built to be applicable to any organismFormed to develop a shared language adequate for the annotation of molecular characteristics across organisms; a common language to share knowledge.

Page 5: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics.

The GO

is NOT list of genes or proteinsalthough you might find a synonym as a gene or

protein name does NOT track diseases

although certain disease phenotypes might suggest the function of a gene product or a process that it may participate in

you will not find “tumor suppressor activity/tumor suppression” as GO terms

Page 6: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics.

The Gene Ontology Consortium Started Small

Original GO created in 2000Three databases involved:

FlyBase (Drosophila)MGI (Mouse)SGD (S. cerevisae)

Used immediately

aa

www.geneontology.org

Page 7: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics.

More quickly joined...

Later databases:TAIR (Arabadopsis)TIGR (microbes including prokaryotes)SWISS-PROT (several thousand species inc. human)PSU (P. falciparum)

ZFIN (zebrafish)PAMGO (plant pathogens)

Page 8: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics.

8

Gene Ontology widely adopted

AgBase

Page 9: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics.

Why do we need this?

Page 10: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics.

Tactition Tactile senseTaction

perception of touch ; GO:0050975

Often the same term is referred to differently

Page 11: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics.

Bud initiation?

Of then the same term is used by different communities to mean different things...

Page 12: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics.

More specifically

The GO is not just a flat list of terms

transcription factor activityDNA bindingtranscription regulator activitymembranemitochondrial membraneglycolysisnucleuscytoplasmion transport.....

transcription factor activityDNA bindingtranscription regulator activitymembranemitochondrial membraneglycolysisnucleuscytoplasmion transport.....

Page 13: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics.

is_a

And the terms can have more than one parent!

is_aDNA binding is a type of nucleic acid binding.

Nucleic acid binding is atype of binding.

There are also relationships between them.

Page 14: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics.

Ontology StructureThe Gene Ontology is structured as a hierarchical directed acyclic graph (DAG)

Terms can have more than one parent and zero, one or more children

Terms are linked by three relationshipsis-apart-ofregulates (new)

negatively regulatespositively regulates is_a part_of

Page 15: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics.

Ontology Structurecell

membrane chloroplast

mitochondrial chloroplastmembrane membrane

is-apart-of

Page 16: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics.

http://www.ebi.ac.uk/ego

It gets complicated quickly

Page 17: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics.

Molecular Function = elemental activity/task

the tasks performed by individual gene products; examples are carbohydrate binding and ATPase activity

Biological Process = biological goal or objectivebroad biological goals, such as mitosis or purine metabolism, that are accomplished by ordered assemblies of molecular functions

Cellular Component = location or complexsubcellular structures, locations, and macromolecular complexes; examples include nucleus, telomere, and RNA polymerase II holoenzyme

The 3 Gene Ontologies

Page 18: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics.

Cellular Component where a gene product acts

Page 19: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics.

Molecular Functionactivities or “jobs” of a gene product

glucose-6-phosphate

isomerase activity insulin binding

insulin receptor activity

A gene product may have several functions; a function term refers to a single reaction or activity, not a gene product.

Sets of functions make up a biological process.

Page 20: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics.

Biological Process

gluconeogenesis

cell division

limb development

a commonly recognized series of events

Page 21: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics.

Mitochondrial P450 (CC24 PR01238; MITP450CC24)

An example…

Page 22: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics.

Anatomy of a GO term

A GO term obo format stanza

begins with [Term] and minimally has

id:

name:

namespace

def

one or more relationships

Page 23: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics.

More GO Term Stanzas

Page 24: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics.

24

The Regulates Relationship

Page 25: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics.

In the Beginning There Were Two Relationships

Is_a: denotes a subtype of its parent.Part_of: denotes a portion of a parent

Is_part: If it exists, it is always a part of its parent (this is the relationship we use).

Has_part: If there is a parent, then it has this as a part of it.

Page 26: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics.

We made the regulation of something a part_of the something

But it’s not really part_of

Page 27: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics.

So, what’s the issue with regulates?

Regulation is not always an inherent part of the process that it regulates

A speed-bump regulates the velocityof my car

50 mph 5 mph

Page 28: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics.

We needed a better way to express ‘regulates’

We defined regulation as “any process that modulates the frequency, rate or extent of

something.

Something can be:

• A Biological Process• A Molecular Function• A Biological Quality

Page 29: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics.

A ‘decomposed’ Term

[Term] id: GO:0000019 name: regulation of mitotic recombination namespace: biological_process def: "Any process that modulates the frequency, rate or extent of DNA recombination during mitosis." [GOC:go_curators] synonym: "regulation of recombination within rDNA repeats" NARROW [] is_a: GO:0000018 ! regulation of DNA recombination intersection_of: GO:0065007 ! biological regulation intersection_of: regulates GO:0006312 ! mitotic recombination relationship: regulates GO:0006312 ! mitotic recombination

The intersection tags make up the logical definition. This places the ‘regulation’term in the context of mitotic recombination.

Page 30: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics.

The context of mitotic recombination

Page 31: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics.

Old regulation of mitotic recombination’ part of the graph on top of ‘mitotic recombination’

Page 32: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics.

Now

regulates

regulates

regulates

Page 33: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics.

What does this buy us?

The new relationship portrays the biology more accurately than part_of

RegulatesPositively rgulatesNegatively regulates

The new logical definitions allow automated consistency checks as the ontology is developed.The first implementation of cross-products in GOSets the stage for:

Molecular function -> biological processCell type -> biological processChebi -> biological process

Page 34: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics.
Page 35: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics.

On March 18th 2008)[Term] id: GO:0000019 name: regulation of mitotic recombination namespace: biological_process def: "Any process that modulates the frequency, rate or extent of DNA recombination during mitosis." [GOC:go_curators] narrow_synonym: "regulation of recombination within rDNA repeats" [] is_a: GO:0000018 ! regulation of DNA recombination relationship: part_of GO:0006312 ! mitotic recombination

[Term] id: GO:0000019 name: regulation of mitotic recombination namespace: biological_process def: "Any process that modulates the frequency, rate or extent of DNA recombination during mitosis." [GOC:go_curators] synonym: "regulation of recombination within rDNA repeats" NARROW [] is_a: GO:0000018 ! regulation of DNA recombination intersection_of: GO:0065007 ! biological regulation intersection_of: regulates GO:0006312 ! mitotic recombination relationship: regulates GO:0006312 ! mitotic recombination

Page 36: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics.

Evolution of GOGO term development was annotation-driven

Development directed by use: Terms added as new species annotatedTerms added on as as-needed basis

Developed by an international consortium of biologists and computer scientists

members from individual databasescentral office at EBI

Development involves collaboration with domain experts from different biological fields

also formal ontologists

Page 37: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics.

Important Consideration for Users

The GO changes dailynew terms addedadditional relationships addedterms removed: obsoletes terms

Page 38: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics.

GO Slims

Page 39: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics.

What is a GO Slim

A GO Slim is a smaller slice of the GO that can be used to “bin” data into categories relevant to the user's experiment

Why use this?

you want to group several sections of the GO into a single broader category

you want to remove sections that are totally irrelevant for your assay (eg, photosynthetic processes irrelevant for birds).

Page 40: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics.

Several GO Slims are referenced in the gene_ontology.obo file

Section of OboEdit showing GO slims built into theontology

Page 41: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics.

But you can build your own

In OboEdit, select the Category Manager (under Metadata)

Use “add” to add a new one; I am adding one for translation

Page 42: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics.

Now I browse through the GO, selecting terms and checkingthem in thecatagories box. Make sure you “commit” (save) each selected term.Note, the children of a term are not automatically selected.You need to decide.

After saving in the category manager, the new slimappears in the category list

Page 43: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics.

Checking the “filter terms” box during save will allow you to save just your slim to a new file

Page 44: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics.

Now you can use THIS obo in various binning tools such asGO term finder, Vlad, GO Slimmer, rather than the entire GO

Page 45: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics.

GO Slimmer tool is part of AmiGO

Page 46: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics.

You cans specify yourgenes in a number of ways

You can filter on species and evidence code

you can input or choose a GO slim

Page 47: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics.

You can also select various output options

The gene product counts and a tab-delimited file are great formaking pie or bar charts in Excel!

Page 48: Gene Onotology Part 1: what is the GO?  Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics.

Visit

http://www.geneontology.org

and

http://www.godatabase.org

for more GO Slim help