Gene Onotology Part 1: what is the GO? http://www.geneontology.org Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics Bar Harbor, ME
Jan 04, 2016
Gene Onotology Part 1: what is the GO?
http://www.geneontology.org
Harold J DrabkinSenior Scientific CuratorThe Jackson Laboratory
Mouse Genome InformaticsBar Harbor, ME
What is the GOThe scope of the GOThe GO Relationships
Using the GO for annotationAnatomy of an annotationEvidence codesqualfiersgene association files
What IS the GO
The Gene Ontology is a dictionary of concepts used to describe the normal properties of a gene product
It has concepts describing molecular functionsIt has concepts describing biological processesIt has concepts describing cellular locations that
the gene products are found in
Gene Ontology
Built for a very specific purpose:“annotation of genes and proteins in
genomic and protein databases”Built to be applicable to any organismFormed to develop a shared language adequate for the annotation of molecular characteristics across organisms; a common language to share knowledge.
The GO
is NOT list of genes or proteinsalthough you might find a synonym as a gene or
protein name does NOT track diseases
although certain disease phenotypes might suggest the function of a gene product or a process that it may participate in
you will not find “tumor suppressor activity/tumor suppression” as GO terms
The Gene Ontology Consortium Started Small
Original GO created in 2000Three databases involved:
FlyBase (Drosophila)MGI (Mouse)SGD (S. cerevisae)
Used immediately
aa
www.geneontology.org
More quickly joined...
Later databases:TAIR (Arabadopsis)TIGR (microbes including prokaryotes)SWISS-PROT (several thousand species inc. human)PSU (P. falciparum)
ZFIN (zebrafish)PAMGO (plant pathogens)
8
Gene Ontology widely adopted
AgBase
Why do we need this?
Tactition Tactile senseTaction
perception of touch ; GO:0050975
Often the same term is referred to differently
Bud initiation?
Of then the same term is used by different communities to mean different things...
More specifically
The GO is not just a flat list of terms
transcription factor activityDNA bindingtranscription regulator activitymembranemitochondrial membraneglycolysisnucleuscytoplasmion transport.....
transcription factor activityDNA bindingtranscription regulator activitymembranemitochondrial membraneglycolysisnucleuscytoplasmion transport.....
is_a
And the terms can have more than one parent!
is_aDNA binding is a type of nucleic acid binding.
Nucleic acid binding is atype of binding.
There are also relationships between them.
Ontology StructureThe Gene Ontology is structured as a hierarchical directed acyclic graph (DAG)
Terms can have more than one parent and zero, one or more children
Terms are linked by three relationshipsis-apart-ofregulates (new)
negatively regulatespositively regulates is_a part_of
Ontology Structurecell
membrane chloroplast
mitochondrial chloroplastmembrane membrane
is-apart-of
http://www.ebi.ac.uk/ego
It gets complicated quickly
Molecular Function = elemental activity/task
the tasks performed by individual gene products; examples are carbohydrate binding and ATPase activity
Biological Process = biological goal or objectivebroad biological goals, such as mitosis or purine metabolism, that are accomplished by ordered assemblies of molecular functions
Cellular Component = location or complexsubcellular structures, locations, and macromolecular complexes; examples include nucleus, telomere, and RNA polymerase II holoenzyme
The 3 Gene Ontologies
Cellular Component where a gene product acts
Molecular Functionactivities or “jobs” of a gene product
glucose-6-phosphate
isomerase activity insulin binding
insulin receptor activity
A gene product may have several functions; a function term refers to a single reaction or activity, not a gene product.
Sets of functions make up a biological process.
Biological Process
gluconeogenesis
cell division
limb development
a commonly recognized series of events
Mitochondrial P450 (CC24 PR01238; MITP450CC24)
An example…
Anatomy of a GO term
A GO term obo format stanza
begins with [Term] and minimally has
id:
name:
namespace
def
one or more relationships
More GO Term Stanzas
24
The Regulates Relationship
In the Beginning There Were Two Relationships
Is_a: denotes a subtype of its parent.Part_of: denotes a portion of a parent
Is_part: If it exists, it is always a part of its parent (this is the relationship we use).
Has_part: If there is a parent, then it has this as a part of it.
We made the regulation of something a part_of the something
But it’s not really part_of
So, what’s the issue with regulates?
Regulation is not always an inherent part of the process that it regulates
A speed-bump regulates the velocityof my car
50 mph 5 mph
We needed a better way to express ‘regulates’
We defined regulation as “any process that modulates the frequency, rate or extent of
something.
Something can be:
• A Biological Process• A Molecular Function• A Biological Quality
A ‘decomposed’ Term
[Term] id: GO:0000019 name: regulation of mitotic recombination namespace: biological_process def: "Any process that modulates the frequency, rate or extent of DNA recombination during mitosis." [GOC:go_curators] synonym: "regulation of recombination within rDNA repeats" NARROW [] is_a: GO:0000018 ! regulation of DNA recombination intersection_of: GO:0065007 ! biological regulation intersection_of: regulates GO:0006312 ! mitotic recombination relationship: regulates GO:0006312 ! mitotic recombination
The intersection tags make up the logical definition. This places the ‘regulation’term in the context of mitotic recombination.
The context of mitotic recombination
Old regulation of mitotic recombination’ part of the graph on top of ‘mitotic recombination’
Now
regulates
regulates
regulates
What does this buy us?
The new relationship portrays the biology more accurately than part_of
RegulatesPositively rgulatesNegatively regulates
The new logical definitions allow automated consistency checks as the ontology is developed.The first implementation of cross-products in GOSets the stage for:
Molecular function -> biological processCell type -> biological processChebi -> biological process
On March 18th 2008)[Term] id: GO:0000019 name: regulation of mitotic recombination namespace: biological_process def: "Any process that modulates the frequency, rate or extent of DNA recombination during mitosis." [GOC:go_curators] narrow_synonym: "regulation of recombination within rDNA repeats" [] is_a: GO:0000018 ! regulation of DNA recombination relationship: part_of GO:0006312 ! mitotic recombination
[Term] id: GO:0000019 name: regulation of mitotic recombination namespace: biological_process def: "Any process that modulates the frequency, rate or extent of DNA recombination during mitosis." [GOC:go_curators] synonym: "regulation of recombination within rDNA repeats" NARROW [] is_a: GO:0000018 ! regulation of DNA recombination intersection_of: GO:0065007 ! biological regulation intersection_of: regulates GO:0006312 ! mitotic recombination relationship: regulates GO:0006312 ! mitotic recombination
Evolution of GOGO term development was annotation-driven
Development directed by use: Terms added as new species annotatedTerms added on as as-needed basis
Developed by an international consortium of biologists and computer scientists
members from individual databasescentral office at EBI
Development involves collaboration with domain experts from different biological fields
also formal ontologists
Important Consideration for Users
The GO changes dailynew terms addedadditional relationships addedterms removed: obsoletes terms
GO Slims
What is a GO Slim
A GO Slim is a smaller slice of the GO that can be used to “bin” data into categories relevant to the user's experiment
Why use this?
you want to group several sections of the GO into a single broader category
you want to remove sections that are totally irrelevant for your assay (eg, photosynthetic processes irrelevant for birds).
Several GO Slims are referenced in the gene_ontology.obo file
Section of OboEdit showing GO slims built into theontology
But you can build your own
In OboEdit, select the Category Manager (under Metadata)
Use “add” to add a new one; I am adding one for translation
Now I browse through the GO, selecting terms and checkingthem in thecatagories box. Make sure you “commit” (save) each selected term.Note, the children of a term are not automatically selected.You need to decide.
After saving in the category manager, the new slimappears in the category list
Checking the “filter terms” box during save will allow you to save just your slim to a new file
Now you can use THIS obo in various binning tools such asGO term finder, Vlad, GO Slimmer, rather than the entire GO
GO Slimmer tool is part of AmiGO
You cans specify yourgenes in a number of ways
You can filter on species and evidence code
you can input or choose a GO slim
You can also select various output options
The gene product counts and a tab-delimited file are great formaking pie or bar charts in Excel!
Visit
http://www.geneontology.org
and
http://www.godatabase.org
for more GO Slim help