25th June 2007 Jane Lomax Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI
Dec 14, 2015
25th June 2007 Jane Lomax
Using the Gene Ontology (GO) for analysis of
expression data
Jane LomaxEMBL-EBI
25th June 2007 Jane Lomax
What is the Gene Ontology?
• Set of standard biological phrases (terms) which are applied to genes/proteins:– protein kinase– apoptosis– membrane
25th June 2007 Jane Lomax
What is the Gene Ontology?
• Genes are linked, or associated, with GO terms by trained curators at genome databases– known as ‘gene associations’ or GO
annotations
• Some GO annotations created automatically
25th June 2007 Jane Lomax
gene -> GO term
associated genes
GO annotations
GO database
genome and protein databases
25th June 2007 Jane Lomax
What is the Gene Ontology?
• Allows biologists to make queries across large numbers of genes without researching each one individually
Copyright ©1998 by the National Academy of Sciences
Eisen, Michael B. et al. (1998) Proc. Natl. Acad. Sci. USA 95, 14863-14868
25th June 2007 Jane Lomax
GO structure
• GO isn’t just a flat list of biological terms
• terms are related within a hierarchy
25th June 2007 Jane Lomax
GO structure
gene A
25th June 2007 Jane Lomax
GO structure
• This means genes can be grouped according to user-defined levels
• Allows broad overview of gene set or genome
25th June 2007 Jane Lomax
How does GO work?
• GO is species independent– some terms, especially lower-level,
detailed terms may be specific to a certain group• e.g. photosynthesis
– But when collapsed up to the higher levels, terms are not dependent on species
25th June 2007 Jane Lomax
How does GO work?
• What does the gene product do?• Where and does it act?• Why does it perform these activities?
What information might we want to capture about a gene product?
25th June 2007 Jane Lomax
GO structure
• GO terms divided into three parts:– cellular component– molecular function– biological process
25th June 2007 Jane Lomax
Cellular Component
• where a gene product acts
25th June 2007 Jane Lomax
Cellular Component
25th June 2007 Jane Lomax
Cellular Component
25th June 2007 Jane Lomax
Cellular Component
• Enzyme complexes in the component ontology refer to places, not activities.
25th June 2007 Jane Lomax
Molecular Function
• activities or “jobs” of a gene product
glucose-6-phosphate isomerase activity
25th June 2007 Jane Lomax
Molecular Function
insulin bindinginsulin receptor activity
25th June 2007 Jane Lomax
Molecular Function
drug transporter activity
25th June 2007 Jane Lomax
Molecular Function
• A gene product may have several functions
• Sets of functions make up a biological process.
25th June 2007 Jane Lomax
Biological Process
a commonly recognized series of events
cell division
25th June 2007 Jane Lomax
Biological Process
transcription
25th June 2007 Jane Lomax
Biological Process
regulation of gluconeogenesis
25th June 2007 Jane Lomax
Biological Process
limb development
25th June 2007 Jane Lomax
Biological Process
courtship behavior
25th June 2007 Jane Lomax
Ontology Structure
• Terms are linked by two relationships– is-a – part-of
25th June 2007 Jane Lomax
Ontology Structure
cell
membrane chloroplast
mitochondrial chloroplastmembrane membrane
is-apart-of
25th June 2007 Jane Lomax
Ontology Structure
• Ontologies are structured as a hierarchical directed acyclic graph (DAG)
• Terms can have more than one parent and zero, one or more children
25th June 2007 Jane Lomax
Ontology Structure
cell
membrane chloroplast
mitochondrial chloroplastmembrane membrane
Directed Acyclic Graph (DAG) - multiple
parentage allowed
25th June 2007 Jane Lomax
Anatomy of a GO term
id: GO:0006094name: gluconeogenesisnamespace: processdef: The formation of glucose fromnoncarbohydrate precursors, such aspyruvate, amino acids and glycerol.[http://cancerweb.ncl.ac.uk/omd/index.html]exact_synonym: glucose biosynthesisxref_analog: MetaCyc:GLUCONEO-PWYis_a: GO:0006006is_a: GO:0006092
unique GO IDterm name
definition
synonymdatabase ref
parentage
ontology
25th June 2007 Jane Lomax
GO terms
• Where do GO terms come from?– GO terms are added by editors at EBI and
annotating databases– new terms are usually only added when they
are asked for by annotators– GO editors work with experts to make major
ontology developments• metabolism• pathogenesis• cell cycle
25th June 2007 Jane Lomax
GO stats
• over 23,000 GO terms:– 13593 biological_process– 1980 cellular_component– 7700 molecular_function
25th June 2007 Jane Lomax
GO annotations
• Where do the links between genes and GO terms come from?
25th June 2007 Jane Lomax
GO annotations• Contributing databases:
– Berkeley Drosophila Genome Project (BDGP)– dictyBase (Dictyostelium discoideum) – FlyBase (Drosophila melanogaster) – GeneDB (Schizosaccharomyces pombe, Plasmodium falciparum,
Leishmania major and Trypanosoma brucei) – UniProt Knowledgebase (Swiss-Prot/TrEMBL/PIR-PSD) and InterPro
databases – Gramene (grains, including rice, Oryza) – Mouse Genome Database (MGD) and Gene Expression Database (GXD)
(Mus musculus) – Rat Genome Database (RGD) (Rattus norvegicus)– Reactome– Saccharomyces Genome Database (SGD) (Saccharomyces cerevisiae) – The Arabidopsis Information Resource (TAIR) (Arabidopsis thaliana) – The Institute for Genomic Research (TIGR): databases on several
bacterial species – WormBase (Caenorhabditis elegans) – Zebrafish Information Network (ZFIN): (Danio rerio)
25th June 2007 Jane Lomax
Species coverage
• All major eukaryotic model organism species
• Human via GOA group at UniProt• Several bacterial and parasite
species through TIGR and GeneDB at Sanger– many more in pipeline
25th June 2007 Jane Lomax
Annotation coverage
25th June 2007 Jane Lomax
Anatomy of a GO annotation
• Three key parts:– gene name/id
– GO term(s)
– evidence for association
25th June 2007 Jane Lomax
Example annotation
• Breast cancer type 1 susceptibility protein gene in humans
25th June 2007 Jane Lomax
Types of GO annotation:
Electronic Annotation
Manual Annotation
25th June 2007 Jane Lomax
Manual annotation
• Created by scientific curators
• High quality
• Small number
25th June 2007 Jane Lomax
Manual annotation
In this study, we report the isolation and molecular characterization of the B. napus PERK1 cDNA, that is predicted to encode a novel receptor-like kinase. We have shown that like other plant RLKs, the kinase domain of PERK1 has serine/threonine kinase activity, In addition, the location of a PERK1-GTP fusion protein to the plasma membrane supports the prediction that PERK1 is an integral membrane protein…these kinases have been implicated in early stages of wound response…
25th June 2007 Jane Lomax
Manual annotation
25th June 2007 Jane Lomax
Electronic Annotation
• Annotation derived without human validation– mappings file e.g. interpro2go, ec2go.– Blast search ‘hits’
• Lower ‘quality’ than manual codes
25th June 2007 Jane Lomax
Mappings files
Fatty acid biosynthesis ( Swiss-Prot Keyword)
EC:6.4.1.2 (EC number)
IPR000438: Acetyl-CoA carboxylase carboxyl transferase beta subunit (InterPro entry)
GO:Fatty acid biosynthesis
(GO:0006633)
GO:acetyl-CoA carboxylase activity
(GO:0003989)
GO:acetyl-CoA carboxylaseactivity
(GO:0003989)
25th June 2007 Jane Lomax
Evidence types
• ISS: Inferred from Sequence/structural Similarity• IDA: Inferred from Direct Assay• IPI: Inferred from Physical Interaction• IMP: Inferred from Mutant Phenotype• IGI: Inferred from Genetic Interaction• IEP: Inferred from Expression Pattern• TAS: Traceable Author Statement• NAS: Non-traceable Author Statement• IC: Inferred by Curator• ND: No Data available
• IEA: Inferred from electronic annotation
25th June 2007 Jane Lomax
GO tools
• GO resources are freely available to anyone to use without restriction– Includes the ontologies, gene
associations and tools developed by GO
• Other groups have used GO to create tools for many purposes:
http://www.geneontology.org/GO.tools
25th June 2007 Jane Lomax
GO tools
• Affymetrix also provide a Gene Ontology Mining Tool as part of their NetAffx™ Analysis Center which returns GO terms for probe sets
25th June 2007 Jane Lomax
GO tools
• Many tools exist that use GO to find common biological functions from a list of genes:
http://www.geneontology.org/GO.tools.microarray.shtml
25th June 2007 Jane Lomax
GO tools
• Most of these tools work in a similar way:– input a gene list and a subset of
‘interesting’ genes– tool shows which GO categories have most
interesting genes associated with them i.e. which categories are ‘enriched’ for interesting genes
– tool provides a statistical measure to determine whether enrichment is significant
25th June 2007 Jane Lomax
Microarray process
• Treat samples• Collect mRNA• Label• Hybridize• Scan• Normalize• Select differentially regulated genes • Understand the biological phenomena
involved
25th June 2007 Jane Lomax
Traditional analysis
Gene 1ApoptosisCell-cell signalingProtein phosphorylationMitosis…
Gene 2Growth controlMitosisOncogenesisProtein phosphorylation…
Gene 3Growth controlMitosisOncogenesisProtein phosphorylation…
Gene 4Nervous systemPregnancyOncogenesisMitosis…
Gene 100Positive ctrl. of cell prolifMitosisOncogenesisGlucose transport…
25th June 2007 Jane Lomax
Traditional analysis
• gene by gene basis
• requires literature searching
• time-consuming
25th June 2007 Jane Lomax
Using GO annotations
• But by using GO annotations, this work has already been done for you!
GO:0006915 : apoptosis
25th June 2007 Jane Lomax
Grouping by process
ApoptosisGene 1Gene 53
MitosisGene 2Gene 5Gene45Gene 7Gene 35…
Positive ctrl. of cell prolif.Gene 7Gene 3Gene 12…
GrowthGene 5Gene 2Gene 6…
Glucose transportGene 7Gene 3Gene 6…
25th June 2007 Jane Lomax
GO for microarray analysis
• Annotations give ‘function’ label to genes
• Ask meaningful questions of microarray data e.g.– genes involved in the same process,
same/different expression patterns?
25th June 2007 Jane Lomax
Using GO in practice
• statistical measure – how likely your differentially regulated
genes fall into that category by chance
microarray
1000 genesexperiment
100 genes differentially regualted
mitosis – 80/100apoptosis – 40/100p. ctrl. cell prol. – 30/100glucose transp. – 20/100
0
10
20
30
40
50
60
70
80
mitosis apoptosis positive control ofcell proliferation
glucose transport
25th June 2007 Jane Lomax
Using GO in practice
• However, when you look at the distribution of all genes on the microarray:Process Genes on array # genes expected in occurred
100 random genesmitosis 800/1000 80 80apoptosis 400/1000 40 40p. ctrl. cell prol. 100/1000 10 30 glucose transp. 50/1000 5 20
25th June 2007 Jane Lomax
Enrichment tools
• GO is developing its own enrichment tool as part of the GO browser AmiGO
• Currently in testing phase, should be released next month
25th June 2007 Jane Lomax
Onto-Express walkthrough
http://vortex.cs.wayne.edu/projects.htm#Onto-Express