24th Feb 2006 Jane Lomax
GO Further
24th Feb 2006 Jane Lomax
GO annotations
• Where do the links between genes and GO terms come from?
24th Feb 2006 Jane Lomax
GO annotations• Contributing databases:
– Berkeley Drosophila Genome Project (BDGP)– dictyBase (Dictyostelium discoideum) – FlyBase (Drosophila melanogaster) – GeneDB (Schizosaccharomyces pombe, Plasmodium falciparum,
Leishmania major and Trypanosoma brucei) – UniProt Knowledgebase (Swiss-Prot/TrEMBL/PIR-PSD) and InterPro
databases – Gramene (grains, including rice, Oryza) – Mouse Genome Database (MGD) and Gene Expression Database (GXD)
(Mus musculus) – Rat Genome Database (RGD) (Rattus norvegicus)– Reactome– Saccharomyces Genome Database (SGD) (Saccharomyces cerevisiae) – The Arabidopsis Information Resource (TAIR) (Arabidopsis thaliana) – The Institute for Genomic Research (TIGR): databases on several
bacterial species – WormBase (Caenorhabditis elegans) – Zebrafish Information Network (ZFIN): (Danio rerio)
24th Feb 2006 Jane Lomax
Species coverage
• All major eukaryotic model organism species
• Human via GOA group at UniProt• Several bacterial and parasite
species through TIGR and GeneDB at Sanger– many more in pipeline
24th Feb 2006 Jane Lomax
Annotation coverage
24th Feb 2006 Jane Lomax
Anatomy of a GO annotation
• Three key parts:– gene name/id
– GO term(s)
– evidence for association
24th Feb 2006 Jane Lomax
Example annotation
• Breast cancer type 1 susceptibility protein gene in humans
24th Feb 2006 Jane Lomax
Types of GO annotation:
Electronic Annotation
Manual Annotation
24th Feb 2006 Jane Lomax
Manual annotation
• Created by scientific curators
• High quality
• Small number
24th Feb 2006 Jane Lomax
Manual annotation
In this study, we report the isolation and molecular characterization of the B. napus PERK1 cDNA, that is predicted to encode a novel receptor-like kinase. We have shown that like other plant RLKs, the kinase domain of PERK1 has serine/threonine kinase activity, In addition, the location of a PERK1-GTP fusion protein to the plasma membrane supports the prediction that PERK1 is an integral membrane protein…these kinases have been implicated in early stages of wound response…
24th Feb 2006 Jane Lomax
Manual annotation
24th Feb 2006 Jane Lomax
Electronic Annotation
• Annotation derived without human validation– mappings file e.g. interpro2go, ec2go.– Blast search ‘hits’
• Lower ‘quality’ than experimental codes
24th Feb 2006 Jane Lomax
Mappings files
Fatty acid biosynthesis ( Swiss-Prot Keyword)
EC:6.4.1.2 (EC number)
IPR000438: Acetyl-CoA carboxylase carboxyl transferase beta subunit (InterPro entry)
GO:Fatty acid biosynthesis
(GO:0006633)
GO:acetyl-CoA carboxylase activity
(GO:0003989)
GO:acetyl-CoA carboxylaseactivity
(GO:0003989)
24th Feb 2006 Jane Lomax
Evidence types
• ISS: Inferred from Sequence/structural Similarity• IDA: Inferred from Direct Assay• IPI: Inferred from Physical Interaction• IMP: Inferred from Mutant Phenotype• IGI: Inferred from Genetic Interaction• IEP: Inferred from Expression Pattern• TAS: Traceable Author Statement• NAS: Non-traceable Author Statement• IC: Inferred by Curator• ND: No Data available
• IEA: Inferred from electronic annotation
24th Feb 2006 Jane Lomax
GO terms
• Where do GO terms come from?– most GO terms are added by the GO editorial
office at EBI– new terms are usually only added when they
are asked for by annotators– GO editors work with experts to make major
ontology developments• metabolism• pathogenesis• cell cycle
24th Feb 2006 Jane Lomax
GO stats
• almost 20,000 GO terms– 10452 biological_process– 1687 cellular_component– 7393 molecular_function
24th Feb 2006 Jane Lomax
Growth of GO
24th Feb 2006 Jane Lomax
No GO Areas
• GO covers ‘normal’ functions and processes– No pathological processes– No experimental conditions
• NO evolutionary relationships• NO gene products• NOT a system of nomenclature
24th Feb 2006 Jane Lomax
Open Biomedical Ontologies (OBO)• A repository for well-structured
controlled vocabularies for shared use across different biological and medical domains:
http://obo.sourceforge.net/
24th Feb 2006 Jane Lomax
Open Biomedical Ontologies (OBO)• Requirements for inclusion:
http://obo.sourceforge.net/crit.html
24th Feb 2006 Jane Lomax
AmiGO exercise
24th Feb 2006 Jane Lomax
Annotation exercise
• We have provided a Nature paper (PMID: 14961121) for you to annotate with GO terms – This will help you to understand how the
information is extracted from papers and GO terms are applied by the curators
– It will also give you the opportunity to use another GO browser developed at EBI: QuickGO
24th Feb 2006 Jane Lomax
Annotation exercise
• The gene you are annotating is VG5Q– To make it easier we’ve highlighted
some of the most relevant passages in the text
• Use the GO browser QuickGO to look for the most appropriate GO terms:– http://www.ebi.ac.uk/ego/
24th Feb 2006 Jane Lomax
Annotation exercise
• In QuickGO, you search for the GO terms by name
http://www.ebi.ac.uk/ego/
24th Feb 2006 Jane Lomax
Annotation exercise
• Remember, as well as the GO term, you also need to assign an evidence code– to remind you, we’ve included a list of
the evidence codes at the back of the paper
24th Feb 2006 Jane Lomax
Annotation exercise
• To see how your annotations compared to those done by the GO curator, search QuickGO for Q8N302– This is the UniProt id for the gene VG5Q
• Click ‘show only manual’ and this will show you the annotations the curator made