Top Banner
TAIR: A Sustainable Community Resource for Arabidopsis Research International Conference on Arabidopsis Research (ICAR 2016), GyeongJu, Korea
60

ICAR2016 TAIR talk

Apr 12, 2017

Download

Education

Donghui Li
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: ICAR2016 TAIR talk

TAIR: A Sustainable Community Resource for Arabidopsis Research

International Conference on Arabidopsis Research (ICAR 2016), GyeongJu, Korea

Page 2: ICAR2016 TAIR talk

1.  TAIR: a sustainable community resource for Arabidopsis research (Eva Huala)

2.  Using biological ontologies to accelerate progress in plant biology research (Donghui Li)

3.  Community annotation: making your data and publication more discoverable (Donghui Li)

Page 3: ICAR2016 TAIR talk

Using biological ontologies to accelerate progress in plant biology research

Donghui Li

TAIR/Phoenix Bioinformatics

Page 4: ICAR2016 TAIR talk

Every year, an average of: •  Over 3000 Arabidopsis research articles are added •  Over 2000 papers are associated with genes •  Over 400 articles have gene function, expression or

phenotype data extracted •  Over 5000 experiment-based annotations are added

using controlled vocabularies (GO and PO ontologies)

Producing a ‘gold standard’ annotated reference plant genome

Highly structured, searchable, computable functional annotations

Page 5: ICAR2016 TAIR talk
Page 6: ICAR2016 TAIR talk
Page 7: ICAR2016 TAIR talk

•  How do we use biological ontologies to annotate Arabidopsis gene function?

•  How to read/interpret annotations?

•  What can you do with these annotations?

Outline

Page 8: ICAR2016 TAIR talk

Why do we need ontologies?

Inconsistency in free text: Different names for the same concept

translation, protein synthesis Same name for different concepts

Bud initiation?

Page 9: ICAR2016 TAIR talk

A Gene Ontology (GO) term

Accession: GO:0006412 Name: translation Ontology: biological_process Synonyms: protein anabolism, protein biosynthesis, protein biosynthetic process, protein formation, protein synthesis, protein translation Definition: The cellular metabolic process in which a protein is formed, using the sequence of a mature mRNA molecule to specify the sequence of amino acids in a polypeptide chain. Translation is mediated by the ribosome, and begins with the formation of a ternary complex between aminoacylated initiator methionine tRNA, GTP, and initiation factor 2, which subsequently associates with the small subunit of the ribosome and an mRNA. Translation ends with the release of a polypeptide chain from the ribosome. Source: GOC:go_curators

Page 10: ICAR2016 TAIR talk

molecular function: catalytic / binding activities kinase activity, DNA binding activity

biological process: biological goal or objective

protein translation, mitosis cellular component: location or complex

nucleus, ribosome, proteasome

More info at www.geneontology.org

Gene Ontology (GO)

Page 11: ICAR2016 TAIR talk

Terms in an ontology are connected

is_a

part_of

Page 12: ICAR2016 TAIR talk

Annotation at different depth of the ontology

is_a

part_of

Page 13: ICAR2016 TAIR talk

Retrieval at higher nodes in the ontology

is_a

part_of

Page 14: ICAR2016 TAIR talk

Manual literature annotation

Page 15: ICAR2016 TAIR talk

Gene product GO term

Evidence code

Anatomy of a GO annotation

Reference

Page 16: ICAR2016 TAIR talk

Experimental evidence codes (EXP) IDA Inferred from Direct Assay (enzyme assays, in situ hybridization) IMP Inferred from Mutant Phenotype (analysis of visible trait) IPI Inferred from Physical Interaction (yeast-2-hybrid) IEP Inferred from Expression Pattern (RT-PCR, Western blot) IGI Inferred from Genetic Interaction (double mutant analysis)

Examples

http://geneontology.org/page/guide-go-evidence-codes

Commonly used evidence codes

Page 17: ICAR2016 TAIR talk

Experimental evidence codes (EXP) IDA Inferred from Direct Assay (enzyme assays, in situ hybridization) IMP Inferred from Mutant Phenotype (analysis of visible trait) IPI Inferred from Physical Interaction (yeast-2-hybrid) IEP Inferred from Expression Pattern (RT-PCR, Western blot) IGI Inferred from Genetic Interaction (double mutant analysis) Computational Analysis Evidence Codes (non-EXP) ISS Inferred from Sequence or Structural Similarity

- based on published sequence alignment IEA Inferred from Electronic Annotation

- InterPro2GO

Examples

http://geneontology.org/page/guide-go-evidence-codes

Commonly used evidence codes

Page 18: ICAR2016 TAIR talk

Evidence code

Annotation counts %

Evidence code

Annotation counts %

EXP 95,435 34.7 IDA 56,271 20.4 IEP 6,651 2.4 IGI 4,286 1.6 IMP 19,441 7.1 IPI 8,786 3.2

Non-EXP 179,801 66.2 Total 275,236 101

Summary of Arabidopsis GO annotations in TAIR

Notes: 9,186 unique publications used in EXP annotations Based on TAIR ATH_GO_GOSLIM.txt 2016-06-05

Page 19: ICAR2016 TAIR talk

Based on annotation data as of May 24, 2016

Summary of Arabidopsis GO annotations in TAIR

Page 20: ICAR2016 TAIR talk

-  Querygenefunc,oninforma,on-  GOannota,onprojec,on-  Func,onalcategoriza,on-  Termenrichment

Application: What can you do with TAIR GO/PO annotations?

Page 21: ICAR2016 TAIR talk
Page 22: ICAR2016 TAIR talk

Get annotations for individual genes from the TAIR locus page

Gene Ontology annotations

Plant Ontology annotations

Page 23: ICAR2016 TAIR talk

Get annotations for individual genes from the TAIR locus page

Other functional information:

Gene summary Polymorphism

Phenotype Publications

Gene symbols

Page 24: ICAR2016 TAIR talk

Get annotations for a list of genes

Page 25: ICAR2016 TAIR talk

Get annotations for a list of genes

Page 26: ICAR2016 TAIR talk

Get annotations for a list of genes

Page 27: ICAR2016 TAIR talk

Find genes annotated to a GO/PO term

Page 28: ICAR2016 TAIR talk
Page 29: ICAR2016 TAIR talk
Page 30: ICAR2016 TAIR talk

Download all GO/PO annotations

Page 31: ICAR2016 TAIR talk

-  Querygenefunc,oninforma,on-  GOannota,onprojec,on-  Func,onalcategoriza,on-  Termenrichment

Application: What can you do with TAIR GO/PO annotations?

Page 32: ICAR2016 TAIR talk

Source: http://geneontology.org/page/current-go-statistics 2016-06-03

Rat

Human

Mouse

Arabidopsis Zebrafish

Worm Chicken

Fly Yeast Rice E coli

GO annotations by species

Page 33: ICAR2016 TAIR talk

Annotating new plant genomes by projecting GO terms from Arabidopsis onto other non-model plant species based on gene orthology

EnsemblPlants Compara

•  Use the Compara pipeline to build orthology •  Automatically transfer GO annotations to plant orthologs

Rulesü  atleasta40%pep,deiden,tytoeachotherü  onlyGOannota,onswithanevidencetypeofIDA,IEP,IGI,

IMPorIPIareprojectedü  noannota,onswitha'NOT'qualifierareprojectedü  annota,onstotheGO:0005515proteinbindingtermarenot

projected

Page 34: ICAR2016 TAIR talk

-  Querygenefunc,oninforma,on-  GOannota,onprojec,on-  Func,onalcategoriza,on-  Termenrichment

Application: What can you do with TAIR GO/PO annotations?

Page 35: ICAR2016 TAIR talk
Page 36: ICAR2016 TAIR talk

TAIR’s functional categorization tool

Page 37: ICAR2016 TAIR talk

Cellular component

Molecular function

Biological process

Page 38: ICAR2016 TAIR talk
Page 39: ICAR2016 TAIR talk

Biological process

Functional category Gene count

Overrepresentation statistical test:

In my list of genes, are any functional classes (for example a GO process) found more often than

expected when compared with the reference list?

Term enrichment analysis

Page 40: ICAR2016 TAIR talk

GOC provides a term enrichment tool powered by PANTHER

pantherdb.org geneontology.org

Page 41: ICAR2016 TAIR talk

Input 1

Input 2

ID Mapping

Use up-to-date annotations

Page 42: ICAR2016 TAIR talk

Output 168/26684=0.63% 0.63%x442=2.78

Page 43: ICAR2016 TAIR talk

Model for the regulation of long-term drought responses in Q. suber root

Model for ABA-dependent drought response in cork oak

Page 44: ICAR2016 TAIR talk

1  The main activity of TAIR curators is producing a ‘gold standard’

annotated reference genome dataset by integrating experimental data from the research literature. New annotations are constantly added.

2  One common use of TAIR is to infer the function of genes in agriculturally important species based on orthology to Arabidopsis genes.

3  TAIR’s annotations are used in applications such as functional categorization, term enrichment. It is important to use the latest annotation file from TAIR.

Summary

Page 45: ICAR2016 TAIR talk

Community annotation: making your data and publication more discoverable

Donghui Li

Page 46: ICAR2016 TAIR talk

Community annotation on TAIR

Page 47: ICAR2016 TAIR talk

Why should everyone participate - increased exposure of your work

Page 48: ICAR2016 TAIR talk
Page 49: ICAR2016 TAIR talk

Community annotation on TAIR

Page 50: ICAR2016 TAIR talk

1. Pre-publication: register your gene symbol to minimize accidental duplications in gene nomenclature

2. Preparing your manuscript: include AGI locus identifiers

3. Post-publication: submit your annotation to us (any journal)

Tips to make your research more discoverable

Page 51: ICAR2016 TAIR talk

AT1G56650 PAP1 PRODUCTION OF ANTHOCYANIN PIGMENT 1AT2G01180 PAP1 PHOSPHATIDIC ACID PHOSPHATASE 1AT2G27190 PAP1 PURPLE ACID PHOSPHATASE 1AT3G16500 PAP1 PHYTOCHROME-ASSOCIATED PROTEIN 1

Gene name duplication make it harder to find the right gene

Page 52: ICAR2016 TAIR talk

Plant Cell Physiol. 2010 Jun;51(6):866-76

Plant Cell Physiol. Jun;51(6):877-83

Conflicting nomenclature / error in publication not uncommon

Page 53: ICAR2016 TAIR talk

PMID:21447788

Mandatory requirement for publishing in some journals

Always include AGI codes

Page 54: ICAR2016 TAIR talk

How to submit

Page 55: ICAR2016 TAIR talk

Requires a login so we can credit submitter no subscription required

Video tutorial

Page 56: ICAR2016 TAIR talk
Page 57: ICAR2016 TAIR talk

Provide ‘evidence with’ as comments

Page 58: ICAR2016 TAIR talk

Multiple genes?

Page 59: ICAR2016 TAIR talk

•  “I do profit a lot from the data on TAIR, thus

this submission is a small contribution to extend the data present on TAIR.”

•  “I gratefully did it [data submission] because I already benefit from similar information for other genes.”

Community feedback

Page 60: ICAR2016 TAIR talk

Q&A