Top Banner
EMBRACE Gene Ontology Workshop, 7 th – 9 th November 2007 Bari High throughput functional annotation and analysis with the Blast2GO suite Ana Conesa Bioinformatics Department Centro de Investigaciones Prínicpe Felipe [email protected]
35

EMBRACE Gene Ontology Workshop, 7 th – 9 th November 2007 Bari High throughput functional annotation and analysis with the Blast2GO suite Ana Conesa Bioinformatics.

Dec 17, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: EMBRACE Gene Ontology Workshop, 7 th – 9 th November 2007 Bari High throughput functional annotation and analysis with the Blast2GO suite Ana Conesa Bioinformatics.

EMBRACE Gene Ontology Workshop, 7th – 9th November 2007 Bari

High throughput functional annotation and analysis with the Blast2GO suite

Ana ConesaBioinformatics Department

Centro de Investigaciones Prínicpe Felipe

[email protected]

Page 2: EMBRACE Gene Ontology Workshop, 7 th – 9 th November 2007 Bari High throughput functional annotation and analysis with the Blast2GO suite Ana Conesa Bioinformatics.

EMBRACE Workshop, 7th – 9th November 2007, Bari

Credits

Biomedical InformaticsUPV, Valencia

Juan Miguel GómezMontserrat Robles

Bioinformatics DepartmentCIPF, Valencia

Blast2GO Development:

Blast2GO special thanks to:

ANNEX :Simen Myhre, Henrik Tveit (MTNU)GOSSIP: Nils Blüthgen (MicroDiscovery GmbH)ZVTM: Emmanuel Pietriga (INRIA)

goslim.tair.obo: Suparna Mundodi (TAIR)

Bioinformatics DepartmentCIPF, Valencia

Ana ConesaStefan Goetz

Centro de GenómicaIVIA, ValenciaJavier Terol

Manuel Talón

Page 3: EMBRACE Gene Ontology Workshop, 7 th – 9 th November 2007 Bari High throughput functional annotation and analysis with the Blast2GO suite Ana Conesa Bioinformatics.

EMBRACE Workshop, 7th – 9th November 2007, Bari

Motivation

Numerous EST/genome projects

Large amounts of NEW sequence data

Functional Genomics Studies

Need of FunctionalAnnotation

Which kind of tool?

Easy to set up & runVersatil & Universal

High-throughput & interactiveCombine annotation & function analysiswww.blast2go.org

Page 4: EMBRACE Gene Ontology Workshop, 7 th – 9 th November 2007 Bari High throughput functional annotation and analysis with the Blast2GO suite Ana Conesa Bioinformatics.

EMBRACE Workshop, 7th – 9th November 2007, Bari

Gene Ontology based annotation

more general

more specific

Molecular FunctionBiological ProcessCellular Component

IP2G

O

GO2EC

Page 5: EMBRACE Gene Ontology Workshop, 7 th – 9 th November 2007 Bari High throughput functional annotation and analysis with the Blast2GO suite Ana Conesa Bioinformatics.

EMBRACE Workshop, 7th – 9th November 2007, Bari

Similaritybetween Sequences

Qualityof existence annotation

Precisionvs. “recall”Resolution

Level in GO hierarchy

Selection of recovered annotation data

B2G Annotation Rule

Consistencyof assigned annotation

Concepts of automatic annotation

Page 6: EMBRACE Gene Ontology Workshop, 7 th – 9 th November 2007 Bari High throughput functional annotation and analysis with the Blast2GO suite Ana Conesa Bioinformatics.

EMBRACE Workshop, 7th – 9th November 2007, Bari

. [(max. ) (# )]lowest node sim ECw GO GOw threshold Annotation Rule

Blast2GO Annotation Rule

Lowest term satisfying the requirements EC weight

IC 1TAS 1IDA 1IMP 0.9IGI 0.9IPI 0.9ISS 0.8IEP 0.8NAS 0.7IEA 0.7ND 0.5NR 0.5RCA 0.5

Quality of source annotation

Evid

ence C

od

es

Possibility of abstraction

sim=∑positiveshsp

∑ alignmentlengthhsp

Similarity requirement

Recall

vs.

Precision

Page 7: EMBRACE Gene Ontology Workshop, 7 th – 9 th November 2007 Bari High throughput functional annotation and analysis with the Blast2GO suite Ana Conesa Bioinformatics.

EMBRACE Workshop, 7th – 9th November 2007, Bari

InterProScan

GO-Slim

GO Second Layer

Graph Visualization

Enrichment

Statistics

KEGG maps

Validation

localB2GDB

PipelineBatch Mode

Compare

Annotation(GO,IPR,EC)

Main functions within Blast2GO

costumDB

GeneIDs

Additional Features:

BLAST MAPPING ANNOT.RULE Manual Curation

Page 8: EMBRACE Gene Ontology Workshop, 7 th – 9 th November 2007 Bari High throughput functional annotation and analysis with the Blast2GO suite Ana Conesa Bioinformatics.

EMBRACE Workshop, 7th – 9th November 2007, Bari

Blast2GO Schema

Page 9: EMBRACE Gene Ontology Workshop, 7 th – 9 th November 2007 Bari High throughput functional annotation and analysis with the Blast2GO suite Ana Conesa Bioinformatics.

EMBRACE Workshop, 7th – 9th November 2007, Bari

Blast2GO use

SpeciesCitrus, nicotiana, maize, soybean, tomato, grape…Streptococcus, Trichoderma, Schistosoma, Cyanobacteria…European Flounder,pig, flidder crab, rat, honneybee, human…Metagenome projects…

Page 10: EMBRACE Gene Ontology Workshop, 7 th – 9 th November 2007 Bari High throughput functional annotation and analysis with the Blast2GO suite Ana Conesa Bioinformatics.

EMBRACE Workshop, 7th – 9th November 2007, Bari

Where to find Blast2GO

More info:Bioinformatics 2005 21: 3674-3676Blast2GO tutorial: http://www.blast2go.org

Web:

http://www.blast2go.orghttp://blast2go.bioinfo.cipf.eshttp://www.geneontology.orghttp://groups.google.com/group/Blast2GO

Page 11: EMBRACE Gene Ontology Workshop, 7 th – 9 th November 2007 Bari High throughput functional annotation and analysis with the Blast2GO suite Ana Conesa Bioinformatics.

EMBRACE Gene Ontology Workshop, 7th – 9th November 2007 Bari

Blast2GO Guided Tour

Ana ConesaBioinformatics Department

Centro de Investigaciones Prínicpe Felipe

[email protected]

Page 12: EMBRACE Gene Ontology Workshop, 7 th – 9 th November 2007 Bari High throughput functional annotation and analysis with the Blast2GO suite Ana Conesa Bioinformatics.

EMBRACE Workshop, 7th – 9th November 2007, Bari

Start Blast2GO

www.blast2go.org

Desktop application

Java webstart technology

Internet connection

Page 13: EMBRACE Gene Ontology Workshop, 7 th – 9 th November 2007 Bari High throughput functional annotation and analysis with the Blast2GO suite Ana Conesa Bioinformatics.

EMBRACE Workshop, 7th – 9th November 2007, Bari

Load Sequences

Page 14: EMBRACE Gene Ontology Workshop, 7 th – 9 th November 2007 Bari High throughput functional annotation and analysis with the Blast2GO suite Ana Conesa Bioinformatics.

EMBRACE Workshop, 7th – 9th November 2007, Bari

Run BLAST search

Page 15: EMBRACE Gene Ontology Workshop, 7 th – 9 th November 2007 Bari High throughput functional annotation and analysis with the Blast2GO suite Ana Conesa Bioinformatics.

EMBRACE Workshop, 7th – 9th November 2007, Bari

BLAST results

Page 16: EMBRACE Gene Ontology Workshop, 7 th – 9 th November 2007 Bari High throughput functional annotation and analysis with the Blast2GO suite Ana Conesa Bioinformatics.

EMBRACE Workshop, 7th – 9th November 2007, Bari

Blast Distribution Charts

Page 17: EMBRACE Gene Ontology Workshop, 7 th – 9 th November 2007 Bari High throughput functional annotation and analysis with the Blast2GO suite Ana Conesa Bioinformatics.

EMBRACE Workshop, 7th – 9th November 2007, Bari

Exercise 1

Launch Blast2GOOpen FASTA file (unizip examples.zip)Browse number of sequences and sequence lengthUnselect all sequencesSelect 5 sequencesRun Blast against NCBI nr (change parameters if desidered)

Page 18: EMBRACE Gene Ontology Workshop, 7 th – 9 th November 2007 Bari High throughput functional annotation and analysis with the Blast2GO suite Ana Conesa Bioinformatics.

EMBRACE Workshop, 7th – 9th November 2007, Bari

Exercise 2

Open blast_example.datExamine Distribution charts

Page 19: EMBRACE Gene Ontology Workshop, 7 th – 9 th November 2007 Bari High throughput functional annotation and analysis with the Blast2GO suite Ana Conesa Bioinformatics.

EMBRACE Workshop, 7th – 9th November 2007, Bari

Mapping

Page 20: EMBRACE Gene Ontology Workshop, 7 th – 9 th November 2007 Bari High throughput functional annotation and analysis with the Blast2GO suite Ana Conesa Bioinformatics.

EMBRACE Workshop, 7th – 9th November 2007, Bari

MappingResources

HitACC/GI

GO-Terms

EC

sim %

GO mapping resources:• Full Gene Ontology DB• NCBI Flat Files: gene2accession (4 079 414 entries) gene_info (1 635 614 entries)• PIR - Non-Redundant Reference Protein Database: including PSD, UniProt, Swiss-Prot, TrEMBL, RefSeq, GenPept y PDB

Resources of mapping

Page 21: EMBRACE Gene Ontology Workshop, 7 th – 9 th November 2007 Bari High throughput functional annotation and analysis with the Blast2GO suite Ana Conesa Bioinformatics.

Annotation

EMBRACE Workshop, 7th – 9th November 2007, Bari

GO

DAG ValidationAnnex

GOSlim

EC/KEGG

InterPro

Page 22: EMBRACE Gene Ontology Workshop, 7 th – 9 th November 2007 Bari High throughput functional annotation and analysis with the Blast2GO suite Ana Conesa Bioinformatics.

EMBRACE Workshop, 7th – 9th November 2007, Bari

Gene Ontology annotation

Page 23: EMBRACE Gene Ontology Workshop, 7 th – 9 th November 2007 Bari High throughput functional annotation and analysis with the Blast2GO suite Ana Conesa Bioinformatics.

EMBRACE Workshop, 7th – 9th November 2007, Bari

Annotation Charts

Page 24: EMBRACE Gene Ontology Workshop, 7 th – 9 th November 2007 Bari High throughput functional annotation and analysis with the Blast2GO suite Ana Conesa Bioinformatics.

EMBRACE Workshop, 7th – 9th November 2007, Bari

Exercise 3

Select 10 first sequencesRun Mapping and AnnotationSelect non annotated sequences and re-annotate with milder parametersLo annotation_example.dat fileVisualize Results on Mapping/Annotation Charts

Page 25: EMBRACE Gene Ontology Workshop, 7 th – 9 th November 2007 Bari High throughput functional annotation and analysis with the Blast2GO suite Ana Conesa Bioinformatics.

EMBRACE Workshop, 7th – 9th November 2007, Bari

Sequence menu

Page 26: EMBRACE Gene Ontology Workshop, 7 th – 9 th November 2007 Bari High throughput functional annotation and analysis with the Blast2GO suite Ana Conesa Bioinformatics.

EMBRACE Workshop, 7th – 9th November 2007, Bari

Modulation of annotation

Change annotation manually

EC-Codes

Seq. Description

GO-Term ACC

Summarize annotation by “GoSlim”

OBO GO-Slim File

Extend annotation by the GO “Second Layer”

Biological Process Cellular Component

Molecular Function

acts inis involved in

Myhre et al, Bioinformatics 2006

Page 27: EMBRACE Gene Ontology Workshop, 7 th – 9 th November 2007 Bari High throughput functional annotation and analysis with the Blast2GO suite Ana Conesa Bioinformatics.

EMBRACE Workshop, 7th – 9th November 2007, Bari

Exercise 4

Browse BlastResults to (select one sequence and use sequence menu):

Draw annotation graph

View Annotations

Edit/change annotationSelect a few sequences to run GoSlimRun Annex

Page 28: EMBRACE Gene Ontology Workshop, 7 th – 9 th November 2007 Bari High throughput functional annotation and analysis with the Blast2GO suite Ana Conesa Bioinformatics.

EMBRACE Workshop, 7th – 9th November 2007, Bari

Enzyme annotation and Kegg Maps

GO Enzyme Codes KEGG maps

Page 29: EMBRACE Gene Ontology Workshop, 7 th – 9 th November 2007 Bari High throughput functional annotation and analysis with the Blast2GO suite Ana Conesa Bioinformatics.

EMBRACE Workshop, 7th – 9th November 2007, Bari

InterproScan

Page 30: EMBRACE Gene Ontology Workshop, 7 th – 9 th November 2007 Bari High throughput functional annotation and analysis with the Blast2GO suite Ana Conesa Bioinformatics.

EMBRACE Workshop, 7th – 9th November 2007, Bari

Exercise 5

Select a few sequences to run InterProScanChange terms view GO ID/term, IPS/GOMerge IPS results with Blast Annotations

Load annot_interpro_annex_example.datExport GO AnnotationsExport IPS AnnotationsSave Project and Sequence Table

Page 31: EMBRACE Gene Ontology Workshop, 7 th – 9 th November 2007 Bari High throughput functional annotation and analysis with the Blast2GO suite Ana Conesa Bioinformatics.

EMBRACE Workshop, 7th – 9th November 2007, Bari

GO Graph Visualization as tool to explore dataInteractive and “zoomable” graphsColor graphs highlighting areas of interest

Node Score of

Annotation Content

31

2.4

2.5

1

1 3

Visualization # dscore seq

Page 32: EMBRACE Gene Ontology Workshop, 7 th – 9 th November 2007 Bari High throughput functional annotation and analysis with the Blast2GO suite Ana Conesa Bioinformatics.

EMBRACE Workshop, 7th – 9th November 2007, Bari

Level and Multilevel Charts

Visualization :Pies

Page 33: EMBRACE Gene Ontology Workshop, 7 th – 9 th November 2007 Bari High throughput functional annotation and analysis with the Blast2GO suite Ana Conesa Bioinformatics.

EMBRACE Workshop, 7th – 9th November 2007, Bari

Exercise 6

Select some sequences using select by names function (use test.example.txt)Create a GO Combined GraphCreate Pies at level 4 and Multilevel Pie at score 3Play with filters to simplify the graph (set score filter to 3)Export GO Graph data as table and visualize

Page 34: EMBRACE Gene Ontology Workshop, 7 th – 9 th November 2007 Bari High throughput functional annotation and analysis with the Blast2GO suite Ana Conesa Bioinformatics.

EMBRACE Workshop, 7th – 9th November 2007, Bari

Functional analysis with B2G

Enrichment Analysis (Fisher)

Page 35: EMBRACE Gene Ontology Workshop, 7 th – 9 th November 2007 Bari High throughput functional annotation and analysis with the Blast2GO suite Ana Conesa Bioinformatics.

EMBRACE Workshop, 7th – 9th November 2007, Bari

Exercise 7

Run Enrichment Analysis using test and reference set filesCreate Bar ChartCreate Enriched Graph and modulate number of nodesExport results