Top Banner
Annotating Metagenomes Using the NMPDR Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division, Argonne National Laboratory ASM General Meeting, Boston. www.nmpdr.org www.theseed.org See also poster: B-179 (126B) Aziz et al
26

Annotating Metagenomes Using the NMPDR Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,

Dec 20, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Annotating Metagenomes Using the NMPDR Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,

Annotating Metagenomes Using the NMPDR

Annotating Metagenomes Using the NMPDR

Rob Edwards

Department of Computer Sciences, San Diego State University

Mathematics and Computer Sciences Division, Argonne National Laboratory

ASM General Meeting, Boston.

www.nmpdr.org www.theseed.org

See also poster:B-179 (126B)

Aziz et al

Page 2: Annotating Metagenomes Using the NMPDR Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,

Firstbacterial genome

100bacterial genomes

1,000bacterial genomesN

um

ber

of

know

n s

equence

s

Year

How much has been sequenced?How much has been sequenced?

Environmentalsequencing

www.nmpdr.org www.theseed.org

Page 3: Annotating Metagenomes Using the NMPDR Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,

Everybody inBoston

Everybody inUSA

AllculturedBacteria

100people

How much will be sequenced?

One genome fromevery species

Most majormicrobial environments

www.nmpdr.org www.theseed.org

Page 4: Annotating Metagenomes Using the NMPDR Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,

The ProblemThe Problem

How do you generate consistent and accurate annotations for

metagenomes?

www.nmpdr.org www.theseed.org

Page 5: Annotating Metagenomes Using the NMPDR Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,

The SEED Family

The SEED Family

www.nmpdr.org www.theseed.org

Page 6: Annotating Metagenomes Using the NMPDR Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,

Annotations using subsystemsAnnotations using subsystems

FIG has developed the notion of Subsystem – a generalization of “pathway” as a collection of functional roles jointly involved in a biological process or complex

Extended subsystems into FIGfams – protein families that perform the same functions.

www.nmpdr.org www.theseed.org

Page 7: Annotating Metagenomes Using the NMPDR Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,

Subsystems make up metabolismSubsystems make up metabolism

Wik

ipedia

Meta

bolis

mhtt

p:/

/en.w

ikip

edia

.org

/wik

i/Port

al:M

eta

bolis

m

Page 8: Annotating Metagenomes Using the NMPDR Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,

SEED ViewerSEED Viewer

www.nmpdr.org www.theseed.org

Page 9: Annotating Metagenomes Using the NMPDR Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,

Populated SubsystemPopulated Subsystem

www.nmpdr.org www.theseed.org

Page 10: Annotating Metagenomes Using the NMPDR Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,

predicted or measured co-regulation

genome context(virulence islands, prophages,

conserved gene clusters)

virulence mechanism

cellular localization

enzymatic activity

common phenotype

combinations of criteria

Subsystems Are Not Just PathwaysSubsystems Are Not Just Pathways

www.nmpdr.org www.theseed.org

Page 11: Annotating Metagenomes Using the NMPDR Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,

Automated Annotations of Complete genomes

Automated Annotations of Complete genomes

• Automated user originated processing

• Takes 1-7 hours depending on size and complexity of the genome

• ~1,500 external submissions, including 150 genomes not yet publicly released.

• Reannotation of >500 genomes complete

• 789 users, 160 organizations, 25 countries.

http://rast.nmpdr.org/

Page 12: Annotating Metagenomes Using the NMPDR Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,

Automated Annotations of Complete Metagenomes

Automated Annotations of Complete Metagenomes

MG-RAST Server

Accurate and consistent annotations in a few days

Automatic metabolic reconstructionFreely available after registration

http://metagenomics.theseed.org/

www.nmpdr.org www.theseed.org

Page 13: Annotating Metagenomes Using the NMPDR Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,

Metagenome AnnotationMetagenome Annotation

Automated pipeline– upload sequences in fasta, with or without

Q-scores– removes exact duplicates (454 artefact)– renumbers sequences (mapping provided)– BLAST against SEED nr, 16S rDNA– Annotations and metabolic reenactment– Taxonomic summary

www.nmpdr.org www.theseed.org

Page 14: Annotating Metagenomes Using the NMPDR Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,

Metagenome Metabolic ReenactmentMetagenome Metabolic Reenactment

Page 15: Annotating Metagenomes Using the NMPDR Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,

PhylogenomicsPhylogenomics

Page 16: Annotating Metagenomes Using the NMPDR Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,

Comparing Metagenomes to Genomes (or other metagenomes!)

Comparing Metagenomes to Genomes (or other metagenomes!)

Page 17: Annotating Metagenomes Using the NMPDR Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,

Metabolic potential in environmentsMetabolic potential in environments

Page 18: Annotating Metagenomes Using the NMPDR Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,

Hours

of

Com

pute

Tim

e

Input size (MB)

MG-RAST computationMG-RAST computation~19 hours of compute per input megabyte

Page 19: Annotating Metagenomes Using the NMPDR Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,

How much so farHow much so far

676 metagenomes

10,012,793,995 bp (10 Gbp)

Average: ~15 M bp per genome

Compute time (on a single CPU):

190,243 hours = 7,926 days = 21 years

~200 GS20~200 FLX~200 Sanger]

www.nmpdr.org www.theseed.org

Page 20: Annotating Metagenomes Using the NMPDR Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,

Lots of sequencesall pyrosequencing

Lots of sequencesall pyrosequencing

www.nmpdr.org www.theseed.org

Page 21: Annotating Metagenomes Using the NMPDR Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,

Sulfur

CDA 60.2%

CD

A 2

1.7

% Respiration

Capsule Motility

Membranetransport

Stress

Signaling

Phosphorus

RNA

MineSaltern

MarineMicrobialites

CoralFish

AnimalsFreshwater

From Sequences To EnvironmentsFrom Sequences To Environments

Dinsdale et al, Nature 2008

Page 22: Annotating Metagenomes Using the NMPDR Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,

Upcoming FeaturesUpcoming Features

• More user options (removing sequences, E-values, percent identities, etc)

• More databases (ACLAME, human, etc)

• More user generated content (mash-ups) via webservices and published API

www.nmpdr.org www.theseed.org

Page 23: Annotating Metagenomes Using the NMPDR Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,

Thanks:

Bahador NosratSDSU

Accessing Data via Web ServicesAccessing Data via Web Services

Page 24: Annotating Metagenomes Using the NMPDR Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,

WorkshopsWorkshops

Free workshops on NMPDR, RAST, mg-RAST, SEED

Upcoming workshops: Greece, Argonne, Urbana-Champaign, San Diego

Contact Leslie McNeil [email protected]

or visithttp://www.nmpdr.org/

Page 25: Annotating Metagenomes Using the NMPDR Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,

AcknowledgementsAcknowledgements

Environmental GenomicsForest Rohwerand the labs that

provided sequence

Metagenomics Annotation ServerRick StevensDaniel Paarman Folker MeyerBob OlsenMark D'Souza Statistics & Web services

Liz DinsdaleDana HallBeltran Rodriguez-BritoBahador Nosrat

FIGRoss OverbeekVeronika VonsteinAnnotators

www.nmpdr.org www.theseed.org

Page 26: Annotating Metagenomes Using the NMPDR Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,