Top Banner
The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State University Mathematics and Computer Sciences Division, Argonne National Laboratory Roche Life Sciences Workshop, Sept 2008 www.nmpdr.org www.theseed.org
28

The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing

Jan 02, 2016

Download

Documents

denise-carson

Roche Life Sciences Workshop, Sept 2008. The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing. Rob Edwards Department of Computer Science, San Diego State University Mathematics and Computer Sciences Division, Argonne National Laboratory. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing

The Metagenomics RAST server: Annotation, Analysis, and

ComparisonsPerfect for Pyrosequencing

Rob Edwards

Department of Computer Science, San Diego State University

Mathematics and Computer Sciences Division, Argonne National Laboratory

Roche Life Sciences Workshop, Sept 2008

www.nmpdr.org www.theseed.org

Page 2: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing

Outline

• Metagenomics

• Tools for analyzing sequences

• Computational Challenges

• Does it work?

www.nmpdr.org www.theseed.org

Page 3: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing

Firstbacterial genome

100bacterial genomes

1,000bacterial genomes

Num

ber

of

know

n s

equence

s

Year

How much has been sequenced?

Environmentalsequencing

www.nmpdr.org www.theseed.org

Page 4: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing

Everybody inSan Diego

Everybody inUSA

AllculturedBacteria

100people

How much will be sequenced?

One genome fromevery species

Most majormicrobial environments

www.nmpdr.org www.theseed.org

Page 5: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing

Metagenomics(Just sequence it)

200 liters water 5-500 g fresh fecal matter50 g soil

Sequence

Epifluorescent Microscopy

Concentrate and purify bacteria, viruses, etc

Extract nucleic acids

Publish papers

Page 6: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing

Marine Near-shore water (~100 samples) Off-shore water (~50 samples) Near- and off-shore sediments

Metazoanassociated Corals Fish Human blood Human stool

ModernMetagenomics

Terrestrial/Soil Terragenomics Amazon rainforest Konza prairie Joshua Tree desert Air

Freshwater Aquifer Glacial lake

ExtremeHot springs (84oC; 78oC)Soda lake (pH 13)Solar saltern (>35% salt)

Page 7: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing

The Problem

How do you generate consistent and accurate annotations for metagenomes?

www.nmpdr.org www.theseed.org

Page 8: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing

The SEED Family

www.nmpdr.org www.theseed.org

Page 9: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing

Annotations using subsystemsFIG developed the notion of Subsystem – a generalization of “pathway” as a collection of functional roles jointly involved in a biological process or complex

Extended subsystems into FIGfams – protein families that perform the same functions.

www.nmpdr.org www.theseed.org

Page 10: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing

Annotation of Complete Genomes

• Automated user originated processing

• Takes 1-7 hours depending on size and complexity of the genome

• ~2,000 external submissions, including hundreds of genomes not yet publicly released.

• Reannotation of >500 genomes complete

• 1,000 users, 200 organizations, 25 countries.

http://rast.nmpdr.org/

www.nmpdr.org www.theseed.org

Page 11: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing

The metagenomics RAST server

www.nmpdr.org www.theseed.org

Page 12: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing

Automated Processing

Page 13: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing

www.nmpdr.org www.theseed.org

Summary View

Page 14: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing

Metagenomics ToolsAnnotation & Subsystems

www.nmpdr.org www.theseed.org

Page 15: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing

Metagenomics ToolsAnnotation & KEGG maps

Page 16: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing

Metagenomics ToolsRecruitment Plots

Page 17: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing

Metagenomics ToolsPhylogenetic Reconstruction

Page 18: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing

Metagenomics ToolsComparative Tools

Page 19: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing

Hours

of

Com

pute

Tim

e

Input size (MB)

Computational Requirements~19 hours of compute per input megabyte

www.nmpdr.org www.theseed.org

Page 20: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing

How much so far

986 metagenomes

79,417,238 sequences

17,306,834,870 bp (17 Gbp)

Average: ~15-20 M bp per genome

Compute time (on a single CPU):

328,814 hours = 13,700 days = 38 years

~300 GS20~300 FLX~300 Sanger

www.nmpdr.org www.theseed.org

Page 21: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing

Lots of sequencesall pyrosequencing

www.nmpdr.org www.theseed.org

Page 22: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing

Metagenomics ToolsFunctional Heat Maps

Page 23: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing

Sulfur

CDA 60.2%

CD

A 2

1.7

% Respiration

Capsule Motility

Membranetransport

Stress

Signaling

Phosphorus

RNA

MineSaltern

MarineMicrobialites

CoralFish

AnimalsFreshwater

From Sequences To Environments

Dinsdale et al, Nature 2008

Page 24: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing

Workshops

Free workshops on NMPDR, RAST, mg-RAST, SEED

Contact Leslie McNeil [email protected]

or visithttp://www.nmpdr.org/

www.nmpdr.org www.theseed.org

Page 25: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing

Acknowledgements

Environmental GenomicsForest Rohwer All the labs that

provided sequence

Metagenomics Annotation ServerRick StevensFolker MeyerBob Olson

Daniel Paarman Mark D'Souza

Jared Wilkening Andreas Wilke

Statistics & Web servicesLiz DinsdaleRobert SchmiederDana HallBeltran Rodriguez-BritoBahador Nosrat

FIGRoss OverbeekVeronika VonsteinAnnotators

www.nmpdr.org www.theseed.org

ArtistPaula Morris

Argonne SequencingMarc DomanusAreej Ammar

Page 26: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing

Artists impression : not all machines are known to explode

Page 27: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing

Terragenomics

Page 28: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing

Differences between soil samples