Top Banner
How We Annotated Genomes for Free: Fast and Accurate Functional Analysis Using Subsystems Technology Rob Edwards Depts of Computer Science And Biology, San Diego State University Mathematics and Computer Sciences Division, Argonne National Laboratory ASM Philadelphia, May 2009 http://rast.nmpdr.org/? page=Conference
25

Rob Edwards Depts of Computer Science And Biology, San Diego State University

Jan 06, 2016

Download

Documents

Barone Barone

ASM Philadelphia, May 2009. How We Annotated Genomes for Free: Fast and Accurate Functional Analysis Using Subsystems Technology. Rob Edwards Depts of Computer Science And Biology, San Diego State University Mathematics and Computer Sciences Division, Argonne National Laboratory. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Rob Edwards Depts of Computer Science And Biology,  San Diego State University

How We Annotated Genomes for Free: Fast and Accurate Functional

Analysis Using Subsystems Technology

Rob EdwardsDepts of Computer Science And Biology,

San Diego State University

Mathematics and Computer Sciences Division, Argonne National Laboratory

ASM Philadelphia, May 2009

http://rast.nmpdr.org/?page=Conference

Page 2: Rob Edwards Depts of Computer Science And Biology,  San Diego State University

Pigeons

If it’s good enough for Google – it’s good enough for me

Page 3: Rob Edwards Depts of Computer Science And Biology,  San Diego State University

Annotation Servers

• Metagenomes– http://metagenomics.theseed.org

http://rast.nmpdr.org/?page=Conference

• Complete genomes– http://rast.nmpdr.org

Page 4: Rob Edwards Depts of Computer Science And Biology,  San Diego State University

Firstbacterial genome

100bacterial genomes

1,000bacterial genomes

Num

ber

of

know

n s

equence

s

Year

How much has been sequenced?

Environmentalsequencing

http://rast.nmpdr.org/?page=Conference

Page 5: Rob Edwards Depts of Computer Science And Biology,  San Diego State University

Everybody atan ASM meeting

Everybody inUSA

AllculturedBacteria

100people

How much will be sequenced?

One genome fromevery species

Most majormicrobial environments

http://rast.nmpdr.org/?page=Conference

Page 6: Rob Edwards Depts of Computer Science And Biology,  San Diego State University

The SEED Family

http://rast.nmpdr.org/?page=Conference

Page 7: Rob Edwards Depts of Computer Science And Biology,  San Diego State University

Subsystem Spreadsheet

Chaperone Subunit Usher Adhesin

S. enterica Enteritidis 2389 2388 2387 2386

E. coli HS 3068 3067 3066 3065

B. cenocepacia J2315 2604 2603 2602 2601

S. maltophilia 1085 1088 1087 1086

Page 8: Rob Edwards Depts of Computer Science And Biology,  San Diego State University

Over 1,000 Subsystems

Three level “hierarchy”

• Amino Acids and Derivatives– Alanine, serine, and glycine

• Serine Biosynthesis

• Amino Acids and Derivatives– Lysine, threonine, methionine, and cysteine

• Methionine Biosynthesis

Make your own subsystems!

http://rast.nmpdr.org/?page=Conference

Page 9: Rob Edwards Depts of Computer Science And Biology,  San Diego State University

Class # SS Class # SS

Amino Acids and Derivatives 56 Nucleosides and Nucleotides 14

Carbohydrates 97 Phosphorus Metabolism 6

Cell Division and Cell Cycle 10 Photosynthesis 9

Cell Wall and Capsule 50 Potassium metabolism 3

Clustering-based subsystems 193 Protein Metabolism 52

Cofactors, Vitamins, Pigments 43 RNA Metabolism 39

DNA Metabolism 30 Regulation and Cell signaling 23

Fatty Acids, Lipids, and Isoprenoids

22 Respiration 44

Membrane Transport 41 Secondary Metabolism 24

Metabolism of Aromatic Compounds

30 Stress Response 37

Motility and Chemotaxis 8 Sulfur Metabolism 12

Nitrogen Metabolism 11 Virulence 116

Page 10: Rob Edwards Depts of Computer Science And Biology,  San Diego State University

The Annotation Process

• Find the phylogenetic neighborhood of your genome

• Look for proteins that related organisms have– Core proteins– Subset of all subsystems

• Use those calls as a training set for critica/glimmer– Intrinsic training set!

http://rast.nmpdr.org/?page=Conference

Page 11: Rob Edwards Depts of Computer Science And Biology,  San Diego State University

This one’s for Gary

Page 12: Rob Edwards Depts of Computer Science And Biology,  San Diego State University

Automatic Metabolic Reconstruction

• Subsystem, GO, and KEGG connections– KEGG EC numbers– KEGG reaction numbers– SEED reaction numbers (Chris Henry)

• Metabolic flux models – Automatically generate FBA matrices (Aaron

Best/Matt DeJongh; Hope College)

http://rast.nmpdr.org/?page=Conference

Page 13: Rob Edwards Depts of Computer Science And Biology,  San Diego State University
Page 14: Rob Edwards Depts of Computer Science And Biology,  San Diego State University

The Populated Subsystem

http://rast.nmpdr.org/?page=Conference

Page 15: Rob Edwards Depts of Computer Science And Biology,  San Diego State University

Automatically Compare Metabolic Reconstructions

Page 16: Rob Edwards Depts of Computer Science And Biology,  San Diego State University

Find And Suggest Candidate Functions

• Rapidly correct missing annotations

• Add more members to subsystems

• Improves future genome annotations!(especially with new subsystems)

http://rast.nmpdr.org/?page=Conference

Page 17: Rob Edwards Depts of Computer Science And Biology,  San Diego State University

The Real Live Test

• 10 genomes submitted on Thursday at 6 pm

• First annotation complete before 8 am Friday

• Remaining annotations completed Friday before noon

• (there were others in the pipeline too!)

http://rast.nmpdr.org/?page=Conference

Page 18: Rob Edwards Depts of Computer Science And Biology,  San Diego State University

Subsystems Coverage

Genome Percent of Proteins in Subsystems

Haloferax denitrificans 20%

Haloferax mediterranei 19%

Haloferax sulfurifontis 19%

Haloferax volcanii DS2 19%

Haloarcula sp 33800 19%

Haloarcula sp 33799 18%

http://rast.nmpdr.org/?page=Conference

Page 19: Rob Edwards Depts of Computer Science And Biology,  San Diego State University

Prophages

PHANTOME

Mya Breitbart,

Matt Sullivan, Je

ff Elhai, Rob Edwards

NSF

Haloferax sulfurifontis prophage

Page 20: Rob Edwards Depts of Computer Science And Biology,  San Diego State University

Metagenome Comparisons

Metagenomics RAST has 300 public metagenomes

Compared using tblastx

http://rast.nmpdr.org/?page=Conference

Page 21: Rob Edwards Depts of Computer Science And Biology,  San Diego State University

Human Poop

Page 22: Rob Edwards Depts of Computer Science And Biology,  San Diego State University

High Salinity SalternsSaN Diego, July 2004

Thanks Beltran Rodriguez-Mueller, Mya Breitbart, & Forest Rohwer

Page 23: Rob Edwards Depts of Computer Science And Biology,  San Diego State University

Low salinity salterns High salinity salterns

July2004

Nov2005

Page 24: Rob Edwards Depts of Computer Science And Biology,  San Diego State University

Free workshops on NMPDR, RAST, mg-RAST, SEED

Contact Leslie McNeil [email protected]

or visithttp://www.nmpdr.org/

http://rast.nmpdr.org/?page=Conference

Page 25: Rob Edwards Depts of Computer Science And Biology,  San Diego State University

Acknowledgements

Environmental GenomicsForest Rohwer Beltran Rodriguez-Mueller

Annotation ServersRick StevensRoss OverbeekFolker MeyerBob Olson

Daniel Paarman Mark D'Souza

Jared Wilkening Andreas Wilke

FIGRoss OverbeekVeronika VonsteinAnnotators

ArtistPaula Morris

http://rast.nmpdr.org/?page=Conference