Top Banner
Annotating Metagenomes Using the SEED Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division, Argonne National Laboratory NSF/EU Cyberinfrastructure Meeting, Washington, DC. www.nmpdr.org www.theseed.org
25

Annotating Metagenomes Using the SEED Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Annotating Metagenomes Using the SEED Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,

Annotating Metagenomes Using the SEED

Rob Edwards

Department of Computer Sciences, San Diego State University

Mathematics and Computer Sciences Division, Argonne National Laboratory

NSF/EU Cyberinfrastructure Meeting, Washington, DC.

www.nmpdr.org www.theseed.org

Page 2: Annotating Metagenomes Using the SEED Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,
Page 3: Annotating Metagenomes Using the SEED Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,

Firstbacterial genome

100bacterial genomes

1,000bacterial genomesN

um

ber

of

know

n s

equence

s

Year

How much has been sequenced?

Environmentalsequencing

Page 4: Annotating Metagenomes Using the SEED Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,

Everybody inSan Diego

Everybody inUSA

AllculturedBacteria

100people

How much will be sequenced?

One genome fromevery species

Most majormicrobial environments

Page 5: Annotating Metagenomes Using the SEED Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,

What do we want from annotations?

ConsistentAccurateAvailableReliable

www.nmpdr.org www.theseed.org

Page 6: Annotating Metagenomes Using the SEED Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,

Consistent

www.nmpdr.org www.theseed.org

Page 7: Annotating Metagenomes Using the SEED Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,

The Importance of Consistency

• Consistency: same genes connected to same functional role

• Enables communication

• Required for most comparative genomics assays

www.nmpdr.org www.theseed.org

Page 8: Annotating Metagenomes Using the SEED Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,

hisAFIG function:

Phosphoribosylformimino-5-aminoimidazole carboxamide ribotide isomerase (EC 5.3.1.16)

Other functions in RefSeq:

phosphoribosylformimino-5-aminoimidazole carboxamidephosphoribosylformimino-5-aminoimidazole carboxamide ribotide isomerasephosphoribosylformimino-5-aminoimidazole carboxamide ribotide...

1-(5-phosphoribosyl)-5-[(5- phosphoribosylamino)methylideneamino] imidazole-4-carboxamide isomerase

N-(5-phospho-L-ribosyl-formimino)-5-amino-1-(5- phosphoribosyl)-4-imidazolecarboxamide isomeraseN-(5'-phospho-L-ribosyl-formimino)-5-amino-1-(5'-phosphoribosyl)-4-imidazolecarboxamide isomeraseN-(5'-phospho-L-ribosyl-formimino)-5-amino-1- (5'- phosphoribosyl)-4-imidazolecarboxamide isomeraseN-(5'-phospho-L-ribosyl-formimino)-5-amino-1- (5'-phosphoribosyl)-4-imidazolecarboxamide isomeraseN-(5'-phospho-L-ribosyl-formimino)-5-amino-1- (5'-phosphoribosyl)-4- imidazolecarboxamide isomerase

Phosphoribosyl isomerase A [1-[5-phosphoribosyl]-5-[[5-phosphoribosylamino]methylideneamino] imidazole-4-carboxamide isomerase]

www.nmpdr.org www.theseed.org

Page 9: Annotating Metagenomes Using the SEED Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,

Measuring Consistency• Define a set of protein families such that each family

contains genes playing the same function

• Attach functional roles to protein families• Measure the consistency of the annotations made to

genes within each family

1. "consistency" is the odds that two proteins from the same family have the same function

2. Evaluate both families and functions.

www.nmpdr.org www.theseed.org

Page 10: Annotating Metagenomes Using the SEED Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,

Consistency among databases

www.nmpdr.org www.theseed.org

Page 11: Annotating Metagenomes Using the SEED Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,

Accurate

www.nmpdr.org www.theseed.org

Page 12: Annotating Metagenomes Using the SEED Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,

How to measure accuracy

• If everything was called “hypothetical protein” the database would be 100% consistent

• Need to measure accuracy (specificity) as well as consistency

• Sample 100 proteins at random from “curated” set (i.e. that are believed to be correct)

• Manually inspect annotations to score correctness

www.nmpdr.org www.theseed.org

Page 13: Annotating Metagenomes Using the SEED Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,

Available

www.nmpdr.org www.theseed.org

Page 14: Annotating Metagenomes Using the SEED Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,

http://metagenomics.theseed.org

Free serviceUser registration/log inFree to upload sequences in several formatsAutomatically annotates sequencesDownload in several formats

Complete genomes too: http://www.nmpdr.org/anno-server

Soon to come:Plasmids, phages, other short genomes

Page 15: Annotating Metagenomes Using the SEED Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,

Metagenome Metabolic Reconstruction

Page 16: Annotating Metagenomes Using the SEED Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,

Metabolic potential in environments

Page 17: Annotating Metagenomes Using the SEED Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,

Phylogenomics

Page 18: Annotating Metagenomes Using the SEED Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,

Comparing Metagenomes to Genomes(or other metagenomes!)

Page 19: Annotating Metagenomes Using the SEED Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,

Reliable (Believable)

Page 20: Annotating Metagenomes Using the SEED Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,

Metabolic potential in environments

Page 21: Annotating Metagenomes Using the SEED Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,

Sulfur

CDA 60.2%

CD

A 2

1.7

% Respiration

Capsule Motility

Membranetransport

Stress

Signaling

Phosphorus

RNA

MineSaltern

MarineMicrobialites

CoralFish

AnimalsFreshwater

From sequences to environments

Page 22: Annotating Metagenomes Using the SEED Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,

What do we want from annotations?

ConsistentAccurateAvailableReliable

When

do

we wan

t it?

NOW

Page 23: Annotating Metagenomes Using the SEED Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,

AcknowledgementsEnvironmental Genomics

Forest RohwerRohwer lab membersAll the labs that

provided sequence

Metagenomics Annotation ServerRick StevensDaniel Paarman Folker MeyerBob Olsen

StatisticsLiz DinsdaleDana HallBeltran Rodriguez-Brito

FIGRoss OverbeekVeronika VonsteinAnnotators

Page 24: Annotating Metagenomes Using the SEED Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,
Page 25: Annotating Metagenomes Using the SEED Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,

Subsystems make up metabolism

Wik

ipedia

Meta

bolis

mhtt

p:/

/en.w

ikip

edia

.org

/wik

i/Port

al:M

eta

bolis

m