Page 1
Annotating Metagenomes Using the SEED
Rob Edwards
Department of Computer Sciences, San Diego State University
Mathematics and Computer Sciences Division, Argonne National Laboratory
NSF/EU Cyberinfrastructure Meeting, Washington, DC.
www.nmpdr.org www.theseed.org
Page 3
Firstbacterial genome
100bacterial genomes
1,000bacterial genomesN
um
ber
of
know
n s
equence
s
Year
How much has been sequenced?
Environmentalsequencing
Page 4
Everybody inSan Diego
Everybody inUSA
AllculturedBacteria
100people
How much will be sequenced?
One genome fromevery species
Most majormicrobial environments
Page 5
What do we want from annotations?
ConsistentAccurateAvailableReliable
www.nmpdr.org www.theseed.org
Page 6
Consistent
www.nmpdr.org www.theseed.org
Page 7
The Importance of Consistency
• Consistency: same genes connected to same functional role
• Enables communication
• Required for most comparative genomics assays
www.nmpdr.org www.theseed.org
Page 8
hisAFIG function:
Phosphoribosylformimino-5-aminoimidazole carboxamide ribotide isomerase (EC 5.3.1.16)
Other functions in RefSeq:
phosphoribosylformimino-5-aminoimidazole carboxamidephosphoribosylformimino-5-aminoimidazole carboxamide ribotide isomerasephosphoribosylformimino-5-aminoimidazole carboxamide ribotide...
1-(5-phosphoribosyl)-5-[(5- phosphoribosylamino)methylideneamino] imidazole-4-carboxamide isomerase
N-(5-phospho-L-ribosyl-formimino)-5-amino-1-(5- phosphoribosyl)-4-imidazolecarboxamide isomeraseN-(5'-phospho-L-ribosyl-formimino)-5-amino-1-(5'-phosphoribosyl)-4-imidazolecarboxamide isomeraseN-(5'-phospho-L-ribosyl-formimino)-5-amino-1- (5'- phosphoribosyl)-4-imidazolecarboxamide isomeraseN-(5'-phospho-L-ribosyl-formimino)-5-amino-1- (5'-phosphoribosyl)-4-imidazolecarboxamide isomeraseN-(5'-phospho-L-ribosyl-formimino)-5-amino-1- (5'-phosphoribosyl)-4- imidazolecarboxamide isomerase
Phosphoribosyl isomerase A [1-[5-phosphoribosyl]-5-[[5-phosphoribosylamino]methylideneamino] imidazole-4-carboxamide isomerase]
www.nmpdr.org www.theseed.org
Page 9
Measuring Consistency• Define a set of protein families such that each family
contains genes playing the same function
• Attach functional roles to protein families• Measure the consistency of the annotations made to
genes within each family
1. "consistency" is the odds that two proteins from the same family have the same function
2. Evaluate both families and functions.
www.nmpdr.org www.theseed.org
Page 10
Consistency among databases
www.nmpdr.org www.theseed.org
Page 11
Accurate
www.nmpdr.org www.theseed.org
Page 12
How to measure accuracy
• If everything was called “hypothetical protein” the database would be 100% consistent
• Need to measure accuracy (specificity) as well as consistency
• Sample 100 proteins at random from “curated” set (i.e. that are believed to be correct)
• Manually inspect annotations to score correctness
www.nmpdr.org www.theseed.org
Page 13
Available
www.nmpdr.org www.theseed.org
Page 14
http://metagenomics.theseed.org
Free serviceUser registration/log inFree to upload sequences in several formatsAutomatically annotates sequencesDownload in several formats
Complete genomes too: http://www.nmpdr.org/anno-server
Soon to come:Plasmids, phages, other short genomes
Page 15
Metagenome Metabolic Reconstruction
Page 16
Metabolic potential in environments
Page 18
Comparing Metagenomes to Genomes(or other metagenomes!)
Page 19
Reliable (Believable)
Page 20
Metabolic potential in environments
Page 21
Sulfur
CDA 60.2%
CD
A 2
1.7
% Respiration
Capsule Motility
Membranetransport
Stress
Signaling
Phosphorus
RNA
MineSaltern
MarineMicrobialites
CoralFish
AnimalsFreshwater
From sequences to environments
Page 22
What do we want from annotations?
ConsistentAccurateAvailableReliable
When
do
we wan
t it?
NOW
Page 23
AcknowledgementsEnvironmental Genomics
Forest RohwerRohwer lab membersAll the labs that
provided sequence
Metagenomics Annotation ServerRick StevensDaniel Paarman Folker MeyerBob Olsen
StatisticsLiz DinsdaleDana HallBeltran Rodriguez-Brito
FIGRoss OverbeekVeronika VonsteinAnnotators
Page 25
Subsystems make up metabolism
Wik
ipedia
Meta
bolis
mhtt
p:/
/en.w
ikip
edia
.org
/wik
i/Port
al:M
eta
bolis
m