Top Banner
Evolution of Structure and Function in the o -Succinylbenzoate Synthase/N-Acylamino Acid Racemase Family of the Enolase Superfamily Margaret E. Glasner 1 , Nima Fayazmanesh 2 , Ranyee A. Chiang 1 Ayano Sakai 3 , Matthew P. Jacobson 2 , John A. Gerlt 3 and Patricia C. Babbitt 1,2 1 Department of Biopharmaceutical Sciences, University of California, San Francisco, CA 94143, USA 2 Department of Pharmaceutical Chemistry, University of California, San Francisco, CA 94143, USA 3 Departments of Biochemistry and Chemistry, University of Illinois, Urbana, IL 61801, USA Understanding how proteins evolve to provide both exquisite specificity and proficient activity is a fundamental problem in biology that has implications for protein function prediction and protein engineering. To study this problem, we analyzed the evolution of structure and function in the o-succinylbenzoate synthase/N-acylamino acid racemase (OSBS/ NAAAR) family, part of the mechanistically diverse enolase superfamily. Although all characterized members of the family catalyze the OSBS reaction, this family is extraordinarily divergent, with some members sharing <15% identity. In addition, a member of this family, Amycolatopsis OSBS/NAAAR, is promiscuous, catalyzing both dehydration and racemi- zation. Although the OSBS/NAAAR family appears to have a single evolutionary origin, no sequence or structural motifs unique to this family could be identified; all residues conserved in the family are also found in enolase superfamily members that have different functions. Based on their species distribution, several uncharacterized proteins similar to Amycola- topsis OSBS/NAAAR appear to have been transmitted by lateral gene transfer. Like Amycolatopsis OSBS/NAAAR, these might have additional or alternative functions to OSBS because many are from organisms lacking the pathway in which OSBS is an intermediate. In addition to functional differences, the OSBS/NAAAR family exhibits surprising structural variations, including large differences in orientation between the two domains. These results offer several insights into protein evolution. First, orthologous proteins can exhibit significant structural variation, and specificity can be maintained with little conservation of ligand-contacting residues. Second, the discovery of a set of proteins similar to Amycolatopsis OSBS/NAAAR supports the hypothesis that new protein functions evolve through promiscuous intermediates. Finally, a combination of evolutionary, structural, and sequence analyses identified characteristics that might prime proteins, such as Amycolatopsis OSBS/NAAAR, for the evolution of new activities. © 2006 Elsevier Ltd. All rights reserved. *Corresponding author Keywords: enolase superfamily; protein evolution; mechanistically diverse superfamily; substrate specificity; functional promiscuity Introduction The evolution of new protein functions is a major puzzle in biochemistry. Given that closely related proteins can have different functions, and distantly related proteins can have the same function, what kinds of structural alterations are required or tolerated during protein evolution? In addition, Abbreviations used: OSBS/NAAAR, o-succinylbenzoate synthase/N-acylamino acid racemase; MLE, muconate lactonizing enzyme; nr, non-redundant; HMM, Hidden Markov Model; AEE, L-Ala-D/L-Glu epimerase. E-mail address of the corresponding author: [email protected] doi:10.1016/j.jmb.2006.04.055 J. Mol. Biol. (2006) 360, 228250 0022-2836/$ - see front matter © 2006 Elsevier Ltd. All rights reserved.
23

Evolution of Structure and Function in the o-Succinylbenzoate Synthase/ N-Acylamino Acid Racemase Family of the Enolase Superfamily

Jan 11, 2023

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Evolution of Structure and Function in the o-Succinylbenzoate Synthase/ N-Acylamino Acid Racemase Family of the Enolase Superfamily

doi:10.1016/j.jmb.2006.04.055 J. Mol. Biol. (2006) 360, 228–250

Evolution of Structure and Function in theo-Succinylbenzoate Synthase/N-Acylamino AcidRacemase Family of the Enolase Superfamily

Margaret E. Glasner1, Nima Fayazmanesh2, Ranyee A. Chiang1

Ayano Sakai3, Matthew P. Jacobson2, John A. Gerlt3

and Patricia C. Babbitt1,2⁎

1Department ofBiopharmaceutical Sciences,University of California,San Francisco, CA 94143, USA2Department of PharmaceuticalChemistry, University ofCalifornia, San Francisco,CA 94143, USA3Departments of Biochemistryand Chemistry, University ofIllinois, Urbana, IL 61801, USA

Abbreviations used: OSBS/NAAAo-succinylbenzoate synthase/N-acylMLE, muconate lactonizing enzymeHMM, Hidden Markov Model; AEEepimerase.E-mail address of the correspondi

[email protected]

0022-2836/$ - see front matter © 2006 E

Understanding how proteins evolve to provide both exquisite specificityand proficient activity is a fundamental problem in biology that hasimplications for protein function prediction and protein engineering. Tostudy this problem, we analyzed the evolution of structure and function inthe o-succinylbenzoate synthase/N-acylamino acid racemase (OSBS/NAAAR) family, part of the mechanistically diverse enolase superfamily.Although all characterized members of the family catalyze the OSBSreaction, this family is extraordinarily divergent, with some memberssharing <15% identity. In addition, a member of this family, AmycolatopsisOSBS/NAAAR, is promiscuous, catalyzing both dehydration and racemi-zation. Although the OSBS/NAAAR family appears to have a singleevolutionary origin, no sequence or structural motifs unique to this familycould be identified; all residues conserved in the family are also found inenolase superfamily members that have different functions. Based on theirspecies distribution, several uncharacterized proteins similar to Amycola-topsis OSBS/NAAAR appear to have been transmitted by lateral genetransfer. Like Amycolatopsis OSBS/NAAAR, these might have additional oralternative functions to OSBS because many are from organisms lacking thepathway in which OSBS is an intermediate. In addition to functionaldifferences, the OSBS/NAAAR family exhibits surprising structuralvariations, including large differences in orientation between the twodomains. These results offer several insights into protein evolution. First,orthologous proteins can exhibit significant structural variation, andspecificity can be maintained with little conservation of ligand-contactingresidues. Second, the discovery of a set of proteins similar to AmycolatopsisOSBS/NAAAR supports the hypothesis that new protein functions evolvethrough promiscuous intermediates. Finally, a combination of evolutionary,structural, and sequence analyses identified characteristics that might primeproteins, such as Amycolatopsis OSBS/NAAAR, for the evolution of newactivities.

© 2006 Elsevier Ltd. All rights reserved.

Keywords: enolase superfamily; protein evolution; mechanistically diversesuperfamily; substrate specificity; functional promiscuity

*Corresponding author

R,amino acid racemase;; nr, non-redundant;, L-Ala-D/L-Glu

ng author:

lsevier Ltd. All rights reserve

Introduction

The evolution of new protein functions is a majorpuzzle in biochemistry. Given that closely relatedproteins can have different functions, and distantlyrelated proteins can have the same function, whatkinds of structural alterations are required ortolerated during protein evolution? In addition,

d.

Page 2: Evolution of Structure and Function in the o-Succinylbenzoate Synthase/ N-Acylamino Acid Racemase Family of the Enolase Superfamily

229Evolution of the o-Succinylbenzoate Synthase Family

what characteristics of a particular protein deter-mine its degree of evolvability, or the likelihood thatit will evolve a new function? Some previous workhas indicated that evolution often proceeds throughpromiscuous intermediates1–10 and that conforma-tional flexibility of surface loops near the active sitemight contribute to promiscuous substrate bindingand hence to the evolution of promiscuousfunctions.11 Unfortunately, there are still few pro-teins whose evolution, structure, and function havebeen analyzed in enough detail to fully evaluatethese hypotheses. With the advent of large-scalegenomic sequencing we are poised to answer thesequestions. Understanding how proteins evolve willhelp address several longstanding problems inbiochemistry, including how to redesign proteinsin the laboratory and how to predict function fromsequence and structure.Studying protein evolution requires identification

of homologous proteins that have evolved toperform different functions, such as those found inmechanistically diverse superfamilies. Mechanisti-cally diverse superfamilies are defined as assembliesof homologous proteins which are unified by acommon chemical attribute of catalysis, althoughoverall reactions can be quite different.4 Here, wefocus on the enolase superfamily, which includesenzymes catalyzing at least 14 different reactions.12All enolase superfamily enzymes utilize a commonpartial reaction in which a proton alpha to acarboxylate is abstracted by a base, leading to ametal-stabilized enolate anion intermediate. Apartfrom this conserved partial reaction, the overallreactions catalyzed by enzymes in this superfamilyare quite divergent, including racemization, β-elimination, and cycloisomerization. Very few resi-dues are required for the superfamily partialreaction; three metal-binding residues are wellconserved across the superfamily, but the identityand position of the general base is not universallyconserved.Enolase superfamily proteins are composed of

two domains, a ∼200 amino acid C-terminalmodified (β/α)8-barrel domain ((β/α)7β)) and a∼100–150 amino acid α+β domain comprised ofelements from both the N and C termini, which wecall the capping domain. As with other (β/α)8-barreldomain proteins, the active site is nestled in adepression formed by the C-terminal ends of the β-strands of the barrel domain. The capping domain isstructurally conserved among all members of theenolase superfamily and has not been found incombination with any other (β/α)8-barrel domainprotein superfamily, with domains of other folds, oras a single domain protein. Thus, it appears that thetwo domains have been co-evolving since the originof the enolase superfamily. The capping domaincloses the active site and appears to play a role indetermining substrate specificity and conformation-al changes that occur upon substrate binding. Thesefunctions are thought to be primarily mediated bytwo N-terminal loops, centered around positions 20and 50 (numbering defined relative to Escherichia coli

o-succinylbenzoate synthase; PDB identifier 1FHV),which will be referred to as the 20s and 50s loops. Inmost enolase superfamily members, the 20s loop isdisordered in the absence of ligand, and ordering ofthis loop upon substrate binding results in interac-tions with the ligand and shields the active site fromsolvent.13–18The enolase superfamily has been divided into

subgroups based on sequence clustering and theidentity and position of the catalytic residues.19Based on currently available sequences, the mostfunctionally diverse subgroup is the muconatelactonizing enzyme (MLE) subgroup, whichincludes enzymes catalyzing cycloisomerization(MLE), β-elimination (o-succinylbenzoate synthase),racemization (L-Ala-D/L-Glu epimerase), and prob-ably other uncharacterized functions.Understanding protein evolution is complicated

by difficulties in assigning functions to superfamilymembers. Categorizing superfamily members intofamilies, or groups of proteins sharing the samefunction, is often accomplished by establishing asequence similarity threshhold.20–24 However, fam-ilies in the enolase superfamily, as in other super-families, have most likely diverged at different ratesor at different times during evolutionary history,making it difficult to define a similarity score cutoffthat separates different isofunctional families. The o-succinylbenzoate synthase (OSBS) family poses aparticularly thorny problem. First, sequence simi-larity between some OSBSs barely exceeds randomsimilarity scores expected between unrelated pro-teins, making it impossible to define a similarityscore that encompasses all OSBSs but excludesproteins of other functions. Second, a promiscuousprotein from Amycolatopsis sp. T-1-60 that shares42% identity with the OSBS from Bacillus subtiliscatalyzes both OSB synthesis and N-acylamino acidracemization.25 Even experimental characterizationdoes not adequately determine the physiologicalfunction of this enzyme, since it catalyzes OSBsynthesis and racemization of N-succinylphenylgly-cine at equivalent rates.26 Thus, the OSBS/N-acylamino acid racemase (NAAAR) family is anespecially interesting subject for investigating pro-tein evolution because it includes both extremelydivergent enzymes having the same function andvery similar enzymes having different functions.Here, we have studied the evolution of the OSBS/

NAAAR family. This study begins to answerseveral questions about how function and structureevolve in extremely divergent protein families.First, what sequence and structural features mustbe conserved to maintain function in extremelydivergent families? Second, by what mechanismsdo proteins evolve new functions? And finally,what functional and structural characteristics of aprotein make it more or less capable of evolving anew function? Our study of the OSBS/NAAARfamily’s evolution demonstrates that sequence,structure, and modes of substrate binding aresurprisingly malleable. In addition, we have iden-tified a number of proteins of unknown function

Page 3: Evolution of Structure and Function in the o-Succinylbenzoate Synthase/ N-Acylamino Acid Racemase Family of the Enolase Superfamily

230 Evolution of the o-Succinylbenzoate Synthase Family

whose experimental characterization would bevaluable for understanding evolutionary relation-ships and structural determinants of catalysis in theenolase superfamily. We also determined that theaccuracy and extent of functional annotation couldbe improved using rigorous phylogenetic recon-struction accompanied by analysis of genomiccontext. Lastly, our in depth analysis of theevolution, structure and function of the OSBS/NAAAR family identified several characteristics ofAmycolatopsis OSBS/NAAAR which might enhanceits evolvability relative to other OSBSs.

Results

Identification of OSBS enzymes

To understand the evolution of the OSBS/NAAAR family, we began by identifying specieswhich must have OSBS activity. OSBS is anintermediate in the menaquinone (vitamin K2)biosynthesis pathway, which is essential for electrontransport in a wide variety of prokaryotes.27Because characterized OSBSs are highly divergent,the presence of proteins catalyzing other steps in themenaquinone pathway was used as a marker forspecies encoding OSBS (Figure 1). First, proteinssharing >40% identity with a characterized mena-quinone pathway enzyme were annotated as havingthat function if the alignment covered >90% of theirlengths. This corresponds to BLAST E-values of∼10−40–10−95 using the NCBI non-redundant (nr)protein database. Although using this threshold isexpected to produce some error,23,24 homologs ofmost menaquinone pathway proteins could only beidentified in a few of the species which are known toproduce menaquinone,28 suggesting that thisthreshold is fairly stringent. However, the menBprotein, the most highly conserved protein in thepathway (average percent identity of 58%), could beidentified in many genomes and served as a markerto identify species likely to encode the menaquinoneoperon. More distantly related menaquinone path-way proteins were identified by sequence similarity(BLAST E-values <10−20 relative to a reliablyannotated menaquinone pathway protein usingthe nr database) and proximity to other menaqui-none pathway genes (≤5 non-pathway genesintervening between a pair of menaquinone path-way genes). The combination of sequence similarityand genomic context is expected to identify ortho-logs, because consecutive enzymes in a pathway arerarely recruited together to function in a differentpathway.29 An important exception is that menBand menE both have homologs in carnitine metab-olism; however, the homologs of menE in carnitinemetabolism were too divergent to meet our criteria.Using these criteria, we identified 127 strains in

which at least five of the eight menaquinonepathway genes could be identified (Figure 2;Supplementary Data, Figure 1). In organisms in

which most menaquinone pathway genes wereidentified, some or all are co-localized in the genomeand are likely to be coregulated as operons. Geneorder is fairly well conserved among the γ-Proteo-bacteria (the phylum which includes E. coli),Bacteroidetes, and Firmicutes (the phylum whichincludes B. subtilis). In particular, the menF andmenD genes, whose proteins catalyze the first threesteps of the pathway, are adjacent in most specieswithin these groups. Among some Cyanobacteria,the menaquinone operon has been almost complete-ly fragmented. Gene order in the Actinobacteria isthe least similar to other organisms, and mostmenaquinone pathway genes are separated byintervening genes; thus, it is unclear whether theyare co-regulated.In a number of genomes, only one or two possible

menaquinone biosynthesis proteins could be identi-fied. UbiE, a methyltransferase which functions inboth menaquinone and ubiquinone synthesis, wasidentified in many genomes. Of species in whichother homologs of menaquinone pathway proteinswere found, four were in draft genomes whichmight encode the menaquinone pathway, and in sixa close homolog of menB was found which may bemore likely to function in benzoate degradation orfatty acid metabolism. This suggestion is based onthe observation that no other menaquinone path-way proteins can be identified in these genomes,adjacent genes are annotated as being in thesepathways, and homologs of menB which aredifficult to distinguish from menB have beencharacterized and shown to function in these path-ways. The remaining strains encode OSBS homo-logs, but no other menaquinone pathway proteinscould be identified. These OSBS homologs share>40% identity with B. subtilis OSBS or AmycolatopsisOSBS/NAAAR, raising interesting questions aboutthe evolution of new enzyme functions, as discussedbelow.In species in which most but not all of the

menaquinone pathway proteins were found, themost difficult proteins to identify were OSBS, menH(which is not fully characterized), and ubiE, whichcatalyzes the final step in the pathway and might notbe required in all species.30 As expected fromprevious work, OSBS is not well conserved, and itsgene was difficult to identify in many species, asshownbyopen arrows in Figure 2.However, possibleOSBSs could be identified in all finished and mostunfinished genomes in which the menaquinonepathway was identified. In most genomes, a geneencoding an enolase superfamily member is adjacentto a menaquinone pathway gene and is likely toencode OSBS; however, outside of the Firmicutes orγ-Proteobacteria, the sequence similarity of theputative OSBS rarely met our criteria for annotation.Also, in a few genomes there was no OSBS candidatenear a menaquinone pathway gene, but a geneencoding an MLE subgroup enzyme of unknownfunction could be identified elsewhere in the genome.The only genomes in which an OSBS candidate genecould not be identified were unfinished (Bacillus

Page 4: Evolution of Structure and Function in the o-Succinylbenzoate Synthase/ N-Acylamino Acid Racemase Family of the Enolase Superfamily

Figure 1. Menaquinone biosynthesis pathway of E. coli.27 The OSBS reaction is boxed. Compounds are abbreviated asfollows: TPP, thiamine pyrophosphate; SHCHC, 2-succinyl-6-hydroxy-2,4-cyclohexadiene-1-carboxylate; OSB, o-succinyl-benzoate; CoASH, coenzyme A; DHNA, 1,4-dihydroxy-2-naphthoate; DMK, demethylmenaquinone; SAM, S-adenosyl-methionine; and SAH, S-adenosylhomocysteine.

231Evolution of the o-Succinylbenzoate Synthase Family

cereus ATCC 14579, Haemophilus influenzae 86028NP,and Salmonella enteritidis); as these species are closelyrelated to E. coli or B. subtilis, it will be surprising ifthey do not encode an E. coli or B. subtilis-like OSBS,respectively.

Phylogeny of the OSBS/NAAAR family

The difficulty of unequivocally identifying OSBSsbased on sequence similarity and genome context isin agreement with the observation of Palmer et al.that OSBSs are extremely divergent and can share

<15% identity.25 In fact, some putative OSBSs arebarely recognizable as enolase superfamily mem-bers. For instance, sequence similarity searchesusing the OSBS from Bdellovibrio bacteriovorus as aquery identifies another very divergent, putativeOSBS as the best match, but the E-value (0.05) isbarely significant. Thus, we speculated that OSBSactivity might have evolved multiple times withinthe enolase superfamily. To investigate this hypoth-esis and to understand how the NAAAR-likeproteins from organisms lacking menaquinone arerelated to OSBS, we examined the phylogeny of a

Page 5: Evolution of Structure and Function in the o-Succinylbenzoate Synthase/ N-Acylamino Acid Racemase Family of the Enolase Superfamily

232 Evolution of the o-Succinylbenzoate Synthase Family

subset of the enolase superfamily comprised of 288sequences which includes all OSBS candidates, therest of the MLE subgroup, and any other enolasesuperfamily members which could not be assignedto a subgroup or family by Hidden Markov Models(HMMs) created to describe OSBS and other enolasesuperfamily members in the Structure-FunctionLinkage Database (SFLD).31,32 Contrary to our hy-pothesis, the phylogenetic tree of a representativesubset of these sequences demonstrated that allOSBSs and NAAAR-like proteins are included in asingle clade (Figure 3)†. Although the resolution atmany interior nodes is low, the branch confidencevalue separating the OSBS/NAAAR family from therest of the MLE subgroup is 1.00. This resultconfirms that the OSBSs identified by sequencesimilarity and genomic context, including those thatare too divergent to match the MLE subgroup HMMand those that are not encoded near other menaqui-none pathway genes, belong to the OSBS/NAAARfamily. In addition, this result strongly suggests thatthis family had a single evolutionary origin, becauserooting the tree with MLE or L-Ala-D/L-Gluepimerase (AEE), the closest known paralogs ofthe OSBS/NAAAR family,19 leaves the family as amonophyletic group.The other characterized proteins included in the

MLE subgroup phylogenetic tree are MLE and AEE.The characterized AEEs from B. subtilis (aee.Bacsu)and E. coli (aee.Escco) are part of a large clade en-compassing proteins from a diverse set of species,suggesting that these proteins are all AEEs. Howev-er, because the branch support for this clade is nothigh (0.76) and the genomic context of these proteinshas not been thoroughly examined, determiningtheir functions requires more study. Finally, anumber of proteins on the MLE subgroup tree donot cluster with the OSBS/NAAAR, MLE, or AEEfamilies, suggesting that there are several morecatalytic activities within the MLE subgroup re-maining to be discovered.If the OSBS/NAAAR family has a single evolu-

tionary origin (i.e. all family members are ortholo-gous), we expected its phylogenetic tree to be similarto trees built using other proteins or methods. Themain difficulty with comparing phylogenetic trees

† Phylogenetic trees for the MLE subgroup and theOSBS/NAAAR family were also constructed using onlythe capping or barrel domain (data not shown). For thewhole MLE subgroup, trees built using only the barreldomain were nearly identical to trees built using theentire protein, although the resolution was somewhatlower. In contrast, trees built using the capping domain,which is shorter and more divergent than the barreldomain, were highly multifurcating. For the OSBS/NAAAR family, trees constructed using only the cappingor barrel domain were nearly identical to trees built usingthe entire protein, although the resolution was somewhatlower. This data is consistent with the notion that domainshuffling is not likely to have occurred among differentmembers of the MLE subgroup or OSBS/NAAAR family,although it cannot be completely ruled out.

encompassing all menaquinone-producing organ-isms is that it has not been possible to generate fullyresolved prokaryotic evolutionary trees because ofextensive lateral gene transfer, variable evolutionaryrates, mutational saturation, and other factors thatlimit the statistical consistency and resolving powerof phylogenetic methods.33–36 However, compari-sons of more well resolved branches might provideinsight into whether lateral gene transfer or inclu-sion of paralogous proteins contribute to differencesbetween the OSBS/NAAAR family tree and othertrees.We compared the phylogeny of the OSBS/

NAAAR family (Figure 4(a)) to those of the menBand enolase families, which are much more highlyconserved (Figure 4(b), Supplementary Data, Figure2). In spite of the greater divergence of the OSBS/NAAAR family, all three trees had similar topolo-gies and resolution. With a few exceptions, well-resolved branches are in agreement with publishedprokaryotic phylogenies, but higher level clusteringof phyla (such as the reported Deinococcus/Cyano-bacteria/Actinobacteria group) is absent, which isnot unexpected, since these groups become appar-ent only when multiple genes or genome character-istics are used for tree construction.37–43While certain differences among the OSBS/

NAAAR, enolase, menB, and species trees mightbe artifacts of phylogenetic reconstruction, theunusual phylogenetic positions of some proteinsappear to have more biologically interesting expla-nations. For instance, the δ-Proteobacteria Desulfo-talea psychrophila (osbs.Desps) groups withBacteroidetes in both the OSBS/NAAAR andmenB trees, but not in the enolase tree (Supplemen-tary Data, Figure 2). Inspection of the sequencesdemonstrates that the D. psychrophila OSBS andmenB are much more similar to the Bacteroidetesproteins (percent identity >40% and >75%, respec-tively) than Proteobacteria (percent identity ≤26%and ≤62%, respectively). Thus, D. psychrophila OSBSand menB appear to be correctly positioned on thephylogenetic trees, suggesting that the menaqui-none operons of D. psychrophila and Bacteroidetesare related by lateral transfer. Another unusualfeature of the OSBS/NAAAR tree is that theArchaea do not cluster together. In fact, the onlyArchaea that have a menaquinone operon are thetwo Halobacteria (osbs.Halma and osbs.Hal). Al-though they cluster with the Actinobacteria in boththe OSBS/NAAAR and menB trees, the menaqui-none operon structures of the two groups are verydifferent; thus, it is possible that clustering of theActinobacteria and Halobacteria is an artifact ofphylogenetic reconstruction. In either case, it islikely that the Halobacteria attained the menaqui-none operon by lateral transfer, since no otherarchaeon has this pathway.The most striking feature of the OSBS/NAAAR

tree is the placement and taxonomic distribution ofthe NAAAR-like proteins. This cluster of proteinsencompasses not only the Firmicute OSBSs, but alsoproteins from a number of taxonomic groups,

Page 6: Evolution of Structure and Function in the o-Succinylbenzoate Synthase/ N-Acylamino Acid Racemase Family of the Enolase Superfamily

Figure 2. Genomic context ofmenaquinone biosynthesis genes from representative species.All identifiedmenaquinonesynthesis genes are shown as arrows; open arrows indicate provisional assignments, as defined in Materials andMethods.Menaquinone synthesis genes have been aligned to show similarities in gene order; as a result, spaces between genes are notproportional to the length of the DNA separating the genes. Each horizontal segment indicates a contiguousDNA segment.The genomes of some species have multiple chromosomes or have not been completely assembled, as indicated by gapsbetween segments. Hash marks indicate an intervening region encoding >40 genes. Smaller intervening regions are shownas light grey arrows with the number of intervening genes and their orientation on the chromosome indicated. Althoughgene neighborhood in plants does not suggest transcriptional coregulation as it does inmost prokaryotes, genome locationsof menaquinone synthesis genes in Arabidopsis thaliana are shown because two pairs of genes (MenF/MenD and MenC/MenH) are predicted to be gene fusions. Intriguingly, these two fusions are adjacent in the genome, and the gene orderresembles that found inmany bacteria, suggesting that this locus could be a remnant of DNA that was transferred from themitochondrial or chloroplast genome to the nucleus. For the complete list of species used in phylogenetic analysis and theirmenaquinone operons, see Supplementary Data, Figure 1.

233Evolution of the o-Succinylbenzoate Synthase Family

including Deinococcus-Thermus, Actinobacteria,Cyanobacteria, and Archaea. Because the resolutionof this part of the tree is low, we constructed aphylogenetic tree using only sequences in thisgroup, hoping to see a clear distinction betweenOSBS and NAAAR (Figure 5). The topology of theFirmicute OSBS/NAAAR subfamily tree differsslightly from its topology in the whole OSBS/NAAAR tree, but the branch confidence values arehigher, suggesting that this tree might be a betterrepresentation of the subfamily’s phylogeny.The Firmicute OSBS/NAAAR subfamily tree

contains several surprises. First, a few species havea NAAAR-like protein but do not appear to encodethe other menaquinone pathway proteins, suggest-ing that the NAAAR-like proteins have a physio-logical role distinct from menaquinone synthesis inthese species. Second, Erwinia carotovora (unk.Erwcaand osbs.Erwca) and Thermobifida fusca (unk.Thefu

and osbs.Thefu) encode both a NAAAR-like protein(which is not encoded in the menaquinone operon)and an OSBS (which is encoded in the menaquinoneoperon and, unlike their NAAAR-like proteins,clusters with OSBSs of species in the same phylaas these two organisms). This also suggests thatthese NAAAR-like proteins have a physiologicalfunction distinct from OSBS. Third, several speciesdo not encode OSBS in their menaquinone operonsbut have a NAAAR-like protein encoded elsewhere.Conceivably, these function physiologically asOSBS, but they might have an additional functionas well. This seems particularly likely for Oceanoba-cillus iheyensis (unk.Oceih) and Geobacillus kaustophi-lus (unk.Geoka), which cluster with E. carotovoraNAAAR. Likewise, the NAAAR-like proteins fromCrocosphaera watsonii (unk.Crowa) and Chloroflexusaurantiacus (unk.Chlau), which are Cyanobacteriaand Chloroflexi, respectively, cluster most closely

Page 7: Evolution of Structure and Function in the o-Succinylbenzoate Synthase/ N-Acylamino Acid Racemase Family of the Enolase Superfamily

Figure 3. Bayesian phylogenetic tree of the proteins in the MLE subgroup. A representative set of 54 proteins wasselected from the 288-protein subgroup by using only proteins sharing <40% identity. The predicted or verified function isindicated by the prefix osbs, aee, or mleI, and characterized proteins are indicated with an asterisk (*). Proteins ofunknown function are prefixed by unk. OSBS/NAAAR family members are shown in red, characterized AEEs are ingreen, and MLE I is in blue. Other possible AEEs are in gray, but they cluster with the characterized AEEs with onlymoderate statistical support. Proteins of unknown function are in black. Branch confidence values are indicated as filledcircles (≥0.95), open circles (0.7–0.94), or no indication (0.5–0.7).

234 Evolution of the o-Succinylbenzoate Synthase Family

with NAAARs from species that appear to lack themenaquinone pathway (albeit with mediocre branchconfidence values), suggesting that these NAAAR-like proteins are required for their OSBS activity andhave replaced the original Cyanobacteria or Chloro-flexi OSBS.Given the high sequence similarity of the

NAAAR-like proteins (most share >40% identitywith B. subtilis OSBS) and the fact that they arefound in very distantly related species, it seemslikely that this protein has been transmitted bymultiple lateral transfer events. For instance, the E.carotovora NAAAR might derive from an ancestorof O. iheyensis and G. kaustophilus, and there mayhave been separate transfer events to the twogroups of Archaea,the euryarchaeote Thermoplasmaclade (unk.Theac, unk.Thevo, unk.Ferac, and unk.Picto) and the crenarchaeote Aeropyrum pernix(unk.Aerpe). In summary, the phylogeny of theFirmicute OSBS/NAAAR subfamily does not

clearly differentiate between apparently monofunc-tional OSBSs such as that of B. subtilis andpromiscuous OSBS/NAAARs such as that ofAmycolatopsis. The presence of NAAAR-like pro-teins in species lacking the menaquinone pathwaysuggests that they have an unknown function,perhaps amino acid racemization.

Diversity of the OSBS/NAAAR family

Having performed a comprehensive survey of thedistribution of the OSBS/NAAAR family, we wereinterested in reevaluating the family’s diversity todiscoverwhether it is unusually divergent comparedto other protein families, as suggested previously.25Initially, we compared lengths of OSBS/NAAARfamily trees to tree lengths of other families in themenaquinone pathway or enolase superfamily. Treelength (measured as substitutions per site) isexpected to be the most accurate measure of

Page 8: Evolution of Structure and Function in the o-Succinylbenzoate Synthase/ N-Acylamino Acid Racemase Family of the Enolase Superfamily

Figure 4 (legend on next page)

235Evolution of the o-Succinylbenzoate Synthase Family

Page 9: Evolution of Structure and Function in the o-Succinylbenzoate Synthase/ N-Acylamino Acid Racemase Family of the Enolase Superfamily

Figure 5. Bayesian phylogenet-ic tree of the proteins in the Firmi-cute OSBS/NAAAR subfamily.Blue indicates that the OSBS isencoded in a menaquinone operon;green indicates that the OSBS/NAAAR protein is not encoded ina menaquinone operon, but thespecies has the menaquinone path-way; purple indicates that theNAAAR-like protein is not encodedin the menaquinone operon, but adifferent OSBS is encoded in themenaquinone operon; red indicatesthat there is no menaquinone oper-on detected in the species; grayindicates that the genome sequenceis unavailable. Branch confidencevalues are indicated as in Figure 3.

236 Evolution of the o-Succinylbenzoate Synthase Family

sequence divergence, because it corrects for multiplesubstitutions per site. In comparisons of trees builtusing sequences from the same set of species, thelength of OSBS/NAAAR trees were usually at leasttwice as long as those of other protein families,indicating that theOSBS/NAAAR family has indeedevolved at a much faster rate (data not shown).However, the topology of the OSBS/NAAAR treewas similar but rarely identical to the topology oftrees built using other families, even when usingsubsets of the OSBS/NAAAR family that are wellresolved on the phylogenetic tree.Because the significance of comparing lengths of

trees that have different topologies is uncertain, wealso calculated pairwise percent sequence identities,even though these are a more approximate measureof evolutionary distance. Comparison of OSBSs andmenBs from a wide taxonomic distribution agreewell with those previously reported, with menBproteins generally sharing >40% identity whileOSBSs from the same set of species generally share<30% identity.25 To gain a better perspectiveconcerning the divergence of the OSBS family, wecompared minimum and average percent identitiesof the OSBS family to other families in the enolasesuperfamily and menaquinone pathway (Table 1;Figure 1). For each comparison, the set of OSBSs and

Figure 4. Bayesian phylogenetic tree of the proteins in thvalues are shown as in Figure 3. (a) The OSBS/NAAAR familwas filtered to remove proteins sharing >94% identity withphylum,37 and arcs indicate the main subfamilies. ProteinsSargasso Sea data set.84 A plus sign (+) indicates NAAAR-likegenes could not be identified. An asterisk (*) identifies proteifound in strains which have the menaquinone pathway. Two ofusca (Thefu)), encode both an OSBS in the menaquinone operoThe menB family. Proteins are colored as in (a).

the set of proteins from the compared family weretaken from the same set of species. Compared toother families in the enolase superfamily, the OSBSfamily is unusually divergent. However, compari-son to other proteins in the menaquinone pathwayreveals a different picture. Although MenB isextremely well conserved, the sequence divergenceof MenD and MenE is more similar to OSBS. Onaverage, the OSBS family is slightly more divergentthan the MenD or MenE families, but becausepercent identity is only a rough approximation ofevolutionary distance, it is unclear whether theOSBS family is significantly more divergent thanthese proteins. Thus, although the OSBS family isunusually divergent for the enolase superfamily, it isless extraordinary compared to other proteins in itspathway.In addition to being more divergent than other

families in the enolase superfamily, the OSBS/NAAAR family is unusual in that it includesproteins catalyzing at least two different reactions.Surprisingly, the NAAAR-like proteins are notamong the more divergent proteins in the family,but are closely related to proteins identified as OSBSbased on genomic context and experimentalevidence.25 As shown above, phylogenetic analysisfailed to separate the NAAAR-like proteins into a

e OSBS/NAAAR and menB families. Branch confidencey. To build the tree, the full set of OSBS/NAAAR proteinsany other in the set. Proteins are colored according toin gray are environmental sequences derived from theproteins found in strains in which menaquinone synthesisns that are not encoded in menaquinone operons but aref these species (Erwinia carotovora (Erwca) and Thermobifidan and a NAAAR-like protein elsewhere in the genome. (b)

Page 10: Evolution of Structure and Function in the o-Succinylbenzoate Synthase/ N-Acylamino Acid Racemase Family of the Enolase Superfamily

Table 1. Relative divergence of the OSBS family

Family forcomparison

Number ofspecies a

Compared family b OSBSb

Average% identity

Minimum% identity

Average% identity

Minimum% identity

Enolase c 66 56 27 26 15Galactonate dehydratase c 8 55 32 31 20Glucarate dehydratasec, d 11 78 66 45 20AEE c 30 38 24 33 18MenB 67 58 35 26 14MenD 66 32 21 26 14MenE 67 27 14 26 14

a OSBSs were compared to proteins from a second family which were taken from the same set of species as the OSBSs.b Percentage identities were calculated as number identical/length of the longer sequence from pair-wise alignments generated by

ALIGN.83c Some NAAAR-like proteins not encoded in menaquinone operons are included in the OSBS family.d Glucarate dehydratase related protein, which has an unknown function was excluded.

237Evolution of the o-Succinylbenzoate Synthase Family

separate clade. In fact, most NAAAR-like proteinswhich are not encoded in menaquinone operonsshare >40% identity with B. subtilis OSBS. Only thegenomic position of the genes encoding NAAAR-like proteins hints that their function might differfrom the menaquinone operon-encoded OSBSs.

Conservation of sequence and structure in theOSBS/NAAAR family

Despite the high sequence divergence of theOSBS/NAAAR family, all proteins in the familyform a single clade in the MLE subgroup phyloge-netic tree, indicating that there must be conservedsequence information that differentiates this familyfrom the rest of the MLE subgroup. To identifyconserved residues specific to the OSBS/NAAARfamily, we compared the pattern of sequenceconservation among the OSBS/NAAAR, MLE, andAEE families. For this analysis, the OSBS/NAAARfamily was treated as a single unit or divided intosubfamilies representing clades containing at leastfive sequences (γ-Proteobacteria, Cyanobacteria,Bacteroidetes, Actinobacteria, and Firmicutes/NAAAR-like proteins), as indicated in Figure 4(a).Except for unk.Thefu (gi23018694 from Thermobifidafusca, discussed below), the NAAAR-like proteinswere included with the Firmicute OSBSs becausethey could not be cleanly separated based onphylogeny or the presence of the menaquinoneoperon. In addition, the AEEs were divided into twogroups comprised of close relatives of characterizedE. coli or B. subtilis epimerases because the cladeincluding both groups had poor statistical supporton the MLE subgroup phylogenetic tree (Figure 3).The pattern of sequence conservation is summa-

rized in Figure 6, in which residues conserved in>90% of subfamily members are highlighted in blue,and residues conserved in both >90% of thesubfamily and >90% of the entire MLE subgroupare highlighted in black. The only residues con-served throughout the entire MLE subgroup are thecatalytic residues in the barrel domain, except for thelysine on barrel domain strand β6 (Bar-β6), which is

replaced by tyrosine or arginine in some MLEsubgroup members, including one branch of theCyanobacteria OSBS subfamily. For these Cyano-bacteria OSBSs, an arginine at this position mighthave little effect on catalysis, because the lysine atthis position in E. coli OSBS appears to stabilize theenediolate intermediate rather than act as a generalacid/base catalyst.44 The other highly conservedresidues in the MLE subgroup appear to be involvedin maintaining the structure. For instance, theconserved elements of capping domain strand β3and helix α3 (Cap-β3 and Cap-α3) are adjacent andprobably important for capping domain structure,and the glycine before Bar-β6 is located in a tightturn. Other than these residues, the pattern ofsequence conservation is somewhat variable. Al-though some groups appear to have greaternumbers of conserved residues, this is mostlybecause these groups are small (e.g. the Bacteroi-detes group) or include sequences of limiteddiversity (e.g. MLE and AEE groups, in whichsequences share >40% identity). In comparison, theFirmicutes/NAAAR-like subfamily includes moredivergent sequences; it should be noted that themost divergent sequences in this group (osbs.Staau,osbs.Staep, osbs.Lacla, osbs.Desha, osbs.Leume, andosbs.Exi) are menaquinone operon-encoded OSBSs,not NAAAR-like proteins.Surprisingly, the results of this analysis indicate

that there are no conserved residues shared by allfive OSBS/NAAAR subfamilies, other than residuesalso shared with the rest of the MLE subgroup.Conserved residues within subfamilies are mostlikely to fall in regions near the active site, either ontwo loops of the capping domain or on the strandsor loops of the barrel domain. Although one or moreOSBS/NAAAR subfamilies often have conservedresidues at the same position, the identities of thoseresidues are rarely the same. In cases where theresidue identity is conserved, the same residue isoften present in the MLE or AEE families. Thus,although the OSBS/NAAAR family is phylogenet-ically unified and most, if not all (includingcharacterized NAAAR-like proteins) catalyze the

Page 11: Evolution of Structure and Function in the o-Succinylbenzoate Synthase/ N-Acylamino Acid Racemase Family of the Enolase Superfamily

Figure 6. Analysis of sequence conservation in the OSBS/NAAAR family. The sequence alignment showsepresentatives of each of the five OSBS/NAAAR subfamilies, the MLE family, and two AEE subfamilies. Theembership of each OSBS/NAAAR subfamily is shown in Figure 4(a), as indicated by the arcs, with the exception thate NAAAR-like T. fusca protein (unk.Thefu) was not included in this analysis. γ-Proteobacteria is represented bySBS.16130196.Escco; Cyanobacteria by OSBS.33864323.Proma; Bacteroidetes by OSBS.53712611.Bctfr; Actinobacteria bySBS.17367875.Myctu; and the Firmicute/NAAAR-like protein subfamily by NAAAR.2147746.Amy. The membership ofe AEE subfamilies and the MLE family consists of proteins sharing >40% identity with each sequence that is shown.lue residues indicate conservation in >90% of subfamily members, and black residues indicate conservation in both90% of the subfamily and >90% of the entire MLE subgroup. Gray numbers indicate the length of segments that are nothown. Secondary structure of the capping and barrel domains are indicated by Cap- and Bar-, respectively. Catalyticesidues are indicated by a five-pointed star below the sequences (★). Positions of residues lining the active site pocket aredicated for E. coli OSBS (•), Amycolatopsis OSBS/NAAAR (O), and B. bacteriovorus OSBS (✦, sequence not shown). Filledymbols represent residues <5 Å away from bound OSB, and open symbols indicate residues 5 Å to 6 Å away from thegand. The arrow indicates the position of the glutamate or aspartate to glycine mutation that confers OSBS activity on E.

238 Evolution of the o-Succinylbenzoate Synthase Family

rmthOOthB>srinsli

coli AEE or Pseudomonas sp. P51 MLE II.6
Page 12: Evolution of Structure and Function in the o-Succinylbenzoate Synthase/ N-Acylamino Acid Racemase Family of the Enolase Superfamily

Figure 7. Structural differences in the active sites ofOSBS/NAAAR family proteins. (a) Comparison of OSBbinding orientation. Amycolatopsis OSBS/NAAAR (1SJB)is in red, E. coli OSBS (1FHV) in cyan, and B. bacteriovorusOSBS in green. (b) Comparison of the 20s and 50s looppositions in E. coli OSBS and Amycolatopsis OSBS/NAAAR. The native structures are shown in the toppanels. In the bottom panels, the capping domain ofAmycolatopsis OSBS/NAAAR has been rotated to matchthe position of the E. coli OSBS capping domain (left), andthe E. coliOSBS capping domain has been rotated to matchthe Amycolatopsis OSBS/NAAAR capping domain (right).Metal binding residues and the metal ion are shown ingreen, the Bar-β2 lysine that acts as the general base isshown in blue, the Bar-β6 lysine required for catalysis is inpurple, and residues on the 20s and 50s loops that contactthe ligand are in orange. The carbon from which theproton is abstracted is shown in black.

239Evolution of the o-Succinylbenzoate Synthase Family

OSBS reaction, there are no unique OSBS/NAAARfamily motifs to differentiate them from other MLEsubgroup members.To understand how substrate specificity is con-

served with so little sequence conservation, wecompared the structures of E. coli OSBS bound tothe substrate or OSB (1FHV and 1R6W), Amycola-topsis OSBS/NAAAR bound to OSB (1SJB), and B.bacteriovorus OSBS bound to OSB (coordinatesgenerously provided by Alexander Fedorov, ElenaFedorov and Dr Steven Almo, Albert EinsteinCollege of Medicine).17,44,45 In all three structures,residues lining the active site pocket are in homol-ogous positions, and these residues tend to be morehighly conserved within and between subfamiliesthan regions distant from the active site (Figure 6).The structures exhibit similar hydrophobic interac-tions between the benzene ring of OSB and the 50sloop, in which at least one of the residues interactingwith ligand is aromatic. Most members of the OSBS/NAAAR family (and many other members of theMLE subgroup) have aromatic residues at one orboth positions, suggesting that this hydrophobicpocket is important for ligand binding.In contrast to these similarities, there are also some

striking differences in active site structure, whichmight contribute to differences in function andinherent evolvability. As previously reported, theconformation of OSB differs in the Amycolatopsis andE. coli enzymes.45 In Amycolatopsis, the succinyl tailof OSB is extended, while it is bent in E. coli and B.bacteriovorus (Figure 7(a)). Likewise, the succinyl oracetyl moieties of N-acylamino acid substrates alsolie in extended conformations in AmycolatopsisOSBS/NAAAR. For N-succinyl-methionine, thisconformation provides suitable hydrogen bonddonors and acceptors, which are unavailable in E.coliOSBS, accounting for the inability of E. coliOSBSto racemize this substrate.45The second major difference among these struc-

tures is the position of the 20s loop (Figure 7(b), top).In spite of its proximity to the active site, the 20s loopis poorly conserved within and between differentsubfamilies. The lack of conservation might beexplained by the necessity of compensatory muta-tions to accommodate other structural changes, suchas shifts in the orientation between the two domains,although there might also be consequences for thecatalytic activity (see below). In AmycolatopsisOSBS/NAAAR bound to OSB, the 20s loop contactsthe catalytic lysine that acts as a general base (thesecond lysine in the KXK motif), sandwiching itbetween the loop and the barrel and orienting itappropriately for proton abstraction. In contrast, the20s loop of E. coli OSBS bound to either substrate orproduct does not contact the barrel, leaving theactive site slightly open and the catalytic lysinedisordered and solvent accessible. Similarly, thecatalytic lysine is also solvent accessible in B.bacteriovorus OSBS, although the 20s loop is disor-dered, even when OSB is bound (data not shown).In addition to comparing active site structure, we

analyzed overall structural differences. Examination

of all pairwise superpositions of the three membersof the OSBS/NAAAR family and MLE I fromPseudomonas putida (1MUC) revealed significantstructural differences (Figure 8; Tables 2 and 3).Intriguingly, Amycolatopsis OSBS/NAAAR is moresimilar to MLE I than it is to the other OSBSs. Whilethis correlates with the central positions of MLE Iand Amycolatopsis OSBS/NAAAR in the MLEsubgroup phylogenetic tree (Figure 3), it is remark-able that structural differences are more pronouncedbetween enzymes catalyzing the same reaction thanbetween those catalyzing different reactions. Oneobvious structural difference among the enzymes isthat the capping domains of the OSBSs are poorlyaligned. To investigate this difference further, thecapping and barrel domains were aligned separate-ly. This resulted in better alignments of bothdomains, in which structural differences tend to belocated at the surfaces of the proteins (Figure 8).Thus, orientation between the domains differsamong the OSBS/NAAAR enzymes.To quantify these differences in domain orienta-

tion, we measured the angle of rotation of thecapping domain between pairs of structures inwhich the barrel domain had been superposed.This was accomplished by using sets of structurallyaligned residues in the capping domains to defineplanes representing capping domain orientation andmeasuring the dihedral angle between these planes.Because defining planes in this manner depends onwhich residues are used, the calculation was

Page 13: Evolution of Structure and Function in the o-Succinylbenzoate Synthase/ N-Acylamino Acid Racemase Family of the Enolase Superfamily

Figure 7 (legend on previous page)

240 Evolution of the o-Succinylbenzoate Synthase Family

repeated several times with different sets of resi-dues, revealing similar results (data not shown).Table 3 shows differences in capping domain

orientation between pairs of structures in which the

capping or barrel domains are superposed. A slightrotation (3°–4°) between the two domains is ob-served when comparing structures of E. coli or B.bacteriovorus OSBSs with and without ligand. A

Page 14: Evolution of Structure and Function in the o-Succinylbenzoate Synthase/ N-Acylamino Acid Racemase Family of the Enolase Superfamily

Figure 8. Overall structural differences within the OSBS/NAAAR family. Superpositions of the whole protein areshown at the top, superpositions of the capping domain are shown in the middle, and superpositions of the barrel domainare shown at the bottom. Colored segments show regions where the aligned alpha carbon atoms are >3 Å apart or wherethere is an insertion in one sequence relative to the other. Segments in yellow correspond to disordered regions in the otherstructure. AmycolatopsisOSBS/NAAAR (1SJB) is in red, E. coliOSBS (1FHV) in cyan, B. bacteriovorusOSBS in green, and P.putida MLE I (1MUC) in blue.

241Evolution of the o-Succinylbenzoate Synthase Family

somewhat higher degree of rotation is observedbetween Amycolatopsis OSBS/NAAAR and MLE Iand between B. bacteriovorus OSBS and MLE I.Because no ligand is bound to MLE I, rotations ofthis magnitude might reflect conformational differ-ences due to ligand binding as well as slightstructural differences between different proteins. Inthe remaining comparisons, the rotation of thecapping domain is significantly higher than thatobserved for liganded versus unliganded structuresof the same protein. In particular, the cappingdomain of E. coli OSBS is rotated 13.3° or 17.7°relative to those of MLE I and Amycolatopsis OSBS/NAAAR, respectively. We hypothesize that thesestructural differences might contribute to differences

in binding specificity and catalysis among theseenzymes, as well as to their capacities to evolve newfunctions, as discussed below.In order to understand the consequences of

domain orientation on the structure of the activesite and the function of the enzymes, we analyzedthe effect of twisting the E. coli OSBS cappingdomain to match the orientation of the Amycola-topsis OSBS/NAAAR capping domain (Figure 7(b),bottom). To do this, the capping and barreldomains were superimposed separately on theAmycolatopsis enzyme. Twisting the E. coli cappingdomain shifts the 20s and 50s loops ∼6 Å downtoward Bar-β2. As a result, the 20s loop is nolonger in contact with the ligand. Instead, it now

Page 15: Evolution of Structure and Function in the o-Succinylbenzoate Synthase/ N-Acylamino Acid Racemase Family of the Enolase Superfamily

Table 3. Comparison of capping domain orientationbetween pairs of structures

Angle withbarrel domainsaligned (deg.)

Angle withcapping domainsaligneda (deg.)

1FHV versus 1FHU 3.3 1.1BDEBA-OSB versus

BDEBA-apo1.0° (3.7°) b 0.5 (0.9)

1SJB versus 1MUC 5.5 0.3BDEBA-OSB versus 1MUC 4.2 1.2BDEBA-apo versus 1MUC 5.2 (7.8) 1.2 (1.2)BDEBA-OSB versus 1SJB 8.3 1.31FHV versus BDEBA-OSB 9.5 1.11FHU versus 1MUC 11.8 0.31FHV versus 1MUC 13.3 1.41FHV versus 1SJB 17.7 1.7

Abbreviations: 1FHV, E. coli OSBS bound to OSB; 1FHU, E. coliOSBS without ligand; BDEBA-OSB, B. bacteriovorus OSBS boundto OSB; BDEBA-apo, B. bacteriovorus OSBS without ligand; 1SJB,Amycolatopsis OSBS/NAAAR bound to OSB; and 1MUC, MLE Iwithout ligand.

a As a control for defining planes that reflect domainorientation and not other structural differences, we calculateddihedral angles between planes from superpositions in whichonly the capping domains were superposed. Since dihedral anglesbetween these planes should approach zero if they reflect domainorientation alone, the set of residues from all structures thatminimizes this angle was used to calculate dihedral angles whenthe barrel domain was aligned. Dihedral angles between alignedcapping domains of different proteins were <2° for the residue setthat minimizes this angle. These values are comparable to thoseobtained by comparing liganded versus unliganded structures ofE. coli and B. bacteriovorus OSBSs, suggesting that structural

Table 2. Comparison of root-mean-square deviation(RMSD) a between pairs of structures

Wholestructure (Å)

Barreldomain (Å)

Cappingdomain (Å)

1FHV versus 1SJB 3.4 (1.4) 2.4 (1.6) 1.8 (1.0)1FHV versus 1MUC 3.3 (1.7) 2.2 (1.6) 1.9 (1.2)1FHV versus BDEBA 4.1 (1.4) 3.2 (1.7) 4.7 (1.5)BDEBA versus 1SJB 2.7 (1.2) 3.4 (1.6) 2.1 (1.5)BDEBA versus 1MUC 2.8 (1.3) 2.6 (1.7) 2.0 (1.4)1SJB versus 1MUC 1.3 (0.7) 1.3 (1.0) 0.7 (0.5)

Abbreviations: 1FHV, E. coli OSBS bound to OSB; BDEBA, B.bacteriovorus OSBS bound to OSB; 1SJB, Amycolatopsis OSBS/NAAAR bound to OSB; and 1MUC, MLE I without ligand.

a Calculated using the same number of atoms in eachcomparison (264 for the whole structure, 163 for the barreldomain, and 98 for the capping domain). Similar trends areobserved using fewer atoms to calculate the RMSD (125 for thewhole structure, 121 for the barrel domain, and 65 for the cappingdomain), as shown in parentheses.

242 Evolution of the o-Succinylbenzoate Synthase Family

approaches the catalytic lysine of the KXK motif,which is disordered in the E. coli structures. Havingthe 20s loop in this position would prevent thislysine from adopting an extended conformation,possibly forcing it into the active site toward thesubstrate. When the converse experiment is per-formed and the Amycolatopsis capping domain istwisted to match that of E. coli, the 20s and 50sloops shift ∼6 Å away from the barrel so that the50s loop is no longer in contact with the ligand. Inthis position, the 20s loop barely contacts thesecond lysine of the KXK motif, leaving it mostlyexposed to solvent outside the active site.Although we have only shifted the orientations of

the two domains and have not refined the models toameliorate steric hindrances or reposition loopresidues into more favorable conformations, theseresults suggest that proper orientation of the cappingand barrel domains is required for positioning thecatalytic lysine for catalysis in Amycolatopsis OSBS/NAAAR. For E. coli OSBS, these results suggest twopossibilities. First, perhaps the flexible lysine isresident in the active site often or long enough forcatalysis. Second, it is also conceivable that thecrystal structures of E. coli OSBS bound to eithersubstrate or product do not capture the structureof the enzyme in the transition state. As inAmycolatopsis OSBS/NAAAR, repositioning the 20sloop through domain rotation or other conformationchanges might be required in order to correctlyposition the lysine for catalysis. The fact that the 20sloop is disordered in B. bacteriovorus OSBS in thepresence of ligand provides some support for thelatter possibility.

differences other than domain rotation marginally affect thismeasurement.

b Because differences in domain orientation could be anartifact of crystal packing, dihedral angles were calculated forall available chains. For 1MUC and 1SJB, this resulted indifferences of <0.7° (data not shown). However, the dihedralangle between the two chains of the BDEBA-apo structure was2.7° when the barrel domains were aligned. Thus, dihedralangles are given for both chains, with those of chain A inparentheses.

Discussion

Changes in protein structure during evolution

Investigating the evolutionary relationshipsamong the OSBS and NAAAR-like proteins of the

enolase superfamily uncovered several surprisingobservations. The most remarkable are that theseproteins exhibit significant structural variation andthat sequence motifs unique to the OSBS/NAAARfamily, which distinguish it from other families inthe enolase superfamily could not be identified, inspite of the fact that OSBS activity has beenconserved and the family appears to have a singleevolutionary origin.This raises the question of how enzyme specificity

can be maintained over the course of evolution.Some structural differences would be expectedbetween Amycolatopsis OSBS/NAAAR and theother two OSBSs, since the Amycolatopsis enzymehas an additional activity. However, structuraldifferences as exemplified by both RMSD anddomain orientation are at least as great between E.coli and B. bacteriovorus OSBSs. One way in whichspecificity might be maintained during evolution isthrough compensatory mutations and structuralflexibility of surface loops that close the activesite.11 In the three OSBS/NAAAR family structures,the function of the 50s loop appears to be conserved,

Page 16: Evolution of Structure and Function in the o-Succinylbenzoate Synthase/ N-Acylamino Acid Racemase Family of the Enolase Superfamily

243Evolution of the o-Succinylbenzoate Synthase Family

since it is structurally well aligned and forms ahydrophobic binding pocket for the benzene ring(Figure 6). The ring is anchored at one end by thecarboxyl group binding to the metal ion and by the50s loop at the other. Mutations that affect theorientation of the benzene ring could be accommo-dated by structural reorganization and mutations ofthe 50s loop, such as the small insertion observed inthe Amycolatopsis enzyme.The 20s loop is also likely to play an important

role in maintaining, and perhaps altering enzymespecificity. In most enolase superfamily members,this loop is disordered in the absence of ligand.13–18In addition to being less well-conserved than the 50sloop, the 20s loop is not well-aligned in thestructures of Amycolatopsis OSBS/NAAAR and E.coli OSBS bound to OSB, and it is disordered in B.bacteriovorusOSBS bound to OSB. The flexibility andapparent mutability of this loop suggest that itcould have co-evolved with other sequence andstructure elements (such as those determiningdomain orientation) to maintain substrate binding.In addition, the flexibility of this loop might allowpromiscuous binding and reactions with newsubstrates without impairing OSBS activity, leadingto the evolution of new protein functions, such asNAAAR activity.10,11While the role of flexible loops in maintaining

OSBS activity is somewhat speculative, it has alsobeen proposed that structural requirements forcatalysis are relatively permissive because theOSBS reaction is highly exergonic and can proceeduncatalyzed at significant rates.25,46 In all threeOSBS/NAAAR family structures, interactions withOSB are largely hydrophobic, and most hydrogenbonds are formed with water or residues conservedin the whole MLE subgroup (Alexander Fedorov,Elena Fedorov and Dr Steven Almo, unpublishedresults).17,45 Thus, it appears that interaction withsubgroup-conserved residues is sufficient for cor-rectly orienting the substrate for catalysis, and theonly additional requirement is a hydrophobic cavityof an appropriate size and shape. Additionalevidence for this is supplied by single point muta-tions in Pseudomonas sp. P51 MLE II and E. coli AEEwhich confer OSBS activity on these enzymes.6These mutations are located at the same position inBar-β8 and exchange an aspartate or glutamate for aglycine, creating space to accommodate the succinyltail of OSB if it is bound in the same conformation asin E. coli OSBS (Figures 6 and 7(a)).The differences in substrate binding and overall

structure in only three of the divergent OSBS/NAAAR subfamilies raises the question of howmany strategies for substrate binding there might beand whether or not there are additional promiscu-ous and perhaps biologically relevant activitiescatalyzed by members of this family. All OSBS/NAAAR subfamilies exhibit variation in length. Theregions preceding and following Cap-α3 are espe-cially variable throughout the family, and the entireregion, including Cap-α3, is deleted from allmembers of the Actinobacteria subfamily, except

Tropheryma whipplei (osbs.Trowh) (Figure 6, repre-sented by Mycobacterium tuberculosis). In Amycola-topsis OSBS/NAAAR, these regions include helicalsections and are at the oligomeric interface of theoctamer.45 In contrast, these regions are shorter andnot helical in E. coli and B. bacteriovorus OSBSs,which are monomers (A.S. & J.A.G., unpublisheddata).17 Thus, much of the length variation in theseenzymes appears to have altered their oligomericstructure.The structural differences among members of the

OSBS/NAAAR family appear surprisingly large,but it is unknown whether such differences aretypical among orthologous proteins. Although anumber of orthologous proteins from distantlyrelated species have been characterized structural-ly, detailed structural comparisons have not alwaysbeen performed. The structures of several enolasesfrom both eukaryotes and prokaryotes have beensolved, and they share much more similarity thanthe OSBS/NAAAR family proteins do, whichcorrelates with their much higher sequence conser-vation (RMSD=0.5 Å–0.6 Å when 265 atoms arealigned between enolases from five differentspecies; see Tables 1 and 2 for comparison).13,47–51In contrast, the ribonucleotide reductase family ismore mechanistically and structurally divergentthan the OSBS/NAAAR family. Whereas themechanism for OSBS synthesis has been conservedin the OSBS/NAAAR family, the ribonucleotidereductases can be divided into three classes whichemploy different means of radical generation, haveslightly different substrate preferences, and share<10% sequence identity even though they catalyzethe same general reaction utilizing a conservedcysteine radical.52,53 These proteins have a commonten-stranded α/β barrel, but each class hasdifferent insertions and deletions resulting in largerstructural variations than observed in the OSBS/NAAAR family, which has limited numbers ofinsertions and a conserved bi-domain structure. Amore thorough analysis of allowable structuralvariation in other protein families will be requiredto determine to what degree structural variationsreflect functional divergence in different proteinfamilies.

Changes in protein function during evolution

In addition to the surprising structural variationsin the OSBS/NAAAR family, we have discoveredthat at least one new function has apparentlyevolved within the family and has been transmittedby lateral transfer to diverse species (see below).Recently, we have begun characterizing otherNAAAR-like proteins and have discovered thatthose from Deinococcus radiodurans, Thermus thermo-philus, and Geobacillus kaustophilus are also promis-cuous (A.S. & J.A.G., unpublished results). Inaddition, the genes for these proteins are adjacentto succinyltransferase genes, suggesting that race-mization ofN-succinylamino acids is their biologicalfunction.54 While D. radiodurans and T. thermophilus

Page 17: Evolution of Structure and Function in the o-Succinylbenzoate Synthase/ N-Acylamino Acid Racemase Family of the Enolase Superfamily

244 Evolution of the o-Succinylbenzoate Synthase Family

do not appear to have the menaquinone pathway, agene encodingOSBS ismissing fromG. kaustophilus’smenaquinone operon, suggesting that its NAAAR-like protein might also function in the menaquinonepathway. Other NAAAR-like proteins found inorganisms whose menaquinone operons are missingan OSBS gene might also have two biologicallyrelevant activities. Some of these, such as theNAAAR-like proteins from E. carotovora and O.iheyensis, are found in species that appear to lacksuccinyltransferases, suggesting that they mightfunction only as OSBSs or have additional, unknownfunctions.The identification of a whole set of related,

promiscuous proteins strongly supports the roleof promiscuity in protein evolution and the ideathat new activities can evolve prior to geneduplication.1–10 The fact that several NAAAR-likeproteins are promiscuous and the strong statisticalsupport for their position in the phylogenetic treesupport a scenario in which racemization activityarose in a Firmicute ancestor. The converse hy-pothesis that OSBS activity evolved in an ancestralracemase seems less likely given that OSBS is morewidespread, more divergent and plays an essentialmetabolic role in many prokaryotes. The possibilitythat the ancestor of the entire OSBS/NAAARfamily was promiscuous for the two activitiescannot be ruled out, however.Although sequence and phylogenetic analysis

could not separate most NAAAR-like proteinsfrom operon-encoded OSBSs, one sequence doesstand out. This protein (unk.Thefu, gi23018694) isfound in the Actinobacterium T. fusca, which hasboth this NAAAR-like protein and an operon-encoded Actinobacteria-like OSBS. What makesthis protein unique is that, although it shares 35%identity with Amycolatopsis OSBS/NAAAR, itscatalytic residues differ from all other members ofthe MLE subgroup. Instead of the conserved KXKmotif on Bar-β2, this protein has RLH; DGGreplaces DXN on Bar-β3, and there is an arginineinstead of the conserved lysine on Bar-β6. Becausearginine is expected to be a poor general base, itseems unlikely that this enzyme is a racemase,which would require a base on both sides of theactive site. The presence of histidine instead oflysine on Bar-β2 also suggests that this proteinmight have a different activity. Thus, it appears thatat least three activities may have evolved within theOSBS/NAAAR family.

Evolvability of the NAAAR-like proteins

Consideration of the structural and functionalvariation between the Firmicute/NAAAR-like sub-family and other apparently monofunctional OSBSsprompts the question of what structural differencescontribute to the ostensible functional evolvability ofthe Firmicute/NAAAR-like subfamily. While it ispossible that any of the other OSBSs might haveuncharacterized, promiscuous activities, one ormore new activities appear to have evolved in the

Firmicute/NAAAR-like subfamily. In addition, thestronger structural similarity between AmycolatopsisOSBS/NAAAR and MLE compared to E. coli OSBSraises the possibility that the structural configurationof the Amycolatopsis enzyme could be more suitablefor evolving new functions. Alternatively, its simi-larity to MLE might also reflect similarities in theirquaternary structures, since both AmycolatopsisOSBS/NAAAR and MLE are octamers, while E.coli and B. bacteriovorusOSBSs are monomers (A.S. &J.A.G., unpublished data).17,45,55If its higher structural similarity to proteins

outside the OSBS/NAAAR family reflects its suit-ability as a scaffold for evolving new functions,what structural features of Amycolatopsis OSBS/NAAAR might facilitate this? The major structuraldifference between Amycolatopsis OSBS/NAAARand E. coli OSBS is the orientation of the cappingdomain and its effect on the position of the 20sloop. Whereas the 20s loop contacts and orients thecatalytic lysine in Amycolatopsis OSBS/NAAAR, itis more distant from the ligand in E. coli OSBS,leaving the catalytic lysine disordered and exposedto solvent. This structural difference might affectthe kinetic constants of the two enzymes. The kcatvalue of Amycolatopsis OSBS/NAAAR is tenfoldhigher than that of E. coli OSBS, possibly becausethe catalytic lysine is held in a more appropriateposition. However, the kcat/Km value of theAmycolatopsis enzyme is sixfold lower.25 Inasmuchas Km reflects the strength of substrate binding, thissuggests that substrate affinity is as much as 40-fold higher for E. coli OSBS. Thus, E. coli OSBSmight be evolutionarily optimized to maximize thestrength of substrate binding; mutations that altersubstrate binding might be more deleteriousrelative to similar mutations in the Amycolatopsisenzyme, rendering E. coli OSBS less likely to gainpromiscuous functions during evolution and there-fore less likely to be a source for novel enzymaticactivities. In contrast, the position of the 20s loop inAmycolatopsis OSBS/NAAAR might be optimizedto maximize kcat by clamping down on the catalyticlysine to hold it in position and more effectivelyclose the active site. This could result in relaxationof constraints on substrate binding such thatdecreases in substrate affinity would remain withinphysiological tolerances. As a result, the interiorsize and shape of the active site could evolve toallow promiscuous binding and catalysis, leadingto the evolution of new protein functions. Thus, therelatively minor alterations in the size, shape andpotential hydrogen bond donors and acceptorsbetween Amycolatopsis OSBS/NAAAR and E. coliOSBS may not have been sufficient to allowproductive binding and catalysis of N-acylaminoacids in the Amycolatopsis enzyme by themselves.Instead, other aspects of the protein structureincluding domain orientation and the position ofthe 20s loop may have also been required in orderto produce an ‘‘evolvable’’ environment whichcould tolerate these mutations. This argument isspeculative and assumes that the conformation of

Page 18: Evolution of Structure and Function in the o-Succinylbenzoate Synthase/ N-Acylamino Acid Racemase Family of the Enolase Superfamily

245Evolution of the o-Succinylbenzoate Synthase Family

the capping domain in the E. coli OSBS structures isnot an artifact of crystallization, but reflects eitherthe catalytically competent form of the enzyme or amore stable form of the enzyme which requires aconformation change to reach the transition state,resulting in a lower apparent kcat value relative tothat of Amycolatopsis OSBS/NAAAR. It might bepossible to test this hypothesis by experimentalevolution and by identifying mutations that affectsubstrate binding or the position of the 20s loopand studying their effects on catalysis.Although some of the characteristics suggesting

that the Firmicute/NAAAR subfamily might beparticularly evolvable are specific to this subfamily,a number of them may be more generally applicableto other protein families. First, more highly evolv-able proteins are expected to share more similaritieswith functionally different proteins in their super-family than less evolvable proteins, as manifested bya central location in their superfamily’s phylogenetictree and structural similarities with functionallydifferent proteins in their superfamily. Second, theremight be enzymatic traits such as highly optimizedkcat values, which enhance the likelihood that highlyevolvable proteins are promiscuous. Examiningthese characteristics in the OSBS/NAAAR familyas well as expanding these ideas to other super-families will be necessary to test these hypotheses. Inaddition, such detailed analysis of structure–func-tion relationships might be extremely valuable foridentifying scaffolds that are particularly amenableto protein engineering.56,57

Lateral gene transfer in the OSBS/NAAAR family

Phylogenetic comparison of the OSBS/NAAAR,menB, and enolase families revealed several prob-able instances of lateral gene transfer. Lateral genetransfer is a major driving force of prokaryoticgenome evolution, accounting for the origin of asmuch as 15% of the genes in some species.58–63Evidence for lateral gene transfer has been inferredfrom nucleotide composition, codon bias, unusualspecies distributions of genes, sequence similarity,and phylogenetic analysis, which is considered themost robust method.61,62 Although extensive sta-tistical comparisons of phylogenetic trees arebeyond the scope of this paper, several instancesof lateral gene transfer in the OSBS/NAAARfamily are supported by phylogeny, species dis-tributions of genes, and sequence similarity. Forexample, the Halobacteria are the only Archaeathat encode the menaquinone operon, and the geneorder of the operon resembles that of menaquinoneoperons in other species (Figures 2 and 4(a)).Lateral transfer of this operon in Halobacterium sp.NRC-1 (osbs.Hal) was previously detected bysequence similarity to bacterial proteins and anom-alous nucleotide composition.64 The menaquinoneoperon of the δ-Proteobacteria D. psychrophila alsoappears to have been the donor or recipient oflateral transfer, since both its OSBS and menBcluster with the Bacteroidetes.

The most compelling examples of lateral genetransfer are among the NAAAR-like proteins.These cluster with the Firmicute OSBSs but arefound in extremely distant species. In addition, theFirmicute/NAAAR-like subfamily exhibits rela-tively high levels of sequence identity, averaging36% identity (41% excluding OSBSs encoded inmenaquinone operons), compared to 26% for thewhole OSBS/NAAAR family. The NAAAR-likeproteins are found in γ-Proteobacteria, Cyanobac-teria, Actinobacteria, Deinococcus/Thermus,Chloroflexi, and Archaea, suggesting that thisprotein has been transferred multiple times. Thefact that the archaeal sequences do not clustertogether suggests that the proteins were transferredin separate events.Characterized NAAAR-like proteins exhibit both

NAAAR and OSBS activity (A.S. & J.A.G., unpub-lished results), and some of these might be requiredto perform both functions in vivo. For instance, thecyanobacterium Crocosphaera watsonii does notencode a cyanobacterial OSBS, but it does encodea NAAAR-like protein, which may be required forboth NAAAR and OSBS activities and could havereplaced the original cyanobacterial OSBS. Twoother species, the actinobacterium Thermobifidafusca and the γ-proteobacterium Erwinia carotovora,have complete menaquinone operons encodingactinobacterial or γ-proteobacterial OSBSs, respec-tively, in addition to a NAAAR-like protein. Whilethe NAAAR-like protein of E. carotovora mighthave OSBS activity, like other characterizedNAAAR-like proteins, it is possible that the T.fusca NAAAR-like protein lacks OSBS activitybecause of mutations in the catalytic residuesdiscussed above.Rejecting lateral gene transfer as an explanation

for these observations would imply that aNAAAR-like protein was present in the commonancestor of Archaea and Bacteria. Explaining thecurrent distribution of this protein would requirean enormous number of gene loss events(suggesting that the protein is not essential inmost environments), in spite of the seeminglycontradictory observation that the sequences arewell-conserved (suggesting that selection hasacted to maintain the protein’s function). Thus,lateral gene transfer is the more parsimoniousexplanation.61

Ramifications for structure and functionprediction in genomics

Two important contributions of genomics are tocorrectly annotate protein functions and to identifyproteins of unknown structure and function whosecharacterization will enhance biological understand-ing. As noted previously and shown here, simplesequence metrics are often inadequate for predictingprotein function.23,24 Perusal of GenBank annota-tions of the OSBS/NAAAR family reveals that only60% are correctly annotated (43% excluding proteinsmisleadingly annotated as ‘‘o-succinylbenzoate-

Page 19: Evolution of Structure and Function in the o-Succinylbenzoate Synthase/ N-Acylamino Acid Racemase Family of the Enolase Superfamily

246 Evolution of the o-Succinylbenzoate Synthase Family

CoA synthases’’)‡. While only 7% of these annota-tions are completely incorrect, the remainder areincomplete or somewhat misleading, often assign-ing OSBS/NAAAR proteins to the wrong family orsubgroup of the enolase superfamily. For example,several proteins are incorrectly annotated as muco-nate or chloromuconate cycloisomerases. Manyothers are annotated as ‘‘COG4948: L-alanine-DL-glutamate epimerase and related enzymes of eno-lase superfamily’’, which correctly relates them tothe MLE subgroup but also implies an incorrectfunction.Functional annotation of the OSBS/NAAAR

family is difficult for two reasons. First, somemembers of the family are so divergent thatsequence similarity cannot be used to distinguishthem. Outliers such as the B. bacteriovorus OSBScould only be identified using a combination ofgenomic context, phylogenetic analyses, and ulti-mately experimental validation. Second, theNAAAR-like proteins could not be separated fromthe OSBSs based on sequence similarity or positionin the phylogenetic tree. Instead, their main char-acteristics are that they are closely related toAmycolatopsis OSBS/NAAAR and they are notencoded in menaquinone operons.Given such complexities, it is not surprising that

automated annotation methods have had so muchdifficulty with this family. The orthogonal informa-tion furnished by phylogenetic reconstruction andanalysis of genome context not only providesstronger confidence in functional annotation, but itis also invaluable for identifying proteins whosefunctions cannot be predicted with certainty. Simi-larly rigorous application of these methods willprobably be required for accurate annotation ofother protein families, which exhibit high sequence,structural, and functional divergence.Detailed studies of the sort undertaken here are

also useful for identifying candidates for experimen-tal characterization and structural genomics projects.Not only is there significant functional diversity inthe OSBS/NAAAR family, but we also discoveredsignificant structural variation among the family’sthree crystallized members. As discussed above, it isexpected that several other subfamilies, especiallythe Actinobacteria subfamily, also exhibit structuralvariations. Solving the three-dimensional structuresof representatives of other subfamilies will be valu-able for understanding allowable variations in pro-tein–substrate interactions in isofunctional proteins.In addition, our current and future studies of thestructure and function of the NAAAR-like proteinswill help elucidate how new protein functionsevolve. Although our strategy is more labor-inten-sive than purely automated methods of target

‡ A detailed and systematic study of misannotation inthe enolase and other superfamilies is currently under-way in our laboratory, and the corrected annotations willbe incorporated into the Structure-Function LinkageDatabase (SFLD).32

selection for structural genomics projects, it providesmore context for understanding structure–functionrelationships and evolutionary mechanisms.

Concluding remarks

Our analysis of the OSBS/NAAAR familyrevealed several insights into how protein functionand structure evolve. First, highly divergent proteinfamilies can exhibit significant structural variations.Second, enzyme specificity can be maintained inspite of limited sequence conservation amongligand-contacting residues. Third, new activitiescan evolve through promiscuous intermediates,and there might be structural features of proteinsthat make them more or less prone to evolvepromiscuous activities. Few analyses of proteinstructure, function, and evolution have been per-formed in this depth; thus, extending these studies toother protein families will be important for testingthe generality of these conclusions.

Materials and Methods

Identification of menaquinone pathway genes

Menaquinone biosynthesis genes were identified incomplete and incomplete genomes using the Seed Anno-tation and Analysis Tool from the Fellowship for Interpre-tation of Genomes (FIG).65 Genes were initially annotatedas menaquinone pathway genes if the percent identity of apairwise protein alignment covering >90% of the length ofa characterized menaquinone pathway protein was >40%.Experimentally characterized menaquinone pathway pro-teins include all pathway proteins from E. coli; menB,menC, menD, menE, and menF from B. subtilis; ubiE fromGeobacillus stearothermophilus; and menA and menB fromSynechocystis sp. PCC 6803 (Figure 1).25,27,66–71 As a secondcriterion, genes were annotated as encoding a menaqui-none pathway protein if they were five or fewer genesdistant from another menaquinone pathway gene andtheir proteins had BLAST expectation values <10−20

relative to reliably annotated menaquinone pathwayproteins when searching the nr database. Most of theremaining genes were provisionally assigned functions iftheir proteins share ∼25%–40% identity with a character-izedmenaquinone pathway protein and nearly all proteinsidentified as being similar (BLAST E-values <10−5 usingthe nr database) are annotated as having that function.

Identification of MLE subgroup members

The initial enolase superfamily data set was down-loaded from the Structure-Function Linkage Database(SFLD).31,32 Additional superfamily members were iden-tified using a subset of the superfamily filtered to includeonly proteins sharing <35% identity as input for Shotgun.72

This program performs a BLAST search73 of each inputsequence and outputs a score indicating the number ofinput sequences that find a given BLAST hit, allowinghomologs which have barely significant BLAST E-valuescores to be identified. These sequences were thenmanually screened to remove fragments and to verifythat they contained the canonical catalytic residues of the

Page 20: Evolution of Structure and Function in the o-Succinylbenzoate Synthase/ N-Acylamino Acid Racemase Family of the Enolase Superfamily

247Evolution of the o-Succinylbenzoate Synthase Family

enolase superfamily. The final enolase superfamily data setwas compared to HMMs from the SFLD to classifysequences into subgroups and isofunctional families. Allfurther analyses were performed using protein sequencesmatching the MLE subgroup HMM with expectationvalues <10−18 and any other enolase superfamilysequences, which could not be classified into a subgroupor family by the HMMs.

Phylogenetic analysis

The MLE subgroup and outlying enolase superfamilymembers were aligned using Muscle v.3.52.74 The initialalignment was manually refined using structural align-ments of muconate lactonizing enzyme (1MUC), L-Ala-D/L-Glu epimerase (1JPM and 1JPD), N-acylamino acidracemase (1SJB and 1XS2), and OSBS (1FHV and B.bacteriovorus OSBS). Structural alignments were generatedby MinRMS75 and the structure matching and alignmentfeature of UCSF Chimera from the Resource for Biocom-puting, Visualization, and Informatics at the University ofCalifornia, San Francisco (supported by NIH P41 RR-01081).76 Phylogenetic reconstruction was performedusing Bayesian and distance methods. Bayesian treeswere constructed with MrBayes v3.1.177,78 under theWAG amino acid substitution model79 using a gammadistribution to approximate rate variation among sites.Distance trees were constructed using the NEIGHBORprogram in PHYLIP80 under the JTT amino acid substitu-tion model81 and a gamma distribution of rate variationamong sites using the alpha parameter estimated in theBayesian analysis. Trees produced by the two methodswere similar, although the Bayesian method producedtrees with higher resolution and branch confidence values.Accession numbers of sequences and species abbreviationsused for phylogenetic analysis are listed in SupplementaryData, Tables 1, 2, 3, 4. In general, species names areabbreviated using the first three letters of the genus andfirst two letters of the species. The strain is indicated ifmultiple strains of the same species were used in theanalysis, and Bacteroides is abbreviatedwith ‘‘Bct’’ to avoidconfusion with Bacillus.

Sequence and structural analysis

Sequence conservation was analyzed by comparing thealigned OSBS/NAAAR, MLE, and AEE families. Familyassignments of MLE and AEE proteins were taken fromthe SFLD, which uses HMMs and information from theliterature to assign proteins to families. Conservedpositions were defined as those in which >90% of familyor subfamily members have the same amino acid residue.Phenylalanine and tyrosine or aspartate and glutamatewere treated as equivalent. Conserved residues weremapped onto the structures of 1FHV (E. coli OSBS) and1SJB (Amycolatopsis OSBS/NAAAR) in Chimera.77

Structural superpositions of the whole proteins, cappingdomains, and barrel domains of 1SJB, 1FHV, B. bacter-iovorus OSBS, and 1MUC were generated from thestructure-based sequence alignment of the MLE subgroupusing the Match feature of Chimera76 or CombinatorialExtension (CE).82 To quantify differences in the orientationbetween the barrel and capping domains, barrel domainsof each structure were first superimposed relative to 1SJB.Then, a plane was fit to each capping domain using thealpha carbon atoms of specific sets of residues that areclosely aligned in structural superpositions of the capping

domains. The dihedral angles between these planes werecalculated in Chimera to measure differences in relativerotation between the capping and barrel domains of eachpair of superimposed structures.

Acknowledgements

We thank Alexander Fedorov, Elena Fedorov, andDr StevenAlmo for providing unpublished structurecoordinates and Dr Scott Pegg, Kai-Yeung Lau, DrElaine Meng and Eric Pettersen for their assistance.This work was supported by National Institutes ofHealth grants GM60595 (to P.C.B.) and P01071790and GM52594 (to J.A.G.). M.P.J. acknowledges start-up funds provided by HHMI Biomedical ResearchSupport Program grant 5300246 to the UCSF Schoolof Medicine. M.E.G. is supported by a postdoctoralfellowship in informatics from the PharmaceuticalResearchers and Manufacturers of America.

Supplementary Data

Supplementary data associated with this articlecan be found, in the online version, at doi:10.1016/j.jmb.2006.04.055

References

1. Ycas, M. (1974). On earlier states of the biochemicalsystem. J. Theor. Biol. 44, 145–160.

2. Jensen, R. A. (1976). Enzyme recruitment in evolutionof new function. Annu. Rev. Microbiol. 30, 409–425.

3. O’Brien, P. J. & Herschlag, D. (1999). Catalyticpromiscuity and the evolution of new enzymaticactivities. Chem. Biol. 6, R91–R105.

4. Gerlt, J. A. & Babbitt, P. C. (2001). Divergent evolutionof enzymatic function: mechanistically diverse super-families and functionally distinct suprafamilies. Annu.Rev. Biochem. 70, 209–246.

5. Matsumura, I. & Ellington, A. D. (2001). In vitroevolution of beta-glucuronidase into a beta-galacto-sidase proceeds through non-specific intermediates.J. Mol. Biol. 305, 331–339.

6. Schmidt, D. M. Z., Mundorff, E. C., Dojka, M.,Bermudez, E., Ness, J. E., Govindarajan, S. et al.(2003). Evolutionary potential of (β/α)8-barrels: func-tional promiscuity produced by single substitutions inthe enolase superfamily. Biochemistry, 42, 8387–8393.

7. Copley, S. D. (2003). Enzymes with extra talents:moonlighting functions and catalytic promiscuity.Curr. Opin. Chem. Biol. 7, 265–272.

8. Hughes, A. L. (1994). The evolution of functionallynovel proteins after gene duplication. Proc. Biol. Sci.256, 119–124.

9. Schultes, E. A. & Bartel, D. P. (2000). One sequence,two ribozymes: implications for the emergence of newribozyme folds. Science, 289, 448–452.

10. Aharoni, A., Gaidukov, L., Khersonsky, O.,Mc, Q.G. S.,Roodveldt, C. & Tawfik, D. S. (2005). The ‘evolvability’of promiscuous protein functions. Nature Genet. 37,73–76.

Page 21: Evolution of Structure and Function in the o-Succinylbenzoate Synthase/ N-Acylamino Acid Racemase Family of the Enolase Superfamily

248 Evolution of the o-Succinylbenzoate Synthase Family

11. James, L. C. & Tawfik, D. S. (2003). Conformationaldiversity and protein evolution–a 60-year-old hypoth-esis revisited. Trends Biochem. Sci. 28, 361–368.

12. Gerlt, J. A., Babbitt, P. C. & Rayment, I. (2005).Divergent evolution in the enolase superfamily: theinterplay of mechanism and specificity. Arch. Biochem.Biophys. 433, 59–70.

13. Lebioda, L.& Stec, B. (1988).Crystal structure of enolaseindicates that enolase and pyruvate kinase evolvedfrom a common ancestor. Nature, 333, 683–686.

14. Neidhart, D. J., Howell, P. L., Petsko, G. A., Powers,V. M., Li, R. S., Kenyon, G. L. & Gerlt, J. A. (1991).Mechanism of the reaction catalyzed by mandelateracemase. 2. Crystal structure of mandelate racemaseat 2.5-Å resolution: identification of the active siteand possible catalytic residues. Biochemistry, 30,9264–9273.

15. Landro, J. A., Gerlt, J. A., Kozarich, J. W., Koo, C. W.,Shah, V. J., Kenyon, G. L. et al. (1994). The role of lysine166 in the mechanism of mandelate racemase fromPseudomonas putida: mechanistic and crystallographicevidence for stereospecific alkylation by (R)-alpha-phenylglycidate. Biochemistry, 33, 635–643.

16. Wedekind, J. E., Poyner, R. R., Reed, G. H. & Rayment,I. (1994). Chelation of serine 39 to Mg2+ latches a gateat the active site of enolase: structure of the bis(Mg2+)complex of yeast enolase and the intermediate analogphosphonoacetohydroxamate at 2.1-Å resolution.Biochemistry, 33, 9333–9342.

17. Thompson, T. B., Garrett, J. B., Taylor, E. A.,Meganathan, R., Gerlt, J. A. & Rayment, I. (2000).Evolution of enzymatic activity in the enolasesuperfamily: structure of o-succinylbenzoate synthasefrom Escherichia coli in complex with Mg2+ and o-succinylbenzoate. Biochemistry, 39, 10662–10676.

18. Gulick, A. M., Hubbard, B. K., Gerlt, J. A. & Rayment,I. (2000). Evolution of enzymatic activities in theenolase superfamily: crystallographic and mutagene-sis studies of the reaction catalyzed by D-glucaratedehydratase from Escherichia coli. Biochemistry, 39,4590–4602.

19. Babbitt, P. C., Hasson, M. S., Wedekind, J. E., Palmer,D. R., Barrett, W. C., Reed, G. H. et al. (1996). Theenolase superfamily: a general strategy for enzyme-catalyzed abstraction of the alpha-protons of carbox-ylic acids. Biochemistry, 35, 16489–16501.

20. Damien Devos, A. V. (2000). Practical limits offunction prediction. Proteins: Struct. Funct. Genet. 41,98–107.

21. Wilson, C. A., Kreychman, J. & Gerstein, M. (2000).Assessing annotation transfer for genomics: quantify-ing the relations between protein sequence, structureand function through traditional and probabilisticscores. J. Mol. Biol. 297, 233–249.

22. Todd, A. E., Orengo, C. A. & Thornton, J. M. (2001).Evolution of function in protein superfamilies, from astructural perspective. J. Mol. Biol. 307, 1113–1143.

23. Rost, B. (2002). Enzyme function less conserved thananticipated. J. Mol. Biol. 318, 595–608.

24. Tian, W. & Skolnick, J. (2003). How well is enzymefunction conserved as a function of pairwise sequenceidentity? J. Mol. Biol. 333, 863–882.

25. Palmer, D. R., Garrett, J. B., Sharma, V., Meganathan,R., Babbitt, P. C. & Gerlt, J. A. (1999). Unexpecteddivergence of enzyme function and sequence: ‘‘N-acylamino acid racemase’’ is o-succinylbenzoatesynthase. Biochemistry, 38, 4252–4258.

26. Taylor Ringia, E. A., Garrett, J. B., Thoden, J. B.,Holden, H. M., Rayment, I. & Gerlt, J. A. (2004).

Evolution of enzymatic activity in the enolasesuperfamily: functional studies of the promiscuouso-succinylbenzoate synthase from Amycolatopsis.Biochemistry, 43, 224–229.

27. Meganathan, R. (2001). Biosynthesis of menaquinone(vitamin K2) and ubiquinone (coenzyme Q): aperspective on enzymatic mechanisms. Vitam. Horm.61, 173–218.

28. Collins, M. D. & Jones, D. (1981). Distribution ofisoprenoid quinone structural types in bacteria andtheir taxonomic implication. Microbiol. Rev. 45,316–354.

29. Teichmann, S. A., Rison, S. C., Thornton, J. M., Riley,M., Gough, J. & Chothia, C. (2001). The evolutionand structural anatomy of the small moleculemetabolic pathways in Escherichia coli. J. Mol. Biol.311, 693–708.

30. Holt, J. G., Ed. (1984). Bergey’s Manual of SystematicBacteriology, 1st edit., vol. 1. Williams & Wilkins,Baltimore.

31. Pegg, S. C., Brown, S., Ojha, S., Huang, C. C., Ferrin,T. E. & Babbitt, P. C. (2005). Representing structure-function relationships in mechanistically diverseenzyme superfamilies. Pac. Symp. Biocomput. 10,358–369.

32. Pegg, S. C., Brown, S. D., Ojha, S., Seffernick, J., Meng,E. C., Morris, J. H. et al. (2006). Leveraging enzymestructure-function relationships for functional infer-ence and experimental design: the structure-functionlinkage database. Biochemistry, 45, 2545–2555.

33. Doolittle, W. F. (1999). Phylogenetic classification andthe universal tree. Science, 284, 2124–2129.

34. Teichmann, S. A. & Mitchison, G. (1999). Is there aphylogenetic signal in prokaryote proteins? J. Mol.Evol. 49, 98–107.

35. Gribaldo, S. & Philippe, H. (2002). Ancient phyloge-netic relationships. Theor. Popul. Biol. 61, 391–408.

36. Delsuc, F., Brinkmann, H. & Philippe, H. (2005).Phylogenomics and the reconstruction of the tree oflife. Nature Rev. Genet. 6, 361–375.

37. Garrity, G. M., Bell, J. A. & Lilburn, T. G. (2004).Taxonomic Outline of the Prokaryotes. Bergey’sManual of Systematic Bacteriology. Williams and Wilk-ins, Baltimore.

38. Wolf, Y. I., Rogozin, I. B., Grishin, N. V., Tatusov, R. L.& Koonin, E. V. (2001). Genome trees constructedusing five different approaches suggest new majorbacterial clades. BMC Evol. Biol. 1, 8.

39. Brown, J. R., Douady, C. J., Italia, M. J., Marshall, W. E.& Stanhope, M. J. (2001). Universal trees based onlarge combined protein sequence data sets. NatureGenet. 28, 281–285.

40. Battistuzzi, F. U., Feijao, A. & Hedges, S. B. (2004). Agenomic timescale of prokaryote evolution: insightsinto the origin of methanogenesis, phototrophy, andthe colonization of land. BMC Evol. Biol. 4, 44.

41. Yang, S., Doolittle, R. F. & Bourne, P. E. (2005).Phylogeny determined by protein domain content.Proc. Natl Acad. Sci. USA, 102, 373–378.

42. Deeds, E. J., Hennessey, H. & Shakhnovich, E. I.(2005). Prokaryotic phylogenies inferred from proteinstructural domains. Genome Res. 15, 393–402.

43. Gophna, U., Doolittle, W. F. & Charlebois, R. L. (2005).Weighted genome trees: refinements and applications.J. Bacteriol. 187, 1305–1316.

44. Klenchin, V. A., Taylor Ringia, E. A., Gerlt, J. A. &Rayment, I. (2003). Evolution of enzymatic activity inthe enolase superfamily: structural and mutagenicstudies of the mechanism of the reaction catalyzed by

Page 22: Evolution of Structure and Function in the o-Succinylbenzoate Synthase/ N-Acylamino Acid Racemase Family of the Enolase Superfamily

249Evolution of the o-Succinylbenzoate Synthase Family

o-succinylbenzoate synthase from Escherichia coli.Biochemistry, 42, 14427–14433.

45. Thoden, J. B., Taylor Ringia, E. A., Garrett, J. B., Gerlt, J.A., Holden, H. M. & Rayment, I. (2004). Evolution ofenzymatic activity in the enolase superfamily: struc-tural studies of the promiscuous o-succinylbenzoatesynthase from Amycolatopsis. Biochemistry, 43,5716–5727.

46. Taylor, E. A., Palmer, D. R. & Gerlt, J. A. (2001). Thelesser ‘‘burden borne’’ by o-succinylbenzoate synthase:an ‘‘easy’’ reaction involving a carboxylate carbon acid.J. Am. Chem. Soc. 123, 5824–5825.

47. Duquerroy, S., Camus, C. & Janin, J. (1995). X-raystructure and catalytic mechanism of lobster enolase.Biochemistry, 34, 12513–12523.

48. Kuhnel, K. & Luisi, B. F. (2001). Crystal structure of theEscherichia coli RNA degradosome component eno-lase. J. Mol. Biol. 313, 583–592.

49. Chai, G., Brewer, J. M., Lovelace, L. L., Aoki, T., Minor,W. & Lebioda, L. (2004). Expression, purification andthe 1.8 angstroms resolution crystal structure of humanneuron specific enolase. J. Mol. Biol. 341, 1015–1021.

50. da Silva Giotto, M. T., Hannaert, V., Vertommen, D., deA. S. Navarro, M. V., Rider, M. H., Michels, P. A. et al.(2003). The crystal structure of Trypanosoma bruceienolase: visualisation of the inhibitory metal bindingsite III and potential as target for selective, irreversibleinhibition. J. Mol. Biol. 331, 653–665.

51. Hosaka, T., Meguro, T., Yamato, I. & Shirakihara, Y.(2003). Crystal structure of Enterococcus hirae enolaseat 2.8 Å resolution. J. Biochem. (Tokyo), 133, 817–823.

52. Stubbe, J. (2000). Ribonucleotide reductases: the linkbetween an RNA and a DNA world? Curr. Opin.Struct. Biol. 10, 731–736.

53. Kolberg, M., Strand, K. R., Graff, P. & Andersson, K.K. (2004). Structure, function, and mechanism ofribonucleotide reductases. Biochim. Biophys. Acta,1699, 1–34.

54. Sakai, A., Xiang, D. F., Xu, C., Song, L., Yew, W. S.,Raushel, F. M. & Gerlt, J. A. (2006). Evolution ofEnzymatic Activities in the Enolase Superfamily: N-Succinylamino Acid Racemase and a New Pathwayfor the Irreversible Conversion of D- to L-AminoAcids. Biochemistry, 45, 4455–4462.

55. Helin, S., Kahn, P. C., Guha, B. L., Mallows, D. G. &Goldman, A. (1995). The refined X-ray structure ofmuconate lactonizing enzyme from Pseudomonasputida PRS2000 at 1.85 Å resolution. J. Mol. Biol. 254,918–941.

56. Minshull, J., Ness, J. E., Gustafsson, C. & Govindarajan,S. (2005). Predicting enzyme function from proteinsequence. Curr. Opin. Chem. Biol. 9, 202–209.

57. Glasner, M. E., Gerlt, J. A. & Babbitt, P. C. (in press).Mechanisms of Protein Evolution and Their Applica-tion to Protein Engineering. In Advances in Enzymologyand Related Areas of Molecular Biology (Toone, E., ed),Wiley and Sons, New York, NY.

58. Ochman, H., Lawrence, J. G. & Groisman, E. A. (2000).Lateral gene transfer and the nature of bacterialinnovation. Nature, 405, 299–304.

59. Koonin, E. V., Makarova, K. S. & Aravind, L. (2001).Horizontal gene transfer in prokaryotes: quantifica-tion and classification. Annu. Rev. Microbiol. 55,709–742.

60. Garcia-Vallve, S., Romeu, A. & Palau, J. (2000).Horizontal gene transfer in bacterial and archaealcomplete genomes. Genome Res. 10, 1719–1725.

61. Doolittle,W. F., Boucher, Y.,Nesbo, C. L., Douady, C. J.,Andersson, J. O. & Roger, A. J. (2003). How big is the

iceberg of which organellar genes in nuclear genomesare but the tip? Phil. Trans. Roy. Soc. ser. B Biol. Sci. 358,39–58.

62. Brown, J. R. (2003). Ancient horizontal gene transfer.Nature Rev. Genet. 4, 121–132.

63. Philippe, H. & Douady, C. J. (2003). Horizontal genetransfer and phylogenetics. Curr. Opin. Microbiol. 6,498–505.

64. Kennedy, S. P., Ng, W. V., Salzberg, S. L., Hood, L. &DasSarma, S. (2001). Understanding the adaptation ofHalobacterium species NRC-1 to its extreme environ-ment through computational analysis of its genomesequence. Genome Res. 11, 1641–1650.

65. Overbeek, R., Disz, T. & Stevens, R. (2004). The SEED:a peer-to-peer environment for genome annotation.Communications of the ACM, 47, 45–46.

66. Meganathan, R., Bentley, R. & Taber, H. (1981).Identification of Bacillus subtilis men mutants whichlack O-succinylbenzoyl-coenzyme A synthetase anddihydroxynaphthoate synthase. J. Bacteriol. 145,328–332.

67. Taber, H. W., Dellers, E. A. & Lombardo, L. R. (1981).Menaquinone biosynthesis in Bacillus subtilis: isolationof men mutants and evidence for clustering of mengenes. J. Bacteriol. 145, 321–327.

68. Driscoll, J. R. & Taber, H. W. (1992). Sequenceorganization and regulation of the Bacillus subtilismenBE operon. J. Bacteriol. 174, 5063–5071.

69. Rowland, B., Hill, K., Miller, P., Driscoll, J. & Taber, H.(1995). Structural organization of a Bacillus subtilisoperon encoding menaquinone biosynthetic enzymes.Gene, 167, 105–109.

70. Koike-Takeshita, A., Koyama, T. & Ogura, K.(1997). Identification of a novel gene cluster par-ticipating in menaquinone (vitamin K2) biosynthe-sis. Cloning and sequence determination of the 2-heptaprenyl-1,4-naphthoquinone methyltransferasegene of Bacillus stearothermophilus. J. Biol. Chem.272, 12380–12383.

71. Johnson, T. W., Shen, G., Zybailov, B., Kolling, D.,Reategui, R., Beauparlant, S. et al. (2000). Recruitmentof a foreign quinone into the A(1) site of photosystemI. I. Genetic and physiological characterization ofphylloquinone biosynthetic pathway mutants inSynechocystis sp. pcc 6803. J. Biol. Chem. 275,8523–8530.

72. Pegg, S. C. & Babbitt, P. C. (1999). Shotgun: gettingmore from sequence similarity searches. Bioinfor-matics, 15, 729–740.

73. Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J.,Zhang, Z., Miller, W. & Lipman, D. J. (1997). GappedBLAST and PSI-BLAST: a new generation of proteindatabase search programs. Nucl. Acids Res. 25,3389–3402.

74. Edgar, R. C. (2004). MUSCLE: multiple sequencealignment with high accuracy and high throughput.Nucl. Acids Res. 32, 1792–1797.

75. Jewett, A. I., Huang, C. C. & Ferrin, T. E. (2003).MINRMS: an efficient algorithm for determining pro-tein structure similarity using root-mean-squared-distance. Bioinformatics, 19, 625–634.

76. Pettersen, E. F., Goddard, T. D., Huang, C. C., Couch,G. S., Greenblatt, D. M., Meng, E. C. & Ferrin, T. E.(2004). UCSF Chimera–a visualization system forexploratory research and analysis. J. Comput. Chem.25, 1605–1612.

77. Ronquist, F. & Huelsenbeck, J. P. (2003). MrBayes 3:Bayesian phylogenetic inference under mixed models.Bioinformatics, 19, 1572–1574.

Page 23: Evolution of Structure and Function in the o-Succinylbenzoate Synthase/ N-Acylamino Acid Racemase Family of the Enolase Superfamily

250 Evolution of the o-Succinylbenzoate Synthase Family

78. Altekar, G., Dwarkadas, S., Huelsenbeck, J. P. &Ronquist, F. (2004). Parallel Metropolis coupledMarkov chain Monte Carlo for Bayesian phylogeneticinference. Bioinformatics, 20, 407–415.

79. Whelan, S. & Goldman, N. (2001). A general empiricalmodel of protein evolution derived from multipleprotein families using a maximum-likelihood ap-proach. Mol. Biol. Evol. 18, 691–699.

80. Felsenstein, J. (2004). PHYLIP (Phylogeny InferencePackage) version 3.6. Department of Genome Science,University of Washington, Seattle.

81. Jones, D. T., Taylor, W. R. & Thornton, J. M. (1992). The

rapid generation of mutation data matrices fromprotein sequences. Comput. Appl. Biosci. 8, 275–282.

82. Shindyalov, I. N. & Bourne, P. E. (1998). Proteinstructure alignment by incremental combinatorialextension (CE) of the optimal path. Protein Eng. 11,739–747.

83. Myers, E. W. & Miller, W. (1988). Optimal alignmentsin linear space. Comput. Appl. Biosci. 4, 11–17.

84. Venter, J. C., Remington, K., Heidelberg, J. F., Halpern,A. L., Rusch, D., Eisen, J. A. et al. (2004). Environmentalgenome shotgun sequencing of the Sargasso Sea.Science, 304, 66–74.

Edited by Michael J. Sternberg

(Received 28 February 2006; received in revised form 22 April 2006; accepted 25 April 2006)Available online 11 May 2006