Top Banner
ORIGINAL PAPER A 5-formyltetrahydrofolate cycloligase paralog from all domains of life: comparative genomic and experimental evidence for a cryptic role in thiamin metabolism Anne Pribat & Ian K. Blaby & Aurora Lara-Núñez & Linda Jeanguenin & Romain Fouquet & Océane Frelin & Jesse F. Gregory III & Benjamin Philmus & Tadhg P. Begley & Valérie de Crécy-Lagard & Andrew D. Hanson Received: 14 January 2011 /Revised: 19 March 2011 /Accepted: 3 April 2011 # Springer-Verlag 2011 Abstract A paralog (here termed COG0212) of the ATP- dependent folate salvage enzyme 5-formyltetrahydrofolate cycloligase (5-FCL) occurs in all domains of life and, although typically annotated as 5-FCL in pro- and eukaryotic genomes, is of unknown function. COG0212 is similar in overall structure to 5-FCL, particularly in the substrate binding region, and has distant similarity to other kinases. The Arabidopsis thaliana COG0212 protein was shown to be targeted to chloroplasts and to be required for embryo viability. Comparative genomic analysis revealed that a high proportion (19%) of archaeal and bacterial COG0212 genes are clustered on the chromosome with various genes implicated in thiamin metabolism or transport but showed no such association between COG0212 and folate metabolism. Consistent with the bioinformatic evidence for a role in thiamin metabolism, ablating COG0212 in the archaeon Haloferax volcanii caused accumulation of thiamin monophosphate. Biochemical and functional complementation tests of several known and hypothetical thiamin-related activities (involving thiamin, its breakdown products, and their phosphates) were, however, negative. Also consistent with the bioinformatic evidence, the COG0212 proteins from A. thaliana and prokaryote sources lacked 5-FCL activity in vitro and did not complement the growth defect or the characteristic 5- formyltetrahydrofolate accumulation of a 5-FCL-deficient (ΔygfA) Escherichia coli strain. We therefore propose (a) that COG0212 has an unrecognized yet sometimes crucial role in thiamin metabolism, most probably in salvage or detoxification, and (b) that is not a 5-FCL and should no longer be so annotated. Keywords At1g76730 . Chloroplast . COG0212 . Thiamin . Folate Introduction With over 1,200 prokaryote and 100 eukaryote genomes now sequenced (Liolios et al. 2010), it has become starkly clear that genes of unknown or uncertain function outnum- ber those of known function in many genomes (Hanson et al. 2009; Janga et al. 2011). This unknown gene functionproblem is exacerbated by misannotations, in which functions are wrongly projected onto genes, based on sequence homology (Schnoes et al. 2009; Galperin and Electronic supplementary material The online version of this article (doi:10.1007/s10142-011-0224-5) contains supplementary material, which is available to authorized users. A. Pribat : L. Jeanguenin : R. Fouquet : O. Frelin : A. D. Hanson (*) Horticultural Sciences Department, University of Florida, Gainesville, FL 32611, USA e-mail: [email protected] I. K. Blaby : V. de Crécy-Lagard Microbiology and Cell Science Department, University of Florida, Gainesville, FL 32611, USA A. Lara-Núñez : J. F. Gregory III Food Science and Human Nutrition Department, University of Florida, Gainesville, FL 32611, USA B. Philmus : T. P. Begley Department of Chemistry, Texas A&M University, College Station, TX 77842, USA Funct Integr Genomics DOI 10.1007/s10142-011-0224-5
12

A 5-formyltetrahydrofolate cycloligase paralog from all domains of life: comparative genomic and experimental evidence for a cryptic role in thiamin metabolism

May 08, 2023

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A 5-formyltetrahydrofolate cycloligase paralog from all domains of life: comparative genomic and experimental evidence for a cryptic role in thiamin metabolism

ORIGINAL PAPER

A 5-formyltetrahydrofolate cycloligase paralog from alldomains of life: comparative genomic and experimentalevidence for a cryptic role in thiamin metabolism

Anne Pribat & Ian K. Blaby & Aurora Lara-Núñez & Linda Jeanguenin &

Romain Fouquet & Océane Frelin & Jesse F. Gregory III & Benjamin Philmus &

Tadhg P. Begley & Valérie de Crécy-Lagard & Andrew D. Hanson

Received: 14 January 2011 /Revised: 19 March 2011 /Accepted: 3 April 2011# Springer-Verlag 2011

Abstract A paralog (here termed COG0212) of the ATP-dependent folate salvage enzyme 5-formyltetrahydrofolatecycloligase (5-FCL) occurs in all domains of life and,although typically annotated as 5-FCL in pro- andeukaryotic genomes, is of unknown function. COG0212 issimilar in overall structure to 5-FCL, particularly in thesubstrate binding region, and has distant similarity to otherkinases. The Arabidopsis thaliana COG0212 protein wasshown to be targeted to chloroplasts and to be required forembryo viability. Comparative genomic analysis revealedthat a high proportion (19%) of archaeal and bacterialCOG0212 genes are clustered on the chromosome withvarious genes implicated in thiamin metabolism or transportbut showed no such association between COG0212 and

folate metabolism. Consistent with the bioinformaticevidence for a role in thiamin metabolism, ablatingCOG0212 in the archaeon Haloferax volcanii causedaccumulation of thiamin monophosphate. Biochemical andfunctional complementation tests of several known andhypothetical thiamin-related activities (involving thiamin,its breakdown products, and their phosphates) were,however, negative. Also consistent with the bioinformaticevidence, the COG0212 proteins from A. thaliana andprokaryote sources lacked 5-FCL activity in vitro and didnot complement the growth defect or the characteristic 5-formyltetrahydrofolate accumulation of a 5-FCL-deficient(ΔygfA) Escherichia coli strain. We therefore propose (a)that COG0212 has an unrecognized yet sometimes crucialrole in thiamin metabolism, most probably in salvage ordetoxification, and (b) that is not a 5-FCL and should nolonger be so annotated.

Keywords At1g76730 . Chloroplast . COG0212 . Thiamin .

Folate

Introduction

With over 1,200 prokaryote and 100 eukaryote genomesnow sequenced (Liolios et al. 2010), it has become starklyclear that genes of unknown or uncertain function outnum-ber those of known function in many genomes (Hanson etal. 2009; Janga et al. 2011). This “unknown gene function”problem is exacerbated by misannotations, in whichfunctions are wrongly projected onto genes, based onsequence homology (Schnoes et al. 2009; Galperin and

Electronic supplementary material The online version of this article(doi:10.1007/s10142-011-0224-5) contains supplementary material,which is available to authorized users.

A. Pribat : L. Jeanguenin :R. Fouquet :O. Frelin :A. D. Hanson (*)Horticultural Sciences Department, University of Florida,Gainesville, FL 32611, USAe-mail: [email protected]

I. K. Blaby :V. de Crécy-LagardMicrobiology and Cell Science Department, University of Florida,Gainesville, FL 32611, USA

A. Lara-Núñez : J. F. Gregory IIIFood Science and Human Nutrition Department,University of Florida,Gainesville, FL 32611, USA

B. Philmus : T. P. BegleyDepartment of Chemistry, Texas A&M University,College Station, TX 77842, USA

Funct Integr GenomicsDOI 10.1007/s10142-011-0224-5

Page 2: A 5-formyltetrahydrofolate cycloligase paralog from all domains of life: comparative genomic and experimental evidence for a cryptic role in thiamin metabolism

Koonin 2010). Most common are “overannotations” inwhich overly specific functions are assigned to relativelydistant homologs—in fact paralogs—of genes of knownfunction (Schnoes et al. 2009). Thus, while long-rangehomology is useful for assigning proteins to a general class(e.g., “dehydrogenase”), it is a poor guide to their precisefunctions (Frishman 2007; Janga et al. 2011). Overannota-tions have knock-on effects. First, they propagate as newgenomes are added to databases, leading to a downwardspiral of annotation accuracy (Schnoes et al. 2009). Second,they corrupt metabolic reconstructions, which seek to inferthe metabolic capabilities of organisms from genomesequences (Durot et al. 2009).

During a comparative genomic analysis of folatesynthesis and metabolism (de Crécy-Lagard et al. 2007),we noticed a striking case of a protein that is almost alwaysannotated as having a precise function although there is noexperimental evidence for this function and obvious reasonto question it. This protein is classified in the Clusters ofOrthologous groups database (Tatusov et al. 2003) asCOG0212 (the name used from here on). COG0212 istypically annotated as the folate salvage enzyme 5-formyltetrahydrofolate cycloligase (5-FCL; EC 6.3.3.2, alsocalled 5,10-methenyltetrahydrofolate synthetase), althoughit shares only ~30% identity with the 80-residue C-terminalregion of canonical 5-FCL proteins.

5-FCL metabolizes 5-formyltetrahydrofolate (5-CHO-THF), which is generated from 5,10-methenyltetrahydrofo-late (5,10-CH=THF) in a side reaction of serine hydrox-ymethyltransferase. Unlike other one-carbon (C1) folates, 5-CHO-THF is not a C1 donor but a potent inhibitor of manyfolate-dependent enzymes (Stover and Schirch 1993) andmust consequently be removed. 5-FCL is the main enzymeknown to do this, and ablating it leads to 5-CHO-THFaccumulation (Holmes and Appling 2002; Goyer et al.2005; Jeanguenin et al. 2010). 5-FCL is mechanistically akinase, the initial reaction product being an iminiumphosphate intermediate, which then undergoes cyclizationand phosphate elimination to give back 5,10-CH=THF(Fig. 1; Field et al. 2007).

An initial survey of the distribution of COG0212 and 5-FCL genes revealed that plants, certain bacteria, andanimals had both. This finding underscored the possibilitythat these proteins differ in function and established plantsas representative models in which to study COG0212.Accordingly, we comprehensively surveyed the distributionof COG0212 and 5-FCL genes, investigated the essentialityand subcellular location of the plant COG0212 protein,used comparative genomics to predict possible metabolicfunctions for COG0212, and tested the predictions. Athiamin-related function was both predicted and supportedexperimentally, but a folate-related function was neitherpredicted nor found.

Materials and methods

Bioinformatics

Genomes were analyzed using STRING (Jensen et al. 2009;http://string-db.org/) and the SEED database and its tools(Overbeek et al. 2005; http://theseed.uchicago.edu).COG0212 protein sequences were obtained from the NCBI(http://www.ncbi.nlm.nih.gov/) and Joint Genome Institute(http://www.jgi.doe.gov/) databases. Sequences werealigned with ClustalW, and phylogenetic analyses weremade with MEGA 4 (Tamura et al. 2007). Organellartargeting was predicted with TargetP (http://www.cbs.dtu.dk/services/TargetP/) and Predotar (http://urgi.versailles.inra.fr/predotar/predotar.html).

Chemicals

(6R,6S) 5-CHO-THF was obtained from Schircks Labora-tories (Jona, Switzerland). Near-saturated stock solutions of5-CHO-THF were freshly prepared in 25 mM potassiumphosphate, pH 7.5, excluding light, and titered spectropho-tometrically (ε287 nm=31,500 M−1 cm−1; Temple andMontgomery 1984). [14C]Formate (52.5 mCi/mmol) wasfrom Moravek Biochemicals (Brea, CA, USA). Thiaminand its phosphates, oxythiamin, and 5-(2-hydroxyethyl)-4-methylthiazole (thiazole) were from Sigma-Aldrich. Oxo-thiamin (Thomas et al. 2008), 4-amino-5-hydroxymethyl-2-methylpyrimidine (HMP; Reddick et al. 2001), and N-formylpyrimidine (Jenkins et al. 2007) were synthesized asdescribed.

5-CHO-THF

Iminium phosphate intermediate

5,10-CH=THF

ATP

+

Pi

ADP

N

NH

O

NH2

N+

NH

OHNH pABG

P O

OH

OH

5

10

N

NH

O

NH2

N+

NH

N pABG

5

10

N

NH

O

NH2

N

NH

HONH pABG

5

10

Fig. 1 The mechanism of the reaction mediated by 5-FCL. Note thatthe first step is an ATP-dependent phosphorylation. 5-CHO-THF 5-formyltetrahydrofolate, 5,10-CH=THF 5,10-methenyltetrahydrofo-late, pABG p-aminobenzoylglutamate

Funct Integr Genomics

Page 3: A 5-formyltetrahydrofolate cycloligase paralog from all domains of life: comparative genomic and experimental evidence for a cryptic role in thiamin metabolism

COG0212 genes, proteins, and enzyme activity assays

Genomic DNA of Bacillus halodurans, Ochrobactrumanthropi, and Halobacterium sp. NRC-1 was from theAmerican Type Culture Collection (Manassas, VA, USA).Genomic DNA of Synechococcus sp. PCC 7002 andSyntrophobacter fumaroxidans was from G. Shen (Penn-sylvania State University) and C.M. Plugge (WageningenUniversity, The Netherlands), respectively. Arabidopsisthaliana COG0212 cDNA clone U82511 was from theArabidopsis Biological Resource Center (ABRC; Colum-bus, OH, USA). COG0212 sequences were amplified withPfuTurbo DNA polymerase (Stratagene) using genomicDNA or cDNAs as template and primers (Table S1)designed with restriction sites to insert the amplicons intopBluescript II SK (Stratagene) or pBAD24 (Guzman et al.1995) for complementation assays or into pET28b (Novagen)for overexpression of proteins with a C-terminal His-tag. TheA. thaliana sequence was truncated by using PCR to replacethe first 150 bp by a start codon. Constructs were verified bysequencing. The production and isolation of recombinantCOG0212 proteins and assays for thiamin- and folate-relatedenzyme activities are described in Online Resource 1.

Subcellular localization of A. thaliana COG0212

The full-length A. thaliana COG0212 cDNA or its first327 bp (which includes the predicted targeting peptide)were cloned between SalI and NcoI sites in-frame upstreamof the green fluorescent protein sequence in pTH2 (Niwa2003). Preparation of A. thaliana mesophyll protoplasts,transfection, and subcellular localization of the fusionproteins were as described (Pribat et al. 2010). For dualimport assays, the full-length A. thaliana COG0212 cDNAwas cloned as an EcoRI-PstI fragment into pGEM-4Z(Promega) using a forward primer that included a Kozaksequence. Coupled in vitro transcription–translation, organ-elle separation, and dual import assays were as described(Rudhe et al. 2002; Pribat et al. 2010).

Isolation of A. thaliana COG0212 mutants

Two T-DNA insertional mutant lines (ecotype Columbia)for the gene (At1g76730) encoding COG0212 wereidentified in the Salk collection (SALK_037940 andSALK_037945). Seeds were obtained from the ABRC,sterilized for 1 min in 20% (v/v) bleach containing 0.1%SDS, and plated on MS medium (4.3 g/l MS salts, 1%sucrose, 0.35% Phytagel, pH 5.7). Germinated seedlingswere transferred to potting medium and cultured in agrowth chamber with an 8-h light period (250 μmol m−2 s−1)at 23°C and a 16-h dark period at 19°C. Wild-type andheterozygous mutant segregants from each line were

identified by PCR screening using At1g76730 gene-specific primers located 5′ and 3′ of the insertion site anda primer located in the T-DNA (Table S1). Genomic DNAwas isolated as described (Edwards et al. 1991) andamplified with Taq DNA polymerase (Invitrogen). After3 min at 95°C, the reaction was carried out with 30 cyclesof 95°C for 45 s, 56°C for 30 s, and 72°C for 1.5 min (wild-type allele) or 45 s (mutant allele) and a final step at 72°Cfor 10 min. Insertion sites were confirmed by sequencingamplicons obtained from heterozygotes. Heterozygousmutant and wild-type segregants were selfed, and theprogeny further analyzed. Siliques were split lengthwisewith a razor blade and examined with a dissectingmicroscope. For germination assays, seeds were plated inlots of 50 on MS medium (plus or minus 1% sucrose) andscored after 10 days; results plus or minus sucrose were thesame. Plantlets were transferred to potting medium andgenotyped by PCR screening as above. Reciprocal crosseswere made between wild-type and heterozygous plants. Theresulting seeds were harvested, grown, and genotyped asabove.

Deletion of Haloferax volcanii COG0212

H. volcanii strain H26 (a uracil auxotroph lacking thepyrE2 gene) was used to make the deletion. Cells weregrown at 44°C (unless otherwise specified) in Hv-YPC,Hv-CA, or Hv-Min media (Dyal-Smith 2008). The deletionconstruct was designed to delete >80% of the COG0212ORF (HVO_1928) by homologous recombination (ElYacoubi et al. 2009). A region from 1 kb before the startcodon to the first 114 nucleotides of the ORF was amplifiedby PCR with primers Hv2Rev and Hv1Fwd, whichincludes a KpnI site (Table S1). A second region of 1 kbstarting just after the stop codon was amplified with primersHv3Fwd and Hv4Rev, which bears a BamHI site(Table S1). Both fragments were amplified from genomicDNA using Herculase (Stratagene) and 6% dimethylsulfoxide, A-tailed with Taq DNA polymerase (Invitrogen),subcloned into pGEM-T (Promega), and sequence-verified.The two regions were assembled into pBluescript II SK(Stratagene) using their own above-mentioned restrictionsite and an internal pGem-T EcoRI site. The wholeconstruct was excised with KpnI and BamHI, then clonedinto pTA131 (Allers et al. 2004). Once obtained, theconfirmed deletion plasmid was passed through a dam−

strain of Escherichia coli (Inv110; Invitrogen) and trans-formed into H. volcanii H26 (or derivatives) using apolyethylene glycol-mediated protocol (Dyal-Smith 2008).Deletion of the targeted locus was selected for in a two-stepprocess as described previously (Allers et al. 2004)(Fig. S1). Briefly, recombination of the deletion plasmidinto the chromosome by a single cross-over event was

Funct Integr Genomics

Page 4: A 5-formyltetrahydrofolate cycloligase paralog from all domains of life: comparative genomic and experimental evidence for a cryptic role in thiamin metabolism

selected for by growth on Hv-CA (i.e., in the absence ofuracil). Subsequent excision of the integrated plasmid andtarget gene by a second recombination event was selectedfor by plating onto Hv-CA supplemented with uracil(10 μg ml−1) and 5-fluoroorotic acid (50 μg ml−1). PCRwas used to confirm the deletion as follows: One pair ofprimers (Table S1) was designed to anneal in the regionsflanking HVO_1928, and the amplicon size was comparedto prediction for wild type and deletant. To confirm geneloss, a second pair of primers (Table S1) was designed toanneal within HVO_1928.

Functional complementation assays

Functional complementation of an E. coli ΔthiD strain wasused to test for hydroxymethyl pyrimidine phosphate(HMP-P) kinase activity (Ajjawi et al. 2007). The E. coliHMP-P kinase deletant NI500 (ΔthiD) was obtained fromthe E. coli Genetic Stock Center (New Haven, CT, USA).E. coli ΔthiD cells harboring pACYC-RP were transformedwith pBS II SK alone (negative control) or containing E.coli thiD (positive control) or a COG0212 gene (S.fumaroxidans, O. anthropi Oant_2976 or O. anthropiOant_2980), plated on LB containing 1 mM isopropyl-β-D-thio-galactoside (IPTG) and appropriate antibiotics, andincubated at 37°C. The next day, independent clones werestreaked on M9 medium as above containing 0.2% (w/v)glucose, micronutrients, and FeSO4 and supplemented with1 mM IPTG and 100 μg/ml of histidine, leucine, arginine,tryptophan, and methionine, plus or minus 10 μM thiamin.Plates were incubated for 4 days at 37°C. A functionalcomplementation assay based on an E. coli ΔygfA strain(Jeanguenin et al. 2010) was used to test COG0212 proteinsfor 5-FCL activity. Details on this assay are given in OnlineResource 1.

Vitamin analyses

For analysis of thiamin vitamers, the pellets obtained from250 ml H. volcanii wild-type and COG0212 deletantcultures (OD600=0.7) grown in Hv-min medium withoutthiamin were resuspended in one volume of 7.2%perchloric acid and sonicated. The sonicate was held onice for 15 min with periodic vortex mixing, then cleared bycentrifugation at 4°C (2,000×g, 15 min). Thiamin and itsphosphates were analyzed by oxidation to thiochromederivatives followed by HPLC with fluorometric detection(Ishii et al. 1979). The oxidation reagent was a freshlyprepared solution of 12.14 mM potassium ferricyanide in3.35 M NaOH. Samples or standards (1 ml) were mixedwith 100 μl methanol; 200 μl of oxidation agent wasadded, mixed for 30 s, and 100 μl of 1.43 M phosphoric

acid was then added; the final pH was 6.9±0.2. Thestandards (thiamin and its mono- and diphosphates) weremade up in 7.2% perchloric acid/0.25 M NaOH (1:1, v/v).Samples (50 μl) were separated on an analytical C18column (100×4.6 mm, 3 μm particle size) eluted(1 ml min−1) with a gradient of 10–20% methanol/water(70:30, v/v) in 0.2 M KH2PO4 containing 0.3 mMtetrabutylammonium hydroxide, pH 7.0/methanol(88.5:11.5, v/v). Fluorometric detection wavelengths were365 nm (excitation) and 435 nm (emission). Analysis offolate vitamers is described in Online Resource 1.

Results

COG0212 is an ancient, widely distributed paralogof 5-FCL

A survey of genomes in GenBank and the Joint GenomeInstitute (as of August 2010) showed that COG0212proteins occur in plants (algae, mosses, lycopods, gymno-sperms, angiosperms), animals (chordates, arthropods,annelids, mollusks, echinoderms), certain ascomycetes,many archaea, and a small number of taxonomicallydiparate bacteria. COG0212 appears not to occur in mostfungi, protists, or some lower animals (nematodes, flat-worms, cnidarians). Distribution data for 913 prokaryotesand representative eukaryotes are available at the SEEDdatabase http://theseed.uchicago.edu in the subsystem titled5-FCL-like protein.

All COG0212 proteins share a domain of approximately250 residues (Fig. 2a). In addition, plant proteins have apredicted N-terminal chloroplast targeting peptide, andmost animal proteins have a C-terminal extension that, inchordates, contains an RNA recognition motif (RRM;Fig. 2a). RRMs are common, versatile domains that interactwith nucleic acids or proteins (Maris et al. 2005). As notedabove, COG0212 proteins have limited sequence similarityto 5-FCL proteins in a roughly 80-residue region toward theC terminus (Fig. 2b). The most conserved set of residues(underlined in red in Fig. 2b) correspond in 5-FCLs to thecore of the active site that binds both 5-CHO-THF and ATP(Chen et al. 2004; Chen et al. 2005). In 5-FCLs, thepenultimate residue of this conserved set is tyrosine, andchanging it to alanine causes almost total (97–99%) loss of5-FCL activity (Field et al. 2007; Wu et al. 2009). Incontrast, the penultimate residue in COG0212 proteins istypically alanine or serine and never tyrosine (Fig. 2b). Thissingle-residue difference, like the overall sequence diver-gence, suggests that COG0212 lacks 5-FCL activity andhas some other function. Besides sharing homology with theactive site region of 5-FCL, which is a kinase (Fig. 1),

Funct Integr Genomics

Page 5: A 5-formyltetrahydrofolate cycloligase paralog from all domains of life: comparative genomic and experimental evidence for a cryptic role in thiamin metabolism

COG0212 proteins have long-range homology to otherkinases, as detected by PSI-Blast (Altschul et al. 1997).Thus, whatever the specific function of COG0212 may be, itseems likely to involve an ATP-dependent phosphorylation.

Phylogenetic analysis of pro- and eukaryotic COG0212and 5-FCL proteins, including many that co-occur in thesame genomes, shows that they belong to separate clades(Fig. 2c). The COG0212 and 5-FCL families are thusanciently diverged paralogs, which again suggests differentfunctions. Within the COG0212 clade, most eukaryote

proteins robustly branch together, whereas for prokaryotesthe deeper branches of the tree are largely unresolved. As agroup, COG0212 proteins are highly conserved. Thus,pairwise sequence comparisons between COG0212 proteinstypically show higher percent identities than those between5-FCL proteins from the same genomes (Fig. S2). As verydiverse 5-FCL proteins are known to be isofunctional(Holmes and Appling 2002; Chen et al. 2005; Jeanguenin etal. 2010), the greater sequence conservation of COG0212proteins implies that they may likewise be isofunctional.

Syntrophobacter fumaroxidansClostridiales bacterium*Ochrobactrum anthropi 2976* Ochrobactrum anthropi 2980* Pyrenophora tritici-repentis* Bacillus halodurans* Korarchaeum cryptofilumThermofilum pendensHaloferax volcaniiSalinibacter ruberAeropyrum pernixStreptomyces scabieiSynechococcus PCC7002Thiomonas sp.*Sulfolobus solfataricusPyrobaculum aerophilumDrosophila melanogaster* Danio rerio* Xenopus laevis* Gallus gallus* Homo sapiens* Chlamydomonas reinhardtii*

Arabidopsis thaliana* Physcomitrella patens*

5896

100

7460

99

9598

73

68

65

64

97

COG0212

Plants & Animals

Pyrenophora tritici-repentis*

Danio rerio* Xenopus laevis* Gallus gallus* Homo sapiens* Drosophila melanogaster* Chlamydomonas reinhardtii* Physcomitrella patens* Arabidopsis thaliana*

70

9976

Ochrobactrum anthropi* Thiomonas sp.*Clostridiales bacterium*Bacillus halodurans*

65

5-FCL

Plants & Animals

c

b

a

Animals

RRM

ProkaryotesCOG0212

PlantsTP

50 aa

5 B. halodurans 118 LDLLLVPGVAFD-QKGNRLGYGGGYYDRFLH------SYKGKTIALAYSQQLVESVP------TDERDERVQMIVTERGVY5 O. anthropi 121 PAILLMPLAGFD-QRGHRLGYGAGHYDRALARFTERGLQPLLIGMAFDCQEVEHVPN-------EPHDIALNQILTESGLR5 Human 129 LDLIFMPGLGFD-KHGNRLGRGKGYYDAYLKRCLQHQEVKPYTLALAFKEQICLQVP------VNENDMKVDEVLYEDSST5 Arabidopsis 188 IDLFILPGLAFD-RCGRRLGRGGGYYDTFLKRYQDRAKEKGWRYPLMVALSYSPQILEDGSIPVTPNDVLIDALVTPSGVVC H. volcanii 127 IDLVVSGSVAVS-ETGARVGKGEGFSDLEYAVLRGLGAVTAETTVATTVHERQVRDD---LPEPDAHDVPMDFVVTPDRLVC B. halodurans 126 IDLIVMGSVAVD-RSGRRIGKGEGYADREYAIIRELGNR--PVPVVTTVHQVQLVDV---ELPRDAYDVTVDWIATTEGLMC O. anthropi 134 VDYMVTGTGAIN-LEGVRFGKGHGFFDAEWGMLYRLGRITAATPAAAVVHDCQVLSEK---LTPDVYDTVADVIFTPTRTIC Human 134 VDLVVVGSVAVS-EKGWRIGKGEGYADLEYAMMVSMGAVSKETPVVTIVHDCQVVDIP--EELVEEHDITVDYILTPTRVIC Arabidopsis 215 VDLIVIGSVAVNPQTGARLGKGEGFAELEYGMLRYMGAIDDSTPVVTTVHDCQLVDDIP-LEKLAIHDVPVDIICTPTRVI

Fig. 2 Primary structure andphylogeny of COG0212 and 5-FCL proteins. a Comparison ofthe domain structures of pro-karyotic, plant, and animalCOG0212 proteins. Note thecommon core domain, the pre-dicted targeting peptide (TP) inplants, and the C-terminal ex-tension that in chordates con-tains an RNA recognition motif(RRM). Insect proteins have alonger C-terminal extensionwithout an RRM motif. Thepositions of the conservedregions in the alignment beloware marked in darker gray. bAmino acid sequence alignmentof the conserved regions ofrepresentative 5-FCL (5) andCOG0212 (C) proteins. Identicalresidues are shaded in black,similar residues in gray. Dashesare gaps introduced to maximizealignment. The most conservedset of residues is underlined inred. Full names of prokaryotes:Bacillus halodurans, Ochrobac-trum anthropi, Haloferax volca-nii. The COG0212 sequencefrom O. anthropi is Oant_2976.c Unrooted neighbor-joiningtree for COG0212 and 5-FCLproteins. Only nodes with boot-strap values (1,000 replicates) of>50% are indicated; values areshown next to nodes. Only thetree topology is shown, so thatbranch lengths are not propor-tional to estimated numbers ofamino acid substitutions.Species with both COG0212and 5-FCL sequences in the treeare marked with asterisks

Funct Integr Genomics

Page 6: A 5-formyltetrahydrofolate cycloligase paralog from all domains of life: comparative genomic and experimental evidence for a cryptic role in thiamin metabolism

The plant COG0212 protein is chloroplast-localized

That plant COG0212 proteins have an N-terminal extensionwith the properties of a chloroplast targeting peptide led usto test for organellar targeting using in vivo and in vitroapproaches. When the full-length A. thaliana COG0212protein (At1g76730), or its predicted targeting peptide(residues 1–109), were fused to green fluorescent protein(GFP), they directed GFP exclusively to chloroplasts intransient expression experiments with A. thaliana meso-phyll protoplasts (Fig. 3a). Controls using GFP aloneshowed no organellar targeting (Fig. 3a). This result wassubstantiated by in vitro data from dual import assays(Rudhe et al. 2002) in which mixtures of isolated peachloroplasts and mitochondria were incubated with radio-

labeled full-length A. thaliana COG0212 (Fig. 3b). Afterincubation, chloroplasts contained a labeled product thatwas smaller in size than the full-length precursor andresistant to thermolysin digestion, as expected for a trans-located protein. No translocated protein was detected inmitochondria (Fig. 3b). A proteomics study of A. thalianaalso detected the COG0212 protein in chloroplast stroma(Zybailov et al. 2008). The chloroplastic location ofCOG0212 contrasts with that of 5-FCL, which is mito-chondrial in plants (Roje et al. 2002).

COG0212 is essential in A. thaliana

To assess the physiological significance of COG0212, weanalyzed two A. thaliana T-DNA mutant lines from the

Optical GFP Chlorophyll Overlay

Full-lengthCOG0212

Targeting peptideCOG0212

GFP control

a

b

43

CP

− + − +MT

THTP56

Fig. 3 Evidence that the plant COG0212 protein is chloroplast-targeted. a Transient expression in A. thaliana mesophyll protoplastsof green fluorescent protein (GFP) fused to the C terminus of full-length A. thaliana COG0212 (upper panels) or to the predictedtargeting peptide of A. thaliana COG0212 (middle panels) and GFPalone (lower panels). GFP (green pseudo-color) and chlorophyll (redpseudo-color) fluorescence were observed by confocal microscopy.Scale bars=10 μm. b Protein import into isolated pea chloroplasts andmitochondria. The full-length A. thaliana COG0212 sequence was

translated in vitro in the presence of [3H]leucine. The translationproducts were incubated for 15 min in the light with chloroplasts (CP)and mitochondria (MT), which were then re-purified on an 8% (v/v)Percoll gradient, without or with prior thermolysin (TH) treatment toremove adsorbed proteins. Proteins were separated by SDS-PAGE andvisualized by fluorography. Samples were loaded on the basis of equalchlorophyll or mitochondrial protein content next to an aliquot of thetranslation product (TP). The positions of molecular mass standards(kilodaltons) are indicated

Funct Integr Genomics

Page 7: A 5-formyltetrahydrofolate cycloligase paralog from all domains of life: comparative genomic and experimental evidence for a cryptic role in thiamin metabolism

Salk collection. PCR of genomic DNA confirmed that bothlines had an insertion at the same site in the third exon(Fig. S3a). The seed of both lines obtained from ABRCcontained only wild-type and heterozygous individuals, andno homozygous mutants were found in the progeny ofheterozygotes of either line. Further analysis of one lineshowed that selfed heterozygotes gave only wild-type andheterozygous progeny in a ratio that was a good fit to 1:2(Fig. S3b, c). This result is consistent with zygotic lethality.In agreement with this explanation, reciprocal crossesbetween heterozygous and wild-type plants gave almostequal numbers of heterozygous and wild-type progeny(Fig. S3d). The lethality was presumably manifested earlyin development because germination was normal (Fig. S3c)and siliques contained no malformed seeds. The essentialityof A. thaliana COG0212 again distinguishes it from 5-FCL,which is non-essential (Goyer et al. 2005).

Comparative genomics links COG0212 to thiamin,not folates

The evidence that COG0212 has an indispensable functionin plants prompted us to apply comparative genomicsanalysis to predict what that function might be (Overbeek etal. 1999; Date and Marcotte 2003; Hanson et al. 2009).Exploratory work used the STRING database (Jensen et al.2009); the bulk of the analysis was done with the SEEDdatabase and its tools (Overbeek et al. 2005). Bothdatabases integrate evidence for associations between genesbased on their physical clustering on the chromosome, theirdistribution among genomes (“phylogenetic profiles”), aswell as postgenomic data. STRING is entirely pre-computed whereas the SEED is user-driven, more flexible,and consequently more powerful.

STRING predicted a medium to high confidencerelationship with thiamin synthesis or salvage, based onan operonic arrangement in the archaeon Pyrobaculumislandicum of genes encoding COG0212 and the thiaminsynthesis and salvage enzyme ThiD, which has both HMPand HMP-P kinase activities. Further analysis with SEEDrobustly implicated COG0212 in thiamin metabolism. First,the COG0212 gene in other archaea was found next to afusion gene specifying ThiD and thiamin phosphatesynthase (ThiN); in Pyrobaculum aerophilum, the genecluster also includes an operon encoding an ECF-typetransporter whose substrate capture component (ThiW) ispredicted to bind the thiamin precursor thiazole (Rodionovet al. 2009; Fig. 4a). Second, the COG0212 gene in yetother archaea and the bacterium Thermus thermophilus isclustered with genes encoding one or more subunits of thethiamin pyrophosphate-dependent pyruvate dehydrogenasecomplex (Fig. 4b). Relatedly, the ATTED-II database(Obayashi and Kinoshita 2010) shows that A. thaliana

COG0212 is coexpressed with pyruvate dehydrogenasekinase (At3g06483), which regulates pyruvate dehydroge-nase. Third, among bacteria with COG0212, O. anthropi andOchrobactrum intermedium have operonic structures inwhich genes for two COG0212 proteins (having 40%identity) flank genes for the subunits ThiX-ThiY-ThiZ of anABC transporter predicted to import the thiamin degradationproducts HMP and/or N-formylpyrimidine (Rodionov et al.2002; Jenkins et al. 2007; Fig. 4c). The substrate bindingcomponent of this transporter (ThiY) shares sequencesimilarity with Thi3 of Schizosaccharomyces pombe andThi5 of Saccharomyces cerevisiae, which are enzymes ofHMP synthesis (Rodionov et al. 2002). Other bacteria have asingle COG0212 gene in an operonic arrangement withgenes for ABC transporters predicted to import pyrimidines(potentially including HMP), based on clustering withpyrimidine-related genes in other genomes. Examples in-clude Thiomonas sp. and a Clostridiales bacterium (Fig. 4c).

Because of the structural similarity between COG0212and 5-FCL, we also searched for associations with genes offolate metabolism, beginning with phylogenetic profiles.Whereas virtually all eubacterial and eukaryote genomeswith COG0212 encode a canonical 5-FCL protein, somearchaea have COG0212, a few have 5-FCL, and many haveneither (Fig. 4d). COG0212 and 5-FCL genes thus fail toshow a reciprocal distribution pattern indicative of func-tional interchangeability (Fig. 4d). Moreover, only certainarchaea (particularly class Halobacteria) have folates(Worrell and Nagle 1988; White 1991, 1993; Buchenauand Thauer 2004). The rest have methanopterins or otherfolate analogs whose chemistry differs from that of folatessuch that the analog of 5-CHO-THF (5-formylmethanop-terin) is not metabolized via a reaction like that of 5-FCL(Maden 2000). Were COG0212 folate-dependent, it wouldtherefore be expected to be confined to archaea with folates,but this is not the case (Fig. 4d). The phylogenetic profileof COG0212 thus adds to the evidence against its being a 5-FCL and further suggests that its function is not connectedwith folates. Additional negative evidence on this point isthat prokaryotic COG0212 genes do not cluster with genesof folate synthesis, metabolism, or transport.

Experimental evidence implicating COG0212 in thiaminmetabolism

Support for the prediction that COG0212 is linked tothiamin was sought using mutational, functional comple-mentation, and biochemical approaches.

Analysis of H. volcanii COG0212 deletants

The archaeon H. volcanii has a single COG0212 gene(locus tag HVO_1928). This gene was ablated by targeted

Funct Integr Genomics

Page 8: A 5-formyltetrahydrofolate cycloligase paralog from all domains of life: comparative genomic and experimental evidence for a cryptic role in thiamin metabolism

deletion (Fig. S4a, b). Deletant strains showed no growthphenotype on thiamin-free medium, showing thatCOG0212 is not required for de novo thiamin formation.

Analysis of thiamin and its phosphates, however, showed asignificant three-fold accumulation of thiamin monophos-phate in deletant strains (Fig. 5a), which is consistent with arole for COG0212 in thiamin metabolism.

Complementation and biochemical assays

A complementation assay was used to test for the capacity tophosphorylate HMP-P, which is needed for both salvage andsynthesis. None of the three COG0212 genes tested restoredthiamin prototrophy to an E. coli thiD (HMP phosphatekinase) deletant, although the positive control (E. coli thiD)did so (Fig. 5b). Thiamin is known to undergo numerousdegradative reactions (Fig. S5), some of whose products aretoxic, but this area of metabolism is little explored andsalvage and detoxification enzymes are still being discovered(Jenkins et al. 2007; Jurgenson et al. 2009; Mukherjee et al.2010). We therefore tested representative COG0212 proteinsfor several known and hypothetical activities involvingthiamin, its breakdown products, and their phosphates; thereactions tested are summarized in Figures S6 and S7 anddescribed in detail in Online Resource 1. No activity wasdetected for any of these reactions.

Experimental evidence that COG0212 is unconnectedto folates

Experimental support for the bioinformatic predictions thatCOG0212 is neither a 5-FCL nor otherwise folate-related

a

b

c

d

�Fig. 4 Comparative genomic evidence associating COG0212 withthiamin metabolism and dissociating it from folate metabolism. aClustering of archaeal COG0212 genes with genes for thiaminmetabolism and transport. Arrows represent the direction of transcrip-tion. Colors denote homologous genes; gray denotes other genes.Note that the COG0212-thiD/thiN duplet is conserved despite changesin gene orientation and flanking genes. Genes of the predicted thiazoleECF family transporter: thiW substrate capture component, TMtransmembrane component, ATPase ATPase component, LP lipopro-tein component (Rodionov et al. 2009). b Clustering of bacterial andarchaeal COG0212 genes with genes encoding one or more subunits(E1, E2, E3) of the pyruvate dehydrogenase complex, which requiresthiamin pyrophosphate as cofactor. Color key as above. c Clusteringof bacterial COG0212 genes with genes encoding components ofABC transporters predicted to import HMP and/or N-formylpyrimi-dine (Rodionov et al. 2002; Jenkins et al. 2007), or pyrimidines orpurines. PBP periplasmic binding protein. In O. anthropi, theCOG0212 gene on the left of the cluster is locus Oant_2980 and thaton the right is Oant_2976. d Distribution among archaea of folatesand folate analogs in relation to distribution of genes encoding 5-FCLand COG0212. The figure shows only species from genera in whichchemical, biochemical, or genomic evidence supports the presence offolates or folate analogs such as methanopterin and sarcinapterin(Worrell and Nagle 1988; van de Wijngaard et al. 1991; White 1988,1991, 1993; Gorris and van der Drift 1994; Lin and Sparling 1998;Lin and White 1988; Buchenau and Thauer 2004; Grochowski et al.2007; Levin et al. 2007; Falb et al. 2008; Boroujerdi and Young2009). The phylogeny is from Spang et al. (2010)

Funct Integr Genomics

Page 9: A 5-formyltetrahydrofolate cycloligase paralog from all domains of life: comparative genomic and experimental evidence for a cryptic role in thiamin metabolism

was sought using three approaches: functional complemen-tation in E. coli, enzyme assays in vitro and in vivo, andfolate analysis of recombinant or mutant strains. All gavenegative results.

Functional complementation of a ygfA mutant

Ablation of the gene (ygfA) encoding 5-FCL in E. coliresults in accumulation of 5-CHO-THF and inability togrow on minimal medium when glycine is the sole nitrogensource (Jeanguenin et al. 2010). This growth phenotypemakes possible a complementation assay for genes encod-ing 5-FCL activity (or other activities that remove 5-CHO-THF). When six diverse COG0212 genes (from A. thalianaand various bacteria) were tested in this assay, nonesupported growth although the ygfA positive control didso (Fig. S8a). The A. thaliana COG0212 construct specifiedthe predicted mature protein, i.e., without the targetingpeptide.

Enzyme assays

Mature A. thaliana COG0212 and B. halodurans COG0212were expressed in E. coli, and crude extracts were used toassay spectrophotometrically for 5-FCL activity (Fig. S8b).Neither protein showed activity although the positivecontrol (A. thaliana 5-FCL) was active, as expected (Rojeet al. 2002). Furthermore, HPLC analysis of COG0212reaction mixtures detected no products of any kind. An invivo [14C]formate fixation assay tested the possibility thatCOG0212 proteins have formate–tetrahydrofolate ligaseactivity, i.e., that they mediate the ATP-dependent couplingof formate to tetrahydrofolate. Expressing B. haloduransCOG0212 in E. coli (which lacks formate–tetrahydrofolateligase) did not confer [14C]formate fixation, and ablatingthe COG0212 gene in H. volcanii, which has formate-tetrahydrofolate ligase, did not impair [14C]formate fixation(not shown).

Folate analyses

To confirm that COG0212 proteins do not act on 5-CHO-THF and to screen for other folate-related activities, folateprofiles were determined for the E. coli ΔygfA strainexpressing A. thaliana or Synechococcus sp. COG0212, orvector alone. Cells were grown on rich (LB) or minimal(M9) medium. Expression of the COG0212 proteins did notsignificantly affect levels of 5-CHO-THF or other folates(Fig. S8c). Were, for instance, COG0212 to have 5-FCLactivity, a reduction in 5-CHO-THF level would beanticipated (Jeanguenin et al. 2010). The folate profiles ofH. volcanii COG0212 knockout strains were also analyzed;these displayed no accumulation of 5-CHO-THF or otherchanges relative to wild type (not shown).

Discussion

The comparative genomic and experimental evidencepresented above establishes a positive connection betweenCOG0212 and thiamin. Comparative genomic evidenceshows that some 19% of prokaryotic COG0212 genes(from a total of 70 in GenBank as of August 2010) areclustered on the chromosome with one or more of a dozengenes that are known, or strongly inferred, to mediatemetabolism and transport of thiamin or its precursors. Thelink to thiamin can be more specifically made to thiaminmetabolism, not de novo synthesis, because (a) COG0212occurs in animals, which cannot synthesize thiamin, and (b)COG0212 occurs in prokaryote and plant genomes thatencode complete thiamin synthesis pathways (Rodionov etal. 2002; Goyer 2010). These arguments assume thatanimal, prokaryote, and plant COG0212 proteins are

0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

0

20

40

60

80

100

120

140

160

Thi

amin

form

s (p

mol

mg

-1pr

otei

n)

T TMP TPP

WTKO

*a

+ Thiamin

thiD thiD

V Oa76 Oa80Sf

b

- Thiamin

Fig. 5 Experimental evidence implicating COG0212 proteins inthiamin metabolism. a Levels of thiamin (T), its monophosphate(TMP), and pyrophosphate (TPP) in H. volcanii COG022 knockoutcells and wild-type controls. Data are means of three biologicalreplicates; error bar shows the standard error of the mean. Theasterisk indicates a significant difference (p<0.05; Student’s t test)between the wild type and knockouts. b Failure of COG0212 genes tofunctionally complement the E. coli ΔthiD strain. Cells were grownon M9 medium containing 0.2% glucose, plus or minus 10 μMthiamin. Note that the positive control, E. coli thiD, restored thiaminprototrophy. Sources of COG0212 genes: Sf S. fumaroxidans, Oa76O. anthropi Oant_2976, Oa80 O. anthropi Oant_2980. V vector alone

Funct Integr Genomics

Page 10: A 5-formyltetrahydrofolate cycloligase paralog from all domains of life: comparative genomic and experimental evidence for a cryptic role in thiamin metabolism

isofunctional; this seems warranted inasmuch as COG0212proteins are more conserved than 5-FCL proteins that areknown to be isofunctional (Fig. S2). The chloroplasticlocation of COG0212 is consistent with a role in thiaminmetabolism because chloroplasts are the site of at least onethiamin salvage reaction (HMP phosphorylation) as well asseveral biosynthetic steps (Goyer 2010).

Experimental support for a role in thiamin metabolism asopposed to synthesis comes from the thiamin prototrophyof H. volcanii COG0212 knockout strains and from theexpanded thiamin monophosphate pool in these strains.That the COG0212 knockout is lethal in A. thaliana is notnecessarily inconsistent with its non-essentiality in H.volcanii. In plants and other higher organisms, particularcells or tissues often rely on others to synthesize essentialmetabolites de novo, being themselves capable only ofsalvage. A salvage defect in a critical cell type ordevelopmental stage can therefore impact viability orgrowth. Indeed, thiamin itself provides an instance of this:Thiamin synthesis genes are barely expressed in roots,which cannot produce thiamin at a sufficient rate for growth(Goyer 2010).

The evidence for a role in thiamin metabolism—such assalvage or detoxification of a degradation product—promp-ted tests for certain known and hypothetical activities of thistype, particularly those using ATP (based on the kinase-likestructure of COG0212). That the results were negative by nomeans rules out a role for COG0212 in metabolizingthiamin breakdown products because this area is too poorlyknown to define the full set of reactions that should betested; our exploratory work therefore covered only a subsetof reasonable possibilities. More generally, it should benoted that salvage and detoxification are probably major butunder-recognized facets of the metabolism of many labilecompounds besides thiamin and that the enzymes involvedare mostly still unidentified (Galperin et al. 2006). In sum,our bioinformatic and experimental data make it reasonableto infer that COG0212 mediates a reaction of thiaminmetabolism, particularly salvage or detoxification of break-down products, and that this reaction requires ATP. Anaccurate, informative annotation for COG0212 at this pointwould be “5-FCL paralog implicated in thiamin metabolism.”

Our comparative genomic, genetic, and biochemicalevidence all make it unlikely that COG0212 proteins have5-FCL activity or any other role in folate metabolism. Thereis consequently no justification for continuing to annotateCOG0212 as being 5-FCL, or folate-related in any way.

Finally, it is informative to consider some negativeconsequences of misannotating COG0212 as “5-FCL” andwhat can be done to avoid such errors. In archaea, themisannotation confounds the vexed issue of which taxahave folates and folate-dependent enzymes and which donot. In mammals and plants, it falsely implies that there are

two redundant 5-FCL enzymes to metabolize 5-CHO-THF.This error is significant in humans because 5-CHO-THF iswidely used in cancer chemotherapy (Stover and Schirch1993). In a wider sense, assigning a precise, superficiallyplausible but wrong annotation to a gene can deter furtherinquiry into its function and spark mistaken ideas about itsbiological role. As this study shows, such problems can bemitigated by using genome context evidence—gene clus-tering and phylogenetic distribution patterns in relation tothose of other genes—to inform the annotation processinstead of relying on sequence homology alone (Galperinand Koonin 2000).

Acknowledgments This work was supported in part by US NationalScience Foundation award # MCB-0839926 (to A.D.H.), by USDepartment of Energy award # FG02-07ER64498 (to V. de C.-L.), byNIH award # DK44083 (to T.P.B.), and by an endowment from the C.V. Griffin, Sr. Foundation. We thank S.E. Giuliani, D.M. Corgliano,and F.R. Collart for conducting exploratory ligand binding assays; A.Noiriel for making the H. volcanii deletion construct; K. Cline, C.Aldridge, and J.C. Waller for help with dual import experiments; andM. Ziemak for technical support.

References

Ajjawi I, Tsegaye Y, Shintani D (2007) Determination of the genetic,molecular, and biochemical basis of the Arabidopsis thalianathiamin auxotroph th1. Arch Biochem Biophys 459:107–114

Allers T, Ngo HP, Mevarech M, Lloyd RG (2004) Development ofadditional selectable markers for the halophilic archaeon Hal-oferax volcanii based on the leuB and trpA genes. Appl EnvironMicrobiol 70:943–953

Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W,Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a newgeneration of protein database search programs. Nucleic AcidsRes 25:3389–3402

Boroujerdi AF, Young JK (2009) NMR-derived folate-bound structureof dihydrofolate reductase 1 from the halophile Haloferaxvolcanii. Biopolymers 91:140–144

Buchenau B, Thauer RK (2004) Tetrahydrofolate-specific enzymes inMethanosarcina barkeri and growth dependence of this meth-anogenic archaeon on folic acid or p-aminobenzoic acid. ArchMicrobiol 182:313–325

Chen S, Shin DH, Pufan R, Kim R, Kim SH (2004) Crystal structureof methenyltetrahydrofolate synthetase from Mycoplasma pneu-moniae (GI: 13508087) at 2.2 Å resolution. Proteins 56:839–843

Chen S, Yakunin AF, Proudfoot M, Kim R, Kim SH (2005) Structuraland functional characterization of a 5,10-methenyltetrahydrofo-late synthetase from Mycoplasma pneumoniae (GI: 13508087).Proteins 61:433–443

Date SV, Marcotte EM (2003) Discovery of uncharacterized cellularsystems by genome-wide analysis of functional linkages. NatBiotechnol 21:1055–1062

de Crécy-Lagard V, El Yacoubi B, de la Garza RD, Noiriel A, HansonAD (2007) Comparative genomics of bacterial and plant folatesynthesis and salvage: predictions and validations. BMCGenomics 8:245

Durot M, Bourguignon PY, Schachter V (2009) Genome-scale modelsof bacterial metabolism: reconstruction and applications. FEMSMicrobiol Rev 33:164–190

Funct Integr Genomics

Page 11: A 5-formyltetrahydrofolate cycloligase paralog from all domains of life: comparative genomic and experimental evidence for a cryptic role in thiamin metabolism

Dyal-Smith M (2008) The halohandbook: protocols for halobacterialgenetics, Version 7. http://www.haloarchaea.com/resources/halohandbook/Halohandbook_2008_v7.pdf

Edwards K, Johnstone C, Thompson C (1991) A simple and rapidmethod for the preparation of plant genomic DNA for PCRanalysis. Nucleic Acids Res 19:1349

El Yacoubi B, Phillips G, Blaby IK, Haas CE, Cruz Y, Greenberg J, deCrécy-Lagard V (2009) A gateway platform for functionalgenomics in Haloferax volcanii: deletion of three tRNAmodification genes. Archaea 2:211–219

Falb M, Müller K, Königsmaier L, Oberwinkler T, Horn P, vonGronau S, Gonzalez O, Pfeiffer F, Bornberg-Bauer E, OesterheltD (2008) Metabolism of halophilic archaea. Extremophiles12:177–196

Field MS, Szebenyi DM, Perry CA, Stover PJ (2007) Inhibition of5,10-methenyltetrahydrofolate synthetase. Arch Biochem Bio-phys 458:194–201

Frishman D (2007) Protein annotation at genomic scale: the currentstatus. Chem Rev 107:3448–3466

Galperin MY, Koonin EV (2000) Who’s your neighbor? Newcomputational approaches for functional genomics. Nat Biotech-nol 18:609–613

Galperin MY, Koonin EV (2010) From complete genome sequence to‘complete’ understanding? Trends Biotechnol 28:398–406

Galperin MY, Moroz OV, Wilson KS, Murzin AG (2006) Housecleaning, a part of good housekeeping. Mol Microbiol 59:5–19

Gorris LG, van der Drift C (1994) Cofactor contents of methanogenicbacteria reviewed. Biofactors 4:139–145

Goyer A (2010) Thiamine in plants: aspects of its metabolism andfunctions. Phytochemistry 71:1615–1624

Goyer A, Collakova E, Díaz de la Garza R, Quinlivan EP, WilliamsonJ, Gregory JF 3rd, Shachar-Hill Y, Hanson AD (2005) 5-Formyltetrahydrofolate is an inhibitory but well toleratedmetabolite in Arabidopsis leaves. J Biol Chem 280:26137–26142

Grochowski LL, Xu H, Leung K, White RH (2007) Characterizationof an Fe2+-dependent archaeal-specific GTP cyclohydrolase,MptA, from Methanocaldococcus jannaschii. Biochemistry46:6658–6667

Guzman LM, Belin D, Carson MJ, Beckwith J (1995) Tightregulation, modulation, and high-level expression by vectorscontaining the arabinose pBAD promoter. J Bacteriol 177:4121–4130

Hanson AD, Pribat A, Waller JC, de Crécy-Lagard V (2009) ‘Unknown’proteins and ‘orphan’ enzymes: the missing half of the engineeringparts list—and how to find it. Biochem J 425:1–11

Holmes WB, Appling DR (2002) Cloning and characterization ofmethenyltetrahydrofolate synthetase from Saccharomyces cerevi-siae. J Biol Chem 277:20205–20213

Ishii K, Sarai K, Sanemori H, Kawasaki T (1979) Analysis ofthiamine and its phosphate esters by high-performance liquidchromatography. Anal Biochem 97:191–195

Janga SC, Díaz-Mejía JJ, Moreno-Hagelsieb G (2011) Network-basedfunction prediction and interactomics: the case for metabolicenzymes. Metab Eng 13:1–10

Jeanguenin L, Lara-Núñez A, Pribat A, Hamner Mageroy M, GregoryJF 3rd, Rice KC, de Crécy-Lagard V, Hanson AD (2010)Moonlighting glutamate formiminotransferases can functionallyreplace 5-formyltetrahydrofolate cycloligase. J Biol Chem285:41557–41566

Jenkins AH, Schyns G, Potot S, Sun G, Begley TP (2007) A newthiamin salvage pathway. Nat Chem Biol 3:492–497

Jensen LJ, Kuhn M, Stark M, Chaffron S, Creevey C, Muller J,Doerks T, Julien P, Roth A, Simonovic M, Bork P, von Mering C(2009) STRING 8—a global view on proteins and theirfunctional interactions in 630 organisms. Nucleic Acids Res 37:D412–D416

Jurgenson CT, Begley TP, Ealick SE (2009) The structural andbiochemical foundations of thiamin biosynthesis. Annu RevBiochem 78:569–603

Levin I, Mevarech M, Palfey BA (2007) Characterization of a novelbifunctional dihydropteroate synthase/dihydropteroate reductaseenzyme from Helicobacter pylori. J Bacteriol 189:4062–4069

Lin Z, Sparling R (1998) Investigation of serine hydroxymethyltrans-ferase in methanogens. Can J Microbiol 44:652–656

Lin XL, White RH (1988) Distribution of charged pterins innonmethanogenic archaebacteria. Arch Microbiol 150:541–546

Liolios K, Chen IM, Mavromatis K, Tavernarakis N, Hugenholtz P,Markowitz VM, Kyrpides NC (2010) The Genomes On LineDatabase (GOLD) in 2009: status of genomic and metagenomicprojects and their associated metadata. Nucleic Acids Res 38:D346–D354

Maden BE (2000) Tetrahydrofolate and tetrahydromethanopterincompared: functionally distinct carriers in C1 metabolism.Biochem J 350:609–629

Maris C, Dominguez C, Allain FH (2005) The RNA recognitionmotif, a plastic RNA-binding platform to regulate post-transcriptional gene expression. FEBS J 272:2118–2131

Mukherjee T, McCulloch KM, Ealick SW, Begley TP (2010) Cofactorcatabolism. In: Mander L, Liu H-W (eds) Comprehensive naturalproducts II, chemistry and biology, vol 7. Elsevier, Amsterdam,pp 649–674

Niwa Y (2003) A synthetic green fluorescent protein gene for plantbiotechnology. Plant Biotechnol 20:1–11

Obayashi T, Kinoshita K (2010) Coexpression landscape in ATTED-II: usage of gene list and gene network for various types ofpathways. J Plant Res 123:311–319

Overbeek R, Fonstein M, D’Souza M, Pusch GD, Maltsev N (1999)The use of gene clusters to infer functional coupling. Proc NatlAcad Sci USA 96:2896–2901v

Overbeek R, Begley T, Butler RM, Choudhuri JV, Chuang HY,Cohoon M, de Crécy-Lagard V, Diaz N, Disz T, Edwards R,Fonstein M, Frank ED, Gerdes S, Glass EM, Goesmann A,Hanson A, Iwata-Reuyl D, Jensen R, Jamshidi N, Krause L,Kubal M, Larsen N, Linke B, McHardy AC, Meyer F, NeuwegerH, Olsen G, Olson R, Osterman A, Portnoy V, Pusch GD,Rodionov DA, Rückert C, Steiner J, Stevens R, Thiele I,Vassieva O, Ye Y, Zagnitko O, Vonstein V (2005) Thesubsystems approach to genome annotation and its use in theproject to annotate 1000 genomes. Nucleic Acids Res 33:5691–5702

Pribat A, Noiriel A, Morse AM, Davis JM, Fouquet R, Loizeau K,Ravanel S, Frank W, Haas R, Reski R, Bedair M, Sumner LW,Hanson AD (2010) Nonflowering plants possess a unique folate-dependent phenylalanine hydroxylase that is localized in chlor-oplasts. Plant Cell 22:3410–3422

Reddick JJ, Nicewonger R, Begley TP (2001) Mechanistic studies onthiamin phosphate synthase: evidence for a dissociative mecha-nism. Biochemistry 40:10095–10102

Rodionov DA, Vitreschak AG, Mironov AA, Gelfand MS (2002)Comparative genomics of thiamin biosynthesis in procaryotes.New genes and regulatory mechanisms. J Biol Chem 277:48949–48959

Rodionov DA, Hebbeln P, Eudes A, ter Beek J, Rodionova IA, ErkensGB, Slotboom DJ, Gelfand MS, Osterman AL, Hanson AD,Eitinger T (2009) A novel class of modular transporters forvitamins in prokaryotes. J Bacteriol 191:42–51

Roje S, Janave MT, Ziemak MJ, Hanson AD (2002) Cloning andcharacterization of mitochondrial 5-formyltetrahydrofolate cyclo-ligase from higher plants. J Biol Chem 277:42748–42754

Rudhe C, Chew O, Whelan J, Glaser E (2002) A novel in vitro systemfor simultaneous import of precursor proteins into mitochondriaand chloroplasts. Plant J 30:213–220

Funct Integr Genomics

Page 12: A 5-formyltetrahydrofolate cycloligase paralog from all domains of life: comparative genomic and experimental evidence for a cryptic role in thiamin metabolism

Schnoes AM, Brown SD, Dodevski I, Babbitt PC (2009) Annotationerror in public databases: misannotation of molecular function inenzyme superfamilies. PLoS Comput Biol 5:e1000605

Spang A, Hatzenpichler R, Brochier-Armanet C, Rattei T, Tischler P,Spieck E, Streit W, Stahl DA, Wagner M, Schleper C (2010)Distinct gene set in two different lineages of ammonia-oxidizingarchaea supports the phylum Thaumarchaeota. Trends Microbiol18:331–340

Stover P, Schirch V (1993) The metabolic role of leucovorin. TrendsBiochem Sci 18:102–106

Tamura K, Dudley J, Nei M, Kumar S (2007) MEGA4: MolecularEvolutionary Genetics Analysis (MEGA) software version 4.0.Mol Biol Evol 24:1596–1599

Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B,Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikol-skaya AN, Rao BS, Smirnov S, Sverdlov AV, Vasudevan S, WolfYI, Yin JJ, Natale DA (2003) The COG database: an updatedversion includes eukaryotes. BMC Bioinform 4:41

Temple CT, Montgomery JA (1984) Chemical and physical propertiesof folic acid and reduced derivatives. In: Blakley RL, BenkovicSJ (eds) Folates and pterins, vol 1, 2nd edn. Wiley, New York, pp61–120

Thomas AA, De Meese J, Le Huerou Y, Boyd SA, Romoff TT,Gonzales SS, Gunawardana I, Kaplan T, Sullivan F, Condroski

K, Lyssikatos JP, Aicher TD, Ballard J, Bernat B, DeWolf W,Han M, Lemieux C, Smith D, Weiler S, Wright SK, Vigers G,Brandhuber B (2008) Non-charged thiamine analogs asinhibitors of enzyme transketolase. Bioorg Med Chem Lett18:509–512

van de Wijngaard WM, Creemers J, Vogels GD, van der Drift C(1991) Methanogenic pathways in Methanosphaera stadtmanae.FEMS Microbiol Lett 64:207–211

White RH (1988) Analysis and characterization of the folates in thenonmethanogenic archaebacteria. J Bacteriol 170:4608–4612

White RH (1991) Distribution of folates and modified folates inextremely thermophilic bacteria. J Bacteriol 173:1987–1991

White RH (1993) Structures of the modified folates in the extremelythermophilic archaebacterium Thermococcus litoralis. J Bacteriol175:3661–3663

Worrell VE, Nagle DP Jr (1988) Folic acid and pteroylpolyglutamatecontents of archaebacteria. J Bacteriol 170:4420–4423

Wu D, Li Y, Song G, Cheng C, Zhang R, Joachimiak A, Shaw N, LiuZJ (2009) Structural basis for the inhibition of human 5,10-methenyltetrahydrofolate synthetase by N10-substituted folateanalogues. Cancer Res 69:7294–7301

Zybailov B, Rutschow H, Friso G, Rudella A, Emanuelsson O, Sun Q,van Wijk KJ (2008) Sorting signals, N-terminal modificationsand abundance of the chloroplast proteome. PLoS ONE 3:e1994

Funct Integr Genomics