letterstonature - University of Washingtondepts.washington.edu/genetics/courses/genet553-sp02/tbgenome.pdf · The DsbA-DsbB system affects the formation of disulﬁde bonds in periplasmic

Nature © Macmillan Publishers Ltd 1998

8

letters to nature

190 NATURE | VOL 396 | 12 NOVEMBER 1998 | www.nature.com

Received 7 July; accepted 21 September 1998.

1. Dalbey, R. E., Lively, M. O., Bron, S. & van Dijl, J. M. The chemistry and enzymology of the type 1

signal peptidases. Protein Sci. 6, 1129±1138 (1997).

2. Kuo, D. W. et al. Escherichia coli leader peptidase: production of an active form lacking a requirement

for detergent and development of peptide substrates. Arch. Biochem. Biophys. 303, 274±280 (1993).3. Tschantz, W. R. et al. Characterization of a soluble, catalytically active form of Escherichia coli leader

peptidase: requirement of detergent or phospholipid for optimal activity. Biochemistry 34, 3935±3941

(1995).

4. Allsop, A. E. et al. in Anti-Infectives, Recent Advances in Chemistry and Structure-Activity Relationships

(eds Bently, P. H. & O'Hanlon, P. J.) 61±72 (R. Soc. Chem., Cambridge, 1997).5. Black, M. T. & Bruton, G. Inhibitors of bacterial signal peptidases. Curr. Pharm. Des. 4, 133±154

(1998).

6. Date, T. Demonstration by a novel genetic technique that leader peptidase is an essential enzyme in

Escherichia coli. J. Bacteriol. 154, 76±83 (1983).

7. Whitely, P. & von Heijne, G. The DsbA-DsbB system affects the formation of disul®de bonds inperiplasmic but not in intramembraneous protein domains. FEBS Lett. 332, 49±51 (1993).

8. Peat, T. S. et al. Structure of the UmuD9 protein and its regulation in response to DNA damage. Nature

380, 727±730 (1996).

9. Paetzel, M. et al. Crystallization of a soluble, catalytically active form of Escherichia coli leader

peptidase. Proteins Struct. Funct. Genet. 23, 122±125 (1995).10. van Klompenburg, W. et al. Phosphatidylethanolamine mediated insertion of the catalytic domain of

leader peptidase in membranes. FEBS Lett. 431, 75±79 (1998).

11. Kim, Y. T., Muramatsu, T. & Takahashi, K. Identi®cation of Trp 300 as an important residue for

Escherichia coli leader peptidase activity. Eur. J. Biochem. 234, 358±362 (1995).

12. Landolt-Marticorena, C., Williams, K. A., Deber, C. M. & Reithmeirer, R. A. Non-random distribu-tion of amino acids in the transmembrane segments of human type I single span membrane proteins.

J. Mol. Biol. 229, 602±608 (1993).

13. James, M. N. G. in Proteolysis and Protein Turnover (eds Bond, J. S. & Barrett, A. J.) 1±8 (Portland,

Brook®eld, VT, 1994).

14. Strynadka, N. C. J. et al. Molecular structure of the acyl-enzyme intermediate in b-lactamase at 1.7 AÊ

resolution. Nature 359, 393±400 (1992).

15. Manard, R. & Storer, A. C. Oxyanion hole interactions in serine and cysteine proteases. Biol. Chem.

Hoppe-Seyler 373, 393±400 (1992).

16. Nicolas, A. et al. Contribution of cutinase Ser 42 side chain to the stabilization of the oxyanion

transition state. Biochemistry 35, 398±410 (1996).17. Paetzel, M. et al. Use of site-directed chemical modi®cation to study an essential lysine in Escherichia

coli leader peptidase. J. Biol. Chem. 272, 9994±10003 (1997).

18. Paetzel, M. & Dalbey, R. E. Catalytic hydroxyl/amine dyads with serine proteases. Trends Biochem. Sci.

22, 28±31 (1997).19. von Heijne, G. Signal sequences. The limits of variation. J. Mol. Biol. 184, 99±105 (1985).

20. Izard, J. W. & Kendall, D. A. Signal peptides: exquisitely designed transport promoters. Mol. Microbiol.

13, 765±773 (1994).

21. Matthews, B. W. Solvent content of protein crystals. J. Mol. Biol. 33, 491±497 (1968).

22. Otwinowski, Z. in DENZO (eds Sawyer, L., Isaacs, N. & Baily, S.) 56±62 (SERC Daresbury Laboratory,Warrington, UK, 1993).

23. Collaborative Computational Project No. 4 The CCP4 suite: programs for protein crystallography.

Acta Crystallogr. D 50, 760±763 (1994).

24. Jones, T. A., Zou, J. Y., Cowan, S. W. & Kieldgaard, M. Improved methods for building protein models

in electron density maps and the location of errors in these models. Acta Crystallogr. A 47, 110±119 (1991).25. Brunger, A. T. X-PLOR: A System for X-ray Crystallography and NMR (Version 3.1) (Yale Univ. Press,

New Haven, 1987).

26. Tronrud, D. E. Conjugate-direction minimization: an improved method for the re®nement of

macromolecules. Acta Crystallogr. A 48, 912±916 (1992).27. Wolfe, P. B., Wickner, W. & Goodman, J. M. Sequence of the leader peptidase gene of Escherichia coli

and the orientation of leader peptidase in the bacterial envelope. J. Biol. Chem. 258, 12073±12080

(1983).

28. Kraulis, P. G. Molscript: a program to produce both detailed and schematic plots of protein structures.

J. Appl. Crystallogr. 24, 946±950 (1991).29. Nicholls, A., Sharp, K. A. & Honig, B. Protein folding and association: insights from the interfacial and

the thermodynamic properties of hydrocarbons. Proteins Struct. Funct. Genet. 11, 281±296 (1991).

30. Meritt, E. A. & Bacon, D. J. Raster3D: photorealistic molecular graphics. Methods Enzymol. 277, 505±

524 (1997).

Acknowledgements. We thank SmithKlineBeecham Pharmaceuticals for penem inhibitor; R. M. Sweetfor use of beamline X12C (NSLS, Brookhaven National Laboratory); G. Petsko for the ethylmercuryphosphate; M. N. G. James for access to equipment for characterization of earlier crystal forms of SPase;and S. Mosimann and S. Ness for discussions. This work was supported by the Medical Research Councilof Canada, the Canadian Bacterial Diseases Network of Excellence, and British Columbia MedicalResearch Foundation grants to N.C.J.S. M.P. is funded by an MRC of Canada post-doctoral fellowship,N.C.J.S. by an MRC of Canada scholarship, and R.E.D. by the NIH and the American Heart Association.

Correspondence and requests for materials should be addressed to N.C.J.S. (e-mail: [email protected]).

errata

Reconciling thespectrumofSagittariusA*withatwo-temperatureplasmamodelRohan Mahadevan

Nature 394, 651±653 (1998)..................................................................................................................................A misleading typographical error was introduced into the secondsentence of the bold introductory paragraph of this Letter: the word` infrared'' should be ` inferred''. M

Deciphering thebiologyof Mycobacterium tuberculosis fromthecompletegenomesequence

S. T. Cole, R. Brosch, J. Parkhill, T. Garnier, C. Churcher, D. Harris, S. V. Gordon, K. Eiglmeier, S. Gas, C. E. Barry III,F. Tekaia, K. Badcock, D. Basham, D. Brown, T. Chillingworth, R. Connor, R. Davies, K. Devlin, T. Feltwell, S. Gentles,N. Hamlin, S. Holroyd, T. Hornsby, K. Jagels, A. Krogh, J. McLean, S. Moule, L. Murphy, K. Oliver, J. Osborne, M. A. Quail,M.-A. Rajandream, J. Rogers, S. Rutter, K. Seeger, J. Skelton, R. Squares, S. Squares, J. E. Sulston, K. Taylor, S. Whitehead& B. G. Barrell

Nature 393, 537±544 (1998)..........................................................................................................................................................................................................................................................................As a result of an error during ®lm output, Table 1 was published with some symbols missing. The correct version can be found athttp://www.sanger.ac.uk and is reproduced again here (following pages).

Also, in Fig. 2, we incorrectly labelled Rv0649 as fadD37 instead of fabD2. Two of the genes for mycolyl transferases were inverted:Rv0129c encodes antigen 85C and not 85C9 as stated, whereas Rv3803c codes for the secreted protein MPT51 and not antigen 85C (Infect.Immun. 59, 372±382; 1991); Rv3803c is now designated fbpD. We thank Morten Harboe and Harald Wiker for drawing this to ourattention.

The sequence of Rv0746 from M. bovis BCG-Pasteur presented in Fig. 5b was incorrect and should have shown a 16-codon deletioninstead of 29, as indicated here:H37Rv.....GSGAPGGAGGAAGLWGTGGAGGAGGSSAGGGGAGGAGGAGGWLLGDGGAGGIGGAST...

..........:::::::::::::::::::: :::::::::::::::::::::

BCG.......GSGAPGGAGGAAGLWGTGGA----------------GGAGGWLLGDGGAGGIGGAST... M


8

letters to nature

NATURE | VOL 396 | 12 NOVEMBER 1998 | www.nature.com 191


8

letters to nature



8

letters to nature



8

letters to nature



8

letters to nature



8

letters to nature



8

letters to nature



8

letters to nature



8

NATURE | VOL 393 | 11 JUNE 1998 537

article

Deciphering the biology ofMycobacterium tuberculosis fromthe complete genome sequenceS. T. Cole*, R. Brosch*, J. Parkhill, T. Garnier*, C. Churcher, D. Harris, S. V. Gordon*, K. Eiglmeier*, S. Gas*,C. E. Barry III†, F. Tekaia‡, K. Badcock, D. Basham, D. Brown, T. Chillingworth, R. Connor, R. Davies,K. Devlin, T. Feltwell, S. Gentles, N. Hamlin, S. Holroyd, T. Hornsby, K. Jagels, A. Krogh§, J. McLean,S. Moule, L. Murphy, K. Oliver, J. Osborne, M. A. Quail, M.-A. Rajandream, J. Rogers, S. Rutter,K. Seeger, J. Skelton, R. Squares, S. Squares, J. E. Sulston, K. Taylor, S. Whitehead & B. G. Barrell

Sanger Centre, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK* Unite de Genetique Moleculaire Bacterienne, and ‡ Unite de Genetique Moleculaire des Levures, Institut Pasteur, 28 rue du Docteur Roux, 75724 Paris Cedex 15, France† Tuberculosis Research Unit, Laboratory of Intracellular Parasites, Rocky Mountain Laboratories, National Institute of Allergy and Infectious Diseases, NationalInstitutes of Health, Hamilton, Montana 59840, USA§ Center for Biological Sequence Analysis, Technical University of Denmark, Lyngby, Denmark

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Countlessmillions of people have died from tuberculosis, a chronic infectious disease caused by the tubercle bacillus.The complete genome sequence of the best-characterized strain of Mycobacterium tuberculosis, H37Rv, has beendeterminedandanalysed inorder to improveourunderstandingof thebiologyof thisslow-growingpathogenand tohelpthe conception of new prophylactic and therapeutic interventions. The genome comprises 4,411,529 base pairs,contains around4,000genes, andhasavery high guanine+ cytosine content that is reflected in the biasedamino-acidcontent of the proteins. M. tuberculosis differs radically from other bacteria in that a very large portion of its codingcapacity is devoted to the production of enzymes involved in lipogenesis and lipolysis, and to two new families ofglycine-rich proteins with a repetitive structure that may represent a source of antigenic variation.

Despite the availability of effective short-course chemotherapy(DOTS) and the Bacille Calmette-Guerin (BCG) vaccine, thetubercle bacillus continues to claim more lives than any othersingle infectious agent1. Recent years have seen increased incidenceof tuberculosis in both developing and industrialized countries, thewidespread emergence of drug-resistant strains and a deadlysynergy with the human immunodeficiency virus (HIV). In 1993,the gravity of the situation led the World Health Organisation (WHO)to declare tuberculosis a global emergency in an attempt to heightenpublic and political awareness. Radical measures are needed now toprevent the grim predictions of the WHO becoming reality. Thecombination of genomics and bioinformatics has the potential togenerate the information and knowledge that will enable theconception and development of new therapies and interventionsneeded to treat this airborne disease and to elucidate the unusualbiology of its aetiological agent, Mycobacterium tuberculosis.

The characteristic features of the tubercle bacillus include its slowgrowth, dormancy, complex cell envelope, intracellular pathogen-esis and genetic homogeneity2. The generation time of M. tubercu-losis, in synthetic medium or infected animals, is typically ,24hours. This contributes to the chronic nature of the disease, imposeslengthy treatment regimens and represents a formidable obstacle forresearchers. The state of dormancy in which the bacillus remainsquiescent within infected tissue may reflect metabolic shutdownresulting from the action of a cell-mediated immune response thatcan contain but not eradicate the infection. As immunity wanes,through ageing or immune suppression, the dormant bacteriareactivate, causing an outbreak of disease often many decadesafter the initial infection3. The molecular basis of dormancy andreactivation remains obscure but is expected to be geneticallyprogrammed and to involve intracellular signalling pathways.

The cell envelope of M. tuberculosis, a Gram-positive bacteriumwith a G + C-rich genome, contains an additional layer beyond thepeptidoglycan that is exceptionally rich in unusual lipids, glycoli-

pids and polysaccharides4,5. Novel biosynthetic pathways generatecell-wall components such as mycolic acids, mycocerosic acid,phenolthiocerol, lipoarabinomannan and arabinogalactan, andseveral of these may contribute to mycobacterial longevity, triggerinflammatory host reactions and act in pathogenesis. Little isknown about the mechanisms involved in life within the macro-phage, or the extent and nature of the virulence factors produced bythe bacillus and their contribution to disease.

It is thought that the progenitor of the M. tuberculosis complex,comprising M. tuberculosis, M. bovis, M. bovis BCG, M. africanumand M. microti, arose from a soil bacterium and that the humanbacillus may have been derived from the bovine form following thedomestication of cattle. The complex lacks interstrain geneticdiversity, and nucleotide changes are very rare6. This is importantin terms of immunity and vaccine development as most of theproteins will be identical in all strains and therefore antigenic driftwill be restricted. On the basis of the systematic sequence analysis of26 loci in a large number of independent isolates6, it was concludedthat the genome of M. tuberculosis is either unusually inert or thatthe organism is relatively young in evolutionary terms.

Since its isolation in 1905, the H37Rv strain of M. tuberculosis hasfound extensive, worldwide application in biomedical researchbecause it has retained full virulence in animal models of tubercu-losis, unlike some clinical isolates; it is also susceptible to drugs andamenable to genetic manipulation. An integrated map of the 4.4megabase (Mb) circular chromosome of this slow-growing patho-gen had been established previously and ordered libraries ofcosmids and bacterial artificial chromosomes (BACs) wereavailable7,8.

Organization and sequence of the genomeSequence analysis. To obtain the contiguous genome sequence, acombined approach was used that involved the systematic sequenceanalysis of selected large-insert clones (cosmids and BACs) as well as


8

random small-insert clones from a whole-genome shotgun library.This culminated in a composite sequence of 4,411,529 base pairs(bp) (Figs 1, 2), with a G + C content of 65.6%. This represents thesecond-largest bacterial genome sequence currently available (afterthat of Escherichia coli)9. The initiation codon for the dnaA gene, ahallmark for the origin of replication, oriC, was chosen as the startpoint for numbering. The genome is rich in repetitive DNA,particularly insertion sequences, and in new multigene familiesand duplicated housekeeping genes. The G + C content is relativelyconstant throughout the genome (Fig. 1) indicating that horizon-tally transferred pathogenicity islands of atypical base compositionare probably absent. Several regions showing higher than average G+ C content (Fig. 1) were detected; these correspond to sequencesbelonging to a large gene family that includes the polymorphic G +C-rich sequences (PGRSs).Genes for stable RNA. Fifty genes coding for functional RNAmolecules were found. These molecules were the three speciesproduced by the unique ribosomal RNA operon, the 10Sa RNAinvolved in degradation of proteins encoded by abnormal messen-ger RNA, the RNA component of RNase P, and 45 transfer RNAs.No 4.5S RNA could be detected. The rrn operon is situatedunusually as it occurs about 1,500 kilobases (kb) from the putativeoriC; most eubacteria have one or more rrn operons near to oriC toexploit the gene-dosage effect obtained during replication10. Thisarrangement may be related to the slow growth of M. tuberculosis.The genes encoding tRNAs that recognize 43 of the 61 possible sensecodons were distributed throughout the genome and, with one

exception, none of these uses A in the first position of the anticodon,indicating that extensive wobble occurs during translation. This isconsistent with the high G + C content of the genome and theconsequent bias in codon usage. Three genes encoding tRNAs formethionine were found; one of these genes (metV) is situated in aregion that may correspond to the terminus of replication (Figs 1,2). As metV is linked to defective genes for integrase and excisionase,perhaps it was once part of a phage or similar mobile geneticelement.Insertion sequences and prophages. Sixteen copies of the promis-cuous insertion sequence IS6110 and six copies of the more stableelement IS1081 reside within the genome of H37Rv8. One copy ofIS1081 is truncated. Scrutiny of the genomic sequence led to theidentification of a further 32 different insertion sequence elements,most of which have not been described previously, and of the 13E12family of repetitive sequences which exhibit some of the character-istics of mobile genetic elements (Fig. 1). The newly discoveredinsertion sequences belong mainly to the IS3 and IS256 families,although six of them define a new group. There is extensivesimilarity between IS1561 and IS1552 with insertion sequenceelements found in Nocardia and Rhodococcus spp., suggesting thatthey may be widely disseminated among the actinomycetes.

Most of the insertion sequences in M. tuberculosis H37Rv appearto have inserted in intergenic or non-coding regions, often neartRNA genes (Fig. 1). Many are clustered, suggesting the existence ofinsertional hot-spots that prevent genes from being inactivated, ashas been described for Rhizobium11. The chromosomal distributionof the insertion sequences is informative as there appears to havebeen a selection against insertions in the quadrant encompassingoriC and an overrepresentation in the direct repeat region thatcontains the prototype IS6110. This bias was also observed experi-mentally in a transposon mutagenesis study12.

At least two prophages have been detected in the genomesequence and their presence may explain why M. tuberculosisshows persistent low-level lysis in culture. Prophages phiRv1 andphiRv2 are both ,10 kb in length and are similarly organized, andsome of their gene products show marked similarity to thoseencoded by certain bacteriophages from Streptomyces and sapro-phytic mycobacteria. The site of insertion of phiRv1 is intriguing asit corresponds to part of a repetitive sequence of the 13E12 familythat itself appears to have integrated into the biotin operon. Somestrains of M. tuberculosis have been described as requiring biotin as agrowth supplement, indicating either that phiRv1 has a polar effecton expression of the distal bio genes or that aberrant excision,leading to mutation, may occur. During the serial attenuation of M.bovis that led to the vaccine strain M. bovis BCG, the phiRv1prophage was lost13. In a systematic study of the genomic diversityof prophages and insertion sequences (S.V.G. et al., manuscript inpreparation), only IS1532 exhibited significant variability, indicat-ing that most of the prophages and insertion sequences are currentlystable. However, from these combined observations, one can con-clude that horizontal transfer of genetic material into the free-livingancestor of the M. tuberculosis complex probably occurred in naturebefore the tubercle bacillus adopted its specialized intracellularniche.

article

538 NATURE | VOL 393 | 11 JUNE 1998

4,411,529 bp

H37Rv

0

4

1

2

M. tuberculosis

3

Figure 1 Circular map of the chromosome of M. tuberculosis H37Rv. The outer

circle shows the scale in Mb, with 0 representing the origin of replication. The first

ring from the exterior denotes the positions of stable RNA genes (tRNAs are blue,

others are pink) and the direct repeat region (pink cube); the second ring inwards

shows the coding sequence bystrand (clockwise, darkgreen; anticlockwise, light

green); the third ring depicts repetitive DNA (insertion sequences, orange; 13E12

REP family, dark pink; prophage, blue); the fourth ring shows the positions of the

PPE family members (green); the fifth ring shows the PE family members (purple,

excluding PGRS); and the sixth ring shows the positions of the PGRS sequences

(dark red). The histogram (centre) represents G + C content, with ,65% G + C in

yellow, and .65% G + C in red. The figure was generated with software from

DNASTAR.

Figure 2 Linear map of the chromosome of M. tuberculosis H37Rv showing the

position and orientation of known genes and coding sequences (CDS). We used

the following functional categories (adapted from ref. 20): lipid metabolism

(black); intermediary metabolism and respiration (yellow); information pathways

(pink); regulatory proteins (sky blue); conserved hypothetical proteins (orange);

proteins of unknown function (light green); insertion sequences and phage-

related functions (blue); stable RNAs (purple); cell wall and cell processes (dark

green); PE and PPE protein families (magenta); virulence, detoxification and

adaptation (white). For additional information about gene functions, refer to

http://www.sanger.ac.uk.

Q


8

Genes encoding proteins. 3,924 open reading frames were identi-fied in the genome (see Methods), accounting for ,91% of thepotential coding capacity (Figs 1, 2). A few of these genes appear tohave in-frame stop codons or frameshift mutations (irrespective ofthe source of the DNA sequenced) and may either use frameshiftingduring translation or correspond to pseudogenes. Consistent withthe high G + C content of the genome, GTG initiation codons (35%)are used more frequently than in Bacillus subtilis (9%) and E. coli(14%), although ATG (61%) is the most common translationalstart. There are a few examples of atypical initiation codons, themost notable being the ATC used by infC, which begins with ATT inboth B. subtilis and E. coli9,14. There is a slight bias in the orientationof the genes (Fig. 1) with respect to the direction of replication as,59% are transcribed with the same polarity as replication,compared with 75% in B. subtilis. In other bacteria, genes tran-scribed in the same direction as the replication forks are believed tobe expressed more efficiently9,14. Again, the more even distributionin gene polarity seen in M. tuberculosis may reflect the slow growthand infrequent replication cycles. Three genes (dnaB, recA andRv1461) have been invaded by sequences encoding inteins (proteinintrons) and in all three cases their counterparts in M. leprae alsocontain inteins, but at different sites15 (S.T.C. et al., unpublishedobservations).Protein function, composition and duplication. By using variousdatabase comparisons, we attributed precise functions to ,40% ofthe predicted proteins and found some information or similarity foranother 44%. The remaining 16% resembled no known proteinsand may account for specific mycobacterial functions. Examinationof the amino-acid composition of the M. tuberculosis proteome bycorrespondence analysis16, and comparison with that of othermicroorganisms whose genome sequences are available, revealed astatistically significant preference for the amino acids Ala, Gly, Pro,Arg and Trp, which are all encoded by G + C-rich codons, and acomparative reduction in the use of amino acids encoded by A + T-rich codons such as Asn, Ile, Lys, Phe and Tyr (Fig. 3). This approachalso identified two groups of proteins rich in Asn or Gly that belongto new families, PE and PPE (see below). The fraction of theproteome that has arisen through gene duplication is similar tothat seen in E. coli or B. subtilis (,51%; refs 9, 14), except that thelevel of sequence conservation is considerably higher, indicatingthat there may be extensive redundancy or differential productionof the corresponding polypeptides. The apparent lack of divergencefollowing gene duplication is consistent with the hypothesis thatM. tuberculosis is of recent descent6.

General metabolism, regulation and drug resistanceMetabolic pathways. From the genome sequence, it is clear that thetubercle bacillus has the potential to synthesize all the essentialamino acids, vitamins and enzyme co-factors, although some of thepathways involved may differ from those found in other bacteria. M.tuberculosis can metabolize a variety of carbohydrates, hydrocar-bons, alcohols, ketones and carboxylic acids2,17. It is apparent fromgenome inspection that, in addition to many functions involved inlipid metabolism, the enzymes necessary for glycolysis, the pentosephosphate pathway, and the tricarboxylic acid and glyoxylate cyclesare all present. A large number (,200) of oxidoreductases, oxyge-nases and dehydrogenases is predicted, as well as many oxygenasescontaining cytochrome P450, that are similar to fungal proteinsinvolved in sterol degradation. Under aerobic growth conditions,ATP will be generated by oxidative phosphorylation from electrontransport chains involving a ubiquinone cytochrome b reductasecomplex and cytochrome c oxidase. Components of several anae-robic phosphorylative electron transport chains are also present,including genes for nitrate reductase (narGHJI), fumarate reductase(frdABCD) and possibly nitrite reductase (nirBD), as well as a newreductase (narX) that results from a rearrangement of a homologueof the narGHJI operon. Two genes encoding haemoglobin-like

proteins, which may protect against oxidative stress or be involvedin oxygen capture, were found. The ability of the bacillus to adapt itsmetabolism to environmental change is significant as it not only hasto compete with the lung for oxygen but must also adapt to themicroaerophilic/anaerobic environment at the heart of theburgeoning granuloma.Regulation and signal transduction. Given the complexity of theenvironmental and metabolic choices facing M. tuberculosis, anextensive regulatory repertoire was expected. Thirteen putativesigma factors govern gene expression at the level of transcriptioninitiation, and more than 100 regulatory proteins are predicted(Table 1). Unlike B. subtilis and E. coli, in which there are .30 copiesof different two-component regulatory systems14, M. tuberculosishas only 11 complete pairs of sensor histidine kinases and responseregulators, and a few isolated kinase and regulatory genes. Thisrelative paucity in environmental signal transduction pathways isprobably offset by the presence of a family of eukaryotic-like serine/threonine protein kinases (STPKs), which function as part of aphosphorelay system18. The STPKs probably have two domains: thewell-conserved kinase domain at the amino terminus is predicted tobe connected by a transmembrane segment to the carboxy-terminalregion that may respond to specific stimuli. Several of the predictedenvelope lipoproteins, such as that encoded by lppR (Rv2403), showextensive similarity to this putative receptor domain of STPKs,suggesting possible interplay. The STPKs probably function insignal transduction pathways and may govern important cellulardecisions such as dormancy and cell division, and although theirpartners are unknown, candidate genes for phosphoprotein phos-phatases have been identified.Drug resistance. M. tuberculosis is naturally resistant to manyantibiotics, making treatment difficult19. This resistance is duemainly to the highly hydrophobic cell envelope acting as a perme-ability barrier4, but many potential resistance determinants are alsoencoded in the genome. These include hydrolytic or drug-modify-ing enzymes such as b-lactamases and aminoglycoside acetyltransferases, and many potential drug–efflux systems, such as 14members of the major facilitator family and numerous ABCtransporters. Knowledge of these putative resistance mechanismswill promote better use of existing drugs and facilitate the concep-tion of new therapies.

article

NATURE | VOL 393 | 11 JUNE 1998 539

F2 -22.6%

Mj Ae AfMth

Bb

Hp

MgMp

Sc Ce

HiSs

Ec

Bs

Glu

TyrIleLys

Asn

Phe

Ser Thr

His

Gln

Trp

Ala

Arg

GlyVal

MetCys

AspLeu

Pro0

0.1

-0.1

-0.2

-0.3

-0.30 -0.15 0 0.15 0.30

Mt

F1 - 55.2%

Figure 3 Correspondence analysis of the proteomes from extensively

sequenced organisms as a function of amino-acid composition. Note the

extreme position of M. tuberculosis and the shift in amino-acid preference

reflecting increasing G + C content from left to right. Abbreviations used: Ae,

Aquifex aeolicus; Af, Archaeoglobus fulgidis; Bb, Borrelia burgdorfei; Bs, B.

subtilis; Ce, Caenorhabditis elegans; Ec, E. coli; Hi, Haemophilus influenzae; Hp,

Helicobacter pylori; Mg, Mycoplasma genitalium; Mj, Methanococcus jannaschi;

Mp, Mycoplasma pneumoniae; Mt, M. tuberculosis; Mth, Methanobacterium

thermoautotrophicum; Sc, Saccharomyces cerevisiae; Ss, Synechocystis sp.

strain PCC6803. F1 and F2, first and second factorial axes16.


8

article

540 NATURE | VOL 393 | 11 JUNE 1998

8 (kb)

fas

fabD acpM kasA kasB accD

mabA inhA

cmaA1

cmaA2

1 2 3 4 5 6 7

b

Gene Function

fas Fatty acid synthase, produces C16 -C 26 acyl-CoA esters

fabD Malonyl-CoA:AcpM acyltransferase, AcpM loadingacpM Acyl carrier protein, meromycolate precursor transportkasA Ketoacyl acyl carrier protein synthase, chain elongationkasB Ketoacyl acyl carrier protein synthase, chain elongationaccD Acetyl-CoA carboxylase, malonyl-CoA synthesismabA 3-oxo-acyl-acyl carrier protein reductase, reduces KasA/B productinhA Enoyl-acyl carrier protein reductase

cmaA1 cyclopropane mycolic acid synthase 1, distal-position specificcmaA2 cyclopropane mycolic acid synthase 2, proximal-position specific

O

HOH

HOH

0

R

O

NADH, FADH 2

Amino acids

Purines,Pyrimidines

Porphyrins

Glucose

ATP

R

O

R

O

SCoA

R

O

R

OO

SCoAβ-Oxidation

(fadA/fadB)

fadD1-36

fadE1-36

fadB2-5echA1-21

fadA2-6

Citricacidcycle

Glyoxylateshunt

O

SCoA

O

SCo A

CO2

accA1/accD1accA2/accD2

Cell wall and mycobacterial lipids

Host membranes

SCoA

SCoA

OH

a

OH

masppsA ppsB ppsC ppsD ppsE

10 20 30 40 (kb)0

CH3

CH3

OCH 3CH3CH3CH3

CH3

CH 3CH3CH3

CH3

OOOO

mas Four rounds extension of C 16,18 using Me-malonyl CoA

ppsA Extension of C 18 with malony CoA, partial reduction

ppsB Extension with malony CoA, partial reduction

Extension with malony CoA, complete reductionppsC

ppsD Extension with Me-malony CoA, partial reduction

ppsE Extension with malony CoA, partial reduction, decarboxylation

Gene Function

c

Figure 4 Lipid metabolism. a, Degradation of

host-cell lipids is vital in the intracellular life of

M. tuberculosis. Host-cell membranes provide

precursors for many metabolic processes,

as well as potential precursors of mycobac-

terial cell-wall constituents, through the

actions of a broad family of b-oxidative

enzymes encoded by multiple copies in the

genome. These enzymes produce acetyl

CoA, which can be converted into many

different metabolites and fuel for the bacteria

through the actions of the enzymes of the

citric acid cycle and the glyoxylate shunt of

this cycle. b, The genes that synthesize

mycolic acids, the dominant lipid component

of the mycobacterial cell wall, include the

type I fatty acid synthase (fas) and a unique

type II system which relies on extension of a

precursor bound to an acyl carrier protein to

form full-length (,80-carbon) mycolic acids.

The cma genes are responsible for

cyclopropanation. c, The genes that produce

phthiocerol dimycocerosate form a large

operon and represent type I (mas) and type

II (the pps operon) polyketide synthase

systems. Functions are colour coordinated.


8

Lipid metabolismVery few organisms produce such a diverse array of lipophilicmolecules as M. tuberculosis. These molecules range from simplefatty acids such as palmitate and tuberculostearate, through iso-prenoids, to very-long-chain, highly complex molecules such asmycolic acids and the phenolphthiocerol alcohols that esterify withmycocerosic acid to form the scaffold for attachment of the myco-sides. Mycobacteria contain examples of every known lipid andpolyketide biosynthetic system, including enzymes usually found inmammals and plants as well as the common bacterial systems. Thebiosynthetic capacity is overshadowed by the even more remarkableradiation of degradative, fatty acid oxidation systems and, in total,there are ,250 distinct enzymes involved in fatty acid metabolismin M. tuberculosis compared with only 50 in E. coli20.Fatty acid degradation. In vivo-grown mycobacteria have beensuggested to be largely lipolytic, rather than lipogenic, because ofthe variety and quantity of lipids available within mammalian cellsand the tubercle2 (Fig. 4a). The abundance of genes encodingcomponents of fatty acid oxidation systems found by our genomicapproach supports this proposition, as there are 36 acyl-CoAsynthases and a family of 36 related enzymes that could catalysethe first step in fatty acid degradation. There are 21 homologousenzymes belonging to the enoyl-CoA hydratase/isomerase super-family of enzymes, which rehydrate the nascent product of the acyl-CoA dehydrogenase. The four enzymes that convert the 3-hydroxyfatty acid into a 3-keto fatty acid appear less numerous, mainly

because they are difficult to distinguish from other members of theshort-chain alcohol dehydrogenase family on the basis of primarysequence. The five enzymes that complete the cycle by thiolysis ofthe b-ketoester, the acetyl-CoA C-acetyltransferases, do indeedappear to be a more limited family. In addition to this extensiveset of dissociated degradative enzymes, the genome also encodes thecanonical FadA/FadB b-oxidation complex (Rv0859 and Rv0860).Accessory activities are present for the metabolism of odd-chain andmultiply unsaturated fatty acids.Fatty acid biosynthesis. At least two discrete types of enzymesystem, fatty acid synthase (FAS) I and FAS II, are involved infatty acid biosynthesis in mycobacteria (Fig. 4b). FAS I (Rv2524, fas)is a single polypeptide with multiple catalytic activities that gen-erates several shorter CoA esters from acetyl-CoA primers5 andprobably creates precursors for elongation by all of the other fattyacid and polyketide systems. FAS II consists of dissociable enzymecomponents which act on a substrate bound to an acyl-carrierprotein (ACP). FAS II is incapable of de novo fatty acid synthesis butinstead elongates palmitoyl-ACP to fatty acids ranging from 24 to56 carbons in length17,21. Several different components of FAS II maybe targets for the important tuberculosis drug isoniazid, includingthe enoyl-ACP reductase InhA22, the ketoacyl-ACP synthase KasAand the ACP AcpM21. Analysis of the genome shows that there areonly three potential ketoacyl synthases: KasA and KasB are highlyrelated, and their genes cluster with acpM, whereas KasC is a moredistant homologue of a ketoacyl synthase III system. The number ofketoacyl synthase and ACP genes indicates that there is a single FASII system. Its genetic organization, with two clustered ketoacylsynthases, resembles that of type II aromatic polyketide biosyntheticgene clusters, such as those for actinorhodin, tetracycline andtetracenomycin in Streptomyces species23. InhA seems to be the soleenoyl-ACP reductase and its gene is co-transcribed with a fabGhomologue, which encodes 3-oxoacyl-ACP reductase. Both of theseproteins are probably important in the biosynthesis of mycolic acids.

Fatty acids are synthesized from malonyl-CoA and precursors aregenerated by the enzymatic carboxylation of acetyl (or propionyl)-CoA by a biotin-dependent carboxylase (Fig. 4b). From study of thegenome we predict that there are three complete carboxylasesystems, each consisting of an a- and a b-subunit, as well as threeb-subunits without an a-counterpart. As a group, all of thecarboxylases seem to be more related to the mammalian homo-logues than to the corresponding bacterial enzymes. Two of thesecarboxylase systems (accA1, accD1 and accA2, accD2) are probablyinvolved in degradation of odd-numbered fatty acids, as they areadjacent to genes for other known degradative enzymes. They mayconvert propionyl-CoA to succinyl-CoA, which can then be incor-porated into the tricarboxylic acid cycle. The synthetic carboxylases(accA3, accD3, accD4, accD5 and accD6) are more difficult tounderstand. The three extra b-subunits might direct carboxylationto the appropriate precursor or may simply increase the totalamount of carboxylated precursor available if this step were rate-limiting.

Synthesis of the paraffinic backbone of fatty and mycolic acids inthe cell is followed by extensive postsynthetic modifications andunsaturations, particularly in the case of the mycolic acids24,25.Unsaturation is catalysed either by a FabA-like b-hydroxyacyl-ACP dehydrase, acting with a specific ketoacyl synthase, or by anaerobic terminal mixed function desaturase that uses both mole-cular oxygen and NADPH. Inspection of the genome revealed noobvious candidates for the FabA-like activity. However, threepotential aerobic desaturases (encoded by desA1, desA2 anddesA3) were evident that show little similarity to related vertebrateor yeast enzymes (which act on CoA esters) but instead resembleplant desaturases (which use ACP esters). Consequently, the geno-mic data indicate that unsaturation of the meromycolate chain mayoccur while the acyl group is bound to AcpM.

Much of the subsequent structural diversity in mycolic acids is

article

NATURE | VOL 393 | 11 JUNE 1998 541

H37Rv .. ...GSGAPGGAGGAAGLWGTGGAGGAGGSSAGGGGAGGAGGAGGWLLGDGGAGGIGGAST.......... ::::::::::::::::: ............................. :::::::::::BCG .... ...GSGAPGGAGGAAGLWGT-----------------------------GGAGGIGGAST

b

46 aa

29 aa

PE-PGRS sequence variation in ORF Rv0746

PGRS

PGRS

H37Rv PE

PE

BCG

H37Rv .. VLGGTGGGGGVGGLWGAGGAGGAGGTGLVGGDGGAGGAGGTGGLLAGLIGAGGGHGGTGG....... ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::BCG .... VLGGTGGGGGVGGLWGAGGAGGAGGTGLVGGDGGAGGAGGTGGLLAGLIGAGGGHGGTGG

H37Rv .. LSTNGDGGVGGAGGNAGMLAGPGGAGGAGGDGENLDTGGDGGAGGSAGLLFGSGGAGGAG....... ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::BCG .... LSTNGDGGVGGAGGNAGMLAGPGGAGGAGGDGENLDTGGDGGAGGSAGLLFGSGGAGGAG

H37Rv .. GFGFLGGDGGAGGNAGLLLSSGGAGGFGGFGTAGGVGGAGGNAGWLGF------------....... ::::::::::::::::::::::::::::::::::::::::::::::::BCG .... GFGFLGGDGGAGGNAGLLLSSGGAGGFGGFGTAGGVGGAGGNAGWLGFGGAGGIGGIGGN

H37Rv .. ----------------------------------GGAGGVGGSAGLIGTGGNGGNGGTGA.......:::::::::::::::::::::::::::::::::: ::::::::::::::::::::::::::BCG .... ANGGAGGNGGTGGQLWGSGGAGGEGGAALSVGDTGGAGGVGGSAGLIGTGGNGGNGGTGA

H37Rv .. NAGSPGTGGAGGLLLGQNGLNGLP*....... ::::::::::::::::::::::::BCG .... NAGSPGTGGAGGLLLGQNGLNGLP*

a

Representatives of the PE family

PGRS PE 0 .. >1,400 aa

0 .. ~500 aa~110 aa

~110 aa

unique sequence

(GGAGGA)n

Representatives of the PPE family

~200 .. ~400 aa

~200 .. >3,500 aa

0 .. ~400 aa

(NxGxGNxG)n MPTR

GxxSVPxxW

PPE

PPE

PPE

PE

~180 aa

~180 aa

~180 aaunique sequence

Figure 5 The PE and PPE protein families. a, Classification of the PE and PPE

protein families. b, Sequence variation between M. tuberculosis H37Rv and M.

bovis BCG-Pasteur in the PE-PGRS encoded by open reading frame (ORF)

Rv0746.


8

generated by a family of S-adenosyl-L-methionine-dependentenzymes, which use the unsaturated meromycolic acid as a substrateto generate cis and trans cyclopropanes and other mycolates. Sixmembers of this family have been identified and characterized25 andtwo clustered, convergently transcribed new genes are evident in thegenome (umaA1 and umaA2). From the functions of the knownfamily members and the structures of mycolic acids in M. tubercu-losis, it is tempting to speculate that these new enzymes mayintroduce the trans cyclopropanes into the meromycolate precursor.In addition to these two methyltransferases, there are two otherunrelated lipid methyltransferases (Ufa1 and Ufa2) that sharehomology with cyclopropane fatty acid synthase of E. coli25.Although cyclopropanation seems to be a relatively commonmodification of mycolic acids, cyclopropanation of plasma-mem-brane constituents has not been described in mycobacteria. Tuber-culostearic acid is produced by methylation of oleic acid, and maybe synthesized by one of these two enzymes.

Condensation of the fully functionalized and preformed mero-mycolate chain with a 26-carbon a-branch generates full-lengthmycolic acids that must be transported to their final location forattachment to the cell-wall arabinogalactan. The transfer andsubsequent transesterification is mediated by three well-knownimmunogenic proteins of the antigen 85 complex26. The genomeencodes a fourth member of this complex, antigen 85C9 ( fbpC2,Rv0129), which is highly related to antigen 85C. Further studies areneeded to show whether the protein possesses mycolytransferaseactivity and to clarify the reason behind the apparent redundancy.Polyketide synthesis. Mycobacteria synthesize polyketides by sev-eral different mechanisms. A modular type I system, similar to thatinvolved in erythromycin biosynthesis23, is encoded by a very largeoperon, ppsABCDE, and functions in the production ofphenolphthiocerol5. The absence of a second type I polyketidesynthase suggests that the related lipids phthiocerol A and B,phthiodiolone A and phthiotriol may all be synthesized by thesame system, either from alternative primers or by differentialpostsynthetic modification. It is physiologically significant thatthe pps gene cluster occurs immediately upstream of mas, whichencodes the multifunctional enzyme mycocerosic acid synthase(MAS), as their products phthiocerol and mycocerosic acid esterifyto form the very abundant cell-wall-associated molecule phthio-cerol dimycocerosate (Fig. 4c).

Members of another large group of polyketide synthase enzymesare similar to MAS, which also generates the multiply methyl-branched fatty acid components of mycosides and phthioceroldimycocerosate, abundant cell-wall-associated molecules5. Althoughsome of these polyketide synthases may extend type I FAS CoAprimers to produce other long-chain methyl-branched fatty acidssuch as mycolipenic, mycolipodienic and mycolipanolic acids or thephthioceranic and hydroxyphthioceranic acids, or may even showfunctional overlap5, there are many more of these enzymes thanthere are known metabolites. Thus there may be new lipid andpolyketide metabolites that are expressed only under certain con-ditions, such as during infection and disease.

A fourth class of polyketide synthases is related to the plantenzyme superfamily that includes chalcone and stilbene synthase23.These polyketide synthases are phylogenetically divergent from allother polyketide and fatty acid synthases and generate unreducedpolyketides that are typically associated with anthocyanin pigmentsand flavonoids. The function of these systems, which are oftenlinked to apparent type I modules, is unknown. An example is thegene cluster spanning pks10, pks7, pks8 and pks9, which includes twoof the chalcone-synthase-like enzymes and two modules of anapparent type I system. The unknown metabolites produced bythese enzymes are interesting because of the potent biologicalactivities of some polyketides such as the immunosuppressorrapamycin.Siderophores. Peptides that are not ribosomally synthesized are

made by a process that is mechanistically analogous to polyketidesynthesis23,27. These peptides include the structurally related iron-scavenging siderophores, the mycobactins and the exochelins2,28,which are derived from salicylate by the addition of serine (orthreonine), two lysines and various fatty acids and possible poly-ketide segments. The mbt operon, encoding one apparent salicylate-activating protein, three amino-acid ligases, and a single module ofa type I polyketide synthase, may be responsible for the biosynthesisof the mycobacterial siderophores. The presence of only one non-ribosomal peptide-synthesis system indicates that this pathway maygenerate both siderophores and that subsequent modification of asingle e-amino group of one lysine residue may account for thedifferent physical properties and function of the siderophores28.

Immunological aspects and pathogenicityGiven the scale of the global tuberculosis burden, vaccination is notonly a priority but remains the only realistic public health inter-vention that is likely to affect both the incidence and the prevalenceof the disease29. Several areas of vaccine development are promising,including DNA vaccination, use of secreted or surface-exposedproteins as immunogens, recombinant forms of BCG and rationalattenuation of M. tuberculosis29. All of these avenues of research willbenefit from the genome sequence as its availability will stimulatemore focused approaches. Genes encoding ,90 lipoproteins wereidentified, some of which are enzymes or components of transportsystems, and a similar number of genes encoding preproteins (withtype I signal peptides) that are probably exported by the Sec-dependent pathway. M. tuberculosis seems to have two copies ofsecA. The potent T-cell antigen Esat-6 (ref. 30), which is probablysecreted in a Sec-independent manner, is encoded by a member of amultigene family. Examination of the genetic context reveals severalsimilarly organized operons that include genes encoding large ATP-hydrolysing membrane proteins that might act as transporters. Oneof the surprises of the genome project was the discovery of twoextensive families of novel glycine-rich proteins, which may be ofimmunological significance as they are predicted to be abundantand potentially polymorphic antigens.The PE and PPE multigene families. About 10% of the codingcapacity of the genome is devoted to two large unrelated families ofacidic, glycine-rich proteins, the PE and PPE families, whose genesare clustered (Figs 1, 2) and are often based on multiple copies of thepolymorphic repetitive sequences referred to as PGRSs, and majorpolymorphic tandem repeats (MPTRs), respectively31,32. The namesPE and PPE derive from the motifs Pro–Glu (PE) and Pro–Pro–Glu (PPE) found near the N terminus in most cases33. The 99members of the PE protein family all have a highly conserved N-terminal domain of ,110 amino-acid residues that is predicted tohave a globular structure, followed by a C-terminal segment thatvaries in size, sequence and repeat copy number (Fig. 5). Phyloge-netic analysis separated the PE family into several subfamilies. Thelargest of these is the highly repetitive PGRS class, which contains 61members; members of the other subfamilies, share very limitedsequence similarity in their C-terminal domains (Fig. 5). Thepredicted molecular weights of the PE proteins vary considerablyas a few members contain only the N-terminal domain, whereasmost have C-terminal extensions ranging in size from 100 to 1,400residues. The PGRS proteins have a high glycine content (up to50%), which is the result of multiple tandem repetitions of Gly–Gly–Ala or Gly–Gly–Asn motifs, or variations thereof.

The 68 members of the PPE protein family (Fig. 5) also have aconserved N-terminal domain that comprises ,180 amino-acidresidues, followed by C-terminal segments that vary markedly insequence and length. These proteins fall into at least three groups,one of which constitutes the MPTR class characterized by thepresence of multiple, tandem copies of the motif Asn–X–Gly–X–Gly–Asn–X–Gly. The second subgroup contains a characteristic,well-conserved motif around position 350, whereas the third contains

article

542 NATURE | VOL 393 | 11 JUNE 1998


8

proteins that are unrelated except for the presence of the common180-residue PPE domain.

The subcellular location of the PE and PPE proteins is unknownand in only one case, that of a lipase (Rv3097), has a function beendemonstrated. On examination of the protein database from theextensively sequenced M. leprae15, no PGRS- or MPTR-relatedpolypeptides were detected but a few proteins belonging to thenon-MPTR subgroup of the PPE family were found. These proteinsinclude one of the major antigens recognized by leprosy patients,the serine-rich antigen34. Although it is too early to attributebiological functions to the PE and PPE families, it is tempting tospeculate that they could be of immunological importance. Twointeresting possibilities spring to mind. First, they could representthe principal source of antigenic variation in what is otherwise agenetically and antigenically homogeneous bacterium. Second,these glycine-rich proteins might interfere with immune responsesby inhibiting antigen processing.

Several observations and results support the possibility of anti-genic variation associated with both the PE and the PPE familyproteins. The PGRS member Rv1759 is a fibronectin-bindingprotein of relative molecular mass 55,000 (ref. 35) that elicits avariable antibody response, indicating either that individualsmount different immune responses or that this PGRS proteinmay vary between strains of M. tuberculosis. The latter possibilityis supported by restriction fragment length polymorphisms forvarious PGRS and MPTR sequences in clinical isolates33. Directsupport for genetic variation within both the PE and the PPEfamilies was obtained by comparative DNA sequence analysis (Fig.5). The gene for the PE–PGRS protein Rv0746 of BCG differs fromthat in H37Rv by the deletion of 29 codons and the insertion of 46codons. Similar variation was seen in the gene for the PPE proteinRv0442 (data not shown). As these differences were all associatedwith repetitive sequences they could have resulted from intergenicor intragenic recombinational events or, more probably, fromstrand slippage during replication32. These mechanisms areknown to generate antigenic variability in other bacterialpathogens36.

There are several parallels between the PGRS proteins and theEpstein–Barr virus nuclear antigens (EBNAs). Members of bothpolypeptide families are glycine-rich, contain extensive Gly–Alarepeats, and exhibit variation in the length of the repeat regionbetween different isolates. The Gly–Ala repeat region of EBNA1functions as a cis-acting inhibitor of the ubiquitin/proteasomeantigen-processing pathway that generates peptides presented inthe context of major histocompatibility complex (MHC) class Imolecules37,38. MHC class I knockout mice are very susceptible to M.tuberculosis, underlining the importance of a cytotoxic T-cellresponse in protection against disease3,39. Given the many potentialeffects of the PPE and PE proteins, it is important that furtherstudies are performed to understand their activity. If extensiveantigenic variability or reduced antigen presentation were indeedfound, this would be significant for vaccine design and for under-standing protective immunity in tuberculosis, and might evenexplain the varied responses seen in different BCG vaccinationprogrammes40.Pathogenicity. Despite intensive research efforts, there is littleinformation about the molecular basis of mycobacterial virulence41.However, this situation should now change as the genome sequencewill accelerate the study of pathogenesis as never before, becauseother bacterial factors that may contribute to virulence are becom-ing apparent. Before the completion of the genome sequence, onlythree virulence factors had been described41: catalase-peroxidase,which protects against reactive oxygen species produced by thephagocyte; mce, which encodes macrophage-colonizing factor42;and a sigma factor gene, sigA (aka rpoV), mutations in which canlead to attenuation41. In addition to these single-gene virulencefactors, the mycobacterial cell wall4 is also important in pathology,

but the complex nature of its biosynthesis makes it difficult toidentify critical genes whose inactivation would lead to attenuation.

On inspection of the genome sequence, it was apparent that fourcopies of mce were present and that these were all situated inoperons, comprising eight genes, organized in exactly the samemanner. In each case, the genes preceding mce code for integralmembrane proteins, whereas mce and the following five genes are allpredicted to encode proteins with signal sequences or hydrophobicstretches at the N terminus. These sets of proteins, about which littleis known, may well be secreted or surface-exposed; this is consistentwith the proposed role of Mce in invasion of host cells42. Further-more, a homologue of smpB, which has been implicated in intra-cellular survival of Salmonella typhimurium, has also beenidentified43. Among the other secreted proteins identified fromthe genome sequence that could act as virulence factors are aseries of phospholipases C, lipases and esterases, which mightattack cellular or vacuolar membranes, as well as several proteases.One of these phospholipases acts as a contact-dependent haemoly-sin (N. Stoker, personal communication). The presence of storageproteins in the bacillus, such as the haemoglobin-like oxygencaptors described above, points to its ability to stockpile essentialgrowth factors, allowing it to persist in the nutrient-limited envir-onment of the phagosome. In this regard, the ferritin-like proteins,encoded by bfrA and bfrB, may be important in intracellular survivalas the capacity to acquire enough iron in the vacuole is verylimited. M. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Methods

Sequence analysis. Initially, ,3.2 Mb of sequence was generated fromcosmids8 and the remainder was obtained from selected BAC clones7 and45,000 whole-genome shotgun clones. Sheared fragments (1.4–2.0 kb) fromcosmids and BACs were cloned into M13 vectors, whereas genomic DNA wascloned in pUC18 to obtain both forward and reverse reads. The PGRS geneswere grossly underrepresented in pUC18 but better covered in the BAC andcosmid M13 libraries. We used small-insert libraries44 to sequence regionsprone to compression or deletion and, in some cases, obtained sequences fromproducts of the polymerase chain reaction or directly from BACs7. All shotgunsequencing was performed with standard dye terminators to minimize com-pression problems, whereas finishing reactions used dRhodamine or BigDyeterminators (http://www.sanger.ac.uk). Problem areas were verified by usingdye primers. Thirty differences were found between the genomic shotgunsequences and the cosmids; twenty of which were due to sequencing errors andten to mutations in cosmids (1 error per 320 kb). Less than 0.1% of thesequence was from areas of single-clone coverage, and ,0.2% was from onestrand with only one sequencing chemistry.Informatics. Sequence assembly involved PHRAP, GAP4 (ref. 45) and acustomized perl script that merges sequences from different libraries andgenerates segments that can be processed by several finishers simultaneously.Sequence analysis and annotation was managed by DIANA (B.G.B. et al.,unpublished). Genes encoding proteins were identified by TB-parse46 using ahidden Markov model trained on known M. tuberculosis coding and non-coding regions and translation-initiation signals, with corroboration by posi-tional base preference. Interrogation of the EMBL, TREMBL, SwissProt,PROSITE47 and in-house databases involved BLASTN, BLASTX48, DOTTER(http://www.sanger.ac.uk) and FASTA49. tRNA genes were located and identi-fied using tRNAscan and tRNAscan-SE50. The complete sequence, a list ofannotated cosmids and linking regions can be found on our website (http://www. sanger.ac.uk) and in MycDB (http://www.pasteur.fr/mycdb/).

Received 15 April; accepted 8 May 1998.

1. Snider, D. E. Jr, Raviglione, M. & Kochi, A. in Tuberculosis: Pathogenesis, Protection, and Control (ed.Bloom, B. R.) 2–11 (Am. Soc. Microbiol., Washington DC, 1994).

2. Wheeler, P. R. & Ratledge, C. in Tuberculosis: Pathogenesis, Protection, and Control (ed. Bloom, B. R.)353–385 (Am. Soc. Microbiol., Washington DC, 1994).

3. Chan, J. & Kaufmann, S. H. E. in Tuberculosis: Pathogenesis, Protection, and Control (ed. Bloom, B. R.)271–284 (Am. Soc. Microbiol., Washington DC, 1994).

4. Brennan, P. J. & Draper, P. in Tuberculosis: Pathogenesis, Protection, and Control (ed. Bloom, B. R.)271–284 (Am. Soc. Microbiol., Washington DC, 1994).

5. Kolattukudy, P. E., Fernandes, N. D., Azad, A. K., Fitzmaurice, A. M. & Sirakova, T. D. Biochemistry

article

NATURE | VOL 393 | 11 JUNE 1998 543


8

and molecular genetics of cell-wall lipid biosynthesis in mycobacteria. Mol. Microbiol. 24, 263–270(1997).

6. Sreevatsan, S. et al. Restricted structural gene polymorphism in the Mycobacterium tuberculosiscomplex indicates evolutionarily recent global dissemination. Proc. Natl Acad. Sci. USA 94, 9869–9874 (1997).

7. Brosch, R. et al. Use of a Mycobacterium tuberculosis H37Rv bacterial artificial chromosome library forgenome mapping, sequencing and comparative genomics. Infect. Immun. 66, 2221–2229 (1998).

8. Philipp, W. J. et al. An integrated map of the genome of the tubercle bacillus, Mycobacteriumtuberculosis H37Rv, and comparison with Mycobacterium leprae. Proc. Natl Acad. Sci. USA 93, 3132–3137 (1996).

9. Blattner, F. R. et al. The complete genome sequence of Escherichia coli K-12. Science 277, 1453–1462(1997).

10. Cole, S. T. & Saint-Girons, I. Bacterial genomics. FEMS Microbiol. Rev. 14, 139–160 (1994).11. Freiberg, C. et al. Molecular basis of symbiosis between Rhizobium and legumes. Nature 387, 394–401

(1997).12. Bardarov, S. et al. Conditionally replicating mycobacteriophages: a system for transposon delivery to

Mycobacterium tuberculosis. Proc. Natl Acad. Sci. USA 94, 10961–10966 (1997).13. Mahairas, G. G., Sabo, P. J., Hickey, M. J., Singh, D. C. & Stover, C. K. Molecular analysis of genetic

differences between Mycobacterium bovis BCG and virulent M. bovis. J. Bacteriol. 178, 1274–1282(1996).

14. Kunst, F. et al. The complete genome sequence of the gram-positive bacterium Bacillus subtilis. Nature390, 249–256 (1997).

15. Smith, D. R. et al. Multiplex sequencing of 1.5 Mb of the Mycobacterium leprae genome. Genome Res.7, 802–819 (1997).

16. Greenacre, M. Theory and Application of Correspondence Analysis (Academic, London, 1984).17. Ratledge, C. R. in The Biology of the Mycobacteria (eds Ratledge, C. & Stanford, J.) 53–94 (Academic,

San Diego, 1982).18. Av-Gay, Y. & Davies, J. Components of eukaryotic-like protein signaling pathways in Mycobacterium

tuberculosis. Microb. Comp. Genomics 2, 63–73 (1997).19. Cole, S. T. & Telenti, A. Drug resistance in Mycobacterium tuberculosis. Eur. Resp. Rev. 8, 701S–713S

(1995).20. Riley, M. & Labedan, B. in Escherichia coli and Salmonella (ed. Neidhardt, F. C.) 2118–2202 (ASM,

Washington, 1996).21. Mdluli, K. et al. Inhibition of a Mycobacterium tuberculosis b-ketoacyl ACP synthase by isoniazid.

Science 280, 1607–1610 (1998).22. Banerjee, A. et al. inhA, a gene encoding a target for isoniazid and ethionamide in Mycobacterium

tuberculosis. Science 263, 227–230 (1994).23. Hopwood, D. A. Genetic contributions to understanding polyketide synthases. Chem. Rev. 97, 2465–

2497 (1997).24. Minnikin, D. E. in The Biology of the Mycobacteria (eds Ratledge, C. & Stanford, J.) 95–184 (Academic,

London, 1982).25. Barry, C. E. III et al. Mycolic acids: structure, biosynthesis, and phsyiological functions. Prog. Lipid

Res. (in the press).26. Belisle, J. T. et al. Role of the major antigen of Mycobacterium tuberculosis in cell wall biogenesis.

Science 276, 1420–1422 (1997).27. Marahiel, M. A., Stachelhaus, T. & Mootz, H. D. Modular peptide synthetases involved in

nonribosomal peptide synthesis. Chem. Rev. 97, 2651–2673 (1997).28. Gobin, J. et al. Iron acquisition by Mycobacterium tuberculosis: isolation and characterization of a

family of iron-binding exochelins. Proc. Natl Acad. Sci. USA 92, 5189–5193 (1995).29. Young, D. B. & Fruth, U. in New Generation Vaccines (eds Levine, M., Woodrow, G., Kaper, J. & Cobon,

G. S.) 631–645 (Marcel Dekker, New York, 1997).30. Sorensen, A. L., Nagai, S., Houen, G., Andersen, P. & Anderson, A. B. Purification and characterization

of a low-molecular-mass T-cell antigen secreted by Mycobacterium tuberculosis. Infect. Immun. 63,

1710–1717 (1995).31. Hermans, P. W. M., van Soolingen, D. & van Embden, J. D. A. Characterization of a major

polymorphic tandem repeat in Mycobacterium tuberculosis and its potential use in the epidemiologyof Mycobacterium kansasii and Mycobacterium gordonae. J. Bacteriol. 174, 4157–4165 (1992).

32. Poulet, S. & Cole, S. T. Characterisation of the polymorphic GC-rich repetitive sequence (PGRS)present in Mycobacterium tuberculosis. Arch. Microbiol. 163, 87–95 (1995).

33. Cole, S. T. & Barrell, B. G. in Genetics and Tuberculosis (eds Chadwick, D. J. & Cardew, G., NovartisFoundation Symp. 217) 160–172 (Wiley, Chichester, 1998).

34. Vega-Lopez, F. et al. Sequence and immunological characterization of a serine-rich antigen fromMycobacterium leprae. Infect. Immun. 61, 2145–2153 (1993).

35. Abou-Zeid, C. et al. Genetic and immunological analysis of Mycobacterium tuberculosis fibronectin-binding proteins. Infect. Immun. 59, 2712–2718 (1991).

36. Robertson, B. D. & Meyer, T. F. Genetic variation in pathogenic bacteria. Trends Genet. 8, 422–427(1992).

37. Levitskaya, J. et al. Inhibition of antigen processing by the internal repeat region of the Epstein-Barrvirus nuclear antigen-1. Nature 375, 685–688 (1995).

38. Levitskaya, J., Sharipo, A., Leonchiks, A., Ciechanover, A. & Masucci, M. G. Inhibition of ubiquitin/proteasome-dependent protein degradation by the Gly-Ala repeat domain of the Epstein-Barr virusnuclear antigen 1. Proc. Natl Acad. Sci. USA 94, 12616–12621 (1997).

39. Flynn, J. L., Goldstein, M. A., Treibold, K. J., Koller, B. & Bloom, B. R. Major histocompatabilitycomplex class-I restricted T cells are required for resistance to Mycobacterium tuberculosis infection.Proc. Natl Acad. Sci. USA 89, 12013–12017 (1992).

40. Bloom, B. R. & Fine, P. E. M. in Tuberculosis: Pathogenesis, Protection, and Control (ed. Bloom, B. R.)531–557 (Am. Soc. Microbiol., Washington DC, 1994).

41. Collins, D. M. In search of tuberculosis virulence genes. Trends Microbiol. 4, 426–430 (1996).42. Arruda, S., Bomfim, G., Knights, R., Huima-Byron, T. & Riley, L. W. Cloning of an M. tuberculosis

DNA fragment associated with entry and survival inside cells. Science 261, 1454–1457 (1993).43. Baumler, A. J., Kusters, J. G., Stojikovic, I. & Heffron, F. Salmonella typhimurium loci involved in

survival within macrophages. Infect. Immun. 62, 1623–1630 (1994).44. McMurray, A. A., Sulston, J. E. & Quail, M. A. Short-insert libraries as a method of problem solving in

genome sequencing. Genome Res. 8, 562–566 (1998).45. Bonfield, J. K., Smith, K. F. & Staden, R. A new DNA sequence assembly program. Nucleic Acids Res.

24, 4992–4999 (1995).46. Krogh, A., Mian, I. S. & Haussler, D. A hidden Markov model that finds genes in E. coli DNA. Nucleic

Acids Res. 22, 4768–4778 (1994).47. Bairoch, A., Bucher, P. & Hofmann, K. The PROSITE database, its status in 1997. Nucleic Acids Res. 25,

217–221 (1997).48. Altschul, S., Gish, W., Miller, W., Myers, E. & Lipman, D. A basic local alignment search tool. J. Mol.

Biol. 215, 403–410 (1990).49. Pearson, W. & Lipman, D. Improved tools for biological sequence comparisons. Proc. Natl Acad. USA

85, 2444–2448 (1988).50. Lowe, T. M. & Eddy, S. R. tRNAscan-SE: a program for improved detection of transfer RNA genes in

genomic DNA. Nucleic Acids Res. 25, 955–964 (1997).

Acknowledgements. We thank Y. Av-Gay, F.-C. Bange, A. Danchin, B. Dujon, W. R. Jacobs Jr, L. Jones,M. McNeil, I. Moszer, P. Rice and J. Stephenson for advice, reagents and support. This work was supportedby the Wellcome Trust. Additional funding was provided by the Association Francaise Raoul Follereau,the World Health Organisation and the Institut Pasteur. S.V.G. received a Wellcome Trust travellingresearch fellowship.

Correspondence and requests for materials should be addressed to B.G.B. ([email protected]) or S.T.C.([email protected]). The complete sequence has been deposited in EMBL/GenBank/DDBJ as MTBH37RV,accession number AL123456.

article

544 NATURE | VOL 393 | 11 JUNE 1998


8

I. Small-molecule metabolismA. Degradation1. Carbon compoundsRv0186 bglS β-glucosidaseRv2202c cbhK carbohydrate kinaseRv0727c fucA L-fuculose phosphate aldolaseRv1731 gabD1 succinate-semialdehyde dehydro-

genaseRv0234c gabD2 succinate-semialdehyde dehydro-

genaseRv0501 galE1 UDP-glucose 4-epimeraseRv0536 galE2 UDP-glucose 4-epimeraseRv0620 galK galactokinaseRv0619 galT galactose-1-phosphate uridylyl-

transferase C-termRv0618 galT' galactose-1-phosphate uridylyl-

transferase N-termRv0993 galU UTP-glucose-1-phosphate uridylyl-

transferaseRv3696c glpK ATP:glycerol 3-phosphotrans-

feraseRv3255c manA mannose-6-phosphate isomeraseRv3441c mrsA phosphoglucomutase or phospho-

mannomutaseRv0118c oxcA oxalyl-CoA decarboxylaseRv3068c pgmA phosphoglucomutaseRv3257c pmmA phosphomannomutaseRv3308 pmmB phosphomannomutaseRv2702 ppgK polyphosphate glucokinaseRv0408 pta phosphate acetyltransferaseRv0729 xylB xylulose kinaseRv1096 - carbohydrate degrading enzyme

2. Amino acids and aminesRv1905c aao D-amino acid oxidaseRv2531c adi ornithine/arginine decarboxylaseRv2780 ald L-alanine dehydrogenaseRv1538c ansA L-asparaginaseRv1001 arcA arginine deiminaseRv0753c mmsA methylmalmonate semialdehyde

dehydrogenaseRv0751c mmsB methylmalmonate semialdehyde

oxidoreductaseRv1187 rocA pyrroline-5-carboxylate dehydro-

genaseRv2322c rocD1 ornithine aminotransferaseRv2321c rocD2 ornithine aminotransferaseRv1848 ureA urease γ subunitRv1849 ureB urease β subunitRv1850 ureC urease α subunitRv1853 ureD urease accessory proteinRv1851 ureF urease accessory proteinRv1852 ureG urease accessory proteinRv2913c - probable D-amino acid

aminohydrolaseRv3551 - possible glutaconate CoA-

transferase

3. Fatty acidsRv2501c accA1 acetyl/propionyl-CoA carboxylase,

α subunitRv0973c accA2 acetyl/propionyl-CoA carboxylase,

α subunitRv2502c accD1 acetyl/propionyl-CoA carboxylase,

β subunitRv0974c accD2 acetyl/propionyl-CoA carboxylase,

β subunitRv3667 acs acetyl-CoA synthaseRv3409c choD cholesterol oxidaseRv0222 echA1 enoyl-CoA hydratase/isomerase

superfamily Rv0456c echA2 enoyl-CoA hydratase/isomerase


superfamily Rv0673 echA4 enoyl-CoA hydratase/isomerase



superfamily (aka eccH)Rv0971c echA7 enoyl-CoA hydratase/isomerase




superfamilyRv1141c echA11 enoyl-CoA hydratase/isomerase

superfamilyRv1472 echA12 enoyl-CoA hydratase/isomerase

superfamilyRv1935c echA13 enoyl-CoA hydratase/isomerase






superfamily, N-termRv3374 echA18' enoyl-CoA hydratase/isomerase

superfamily, C-termRv3516 echA19 enoyl-CoA hydratase/isomerase



superfamilyRv0859 fadA β oxidation complex, β subunit

(acetyl-CoA C-acetyltransferase)Rv0243 fadA2 acetyl-CoA C-acetyltransferase Rv1074c fadA3 acetyl-CoA C-acetyltransferaseRv1323 fadA4 acetyl-CoA C-acetyltransferase

(aka thiL)Rv3546 fadA5 acetyl-CoA C-acetyltransferaseRv3556c fadA6 acetyl-CoA C-acetyltransferaseRv0860 fadB β oxidation complex, α subunit

(multiple activities)Rv0468 fadB2 3-hydroxyacyl-CoA dehydroge-

naseRv1715 fadB3 3-hydroxyacyl-CoA dehydroge-

naseRv3141 fadB4 3-hydroxyacyl-CoA dehydroge-

naseRv1912c fadB5 3-hydroxyacyl-CoA dehydroge-

naseRv1750c fadD1 acyl-CoA synthaseRv0270 fadD2 acyl-CoA synthaseRv3561 fadD3 acyl-CoA synthaseRv0214 fadD4 acyl-CoA synthaseRv0166 fadD5 acyl-CoA synthaseRv1206 fadD6 acyl-CoA synthaseRv0119 fadD7 acyl-CoA synthaseRv0551c fadD8 acyl-CoA synthaseRv2590 fadD9 acyl-CoA synthaseRv0099 fadD10 acyl-CoA synthaseRv1550 fadD11 acyl-CoA synthase, N-termRv1549 fadD11' acyl-CoA synthase, C-termRv1427c fadD12 acyl-CoA synthaseRv3089 fadD13 acyl-CoA synthaseRv1058 fadD14 acyl-CoA synthaseRv2187 fadD15 acyl-CoA synthaseRv0852 fadD16 acyl-CoA synthaseRv3506 fadD17 acyl-CoA synthaseRv3513c fadD18 acyl-CoA synthaseRv3515c fadD19 acyl-CoA synthaseRv1185c fadD21 acyl-CoA synthaseRv2948c fadD22 acyl-CoA synthaseRv3826 fadD23 acyl-CoA synthaseRv1529 fadD24 acyl-CoA synthaseRv1521 fadD25 acyl-CoA synthaseRv2930 fadD26 acyl-CoA synthaseRv0275c fadD27 acyl-CoA synthaseRv2941 fadD28 acyl-CoA synthaseRv2950c fadD29 acyl-CoA synthaseRv0404 fadD30 acyl-CoA synthaseRv1925 fadD31 acyl-CoA synthaseRv3801c fadD32 acyl-CoA synthaseRv1345 fadD33 acyl-CoA synthaseRv0035 fadD34 acyl-CoA synthaseRv2505c fadD35 acyl-CoA synthaseRv1193 fadD36 acyl-CoA synthaseRv0131c fadE1 acyl-CoA dehydrogenase Rv0154c fadE2 acyl-CoA dehydrogenase Rv0215c fadE3 acyl-CoA dehydrogenase Rv0231 fadE4 acyl-CoA dehydrogenase Rv0244c fadE5 acyl-CoA dehydrogenase Rv0271c fadE6 acyl-CoA dehydrogenase Rv0400c fadE7 acyl-CoA dehydrogenase Rv0672 fadE8 acyl-CoA dehydrogenase

(aka aidB)Rv0752c fadE9 acyl-CoA dehydrogenaseRv0873 fadE10 acyl-CoA dehydrogenase Rv0972c fadE12 acyl-CoA dehydrogenase Rv0975c fadE13 acyl-CoA dehydrogenaseRv1346 fadE14 acyl-CoA dehydrogenase Rv1467c fadE15 acyl-CoA dehydrogenase Rv1679 fadE16 acyl-CoA dehydrogenase Rv1934c fadE17 acyl-CoA dehydrogenaseRv1933c fadE18 acyl-CoA dehydrogenase Rv2500c fadE19 acyl-CoA dehydrogenase

(aka mmgC)Rv2724c fadE20 acyl-CoA dehydrogenase Rv2789c fadE21 acyl-CoA dehydrogenaseRv3061c fadE22 acyl-CoA dehydrogenase Rv3140 fadE23 acyl-CoA dehydrogenase Rv3139 fadE24 acyl-CoA dehydrogenase Rv3274c fadE25 acyl-CoA dehydrogenase Rv3504 fadE26 acyl-CoA dehydrogenase Rv3505 fadE27 acyl-CoA dehydrogenaseRv3544c fadE28 acyl-CoA dehydrogenase

Rv3543c fadE29 acyl-CoA dehydrogenase Rv3560c fadE30 acyl-CoA dehydrogenase Rv3562 fadE31 acyl-CoA dehydrogenase Rv3563 fadE32 acyl-CoA dehydrogenase Rv3564 fadE33 acyl-CoA dehydrogenaseRv3573c fadE34 acyl-CoA dehydrogenase Rv3797 fadE35 acyl-CoA dehydrogenase Rv3761c fadE36 acyl-CoA dehydrogenaseRv1175c fadH 2,4-Dienoyl-CoA ReductaseRv0855 far fatty acyl-CoA racemaseRv1143 mcr α-methyl acyl-CoA racemaseRv1492 mutA methylmalonyl-CoA mutase, β

subunitRv1493 mutB methylmalonyl-CoA mutase, α

subunitRv2504c scoA 3-oxo acid:CoA transferase, α sub-

unitRv2503c scoB 3-oxo acid:CoA transferase, β sub-

unitRv1136 - probable carnitine racemaseRv1683 - possible acyl-CoA synthase

4. Phosphorous compoundsRv2368c phoH ATP-binding pho regulon

componentRv1095 phoH2 PhoH-like proteinRv3628 ppa probable inorganic pyrophos-

phataseRv2984 ppk polyphosphate kinase

B. Energy metabolism1. GlycolysisRv1023 eno enolaseRv0363c fba fructose bisphosphate aldolaseRv1436 gap glyceraldehyde 3-phosphate dehy-

drogenaseRv0489 gpm phosphoglycerate mutase IRv3010c pfkA phosphofructokinase IRv2029c pfkB phosphofructokinase IIRv0946c pgi glucose-6-phosphate isomerase Rv1437 pgk phosphoglycerate kinaseRv1617 pykA pyruvate kinaseRv1438 tpi triosephosphate isomeraseRv2419c - putative phosphoglycerate mutaseRv3837c - putative phosphoglycerate mutase

2. Pyruvate dehydrogenaseRv2241 aceE pyruvate dehydrogenase E1 com-

ponentRv3303c lpdA dihydrolipoamide dehydrogenaseRv2497c pdhA pyruvate dehydrogenase E1 com-

ponent α subunitRv2496c pdhB pyruvate dehydrogenase E1 com-

ponent β subunitRv2495c pdhC dihydrolipoamide acetyltransferaseRv0462 - probable dihydrolipoamide dehy-

drogenase

3. TCA cycleRv1475c acn aconitate hydrataseRv0889c citA citrate synthase 2Rv2498c citE citrate lyase β chainRv1098c fum fumaraseRv1131 gltA1 citrate synthase 3Rv0896 gltA2 citrate synthase 1 Rv3339c icd1 isocitrate dehydrogenaseRv0066c icd2 isocitrate dehydrogenaseRv0794c lpdB dihydrolipoamide dehydrogenaseRv1240 mdh malate dehydrogenaseRv2967c pca pyruvate carboxylaseRv3318 sdhA succinate dehydrogenase ARv3319 sdhB succinate dehydrogenase BRv3316 sdhC succinate dehydrogenase C sub-

unitRv3317 sdhD succinate dehydrogenase D sub-

unitRv1248c sucA 2-oxoglutarate dehydrogenaseRv2215 sucB dihydrolipoamide succinyltrans-

feraseRv0951 sucC succinyl-CoA synthase β chainRv0952 sucD succinyl-CoA synthase α chain

4. Glyoxylate bypassRv0467 aceA isocitrate lyaseRv1915 aceAa isocitrate lyase, α moduleRv1916 aceAb isocitrate lyase, β moduleRv1837c glcB malate synthaseRv3323c gphA phosphoglycolate phosphatase

5. Pentose phosphate pathwayRv1445c devB glucose-6-phosphate 1-dehydro-

genaseRv1844c gnd 6-phosphogluconate dehydroge-

nase (Gram –)Rv1122 gnd2 6-phosphogluconate dehydroge-

nase (Gram +)Rv1446c opcA unknown function, may aid

G6PDH

Table 1. Functional classification of Mycobacterium tuberculosis protein-coding genes


8

Rv2436 rbsK ribokinaseRv1408 rpe ribulose-phosphate 3-epimeraseRv2465c rpi phosphopentose isomeraseRv1448c tal transaldolaseRv1449c tkt transketolaseRv1121 zwf glucose-6-phosphate 1-dehydro-

genaseRv1447c zwf2 glucose-6-phosphate 1-dehydro-

genase

6. Respirationa. aerobicRv0527 ccsA cytochrome c-type biogenesis

proteinRv0529 ccsB cytochrome c-type biogenesis

proteinRv1451 ctaB cytochrome c oxidase assembly

factorRv2200c ctaC cytochrome c oxidase chain IIRv3043c ctaD cytochrome c oxidase poly-

peptide IRv2193 ctaE cytochrome c oxidase poly-

peptide IIIRv1542c glbN hemoglobin-like, oxygen carrierRv2470 glbO hemoglobin-like, oxygen carrierRv2249c glpD1 glycerol-3-phosphate dehydroge-

naseRv3302c glpD2 glycerol-3-phosphate dehydroge-

naseRv0694 lldD1 L-lactate dehydrogenase

(cytochrome) Rv1872c lldD2 L-lactate dehydrogenaseRv1854c ndh probable NADH dehydrogenaseRv3145 nuoA NADH dehydrogenase chain ARv3146 nuoB NADH dehydrogenase chain BRv3147 nuoC NADH dehydrogenase chain CRv3148 nuoD NADH dehydrogenase chain DRv3149 nuoE NADH dehydrogenase chain ERv3150 nuoF NADH dehydrogenase chain FRv3151 nuoG NADH dehydrogenase chain GRv3152 nuoH NADH dehydrogenase chain HRv3153 nuoI NADH dehydrogenase chain IRv3154 nuoJ NADH dehydrogenase chain JRv3155 nuoK NADH dehydrogenase chain KRv3156 nuoL NADH dehydrogenase chain LRv3157 nuoM NADH dehydrogenase chain MRv3158 nuoN NADH dehydrogenase chain NRv2195 qcrA Rieske iron-sulphur component of

ubiQ-cytB reductaseRv2196 qcrB cytochrome β component of ubiQ-

cytB reductaseRv2194 qcrC cytochrome b/c component of

ubiQ-cytB reductase

b. anaerobicRv2392 cysH 3'-phosphoadenylylsulfate (PAPS)

reductaseRv2899c fdhD affects formate dehydrogenase-NRv2900c fdhF molybdopterin-containing oxidore-

ductaseRv1552 frdA fumarate reductase flavoprotein

subunitRv1553 frdB fumarate reductase iron sulphur

proteinRv1554 frdC fumarate reductase 15kD anchor

proteinRv1555 frdD fumarate reductase 13kD anchor

proteinRv1161 narG nitrate reductase α subunitRv1162 narH nitrate reductase β chainRv1164 narI nitrate reductase γ chainRv1163 narJ nitrate reductase δ chainRv1736c narX fused nitrate reductaseRv2391 nirA probable nitrite reductase/sulphite

reductaseRv0252 nirB nitrite reductase flavoproteinRv0253 nirD probable nitrite reductase small

subunit

c. Electron transportRv0409 ackA acetate kinaseRv1623c appC cytochrome bd-II oxidase

subunit IRv1622c cydB cytochrome d ubiquinol oxidase

subunit IIRv1620c cydC ABC transporterRv1621c cydD ABC transporterRv2007c fdxA ferredoxinRv3554 fdxB ferredoxinRv1177 fdxC ferredoxin 4Fe-4SRv3503c fdxD probable ferredoxinRv3029c fixA electron transfer flavoprotein

β subunitRv3028c fixB electron transfer flavoprotein α

subunitRv3106 fprA adrenodoxin and NADPH ferre-

doxin reductaseRv0886 fprB ferredoxin, ferredoxin-NADP

reductaseRv3251c rubA rubredoxin A

Rv3250c rubB rubredoxin B

7. Miscellaneous oxidoreductases and oxygenases 171

8. ATP-proton motive forceRv1308 atpA ATP synthase α chainRv1304 atpB ATP synthase α chainRv1311 atpC ATP synthase e chainRv1310 atpD ATP synthase β chainRv1305 atpE ATP synthase c chainRv1306 atpF ATP synthase b chainRv1309 atpG ATP synthase γ chainRv1307 atpH ATP synthase δ chain

C. Central intermediary metabolism1. GeneralRv2589 gabT 4-aminobutyrate aminotransferaseRv3432c gadB glutamate decarboxylaseRv1832 gcvB glycine decarboxylase Rv1826 gcvH glycine cleavage system H proteinRv2211c gcvT T protein of glycine cleavage

systemRv1213 glgC glucose-1-phosphate adenylyl-

transferaseRv3842c glpQ1 glycerophosphoryl diester phos-

phodiesteraseRv0317c glpQ2 glycerophosphoryl diester phos-

phodiesteraseRv3566c nhoA N-hydroxyarylamine o-acetyltrans-

feraseRv0155 pntAA pyridine transhydrogenase sub-

unit α1Rv0156 pntAB pyridine transhydrogenase sub-

unit α2Rv0157 pntB pyridine transhydrogenase

subunit βRv1127c ppdK similar to pyruvate, phosphate

dikinase

2. GluconeogenesisRv0211 pckA phosphoenolpyruvate carboxy-

kinase Rv0069c sdaA L-serine dehydratase 1

3. Sugar nucleotidesRv1512 epiA nucleotide sugar epimeraseRv3784 epiB probable UDP-galactose 4-

epimeraseRv1511 gmdA GDP-mannose 4,6 dehydrataseRv0334 rmlA glucose-1-phosphate thymidyl-

transferaseRv3264c rmlA2 glucose-1-phosphate thymidyl-

transferaseRv3464 rmlB dTDP-glucose 4,6-dehydrataseRv3634c rmlB2 dTDP-glucose 4,6-dehydrataseRv3468c rmlB3 dTDP-glucose 4,6-dehydrataseRv3465 rmlC dTDP-4-dehydrorhamnose

3,5-epimeraseRv3266c rmlD dTDP-4-dehydrorhamnose

reductaseRv0322 udgA UDP-glucose

dehydrogenase/GDP-mannose 6-dehydrogenase

Rv3265c wbbL dTDP-rhamnosyl transferaseRv1525 wbbl2 dTDP-rhamnosyl transferaseRv3400 - probable β-phosphoglucomutase

4. Amino sugarsRv3436c glmS glucosamine-fructose-6-

phosphate aminotransferase

5. Sulphur metabolismRv0711 atsA arylsulfataseRv3299c atsB proable arylsulfataseRv0663 atsD proable arylsulfataseRv3077 atsF proable arylsulfataseRv0296c atsG proable arylsulfataseRv3796 atsH proable arylsulfataseRv1285 cysD ATP:sulphurylase subunit 2Rv1286 cysN ATP:sulphurylase subunit 1Rv2131c cysQ homologue of M.leprae cysQRv3248c sahH adenosylhomocysteinaseRv3283 sseA thiosulfate sulfurtransferaseRv2291 sseB thiosulfate sulfurtransferaseRv3118 sseC thiosulfate sulfurtransferaseRv0814c sseC2 thiosulfate sulfurtransferaseRv3762c - probable alkyl sulfatase

D. Amino acid biosynthesis1. Glutamate familyRv1654 argB acetylglutamate kinaseRv1652 argC N-acetyl-γ-glutamyl-phosphate

reductaseRv1655 argD acetylornithine aminotransferaseRv1656 argF ornithine carbamoyltransferaseRv1658 argG arginosuccinate synthaseRv1659 argH arginosuccinate lyase Rv1653 argJ glutamate N-acetyltransferaseRv2220 glnA1 glutamine synthase class IRv2222c glnA2 glutamine synthase class II

Rv1878 glnA3 probable glutamine synthaseRv2860c glnA4 proable glutamine synthaseRv2918c glnD uridylyltransferaseRv2221c glnE glutamate-ammonia-ligase

adenyltransferaseRv3859c gltB ferredoxin-dependent glutamate

synthaseRv3858c gltD small subunit of NADH-dependent

glutamate synthaseRv3704c gshA possible γ-glutamylcysteine syn-

thaseRv2427c proA γ-glutamyl phosphate reductaseRv2439c proB glutamate 5-kinase Rv0500 proC pyrroline-5-carboxylate reductase

2. Aspartate familyRv3708c asd aspartate semialdehyde dehydro-

genaseRv3709c ask aspartokinaseRv2201 asnB asparagine synthase BRv3565 aspB aspartate aminotransferaseRv0337c aspC aspartate aminotransferaseRv2753c dapA dihydrodipicolinate synthaseRv2773c dapB dihydrodipicolinate reductaseRv1202 dapE succinyl-diaminopimelate desuc-

cinylaseRv2141c dapE2 ArgE/DapE/Acy1/Cpg2/yscS

familyRv2726c dapF diaminopimelate epimeraseRv1293 lysA diaminopimelate decarboxylaseRv3341 metA homoserine o-acetyltransferaseRv1079 metB cystathionine γ-synthaseRv3340 metC cystathionine β-lyaseRv1133c metE 5-methyltetrahydropteroyltrigluta-

mate-homocysteine methyltrans-ferase

Rv2124c metH 5-methyltetrahydrofolate-homo-cysteine methyltransferase

Rv1392 metK S-adenosylmethionine synthaseRv0391 metZ o-succinylhomoserine sulfhy-

drylaseRv1294 thrA homoserine dehydrogenaseRv1296 thrB homoserine kinaseRv1295 thrC homoserine synthase

3. Serine familyRv0815c cysA2 thiosulfate sulfurtransferaseRv3117 cysA3 thiosulfate sulfurtransferaseRv2335 cysE serine acetyltransferaseRv0511 cysG uroporphyrin-III c-methyltrans-

feraseRv2847c cysG2 multifunctional enzyme, siroheme

synthaseRv2334 cysK cysteine synthase ARv1336 cysM cysteine synthase BRv1077 cysM2 cystathionine β-synthaseRv0848 cysM3 putative cysteine synthaseRv1093 glyA serine hydroxymethyltransferaseRv0070c glyA2 serine hydroxymethyltransferaseRv2996c serA D-3-phosphoglycerate dehydro-

genaseRv0505c serB probable phosphoserine phos-

phataseRv3042c serB2 C-term similar to phosphoserine

phosphataseRv0884c serC phosphoserine aminotransferase

4. Aromatic amino acid familyRv3227 aroA 3-phosphoshikimate

1-carboxyvinyl transferaseRv2538c aroB 3-dehydroquinate synthaseRv2537c aroD 3-dehydroquinate dehydrataseRv2552c aroE shikimate 5-dehydrogenaseRv2540c aroF chorismate synthaseRv2178c aroG DAHP synthaseRv2539c aroK shikimate kinase IRv3838c pheA prephenate dehydrataseRv1613 trpA tryptophan synthase α chainRv1612 trpB tryptophan synthase β chainRv1611 trpC indole-3-glycerol phosphate

synthaseRv2192c trpD anthranilate phosphoribosyltrans-

feraseRv1609 trpE anthranilate synthase

component IRv2386c trpE2 anthranilate synthase

component IRv3754 tyrA prephenate dehydrogenase

5. HistidineRv1603 hisA phosphoribosylformimino-5-

aminoimidazole carboxamide ribonucleotide isomerase

Rv1601 hisB imidazole glycerol-phosphate dehydratase

Rv1600 hisC histidinol-phosphate aminotrans-ferase

Rv3772 hisC2 histidinol-phosphate aminotrans-ferase

Rv1599 hisD histidinol dehydrogenase


8

Rv1605 hisF imidazole glycerol-phosphate synthase

Rv2121c hisG ATP phosphoribosyltransferaseRv1602 hisH amidotransferaseRv2122c hisI phosphoribosyl-AMP cyclohydro-

laseRv1606 hisI2 probable phosphoribosyl-AMP 1,6

cyclohydrolaseRv0114 - similar to HisB

6. Pyruvate familyRv3423c alr alanine racemase

7. Branched amino acid familyRv1559 ilvA threonine deaminaseRv3003c ilvB acetolactate synthase I large sub-

unit Rv3470c ilvB2 acetolactate synthase large sub-

unitRv3001c ilvC ketol-acid reductoisomeraseRv0189c ilvD dihydroxy-acid dehydrataseRv2210c ilvE branched-chain-amino-acid

transaminaseRv1820 ilvG acetolactate synthase IIRv3002c ilvN acetolactate synthase I small sub-

unitRv3509c ilvX probable acetohydroxyacid syn-

thase I large subunitRv3710 leuA α-isopropyl malate synthaseRv2995c leuB 3-isopropylmalate dehydrogenaseRv2988c leuC 3-isopropylmalate dehydratase

large subunitRv2987c leuD 3-isopropylmalate dehydratase

small subunit

E. Polyamine synthesisRv2601 speE spermidine synthase

F. Purines, pyrimidines, nucleosides and nucleotides1. Purine ribonucleotide biosynthesisRv1389 gmk putative guanylate kinaseRv3396c guaA GMP synthaseRv1843c guaB1 inosine-5'-monophosphate dehy-

drogenaseRv3411c guaB2 inosine-5'-monophosphate dehy-

drogenaseRv3410c guaB3 inosine-5'-monophosphate dehy-

drogenaseRv1017c prsA ribose-phosphate pyrophosphoki-

naseRv0357c purA adenylosuccinate synthaseRv0777 purB adenylosuccinate lyaseRv0780 purC phosphoribosylaminoimidazole-

succinocarboxamide synthaseRv0772 purD phosphoribosylamine-glycine lig-

aseRv3275c purE phosphoribosylaminoimidazole

carboxylaseRv0808 purF amidophosphoribosyltransferase- Rv0957 purH phosphoribosylaminoimidazole-

carboxamide formyltransferase Rv3276c purK phosphoribosylaminoimidazole

carboxylase ATPase subunitRv0803 purL phosphoribosylformylglycin-

amidine synthase IIRv0809 purM 5'-phosphoribosyl-5-aminoimida-

zole synthaseRv0956 purN phosphoribosylglycinamide

formyltransferase IRv0788 purQ phosphoribosylformylglycin-

amidine synthase IRv0389 purT phosphoribosylglycinamide

formyltransferase IIRv2964 purU formyltetrahydrofolate deformy-

lase

2. Pyrimidine ribonucleotide biosynthesisRv1383 carA carbamoyl-phosphate synthase

subunitRv1384 carB carbamoyl-phosphate synthase

subunitRv1380 pyrB aspartate carbamoyltransferase Rv1381 pyrC dihydroorotaseRv2139 pyrD dihydroorotate dehydrogenaseRv1385 pyrF orotidine 5'-phosphate decarboxy-

laseRv1699 pyrG CTP synthaseRv2883c pyrH uridylate kinaseRv0382c umpA probable uridine 5'-monophos-

phate synthase

3. 2'-deoxyribonucleotide metabolismRv0321 dcd deoxycytidine triphosphate

deaminaseRv2697c dut deoxyuridine triphosphataseRv0233 nrdB ribonucleoside-diphosphate

reductase B2 (eukaryotic-like) Rv3051c nrdE ribonucleoside diphosphate

reductase α chainRv1981c nrdF ribonucleotide reductase small

subunitRv3048c nrdG ribonucleoside-diphosphate small

subunitRv3053c nrdH glutaredoxin electron transport

component of NrdEF systemRv3052c nrdI NrdI/YgaO/YmaA familyRv3247c tmk thymidylate kinaseRv2764c thyA thymidylate synthaseRv0570 nrdZ ribonucleotide reductase, class IIRv3752c - probable cytidine/deoxycytidylate

deaminase

4. Salvage of nucleosides and nucleotidesRv3313c add probable adenosine deaminaseRv2584c apt adenine phosphoribosyltrans-

ferasesRv3315c cdd probable cytidine deaminaseRv3314c deoA thymidine phosphorylaseRv0478 deoC deoxyribose-phosphate aldolaseRv3307 deoD probable purine nucleoside phos-

phorylaseRv3624c hpt probable hypoxanthine-guanine

phosphoribosyltransferaseRv3393 iunH probable inosine-uridine

preferring nucleoside hydrolaseRv0535 pnp phosphorylase from Pnp/MtaP

family 2Rv3309c upp uracil phophoribosyltransferase

5. Miscellaneous nucleoside/nucleotide reactionsRv0733 adk probable adenylate kinaseRv2364c bex GTP-binding protein of Era/ThdF

familyRv1712 cmk cytidylate kinaseRv2344c dgt probable deoxyguanosine

triphosphate hydrolaseRv2404c lepA GTP-binding protein LepARv2727c miaA tRNA δ(2)-isopentenylpyrophos-

phate transferaseRv2445c ndkA nucleoside diphosphate kinaseRv2440c obg Obg GTP-binding proteinRv2583c relA (p)ppGpp synthase I

G. Biosynthesis of cofactors, prosthetic groups and carriers1. BiotinRv1568 bioA adenosylmethionine-8-amino-7-

oxononanoate aminotransferaseRv1589 bioB biotin synthaseRv1570 bioD dethiobiotin synthaseRv1569 bioF 8-amino-7-oxononanoate

synthaseRv0032 bioF2 C-terminal similar to B. subtilis

BioFRv3279c birA biotin apo-protein ligaseRv1442 bisC biotin sulfoxide reductaseRv0089 - possible bioC biotin synthesis

gene

2. Folic acidRv2763c dfrA dihydrofolate reductaseRv2447c folC folylpolyglutamate synthaseRv3356c folD methylenetetrahydrofolate dehy-

drogenaseRv3609c folE GTP cyclohydrolase I Rv3606c folK 7,8-dihydro-6-hydroxymethylpterin

pyrophosphokinaseRv3608c folP dihydropteroate synthaseRv1207 folP2 dihydropteroate synthaseRv3607c folX may be involved in folate biosyn-

thesisRv0013 pabA p-aminobenzoate synthase gluta-

mine amidotransferaseRv1005c pabB p-aminobenzoate synthaseRv0812 pabC aminodeoxychorismate lyase

3. LipoateRv2218 lipA lipoate biosynthesis protein ARv2217 lipB lipoate biosynthesis protein B

4. MolybdopterinRv3109 moaA molybdenum cofactor biosynthe-

sis, protein ARv0869c moaA2 molybdenum cofactor biosynthe-

sis, protein ARv0438c moaA3 molybdenum cofactor biosynthe-

sis, protein ARv3110 moaB molybdenum cofactor biosynthe-

sis, protein BRv0984 moaB2 molybdenum cofactor biosynthe-

sis, protein BRv3111 moaC molybdenum cofactor biosynthe-

sis, protein CRv0864 moaC2 molybdenum cofactor biosynthe-

sis, protein CRv3324c moaC3 molybdenum cofactor biosynthe-

sis, protein CRv3112 moaD molybdopterin converting factor

subunit 1Rv0868c moaD2 molybdopterin converting factor

subunit 1Rv3119 moaE molybdopterin-converting factor

subunit 2Rv0866 moaE2 molybdopterin-converting factor

subunit 2Rv3322c moaE3 molybdopterin-converting factor

subunit 2Rv0994 moeA molybdopterin biosynthesisRv3116 moeB molybdopterin biosynthesisRv2338c moeW molybdopterin biosynthesisRv1681 moeX weak similarity to E. coli MoaARv1355c moeY weak similarity to E. coli MoeBRv3206c moeZ probably involved in

molybdopterin biosynthesisRv0865 mog molybdopterin biosynthesis

5. PantothenateRv1092c coaA pantothenate kinaseRv2225 panB 3-methyl-2-oxobutanoate

hydroxymethyltransferaseRv3602c panC pantoate-β-alanine ligaseRv3601c panD aspartate 1-decarboxylase

6. PyridoxineRv2607 pdxH pyridoxamine 5'-phosphate

oxidase

7. Pyridine nucleotideRv1594 nadA quinolinate synthaseRv1595 nadB L-aspartate oxidaseRv1596 nadC nicotinate-nucleotide pyrophos-

phataseRv0423c thiC thiamine synthesis, pyrimidine

moiety

8. ThiamineRv0422c thiD phosphomethylpyrimidine kinaseRv0414c thiE thiamine synthesis, thiazole

moietyRv0417 thiG thiamine synthesis, thiazole

moietyRv2977c thiL probable thiamine-monophos-

phate kinase

9. RiboflavinRv1940 ribA GTP cyclohydrolase IIRv1415 ribA2 probable GTP cyclohydrolase IIRv1412 ribC riboflavin synthase α chainRv2671 ribD probable riboflavin deaminaseRv2786c ribF riboflavin kinaseRv1409 ribG riboflavin biosynthesisRv1416 ribH riboflavin synthase β chainRv3300c - probable deaminase, riboflavin

synthesis

10. Thioredoxin, glutaredoxin and mycothiolRv0773c ggtA putative γ-glutamyl transpeptidaseRv2394 ggtB γ -glutamyltranspeptidase

precursorRv2855 gorA glutathione reductase homologueRv0816c thiX equivalent to M. leprae ThiXRv1470 trxA thioredoxinRv1471 trxB thioredoxin reductaseRv3913 trxB2 thioredoxin reductaseRv3914 trxC thioredoxin

11. Menaquinone, PQQ, ubiquinone and other terpenoidsRv2682c dxs 1-deoxy-D-xylulose 5-phosphate

synthaseRv0562 grcC1 heptaprenyl diphosphate

synthase IIRv0989c grcC2 heptaprenyl diphosphate

synthase IIRv3398c idsA geranylgeranyl pyrophosphate

synthaseRv2173 idsA2 geranylgeranyl pyrophosphate

synthaseRv3383c idsB transfergeranyl, similar geranyl

pyrophosphate synthaseRv0534c menA 4-dihydroxy-2-naphthoate

octaprenyltransferase Rv0548c menB naphthoate synthaseRv0553 menC o-succinylbenzoate-CoA synthaseRv0555 menD 2-succinyl-6-hydroxy-2,4-cyclo-

hexadiene-1-carboxylate synthaseRv0542c menE o-succinylbenzoic acid-CoA ligase Rv3853 menG S-adenosylmethionine:

2-demethylmenaquinone Rv3397c phyA phytoene synthaseRv0693 pqqE coenzyme PQQ synthesis

protein ERv0558 ubiE ubiquinone/menaquinone biosyn-

thesis methyltransferase

12. Heme and porphyrinRv0509 hemA glutamyl-tRNA reductaseRv0512 hemB δ-aminolevulinic acid dehydrataseRv0510 hemC porphobilinogen deaminaseRv2678c hemE uroporphyrinogen decarboxylase


8

Rv1300 hemK protoporphyrinogen oxidaseRv0524 hemL glutamate-1-semialdehyde amino-

transferaseRv2388c hemN oxygen-independent copropor-

phyrinogen III oxidaseRv2677c hemY' protoporphyrinogen oxidaseRv1485 hemZ ferrochelatase

13. CobalaminRv2849c cobA cob(I)alamin adenosyltransferaseRv2848c cobB cobyrinic acid a,c-diamide

synthaseRv2231c cobC aminotransferaseRv2236c cobD cobinamide synthaseRv2064 cobG percorrin reductaseRv2065 cobH precorrin isomeraseRv2066 cobI CobI-CobJ fusion proteinRv2070c cobK precorrin reductaseRv2072c cobL probable methyltransferaseRv2071c cobM precorrin-3 methylaseRv2062c cobN cobalt insertion Rv2208 cobS cobalamin (5'-phosphate)

synthaseRv2207 cobT nicotinate-nucleotide-dimethyl-

benzimidazole transferase Rv0254c cobU cobinamide kinaseRv0255c cobQ cobyric acid synthaseRv3713 cobQ2 possible cobyric acid synthaseRv0306 - similar to BluB cobalamin synthe-

sis protein R. capsulatus

14. Iron utilizationRv1876 bfrA bacterioferritinRv3841 bfrB bacterioferritinRv3215 entC probable isochorismate synthaseRv3214 entD weak similarity to many phospho-

glycerate mutasesRv2895c viuB similar to proteins involved in

vibriobactin uptakeRv3525c - similar to ferripyochelin binding

protein

H. Lipid biosynthesis1. Synthesis of fatty and mycolic acidsRv3285 accA3 acetyl/propionyl CoA carboxylase

α subunitRv0904c accD3 acetyl/propionyl CoA carboxylase

β subunitRv3799c accD4 acetyl/propionyl CoA carboxylase

β subunitRv3280 accD5 acetyl/propionyl CoA carboxylase

β subunitRv2247 accD6 acetyl/propionyl CoA carboxylase

β subunitRv2244 acpM acyl carrier protein (meromycolate

extension)Rv2523c acpS CoA:apo-[ACP] pantethienephos-

photransferaseRv2243 fabD malonyl CoA-[ACP] transacylaseRv0649 fabD2 malonyl CoA-[ACP] transacylaseRv1483 fabG1 3-oxoacyl-[ACP] reductase (aka

MabA)Rv1350 fabG2 3-oxoacyl-[ACP] ReductaseRv2002 fabG3 3-oxoacyl-[ACP] reductaseRv0242c fabG4 3-oxoacyl-[ACP] reductaseRv2766c fabG5 3-oxoacyl-[ACP] reductaseRv0533c fabH β-ketoacyl-ACP synthase IIIRv2524c fas fatty acid synthaseRv1484 inhA enoyl-[ACP] reductaseRv2245 kasA β-ketoacyl-ACP synthase

(meromycolate extension)Rv2246 kasB β-ketoacyl-ACP synthase

(meromycolate extension)Rv1618 tesB1 thioesterase IIRv2605c tesB2 thioesterase IIRv0033 - possible acyl carrier proteinRv1344 - possible acyl carrier proteinRv1722 - possible biotin carboxylaseRv3221c - resembles biotin carboxyl carrierRv3472 - possible acyl carrier protein

2. Modification of fatty and mycolic acidsRv3391 acrA1 fatty acyl-CoA reductaseRv3392c cmaA1 cyclopropane mycolic acid

synthase 1Rv0503c cmaA2 cyclopropane mycolic acid syn-

thase 2Rv0824c desA1 acyl-[ACP] desaturase Rv1094 desA2 acyl-[ACP] desaturaseRv3229c desA3 acyl-[ACP] desaturaseRv0645c mmaA1 methoxymycolic acid synthase 1Rv0644c mmaA2 methoxymycolic acid synthase 2Rv0643c mmaA3 methoxymycolic acid synthase 3Rv0642c mmaA4 methoxymycolic acid synthase 4Rv0447c ufaA1 unknown fatty acid methyltrans-

feraseRv3538 ufaA2 unknown fatty acid methyltrans-

feraseRv0469 umaA1 unknown mycolic acid methyl-

transferase

Rv0470c umaA2 unknown mycolic acid methyl-transferase

3. Acyltransferases, mycoloyltransferases and phospholipid synthesisRv2289 cdh CDP-diacylglycerol phosphatidyl-

hydrolaseRv2881c cdsA phosphatidate cytidylyltransferase Rv3804c fbpA antigen 85A, mycolyltransferaseRv1886c fbpB antigen 85B, mycolyltransferaseRv3803c fbpC1 antigen 85C, mycolyltransferaseRv0129c fbpC2 antigen 85C', mycolytransferaseRv0564c gpdA1 glycerol-3-phosphate dehydroge-

naseRv2982c gpdA2 glycerol-3-phosphate dehydroge-

naseRv2612c pgsA CDP-diacylglycerol-glycerol-3-

phosphate phosphatidyltrans-ferase

Rv1822 pgsA2 CDP-diacylglycerol-glycerol-3-phosphate phosphatidyltrans-ferase

Rv2746c pgsA3 CDP-diacylglycerol-glycerol-3-phosphate phosphatidyltrans-ferase

Rv1551 plsB1 glycerol-3-phosphate acyltrans-ferase

Rv2482c plsB2 glycerol-3-phosphate acyltrans-ferase

Rv0437c psd putative phosphatidylserine decarboxylase

Rv0436c pssA CDP-diacylglycerol-serine o-phosphatidyltransferase

Rv0045c - possible dihydrolipoamide acetyl-transferase

Rv0914c - lipid transfer proteinRv1543 - probable fatty-acyl CoA reductaseRv1627c - lipid carrier proteinRv1814 - possible C-5 sterol desaturaseRv1867 - similar to acetyl CoA

synthase/lipid carriersRv2261c - apolipoprotein N-acyltrans-

ferase-aRv2262c - apolipoprotein N-acyltrans-

ferase-bRv3523 - lipid carrier proteinRv3720 - C-term similar to cyclopropane

fatty acid synthases

I. Polyketide and non-ribosomal peptide synthesisRv2940c mas mycocerosic acid synthaseRv2384 mbtA mycobactin/exochelin synthesis

(salicylate-AMP ligase)Rv2383c mbtB mycobactin/exochelin synthesis

(serine/threonine ligation)Rv2382c mbtC mycobactin/exochelin synthesisRv2381c mbtD mycobactin/exochelin synthesis

(polyketide synthase)Rv2380c mbtE mycobactin/exochelin synthesis

(lysine ligation)Rv2379c mbtF mycobactin/exochelin synthesis

(lysine ligation)Rv2378c mbtG mycobactin/exochelin synthesis

(lysine hydroxylase)Rv2377c mbtH mycobactin/exochelin synthesis Rv0101 nrp unknown non-ribosomal peptide

synthaseRv1153c omt PKS o-methyltransferaseRv3824c papA1 PKS-associated protein, unknown

function Rv3820c papA2 PKS-associated protein, unknown

functionRv1182 papA3 PKS-associated protein, unknown

functionRv1528c papA4 PKS-associated protein, unknown

functionRv2939 papA5 PKS-associated protein, unknown

functionRv2946c pks1 polyketide synthaseRv1660 pks10 polyketide synthase (chalcone

synthase-like)Rv1665 pks11 polyketide synthase (chalcone

synthase-like)Rv2048c pks12 polyketide synthase (erythronolide

synthase-like)Rv3800c pks13 polyketide synthaseRv1342c pks14 polyketide synthase (chalcone

synthase-like)Rv2947c pks15 polyketide synthase Rv1013 pks16 polyketide synthase Rv1663 pks17 polyketide synthase Rv1372 pks18 polyketide synthase Rv3825c pks2 polyketide synthaseRv1180 pks3 polyketide synthaseRv1181 pks4 polyketide synthaseRv1527c pks5 polyketide synthaseRv0405 pks6 polyketide synthaseRv1661 pks7 polyketide synthaseRv1662 pks8 polyketide synthaseRv1664 pks9 polyketide synthase

Rv2931 ppsA phenolpthiocerol synthesis (pksB)Rv2932 ppsB phenolpthiocerol synthesis (pksC)Rv2933 ppsC phenolpthiocerol synthesis (pksD)Rv2934 ppsD phenolpthiocerol synthesis (pksE)Rv2935 ppsE phenolpthiocerol synthesis (pksF)Rv2928 tesA thioesteraseRv1544 - probable ketoacyl reductase

J. Broad regulatory functions1. Repressors/activatorsRv1657 argR arginine repressorRv1267c embR regulator of embAB genes

(AfsR/DndI/RedD family)Rv1909c furA ferric uptake regulatory proteinRv2359 furB ferric uptake regulatory proteinRv2919c glnB nitrogen regulatory proteinRv2711 ideR iron dependent repressor, IdeRRv2720 lexA LexA, SOS repressor proteinRv1479 moxR transcriptional regulator, MoxR

homologueRv3692 moxR2 transcriptional regulator, MoxR

homologueRv3164c moxR3 transcriptional regulator, MoxR

homologueRv0212c nadR similar to E.coli NadRRv0117 oxyS transcriptional regulator (LysR

family)Rv1379 pyrR regulatory protein pyrimidine

biosynthesisRv2788 sirR iron-dependent transcriptional

repressorRv3082c virS putative virulence regulating

protein (AraC/XylS family)Rv3219 whiB1 WhiB transcriptional activator

homologueRv3260c whiB2 WhiB transcriptional activator

homologueRv3416 whiB3 WhiB transcriptional activator

homologueRv3681c whiB4 WhiB transcriptional activator

homologueRv0023 - putative transcriptional regulatorRv0043c - transcriptional regulator (GntR

family)Rv0067c - transcriptional regulator

(TetR/AcrR family)Rv0078 - transcriptional regulator

(TetR/AcrR family)Rv0081 - transcriptional regulator (ArsR

family)Rv0135c - putative transcriptional regulatorRv0144 - putative transcriptional regulatorRv0158 - transcriptional regulator

(TetR/AcrR family)Rv0165c - transcriptional regulator (GntR

family)Rv0195 - transcriptional regulator

(LuxR/UhpA family)Rv0196 - transcriptional regulator



(TetR/AcrR family)Rv0273c - putative transcriptional regulatorRv0302 - transcriptional regulator

(TetR/AcrR family)Rv0324 - putative transcriptional regulatorRv0328 - transcriptional regulator

(TetR/AcrR family)Rv0348 - putative transcriptional regulatorRv0377 - transcriptional regulator (LysR


(LuxR/UhpA family)Rv0452 - putative transcriptional regulatorRv0465c - transcriptional regulator

(PbsX/Xre family)Rv0472c - transcriptional regulator


(PbsX/Xre family)Rv0485 - transcriptional regulator (ROK

family)Rv0494 - transcriptional regulator (GntR

family)Rv0552 - putative transcriptional regulatorRv0576 - putative transcriptional regulatorRv0586 - transcriptional regulator (GntR

family)Rv0650 - transcriptional regulator (ROK

family)Rv0653c - putative transcriptional regulatorRv0681 - transcriptional regulator

(TetR/AcrR family)Rv0691c - transcriptional regulator

(TetR/AcrR family)Rv0737 - putative transcriptional regulatorRv0744c - putative transcriptional regulatorRv0792c - transcriptional regulator (GntR


8

0250

1

150,

001

300,

001

450,

001

600,

001

750,

001

900,

001

1,05

0,00

1

1,20

0,00

1

1,35

0,00

1

1,50

0,00

1

1,65

0,00

1

1,80

0,00

1

1,95

0,00

1

2,10

0,00

1

dnaA

dnaN

recF

0004

gyrB

gyrA

0007

ileT alaT 00

08pp

iA 0010

0011

0012

pabA

pknB

pknA

pbpA

rodA

ppp

0019

0020

leuT 00

2100

22

0023

0024

0025

0026

0027

0028

0029

0030

0031

bioF2

0033

0034

fadD34

0036

0037

0038 00

3900

40

leuS

0042

0043

0044

0045

0046

0047

0048

0049

ponA

0051

0052

rpsF

ssb rpsR

rplI

0057

dnaB

0059

0060

0061

celA

0063

0064

0065

icd2

0067

0068

sdaA

glyA2

0071

REP'00

72

0073

0074

0075

00

PE_PGRS pe

pA

0126

0127

0128

fbpC2

0130

fadE1

0132

0133

ephF

0135

0136

0137

0138

0139

0140 01

41

0142

0143

0144

0145

0146

0147

0148

0149 01

50

PE

PE

0153

fadE2

pntA

A pntA

Bpn

tB

0158

PE

PE

0161

adhE

0163

0164 01

65

fadD5

0167

0168

mce1

0170

0171

0172

lprK

0174

0175

0176

0177

0178

lprO

0180

0181

sigG

0183

0184

0185

bglS

0187

0188

ilvD

0190

0191

0192

0193

0194

0195

0196

0197

0

0249

hsp

nirB

nirD co

bU

cobQ

PPE

0258

0259

0260

narK

3

0262

0263

0264

fecB2

0266

narU

0268

0269

fadD2

fadE6

0272

0273

0274 fad

D27

0276

0277

PE_PGRS

PE_PGRS

PPE

0281

0282

0283

0284

PE

PPE

0287

0288

0289

0290

0291

0292

0293

0294

0295

atsG

PE_PGRS 02

98 0299 0300 0301

0302

0303

PPE

PPE

0306 03

0703

0803

09 0310

0311

0312

0313 03

14

0315

0316 gl

0373

0374

0375

0376

0377

0378

sec 03

8003

81um

pA03

83

clpB

0385

0386

0387

PPE

purT

0390

metZ

0392

0393

0394

0395

0396

0397

0398

lpqK

fadE7

0401

mmpL1

mmpS1

fadD30

pks6

0406

0407

pta

ackA

pknG

glnH

0412

mutT3 thi

E

0415

0416

thiG

lpqL

lpqM

0420

0421thi

D

thiC

0424

ctpH

0426

xthA

0428

def

0430

0431

sodC

0433

0434

0435

pssA

psd

moaA3

0439

groE

L204

41

PPE

0443

0444

sigK

0446

0508

hemA

hemC

cysG

hemB

0513

0514

0515

0516

0517

0518

0519

0520 05

21

gabP

0523

hemL

0525

0526

ccsA

0528

ccsB

0530

0531

PE_PGRS fab

H

menA

pnp

galE

2

0537

0538

0539

0540

0541

menE

0543

0544

pitA

0546

0547

menB

0549

0550

fadD8

0552

menC

bpoC

menD

0556

0557

ubiE 05

5905

60

0561

grcC

1htp

Xgp

dA1

0565

0566

tyrT

0567

0568

0569

nrdZ

0571

0572

0573

0574

0575

0576

0577

PE_PGRS

0579

0580

0581

0582 lpq

N

0584

0585

0586

0654

0655

0656

0657

0658

0659 0660 0661 0662

atsD

0664 0665 0666

rpoB

rpoC

0669

end

lpqP

fadE8

echA

406

74ec

hA5

mmpL5

mmpS5

0678 06

7906

80

0681

rpsL

rpsG

fusA

tuf

0686

0687

0688

0689

0690

0691

0692

pqqE

lldD1

0695

0696

0697

0698

0699

rpsJ

rplC

rplD

rplW

rplB

rpsS

rplV

rpsC

rplP rpmC rpsQ

atsA

0712

0713

rplN rplX

rplE rpsN

rpsH

rplF

rplR

rpsE

rpmD rplO

sppA

0725

0726

fucA

0728

xylB

0730

0731

secY

adk

map'

sigL

0736

0737

0738

0739

0740 IS

1557

'0741

PE_PGRS

0743

0744

cpsY

0807

purF

purM

0810

0811

pabC

0813 ss

eC2 cy

sA2

thiX

0817

0818

0819

phoT ph

oY2

0822

0823

desA

1

0825

0826 08

2708

28 IS16

05'

0829

0830

0831

lysTglu

T aspT pheT

PE_PGRS

PE_PGRS

PE_PGRS

lpqQ

0836

0837

lpqR

0839

0840

0841

0842

0843

narL

0845

0846

lpqS

cysM

3

0849 IS

1606

'0850 08

51

fadD16

pdc

0854

far

0856

0857

0858

fadA

fadB

0861

0862

0863 moa

C2 mogmoa

E2 0867

moaD2 moaA208

70cs

pB PE_PGRS

fadE10

0874

0875

0876

0877

PPE

0879

0880

0881

0882 08

83

serC

0885

fprB

0940

0941

0942

0943

0944

0945

pgi

0947

0948

uvrD

0950

sucC

sucD

0953

0954

0955

purN

purH

0958

0959

0960

0961 09

62

0963

0964

0965

0966

0967

0968

ctpV

0970 ec

hA7

fadE12

accA

2

accD

2

fadE13

0976

PE_PGRS

PE_PGRS

0979 PE_P

GRS

0981

0982

0983

moaB2

mscL

0986

0987

0988

grcC

2

0990

0991

0992

galU

moeA

rimJ

0996

alaV

0997

0998

0999

1000

arcA

1002

1003

1004

pabB

1006

metS

1008

1009

ksgA

1011

1012

pks1

6

pthrpl

Y

lpqT

lipU

cysM

2

pra

metBgr

eA10

81

1082

1083

1084

1085

1086

PE_PGRS

PEPE10

90

PE_PGRS

coaA

glyA

desA

2

phoH

2

1096

1097

fum

1099

1100

1101

1102

1103

1104

1105

1106

xseB

xseA

1109

lytB'

1111

1112

1113

1114

1115

1116

1117 11

1811

1911

20

zwf

gnd2

bpoB

ephC

1125

1126

ppdK

1128REP

1129

1130

gltA1

1132

metE

1134

PPE

1136 11

3711

3811

39

1140 ec

hA11

echA

10

mcr

1144

1145

1146

1147

REP

1148

IS-L

IKE

1149

1150

1151

1152

omt

1154

1155

1156

1157

1158

fadD6

folP2

1208

1209

tagA

1211 12

12

glgC

PE

1215

1216

1217

1218

1219

1220

sigE

1222

htrA

1224 12

25

1226

1227

lpqX

mrp

1230

1231

1232

1233

1234

lpqY

sugA

sugB

sugC

corA

mdh12

4112

42 PE_PGRS

lpqZ

1245

1246

1247

sucA

1249

1250

1251

lprE

deaD

1254

1255

1256

1257

1258

1259

1260

1261

1262

amiB2

1264

1265

pknH

embR

1268

1269

lprA

1271

1272

1273

lprB

lprC 12

76

1277

1278

1279

oppA

oppD

1331

1332

1333

1334

1335

cysM

1337

murI13

39rp

hA13

41pk

s14 1343

1344

fadD33

fadE14 13

47leu

T

1348

1349

fabG2 1351

1352 13

53

1354

moeY

1356

1357

1358

1359

1360

PPE

1362

1363

rsbU

1365

1366

1367

lprF

IS61

10

1369

1370

1371

pks1

8

1373

1374

1375

1376

1377

1378

pyrR

pyrB

pyrC

1382

carA

carB

pyrF

PE

PPE

mIHF

gmk

1390

dfp

metK

1393

1394

1395

PE_PGRS

1397

1398lip

H

lipI

1401

priA

1403

1404

1405

fmt

fmu

rpe

1462

1463

1464

1465

1466

fadE15

PE_PGRS

ctpD

trxA trxB

echA

12

1473

1474

acn

1476

1477

1478

moxR

1480

1481

1482

fabG1

inhA

hemZ

1486

1487

1488

1489

1490

1491

mutA

mutB

1494 1495

1496

lipL

1498

1499

1500

1501

1502

1503

1504

1505

1506

1507

1508

1509

1510

gmdA

epiA

1513

1514

1515

1516

1517

1518

1519

1520

fadD25

mmpL12

1523

1524

wbbl2

1526

pks5

papA

4

fadD24

adh

1531 15

32

1533

15

hisD

hisC

hisB

hisH

hisA

impA

hisF

hisI2

chaA

bcpB

trpE

1610

trpC

trpB

trpA

lgt

1615

1616

pykA

tesB1

1619

cydC

cydD

cydB

appC

1624

1625

leuV

1626

1627

1628

polA

rpsA

1631

1632

uvrB

1634

1635

1636 16

37

uvrA

1639

lysX

infC rpm

I rplT

tsnR

1645

PE

1647

1648

pheS

pheT

PE_PGRS

argC

argJ

argB

argD

argF

argR

argG

argH

pks1

0

pks7

pks8

1724

1725

1726

1727

1728

1729

1730

gabD

117

3217

3317

3417

35

narX

narK

2

1738

1739

1740 1741

1742

pknE

1744

1745

pknF

1747

1748 17

49

fadD1

1751

1752

PPE

1754

plcD

IS61

10

1756

1757

1758

PE_PGRS

(wag

22)

1760

1761

1762

IS61

10

1763

1764

1765

ISB9'

1766

1767

PE_PGRS

1769

1770

1771

1772 17

73

1774

1775

1776

1777

1778

1779

1780

1781

1782

1783

1784

1785

1786

PPEPE

PPE

PPE

PE17

92 1793

1794

1795

1796

ureF ureG

ureD

ndh

1855

1856

modA

modB

modC

modD

1861

adhA

1863

1864

1865

1866

1867

1868

1869

1870

1871

lldD2

1873

1874

1875

bfrA

1877

glnA3

1879

1880

lppE

1882

1883

1884

1885

fbpB

1887

1888

1889

1890

1891 1892 1893 18

94

1895

1896

1897

1898 lpp

D

lipJ

cinA

nanT

1903

1904

aao

1906

1907

katG

furA

1910

lppC

fadB5

1913 19

14

aceA

a

aceA

b

PPE

PPE

1919

1920

lppF

1922

lipD

1924

fadD31

1926

1927

1928

1929

1930

1931

tpxfad

E18fa

d

0257

150,

000

300,

000

450,

000

600,

000

750,

000

900,

000

1,05

0,00

0

1,20

0,00

0

1,35

0,00

0

1,50

0,00

0

1,65

0,00

0

1,80

0,00

0

1,95

0,00

0

2,10

0,00

0

2,25

0,00

0

0076

0077

0078

0079

0080

0081

0082

0083

hycD

hycP

hycQ

hycE

0088

0089

0090

0091

ctpA

0093

REP

0094

0095

PPE

0097

0098

fadD10

0100

nrp

0102

ctpB

0104

rpmB

0106

ctpI

0108

PE_PGRS

0110

0111

gca

gmhA

0114

0115

0116

oxyS

oxcA

fadD7

fusA2

0121

0122

0123

0198

0199

0200 02

01

mmpL11

0203

0204

0205

mmpL3

0207

0208

0209

0210

pckA

nadR

0213

fadD4

fadE3

0216

lipW

0218

0219

lipC

0221

echA

102

23

0224

0225

0226

0227

0228

0229

0230

fadE4

0232

nrdB

gabD

2

0235

0236

lpqI

0239

0240 02

41

0245

fabG4

fadA2

fadE5

0246

0247

0248

16 glpQ2 glyU 0318

3

pcp

0320

dcd

udgA

0323

0324

0325

0326

0327

0328 03

2903

30

0331

0332

0333

rmlA

PE

0336

aspC

0338

0339

0340

0341

0342

0343

lpqJ

0345

aroP

2

0347

0348

0349

dnaK

grpE

dnaJ

hspR

PPE

PPE

0356

purA

0358

0359 03

60

0361

mgtE

fba

0364

0365

0366

0367

0368

0369

0370

0371

0372

ufaA1

0448

0449

mmpL4

mmpS4

0452

PPE

0454 04

55ec

hA2

0457

0458

0459

0460

0461

0462

0463 04

64

0465

0466

aceA

fadB2

umaA

1 umaA

2

0471

0472

0473

0474

0475

0476

0477

deoC

0479

0480

0481

murB

0483

0484

0485

0486

0487

0488

gpm

senX

3

regX

3

0492

0493

0494

0495

0496

0497

0498

0499

proC

galE

1

0502

cmaA

205

04

serB

mmpS2

mmpL2

8605

8705

88

mce2

0590

0591

0592

lprL

0594

0595

0596

0597

0598

0599

0600

0601

tcrA

0603

lpqO

IS15

36

0605

0606

0607 0608

0609

0610

0611

0612

0613

0614

0615

0616

0617

galT' galT

galK

0621

0622

0623

0624 06

2506

2606

2706

28

recD

recB

recC

echA

306

33

0634

thrT metT

0635 0636

0637

trpT secE

nusG

rplK

rplA mmaA

4mmaA

3mmaA

2mmaA

1

lipG

0647

0648

fadD37

0650

rplJ

rplL 06

53

0745

PE_PGRS

PE_PGRS

0748

0749

0750 mmsB

fadE9

mmsA

PE_PGRS

PPE

thrV

0756

phoP

phoR

0759

0760

adhB

0762 0763

0764

0765

0766

0767

aldA

0769

0770

0771

purD

ggtA

0774

0775 07

76

purB

0778

0779

purC

ptrBb

ptrBa

0783

0784

0785

0786

0787

purQ 07

8907

9007

91

0792

0793

lpdB

IS61

10

0795

0796

IS15

470797

0798

0799

pepC

0801 08

02

purL

0804

0805

B

0887

0888

citA

0890

0891

0892

0893

0894

0895

gltA2

0897

0898

ompA

0900

0901

0902

0903

accD

3

echA

609

06

0907

ctpE

0909 0910

0911

0912

0913

0914

PPE

PE

betP

0918

0919 ar

gT

IS15

54

0920

IS15

35

0921

0922

0923

nram

p

0925

0926

0927

phoS

2ps

tC2

pstA

1

pknD

pstS

pstB

phoS

1

pstC

pstA

2 0937

0938

0939

pqT

prsA

glmU

glnT

1019

mfd

1021

lpqU

eno

1024

1025

1026

kdpE

kdpD

kdpA

kdpB

kdpC

1032

1033

1034

IS15

60

1035

1036

1037 1038

PPE

PE

IS-L

IKE

1041

1042

1043

1044

1045

1046

IS10

811047

1048

1049

1050

1051

1052

1053

1054 1055 leuX 1056

1057

fadD14

1059

1060

1061

1062

1063

lpqV

1065

1066 PE_P

GRS

PE_PGRS

1069

echA

8ec

hA9

1072

1073

fadA3

1075

1159

mutT2

narG

narH

narJ

narI

1165

lpqW

1167

PPE

PE

1170

1171

PE

1173

1174

fadH

1176

fdxC

1178

1179

pks3

pks4

papA

3

mmpL10

1184

fadD21

1186

rocA

1188

sigI

1190

1191

1192

fadD36

1194

PE

PPE

1197

1198

1199IS

1081

1200

1201

dapE

1203

1204

1205

oppC

oppB

1284

cysD

cysN

1287

1288

1289

1290

1291

argV

argS

lysA

thrA

thrC

thrB

rho

rpmE

prfA

hemK

1301

rfe

1303

atpB atpE atpF

atpH

atpA

atpG

atpD

atpC 1312

IS15

57

1313

1314

murA

rrs

rrl

rrfog

t

alkA

1318

1319

1320

1321

1322

fadA4

1324

PE_PGRS

glgB

1327

glgP

dinG

1330

e

ribG

1410

lprG

ribC

1413

1414

ribA2

ribH 1417

lprH

1419

uvrC

1421

1422

1423

1424

1425

lipO

fadD12

1428

1429

PE

1431

1432

1433

1434 14

35

gap

pgk

tpi

1439

secG PE_P

GRS

bisC

1443

1444

devB

opcA

zwf2

tal

tkt

PE_PGRS

ctaB

PE_PGRS

1453

qor

1455

1456

1457

1458

1459

1460

1461

1534

1535

ileS

dinX

ansA

lspA

1540

lprI glbN

1543

1544

1545

1546

dnaE

1

PPE

fadD11

' fadD11

plsB1

frdA

frdB

frdC frdD

1556

mmpL6 15

58ilv

A

1560

1561

glgZ

glgY

glgX

1565

1566

1567

bioA

bioF

bioD

1571 15

7215

7315

7415

75 1576

1577

1578

1579 1580 1581

1582

1583 1584 1585

1586

1587

REP 1588

bioB

1590

1591

1592

1593

nadA

nadB

nadC

1597 15

98

pks1

7

pks9

pks1

116

66

1667

1668

1669 1670 1671

1672

1673

1674

1675

1676

dsbF

1678

fadE16

1680

moeX

1682

1683

1684 16

8516

8616

87

1688

tyrS

lprJ

1691

1692

1693

tlyA

1695

recN

1697

1698

pyrG

1700

1701

1702REP

1703

cycA

PPE

PPE

1707

1708

1709

1710

1711

cmk

1713

1714

fadB3

1716

1717

1718

1719

proT 17

2017

21

1722

1723

1797

1798

lppT

PPE

PPE

PPE

PE_PGRS

1804

1805

PE

PPE

PPE

PPE

1810

mgtC18

12

1813

1814

1815

1816

1817

PE_PGRS

1819

ilvG

secA

2

pgsA

2 1823

1824

1825

gcvH

1827

1828

1829

1830

1831

gcvB

1833

1834

1835

1836

glcB

1838

1839 PE_P

GRS

1841

1842

guaB

1

gnd

1845

1846

1847 ureA

ureB

ureC

adE17

echA

13

1936

1937

ephB

1939

ribA

1941 19

4219

4319

44

1945

REP

lppG

1947 19

4819

4919

50 1951

1952

1953 19

5419

5519

5619

57 1958

1959 1960

1961 19

62

1963

1964

1965

mce3

1967

1968

1969

lprM

1971

1972

1973

1974

1975 19

76

1977

1978

1979

mpt64

nrdF

1982

PE_PGRS 19

84

1985

1986

1987

1988

1989

1990

1991

ctpG

1993

1994

1995

1996

ctpF

1998

1999

2000

2001

fabG3 20

03

0238


8

2,25

0,00

1

2,40

0,00

1

2,55

0,00

1

2,70

0,00

1

2,85

0,00

1

3,00

0,00

1

3,15

0,00

1

3,30

0,00

1

3,45

0,00

1

3,60

0,00

1

3,75

0,00

1

3,90

0,00

1

4,05

0,00

1

4,20

0,00

1

4,35

0,00

14,

411,

529

2004

2005

otsB

fdxA

2008

2009

2010 20

1120

12

IS16

0720

1320

1420

15

2016

2017

2018

2019

2020

2021

2022

2023

2024

2025

2026

2027

2028

pfkB

2030

hspX

2032

2033

2034

2035

2036

2037

2038

2039

2040

2041

2042

pncA20

44

lipT

lppI

2047

pks1

2

2049

2050

2051

2052

2053

2054

rpsR

2 rpsN

2 rpmG rpmB2

2059

2060

2061

cobN

2063

cobG

cobH

cobI

2067

blaC

sigC co

bKco

bM

cobL

2073

2074

2075

2076

2077

2

2140

dapE

2

leuU

2142

2143

2144 wag31

2146

2147

2148

yfiH

ftsZ

ftsQ

murC

murG

ftsW

murD

murX

murF

murE

2159

2160

2161

PE_PGRS

pbpB

2164

2165

2166

IS61

10

2167

2168

2169

2170

lppM

2172

idsA2

2174

2175

pknL

IS15

58

2177

aroG

2179

2180

2181

2182

2183

2184

2185

2186

fadD15

2188

2189

2190

2191

trpD

ctaE

qcrC

qcrA

qcrB

2197 mmpS3

2199

ctaC

asnB

cbhK

2203 22

04

2205

2206

cobT

cobS

2209

ilvE

gcvT

2212

pepB

ephD

sucB

2216

lipB

IS61

10

2278

2279

2280

pitB

2282

2283

lipM

2285

2286

yjcE

2288

cdh

lppO

sseB 22

9222

93

2294

2295

2296

2297

2298

htpG

2300

2301

2302 23

0323

04

2305

2306

2307

2308

metV23

09

2310

2311

2312

2313

2314

2315

uspA

uspE

uspC

2319

rocE

rocD

2 rocD

123

23

2324

2325

2326

2327

PE

narK

1

lppP

2331

mez

2333

cysK

cysE

2336

2337

moeW

mmpL9

PE

asnT

lppQ

2342

dnaG

dgt

2345

2346

2347

2348

plcC

plcB

plcA

PPE

PPE

lppR

lepA

2405

2406

2407

PE 2409

2410

2411

rpsT 24

13

2414

2415

2416

2417

2418

2419

2420

2421

2422

2423

IS15

58

2424

2425

2426

proA

ahpC

ahpD PPE

PE 2432 2433

2434

2435

rbsK

2437

2438

proB

obg

rpmA rpl

U

dctA

rne

ndkA24

46

folC

valS

2449

2450

2451 24

5224

5324

54

2455

2456

clpX

2458

2459

clpP2clp

P

tigpr

oUgly

Vlip

P

2464rpi

2466

pepD

2468

2469

glbO

2471

2472

2473 24

7424

75

2476

2477

2478

I24

7

2525

2526

2527 mrr

2529

2530

adi

2532

nusBefp

pepQ

2536 ar

oDar

oBar

oK

aroF

2541

2542

lppA

lppB 2545

2546 2547 2548 25

4925

5025

51ar

oE

2553

2554

alaS

2556

2557

2558

2559

2560

2561

2562

2563

glnQ

2565

2566

2567

2568

2569

2570

2571

aspS

2573

2574

2575

2576

2577

2578

linB

hisS

2581

ppiB

relA

apt

2585

secF

secD

2588

gabT

fadD9

PE_PGRS ru

vBru

vAru

vC25

9525

9625

9725

9825

9926

00sp

eE

2602 26

0326

04tes

B2

2606

pdxH

2683

arsA

arsB

2686

2687

2688

2689

2690

trkA

trkB 26

9326

94

2695

2696du

t26

98 2699

2700

suhB

ppgK

sigA

2704

2705

2706

2707 27

0827

09sig

B

ideR

2712

2713

2714

2715

2716 27

1727

1827

19

lexA

2721

2722

2723

fadE20

hflX

dapF

miaA27

28

2729

2730

2731

2732

2733

2734

2735

recX

recA

2738

2739

2740

PE_PGRS 27

42

2743 35

kd_a

g27

45 pgsA

327

47

ftsK

2749

2750

2751

2752

dapA

2754

hsdS

'hs

dM

2757 2758 2759

2760

2761

2762

dfrA

thyA

2765

fabG527

67

PPE

PE

PPE

2771

2772

dapB27

7427

7527

76

2777

2843

2844

proS

efpA

cysG

2

cobB

cobA

2850

2851

2852

PE_PGRS

2854

gorA

nicT

2857

aldC

2859

glnA4

map28

62

2863

2864

2865

2866

2867

gcpE

2869

2870

2871 2872

mpt83

2874

mpt70 28

76 merTmpt5

328

7928

80cd

sA

frrpy

rH

2884

IS15

39

2885

2886

2887

amiC

tsfrp

sB

2891

PPE

2893

xerC

viuB

2896

2897

2898

fdhD

fdhF

2901

rnhB

lepB

rplS

lppW

trmD

rimM

2908 rpsP

2910

dacB

2912

2913

pknI

2915

ffh

2917

glnD

glnB

amt

ftsY

2949

fadD29

2951

2952

2953

2954

2955

2956

2957

2958

2959

2960

2961

2962

2963

purU

kdtB

2966

pca

2968

2969

lipN

2971

2972

recG

2974

2975

ung

thiL

IS15

38

2978

2979

2980

ddlA

gpdA

2

2983

ppk

mutT1 hu

pB

leuD

leuC

2989

2990

2991

gluU glnU

gltS

2993

2994

leuB

serA

2997

2998

lppY

3000

ilvC

ilvN

ilvB

3004 30

05

lppZ

3007

3008

gatB

pfkA

gatA

gatC

3013

ligA

3015

lpqA 30

17

PPE

3019PEPPEPPE

IS10

81

3023

3024

3025

lipR

3085

adhD

3087

3088

fadD13

3090

3091

3092

3093

3094

3095

3096

PE

3098ss

r30

99sm

pBfts

Xfts

E31

0331

04

prfB

fprA

3107

3108

moaA

moaB moaC moaD 31

1331

14IS

108131

15

moeB

cysA

3 sseC moa

E 3120

3121

3122

3123

3124

PPE

3126

3127

3128

3129

3130

3131

3132

3133

3134

PPEPPE

3137

pflA

fadE24

fadE23

fadB4 31

4231

43PPE

nuoA

nuoB

nuoC

nuoD

nuoE

nuoF

nuoG

nuoH

nuoI

nuoJ

nuoK

nuoL

nuoM

nuoN

PPE

3160

3161

3162

3163

moxR331

6531

66

3224

3225

3226

aroA

3228

desA

3

3230

3231

pvdS32

3332

34

3235

kefB

3237

3238

3239

secA

3241

3242

3243

lpqB

mtrB

mtrA

tmk

sahH

3249 rubB rubA32

52

3253

3254

manA

3256

pmmA

3258

3259

whiB2

3261

3262

3263

rmlA2

wbbL

rmlD

3267

3268

3269

ctpC

3271

3272

3273

fadE25pu

rE

purK

3277 32

78bir

A

accD

532

8132

82ss

eA32

84

accA

3

sigF

rsbW

3288

3289

lat32

91

3292

aldB

3294

3295

lhr

nei

lpqC

atsB

IS16

083348 33

49IS15

61'

PPE

3351

3352

3353

3354 33

55fol

D33

5733

5833

59

3360 33

6133

6233

6333

64

3365

spoU

PE_PGRS 33

6833

69

dnaE

2

3371

otsB2

echA

18 echA

18' am

iD

3376

3377

3378

3379

IS61

10

3380

3381lyt

B

idsB

3384

3385

IS15

6033

8633

87

PE_PGRS

3389

lpqD

acrA

1cm

aA1

iunH

3394

3395

guaA

phyA

idsA

3399

3400

3401

3402

3403

3404

3405

3406

3407

3408

choD

guaB

3

guaB

2

3412 34

13sig

D34

15

wh

3481

3482

3483

cpsA

3485

3486

lipF

3488 3489

otsA

3491 34

9234

93

3494

lprN

3496

3497

3498

mce4

3500

3501

3502

fdxD

fadE26

fadE27

fadD17

PE_PGRS

PE_PGRS

ilvX

3510

PE_PGRS

PE_PGRS

fadD18

PE_PGRS

fadD19

echA

1935

1735

18

3519

3520

3521

3522

3523

3524

3525

3526

3527

3528

3529

3530

3531

PPE

PPE

3534

3535

3536

3537

ufaA2

PE

ltp2

3541

3542

fadE29

fadE28

3545

folE

ftsH

3611

3612

3613

3614

3615

3616

ephA

3618

3619

3620

PPE

PE

lpqG

hpt

mesJ

3626

3627

ppa

3629

3630

3631

3632

3633

rmlB2

3635

IS15

34

3636

3637

3638

3639

IS15

53

3640

fic36

4236

43thr

U36

44

3645

topA

3647

cspA

3649

PE

3651

PE_PGRS

PE_PGRS

3654

3655 3656 3657

3658

trbB

3660

3661

3662

dppD

dppC

dppB

dppA

acs

3668

3669

ephE

3671

3672

3673nth

3675

3676

3677

3678

3679

3680

whiB4

ponA

'

3683

3684

proV

3685

3686

3687

3688

3689

3690

3691

moxR2

tyrA

3755

proZ

proW

proV

proX

3760 fad

E36

3762

lpqH

3764

3765

3766

3767

3768

3769

3770

3771 argU serT

hisC2 37

73

echA

21lip

E

3776

serU

3777

3778

3779

3780

3781

rfbE

3783

epiB

3785

3786

3787

3788

3789

3790

3791

3792

embC

embA

embB

atsH

fadE35

3798

IS15

57

accD

4

pks1

3

fadD32

3802

fbpC1

fbpA

3805

3806

3807

3808

glf

pirG

csp

PE_PGRS

3813

3814

3815

3816

3817

3818

3819

papA2

PE

PPE38

74es

at6

3876

3877

3878

3879

3880

3881

3882

3883

3884

3885

3886

3887

3888

3889

3890

3891

PPE

PE

3894

3895

3896

3897

3898

3899

3900

3901

3902

3903

3904

3905

3906

pcnA

3908

3909

3910

sigM

3912

trxB2

trxC

cwlM

3916

parA

parB

gid39

20

3921

3922 rnpA rpmH

2,40

0,00

0

2,55

0,00

0

2,70

0,00

0

2,85

0,00

0

3,00

0,00

0

3,15

0,00

0

3,30

0,00

0

3,45

0,00

0

3,60

0,00

0

3,75

0,00

0

3,90

0,00

0

4,05

0,00

0

4,20

0,00

0

4,35

0,00

0

2078

2079

lppJ 20

81

2082

2083

2084

IS15

56

2085

2086

2087

pknJ

pepE

2090

2091

helY

2093

2094

2095

2096

2097

PE_PGRS

PE

2100

helZ

2102 21

0321

04

IS61

10

2105

2106

PEPPE

prcA

prcB21

11

2112

2113

2114

2115

lppK 2117 21

18

2119

2120

hisGhis

I

PPE

metH

2125

PE_PGRS

ansP

2128 21

29cy

sS2

cysQ

2132

2133

2134

2135

2136

2137

lppL

pyrD

pB

lipA

2219

glnA1

glnE

glnA2

2223

2224

panB

2226

rnpB

2227

2228

2229

2230

cobC

2232

2233

ptpA

2235

cobD

2237 va

lV ahpE

2239

2240

aceE

2242

fabD

acpM

kasA

kasB

accD

6

2248

glpD1

2250

2251

2252

2253 22

5422

5522

5622

57

2258

adhE

222

60 2261

2262

2263

2264

2265

2266

2267

2268

2269

lppN 2271

2272

2273

2274

2275

2276

2277

PE

IS61

10

2354

2355

PPE

glyS

2358

furB 23

6023

6123

62

amiA2

bex

2365

2366

2367

phoH23

6923

70

PE23

72dn

aJ2

hrcA

2375 23

76mbtHmbtG

mbtF

mbtE

mbtD

mbtC

mbtB

mbtA

lipK

trpE2

2387

hemN

2389

2390

nirA

cysH

2393

ggtB

2395

PE_PGRS cy

sA

cysW

cysT

subI

2401

2402

IS61

10

479 2480

2481

plsB2

2483

2484

lipQ

argW

echA

14PE_P

GRS

2488

2489

PE_PGRS

2491

2492

2493

2494 pd

hC

pdhB

pdhA

citE

2499

fadE19

accA

1

accD

1

scoB

scoA

fadD35

2506

2507

2508

2509

2510

2511

hisT

IS10

81

2512

2513

2514

2515

2516

2517

lppS

lysU

PE

2520

bcp

2522

acpS

fas

PPE

2609

2610

2611

pgsA26

13

thrS

PE_PGRS

2616 26

17

2618 26

1926

2026

21

2622

2623

2624

2625

2626

2627

2628

2629

2630

2631

2632

2633

PE_PGRS

2635

2636

dedA

2638 26

3926

4026

4126

42

arsC

2644

valTgly

T cysT valU 2645

2646

2647

IS61

10

2648

2649

2650

2651

2652

2653

2654

2655

2656

2657 2658

2659

2660 266126

62 2663 2664 2665 IS

1081

'2666

clpX'

2668

2669

2670

ribD

2672

2673

2674 26

7526

76

hemY'

hemE

echA

1526

80

2681

dxs

77

2778

2779

ald

2781

pepR

gpsI

lppU

rpsO

ribF

2787

sirR

fadE21

ltp1

IS16

02

2791

2792

truB

2794

2795

lppV

2797

2798

2799

2800

2801

2802

2803

2804

2805

2806

2807

2808

2809

IS15

55'

2810

2811

IS16

042812

2813

IS61

10

2814

2815

2816

2817

2818

2819

2820

2821

2822

2823

2824

2825

2826

2827

2828

2829

2830

echA

16 ugpC

ugpB

ugpE

ugpA

dinF

2837

rbfA

infB

2840

nusA28

42

smc

2923fpg

rnc

2926

2927

tesA

2929

fadD26

ppsA

ppsB

ppsC

ppsD

ppsE

drrA

drrB

drrC

papA

5

mas

fadD28

mmpL7

IS15

33

2943

2944

lppX

pks1

pks1

5

fadD22

3026

3027

fixB

fixA

3030

3031

3032

3033

3034

3035

3036

3037

3038

echA

1730

4030

41

serB

2

ctaD

fecB

adhC 30

4630

47nr

dG

3049

3050

nrdE

nrdI nr

dH

3054

3055

dinP

3057

3058

3059

3060

fadE22

ligB

cstA

3064

emrE

3066

3067

alaU

pgmA

3069

3070

3071

3072

3073

3074

3075

3076

atsF

3078 30

79

pknK

3081

virS

3083

631

67

3168

3169

3170

hpx

3172

3173

3174

3175

lipS

3177

3178

3179

3180

3181

3182 3183

IS61

10

3184

3185

IS61

10

3186

3187

3188

3189

3190

3191IS

1603

metU31

92

3193

3194

3195

3196

3197

uvrD

2

3199

3200

3201

3202

lipV

3204 32

05

moeZ

3207

3208

3209 32

10

rhlE

3212

3213

entD

entC

3216 32

17

3218

whiB1 32

2032

21

3222

sigH

3300

phoY

1

glpD2

lpdA

3304

amiA

amiB

deoD

pmmB

upp

3310

3311

3312

add

deoA

cdd

sdhC sdhD

sdhA

sdhB 33

2033

21 moaE3 gphA moaC3

IS61

10

3325

3326

IS15

4733

27

sigJ

3329

3330

sugI

nagA

3333

3334

3335

trpS

3337

3338

icd1

metC

metA

3342

PPE

PE_PGRS

PE_PGRS

3346

PPE

whiB3 gr

oEL1

groE

S

gcp

rimI 3421

3422

alr

3424

PPE

PPE

IS15

32

3427

3428

PPEIS

1540

3430

IS15

52

3431

gadB

3433

3434

3435

glmS

3437

3438

3439

3440

mrsA

rpsI

rplM

3444

3445

3446

3447

3448

3449

3450

3451

3452

3453

3454

truA

rplQ

rpoA

rpsD

rpsK rpsM

rpmJ infA

3463

rmlB

rmlC

3466 REP3467

rmlB3

mhpE

ilvB2

3471

3472 bp

oA

IS61

10

3474

3475

kgtP

PE

PPE

3479

3480

fadA5

3547

3548

3549

echA

2035

5135

52

3553

fdxB

3555

fadA6

3557

PPE

3559

fadE30

fadD3

fadE31

fadE32

fadE33

aspB

nhoA

3567

3568

3569

3570

3571

3572

fadE34

3574

3575

pknM

3577

arsB

235

79

cysS

3581

3582

3583

lpqE

radA

3586

3587

3588

mutYPE_P

GRS

3591

3592

lpqF

3594 PE_P

GRS

clpC

lsr2

lysS

3599

3600

panD

panC

3603

3604

3605

folK

folX

folP

3693

3694

3695

glpK

3697

3698

3699

3700

3701

3702

3703

gshA

3705

3706

3707

asd

ask

leuA

dnaQ

3712

cobQ

2 3714

recR

3716

3717 37

18

3719

3720

dnaZ

X

3722

serV

3723

3724

3725

3726

3727

3728

3729

3730

ligC

3732

3733

3734

3735

3736

3737

PPEPPE

3740

3741

3742

3743

3744

3745PE

3747

3748 37

4937

5037

51

serX 3752 3753

A2

3821

3822

mmpL8

papA

1

pks2

fadD23

IS15

37

3827

3828

3829

3830

3831 38

32

3833

serS

3835

3836

3837

pheA

3839

3840

bfrB glp

Q1

3843

3844

3845

sodA

3847

3848

3849

3850

3851

hns

menG

3854

3855

3856

3857

gltD

gltB

3860

3861

3862

3863

3864

3865

3866

3867

3868

3869

3870

3871


8


(NifR3/Smm1 family)Rv0827c - transcriptional regulator (ArsR


(LuxR/UhpA family)Rv0891c - putative transcriptional regulatorRv0894 - putative transcriptional regulatorRv1019 - transcriptional regulator

(TetR/AcrR family)Rv1049 - transcriptional regulator (MarR


(PbsX/Xre family)Rv1151c - putative transcriptional regulatorRv1152 - transcriptional regulator (GntR

family)Rv1167c - putative transcriptional regulatorRv1219c - putative transcriptional regulatorRv1255c - transcriptional regulator

(TetR/AcrR family)Rv1332 - putative transcriptional regulatorRv1353c - transcriptional regulator


(LuxR/UhpA family)Rv1359 - putative transcriptional regulatorRv1395 - transcriptional regulator

(AraC/XylS family)Rv1404 - transcriptional regulator (MarR

family)Rv1423 - putative transcriptional regulatorRv1460 - putative transcriptional regulatorRv1474c - transcriptional regulator


(TetR/AcrR family)Rv1556 - putative transcriptional regulatorRv1674c - putative transcriptional regulatorRv1675c - putative transcriptional regulatorRv1719 - transcriptional regulator (IclR

family)Rv1773c - transcriptional regulator (IclR

family)Rv1776c - putative transcriptional regulatorRv1816 - putative transcriptional regulatorRv1846c - putative transcriptional regulatorRv1931c - transcriptional regulator

(AraC/XylS family)Rv1956 - putative transcriptional regulatorRv1963c - putative transcriptional regulatorRv1985c - transcriptional regulator (LysR

family)Rv1990c - putative transcriptional regulatorRv1994c - transcriptional regulator (MerR

family)Rv2017 - putative transcriptional regulator

(PbsX/Xre family)Rv2021c - putative transcriptional regulatorRv2034 - transcriptional regulator (ArsR

family)Rv2175c - putative transcriptional regulatorRv2250c - putative transcriptional regulatorRv2258c - putative transcriptional regulatorRv2282c - transcriptional regulator (LysR

family)Rv2308 - putative transcriptional regulatorRv2324 - transcriptional regulator

(Lrp/AsnC family)Rv2358 - transcriptional regulator (ArsR


(LuxR/UhpA family)Rv2506 - transcriptional regulator

(TetR/AcrR family)Rv2621c - putative transcriptional regulatorRv2640c - transcriptional regulator (ArsR

family)Rv2642 - transcriptional regulator (ArsR

family)Rv2669 - putative transcriptional regulatorRv2745c - putative transcriptional regulatorRv2779c - transcriptional regulator

(Lrp/AsnC family)Rv2887 - transcriptional regulator (MarR


(TetR/AcrR family)Rv2989 - transcriptional regulator (IclR

family)Rv3050c - putative transcriptional regulatorRv3055 - putative transcriptional regulatorRv3058c - putative transcriptional regulatorRv3060c - transcriptional regulator (GntR

family)Rv3066 - putative transcriptional regulatorRv3095 - putative transcriptional regulatorRv3124 - transcriptional regulator

(AfsR/DndI/RedD family)

Rv3160c - putative transcriptional regulatorRv3167c - putative transcriptional regulatorRv3173c - transcriptional regulator

(TetR/AcrR family)Rv3183 - putative transcriptional regulatorRv3208 - transcriptional regulator



(Lrp/AsnC family)Rv3295 - transcriptional regulator

(TetR/AcrR family)Rv3334 - transcriptional regulator (MerR

family)Rv3405c - putative transcriptional regulatorRv3522 - putative transcriptional regulatorRv3557c - transcriptional regulator


(TetR/AcrR family)Rv3575c - transcriptional regulator (LacI

family)Rv3583c - putative transcriptional regulatorRv3676 - transcriptional regulator (Crp/Fnr

family)Rv3678c - transcriptional regulator (LysR


(AraC/XylS family)Rv3744 - transcriptional regulator (ArsR



(AraC/XylS family)Rv3840 - putative transcriptional regulatorRv3855 - putative transcriptional regulator

2. Two component systemsRv1028c kdpD sensor histidine kinaseRv1027c kdpE two-component response

regulatorRv3246c mtrA two-component response

regulatorRv3245c mtrB sensor histidine kinaseRv0844c narL two-component response

regulatorRv0757 phoP two-component response

regulatorRv0758 phoR sensor histidine kinaseRv0491 regX3 two-component response

regulatorRv0490 senX3 sensor histidine kinaseRv0602c tcrA two-component response

regulatorRv0260c - two-component response

regulatorRv0600c - sensor histidine kinaseRv0601c - sensor histidine kinaseRv0818 - two-component response

regulatorRv0845 - sensor histidine kinaseRv0902c - sensor histidine kinaseRv0903c - two-component response

regulatorRv0981 - two-component response

regulatorRv0982 - sensor histidine kinaseRv1032c - sensor histidine kinaseRv1033c - two-component response

regulatorRv1626 - two-component response

regulatorRv2027c - sensor histidine kinaseRv2884 - two-component response

regulatorRv3132c - sensor histidine kinaseRv3133c - two-component response

regulatorRv3143 - putative sensory transduction

proteinRv3220c - sensor histidine kinaseRv3764c - sensor histidine kinaseRv3765c - two-component response

regulator

3. Serine-threonine protein kinases and phosphoproteinphosphatasesRv0015c pknA serine-threonine protein kinaseRv0014c pknB serine-threonine protein kinaseRv0931c pknD serine-threonine protein kinaseRv1743 pknE serine-threonine protein kinaseRv1746 pknF serine-threonine protein kinaseRv0410c pknG serine-threonine protein kinaseRv1266c pknH serine-threonine protein kinaseRv2914c pknI serine-threonine protein kinaseRv2088 pknJ serine-threonine protein kinaseRv3080c pknK serine-threonine protein kinaseRv2176 pknL serine-threonine protein kinase,

truncatedRv0018c ppp putative phosphoprotein phos-

phataseRv2234 ptpA low molecular weight protein-tyro-

sine-phosphataseRv0153c - putative protein-tyrosine-phos-

phatase

II. Macromolecule metabolismA. Synthesis and modification of macromolecules1. Ribosomal protein synthesis and modificationRv3420c rimI ribosomal protein S18 acetyl

transferaseRv0995 rimJ acetylation of 30S S5 subunitRv0641 rplA 50S ribosomal protein L1Rv0704 rplB 50S ribosomal protein L2Rv0701 rplC 50S ribosomal protein L3Rv0702 rplD 50S ribosomal protein L4Rv0716 rplE 50S ribosomal protein L5Rv0719 rplF 50S ribosomal protein L6Rv0056 rplI 50S ribosomal protein L9Rv0651 rplJ 50S ribosomal protein L10Rv0640 rplK 50S ribosomal protein L11Rv0652 rplL 50S ribosomal protein L7/L12Rv3443c rplM 50S ribosomal protein L13Rv0714 rplN 50S ribosomal protein L14Rv0723 rplO 50S ribosomal protein L15Rv0708 rplP 50S ribosomal protein L16Rv3456c rplQ 50S ribosomal protein L17Rv0720 rplR 50S ribosomal protein L18Rv2904c rplS 50S ribosomal protein L19Rv1643 rplT 50S ribosomal protein L20Rv2442c rplU 50S ribosomal protein L21Rv0706 rplV 50S ribosomal protein L22Rv0703 rplW 50S ribosomal protein L23Rv0715 rplX 50S ribosomal protein L24Rv1015c rplY 50S ribosomal protein L25Rv2441c rpmA 50S ribosomal protein L27Rv0105c rpmB 50S ribosomal protein L28Rv2058c rpmB2 50S ribosomal protein L28Rv0709 rpmC 50S ribosomal protein L29 Rv0722 rpmD 50S ribosomal protein L30Rv1298 rpmE 50S ribosomal protein L31Rv2057c rpmG 50S ribosomal protein L33Rv3924c rpmH 50S ribosomal protein L34Rv1642 rpmI 50S ribosomal protein L35Rv3461c rpmJ 50S ribosomal protein L36Rv1630 rpsA 30S ribosomal protein S1Rv2890c rpsB 30S ribosomal protein S2Rv0707 rpsC 30S ribosomal protein S3Rv3458c rpsD 30S ribosomal protein S4Rv0721 rpsE 30S ribosomal protein S5Rv0053 rpsF 30S ribosomal protein S6Rv0683 rpsG 30S ribosomal protein S7Rv0718 rpsH 30S ribosomal protein S8Rv3442c rpsI 30S ribosomal protein S9Rv0700 rpsJ 30S ribosomal protein S10Rv3459c rpsK 30S ribosomal protein S11Rv0682 rpsL 30S ribosomal protein S12Rv3460c rpsM 30S ribosomal protein S13Rv0717 rpsN 30S ribosomal protein S14Rv2056c rpsN2 30S ribosomal protein S14Rv2785c rpsO 30S ribosomal protein S15Rv2909c rpsP 30S ribosomal protein S16Rv0710 rpsQ 30S ribosomal protein S17Rv0055 rpsR 30S ribosomal protein S18Rv2055c rpsR2 30S ribosomal protein S18Rv0705 rpsS 30S ribosomal protein S19Rv2412 rpsT 30S ribosomal protein S20Rv3241c - member of S30AE ribosomal

protein family

2. Ribosome modification and maturationRv1010 ksgA 16S rRNA dimethyltransferaseRv2838c rbfA ribosome-binding factor ARv2907c rimM 16S rRNA processing protein

3. Aminoacyl tRNA synthases and their modificationRv2555c alaS alanyl-tRNA synthase Rv1292 argS arginyl-tRNA synthaseRv2572c aspS aspartyl-tRNA synthaseRv3580c cysS cysteinyl-tRNA synthaseRv2130c cysS2 cysteinyl-tRNA synthaseRv1406 fmt methionyl-tRNA formyltransferaseRv3011c gatA glu-tRNA-gln amidotransferase,

subunit BRv3009c gatB glu-tRNA-gln amidotransferase,

subunit ARv3012c gatC glu-tRNA-gln amidotransferase,

subunit C Rv2992c gltS glutamyl-tRNA synthaseRv2357c glyS glycyl-tRNA synthaseRv2580c hisS histidyl-tRNA synthaseRv1536 ileS isoleucyl-tRNA synthaseRv0041 leuS leucyl-tRNA synthaseRv3598c lysS lysyl-tRNA synthaseRv1640c lysX C-term lysyl-tRNA synthaseRv1007c metS methionyl-tRNA synthaseRv1649 pheS phenylalanyl-tRNA synthase α

subunit


8

Rv1650 pheT phenylalanyl-tRNA synthase β subunit

Rv2845c proS prolyl-tRNA synthaseRv3834c serS seryl-tRNA synthaseRv2614c thrS threonyl-tRNA synthaseRv2906c trmD tRNA (guanine-N1)-methyltrans-

feraseRv3336c trpS tryptophanyl tRNA synthaseRv1689 tyrS tyrosyl-tRNA synthaseRv2448c valS valyl-tRNA synthase

4. NucleoproteinsRv1407 fmu similar to Fmu proteinRv3852 hns HU-histone proteinRv2986c hupB DNA-binding protein II Rv1388 mIHF integration host factor

5. DNA replication, repair, recombination and restric-tion/modificationRv1317c alkA DNA-3-methyladenine glycosi-

dase IIRv2836c dinF DNA-damage-inducible protein FRv1329c dinG probable ATP-dependent helicaseRv3056 dinP DNA-damage-inducible proteinRv1537 dinX probable DNA-damage-inducible

proteinRv0001 dnaA chromosomal replication initiator

proteinRv0058 dnaB DNA helicase (contains intein)Rv1547 dnaE1 DNA polymerase III, α subunitRv3370c dnaE2 DNA polymerase III α chainRv2343c dnaG DNA primaseRv0002 dnaN DNA polymerase III, β subunitRv3711c dnaQ DNA polymerase III e chainRv3721c dnaZX DNA polymerase III, γ (dnaZ) and

τ (dnaX)Rv2924c fpg formamidopyrimidine-DNA glyco-

sylaseRv0006 gyrA DNA gyrase subunit ARv0005 gyrB DNA gyrase subunit BRv2092c helY probable helicase, Ski2 subfamilyRv2101 helZ probable helicase, Snf2/Rad54

familyRv2756c hsdM type I restriction/modification sys-

tem DNA methylaseRv2755c hsdS' type I restriction/modification sys-

tem specificity determinantRv3296 lhr ATP-dependent helicaseRv3014c ligA DNA ligaseRv3062 ligB DNA ligaseRv3731 ligC probable DNA ligaseRv1020 mfd transcription-repair coupling factorRv2528c mrr restriction system proteinRv2985 mutT1 MutT homologueRv1160 mutT2 MutT homologueRv0413 mutT3 MutT homologueRv3589 mutY probable DNA glycosylaseRv3297 nei probable endonuclease VIIIRv3674c nth probable endonuclease IIIRv1316c ogt methylated-DNA-protein-cysteine

methyltransferaseRv1629 polA DNA polymerase IRv1402 priA putative primosomal protein n'

(replication factor Y)Rv3585 radA probable DNA repair RadA homo-

logueRv2737c recA recombinase (contains intein)Rv0630c recB exodeoxyribonuclease VRv0631c recC exodeoxyribonuclease VRv0629c recD exodeoxyribonuclease VRv0003 recF DNA replication and SOS induc-

tionRv2973c recG ATP-dependent DNA helicaseRv1696 recN recombination and DNA repairRv3715c recR RecBC-Independent process of

DNA repairRv2736c recX regulatory protein for RecARv2593c ruvA Holliday junction binding protein,

DNA helicaseRv2592c ruvB Holliday junction binding proteinRv2594c ruvC Holliday junction resolvase, endo-

deoxyribonucleaseRv0054 ssb single strand binding proteinRv1210 tagA DNA-3-methyladenine glycosi-

dase IRv3646c topA DNA topoisomeraseRv2976c ung uracil-DNA glycosylaseRv1638 uvrA excinuclease ABC subunit ARv1633 uvrB excinuclease ABC subunit BRv1420 uvrC excinuclease ABC subunit CRv0949 uvrD DNA-dependent ATPase I and

helicase II Rv3198c uvrD2 putative UvrDRv0427c xthA exodeoxyribonuclease IIIRv0071 - group II intron maturaseRv0861c - probable DNA helicaseRv0944 - possible formamidopyrimidine-

DNA glycosylaseRv1688 - probable 3-methylpurine DNA

glycosylase

Rv2090 - partially similar to DNA poly-merase I

Rv2191 - similar to both PolC and UvrC proteins

Rv2464c - probable DNA glycosylase, endonuclease VIII

Rv3201c - probable ATP-dependent DNA helicase

Rv3202c - similar to UvrD proteinsRv3263 - probable DNA methylaseRv3644c - similar in N-term to DNA poly-

merase III

6. Protein translation and modificationRv0429c def polypeptide deformylaseRv2534c efp elongation factor PRv2882c frr ribosome recycling factorRv0684 fusA elongation factor GRv0120c fusA2 elongation factor G Rv1080c greA transcription elongation factor GRv3462c infA initiation factor IF-1Rv2839c infB initiation factor IF-2 Rv1641 infC initiation factor IF-3Rv0009 ppiA peptidyl-prolyl cis-trans isomeraseRv2582 ppiB peptidyl-prolyl cis-trans isomeraseRv1299 prfA peptide chain release factor 1Rv3105c prfB peptide chain release factor 2Rv2889c tsf elongation factor EF-TsRv0685 tuf elongation factor EF-Tu

7. RNA synthesis, RNA modification and DNA transcriptionRv1253 deaD ATP-dependent DNA/RNA

helicaseRv2783c gpsI pppGpp synthase and polyribo-

nucleotide phosphorylaseRv2841c nusA transcription termination factorRv2533c nusB N-utilization substance protein BRv0639 nusG transcription antitermination

proteinRv3907c pcnA polynucleotide polymeraseRv3232c pvdS alternative sigma factor for

siderophore productionRv3211 rhlE probable ATP-dependent

RNA helicaseRv1297 rho transcription termination

factor rhoRv3457c rpoA α subunit of RNA polymeraseRv0667 rpoB β subunit of RNA polymeraseRv0668 rpoC β' subunit of RNA polymeraseRv1364c rsbU SigB regulation protein Rv3287c rsbW anti-sigma B factorRv2703 sigA RNA polymerase sigma factor

(aka MysA, RpoV)Rv2710 sigB RNA polymerase sigma factor

(aka MysB)Rv2069 sigC ECF subfamily sigma subunitRv3414c sigD ECF subfamily sigma subunitRv1221 sigE ECF subfamily sigma subunitRv3286c sigF ECF subfamily sigma subunitRv0182c sigG sigma-70 factors ECF subfamily Rv3223c sigH ECF subfamily sigma subunitRv1189 sigI ECF family sigma factorRv3328c sigJ similar to SigI, ECF familyRv0445c sigK ECF-type sigma factorRv0735 sigL sigma-70 factors ECF subfamilyRv3911 sigM probable sigma factor, similar to

SigERv3366 spoU probable rRNA methylaseRv3455c truA probable pseudouridylate syn-

thaseRv2793c truB tRNA pseudouridine 55 synthaseRv1644 tsnR putative 23S rRNA methyltrans-

feraseRv3649 - ATP-dependent DNA/RNA heli-

case

8. Polysaccharides (cytoplasmic)Rv1326c glgB 1,4-α-glucan branching enzymeRv1328 glgP probable glycogen phosphory-

laseRv1564c glgX probable glycogen debranching

enzymeRv1563c glgY putative α-amylaseRv1562c glgZ maltooligosyltrehalose trehalohy-

drolaseRv0126 - probable glycosyl hydrolaseRv1781c - probable 4-α-glucanotransferaseRv2471 - probable maltase α-glucosidase

B. Degradation of macromolecules1. RNARv1014c pth peptidyl-tRNA hydrolaseRv2925c rnc RNAse IIIRv2444c rne similar at C-term to ribo-

nuclease ERv2902c rnhB ribonuclease HIIRv3923c rnpA ribonuclease P protein compo-

nentRv1340 rphA ribonuclease PH

2. DNARv0670 end endonuclease IV (apurinase)Rv1108c xseA exonuclease VII large subunitRv1107c xseB exonuclease VII small subunit

3. Proteins, peptides and glycopeptidesRv3305c amiA probable aminohydrolaseRv3306c amiB probable aminohydrolaseRv3596c clpC ATP-dependent Clp proteaseRv2461c clpP ATP-dependent Clp protease pro-

teolytic subunitRv2460c clpP2 ATP-dependent Clp protease pro-

teolytic subunitRv2457c clpX ATP-dependent Clp protease

ATP-binding subunit ClpXRv2667 clpX' similar to ClpC from M. leprae but

shorterRv3419c gcp glycoproteaseRv2725c hflX GTP-binding proteinRv1223 htrA serine proteaseRv2861c map methionine aminopeptidaseRv0734 map' probable methionine aminopepti-

daseRv0319 pcp pyrrolidone-carboxylate peptidaseRv0125 pepA probable serine proteaseRv2213 pepB aminopeptidase A/IRv0800 pepC aminopeptidase IRv2467 pepD probable aminopeptidaseRv2089c pepE cytoplasmic peptidaseRv2535c pepQ cytoplasmic peptidaseRv2782c pepR protease/peptidase, M16 family

(insulinase)Rv2109c prcA proteasome α-type subunit 1Rv2110c prcB proteasome β-type subunit 2Rv0782 ptrBa protease II, α subunitRv0781 ptrBb protease II, β subunitRv0724 sppA protease IV, signal peptide pepti-

daseRv0198c - probable zinc metalloproteaseRv0457c - probable peptidaseRv0840c - probable proline iminopeptidaseRv0983 - probable serine proteaseRv1977 - probable zinc metallopeptidaseRv3668c - probable alkaline serine proteaseRv3671c - probable serine proteaseRv3883c - probable secreted proteaseRv3886c - protease

4. Polysaccharides, lipopolysaccharides and phospho-lipidsRv0062 celA cellulase/endoglucanaseRv3915 cwlM hydrolaseRv0315 - probable β-1,3-glucanaseRv1090 - probable inactivated

cellulase/endoglucanaseRv1327c - probable glycosyl hydrolase, α-

amylase familyRv1333 - probable hydrolaseRv3463 - probable neuraminidaseRv3717 - possible N-acetylmuramoyl-L-ala-

nine amidase

5. Esterases and lipasesRv0220 lipC probable esteraseRv1923 lipD probable esteraseRv3775 lipE probable hydrolaseRv3487c lipF probable esteraseRv0646c lipG probable hydrolaseRv1399c lipH probable lipaseRv1400c lipI probable lipaseRv1900c lipJ probable esteraseRv2385 lipK probable acetyl-hydrolaseRv1497 lipL esteraseRv2284 lipM probable esteraseRv2970c lipN probable lipase/esteraseRv1426c lipO probable esteraseRv2463 lipP probable esteraseRv2485c lipQ probable carboxlyesteraseRv3084 lipR probable acetyl-hydrolaseRv3176c lipS probable esterase/lipaseRv2045c lipT probable carboxylesteraseRv1076 lipU probable esteraseRv3203 lipV probable lipaseRv0217c lipW probable esteraseRv2351c plcA phospholipase C precursorRv2350c plcB phospholipase C precursorRv2349c plcC phospholipase C precursorRv1755c plcD partial CDS for phospholipase CRv1104 - probable esterase pseudogeneRv1105 - probable esterase pseudogene

6. Aromatic hydrocarbonsRv3469c mhpE probable 4-hydroxy-2-oxovalerate

aldolaseRv0316 - probable muconolactone iso-

meraseRv0771 - probable 4-carboxymuconolac-

tone decarboxylaseRv0939 - probable dehydraseRv1723 - 6-aminohexanoate-dimer hydro-


8

laseRv2715 - 2-hydroxymuconic semialdehyde

hydrolaseRv3530c - probable cis-diol dehydrogenaseRv3534c - 4-hydroxy-2-oxovalerate aldolaseRv3536c - aromatic hydrocarbon degrada-

tion

C. Cell envelope1. Lipoproteins (lppA-lpr0) 65

2. Surface polysaccharides, lipopolysaccharides, pro-teins and antigensRv0806c cpsY probable UDP-glucose-4-

epimeraseRv3811 csp secreted proteinRv1677 dsbF highly similar to C-term Mpt53Rv3794 embA involved in arabinogalactan syn-

thesisRv3795 embB involved in arabinogalactan syn-

thesisRv3793 embC involved in arabinogalactan syn-

thesisRv3875 esat6 early secretory antigen targetRv0112 gca probable GDP-mannose dehy-

drataseRv0113 gmhA phosphoheptose isomeraseRv2965c kdtB lipopolysaccharide core biosyn-

thesis proteinRv2878c mpt53 secreted protein Mpt53Rv1980c mpt64 secreted immunogenic protein

Mpb64/Mpt64Rv2875 mpt70 major secreted immunogenic pro-

tein Mpt70 precursorRv2873 mpt83 surface lipoprotein Mpt83Rv0899 ompA member of OmpA familyRv3810 pirG cell surface protein precursor (Erp

protein)Rv3782 rfbE similar to rhamnosyl transferaseRv1302 rfe undecaprenyl-phosphate α-N-

acetylglucosaminyltransferaseRv2145c wag31 antigen 84 (aka wag31)Rv0431 - tuberculin related peptide (AT103)Rv0954 - cell envelope antigenRv1514c - involved in polysaccharide syn-

thesisRv1518 - involved in exopolysaccharide

synthesisRv1758 - partial cutinaseRv1910c - probable secreted proteinRv1919c - weak similarity to pollen antigensRv1984c - probable secreted proteinRv1987 - probable secreted proteinRv2223c - probable exported proteaseRv2224c - probable exported proteaseRv2301 - probable cutinaseRv2345 - precursor of probable membrane

proteinRv2672 - putative exported proteaseRv3019c - similar to Esat6Rv3036c - probable secreted proteinRv3449 - probable precursor of serine pro-

teaseRv3451 - probable cutinaseRv3452 - probable cutinase precursorRv3724 - probable cutinase precursor

3. Murein sacculus and peptidoglycanRv2911 dacB penicillin binding proteinRv2981c ddlA D-alanine-D-alanine ligase ARv3809c glf UDP-galactopyranose mutaseRv1018c glmU UDP-N-acetylglucosamine

pyrophosphorylaseRv3382c lytB LytB protein homologueRv1110 lytB' very similar to LytBRv1315 murA UDP-N-acetylglucosamine-1-car-

boxyvinyltransferaseRv0482 murB UDP-N-acetylenolpyruvoylglu-

cosamine reductaseRv2152c murC UDP-N-acetyl-muramate-alanine

ligaseRv2155c murD UDP-N-acetylmuramoylalanine-D-

glutamate ligaseRv2158c murE meso-diaminopimelate-adding

enzymeRv2157c murF D-alanine:D-alanine-adding

enzymeRv2153c murG transferase in peptidoglycan syn-

thesisRv1338 murI glutamate racemaseRv2156c murX phospho-N-acetylmuramoyl-

petapeptide transferaseRv3332 nagA N-acetylglucosamine-6-P-

deacetylaseRv0016c pbpA penicillin-binding proteinRv2163c pbpB penicillin-binding protein 2Rv0050 ponA penicillin-bonding proteinRv3682 ponA' class A penicillin binding proteinRv0017c rodA FtsW/RodA/SpovE familyRv0907 - probable penicillin binding protein

Rv1367c - probable penicillin binding proteinRv1730c - probable penicillin binding proteinRv1922 - probable penicillin binding proteinRv2864c - probable penicillin binding proteinRv3330 - probable penicillin binding proteinRv3627c - probable penicillin binding protein

4. Conserved membrane proteinsRv0402c mmpL1 conserved large membrane

proteinRv0507 mmpL2 conserved large membrane

proteinRv0206c mmpL3 conserved large membrane










proteinRv0403c mmpS1 conserved small membrane

proteinRv0506 mmpS2 conserved small membrane




protein

5. Other membrane proteins 211

III. Cell processesA. Transport/binding proteins1. Amino acidsRv2127 ansP L-asparagine permeaseRv0346c aroP2 probable aromatic amino acid

permeaseRv0917 betP glycine betaine transportRv1704c cycA transport of D-alanine, D-serine

and glycineRv3666c dppA probable peptide transport system

permeaseRv3665c dppB probable peptide transport system

permeaseRv3664c dppC probable peptide transport system

permeaseRv3663c dppD probable ABC-transporterRv0522 gabP probable 4-amino butyrate trans-

porterRv0411c glnH putative glutamine binding proteinRv2564 glnQ probable ATP-binding transport

proteinRv1280c oppA probable oligopeptide transport

proteinRv1283c oppB oligopeptide transport proteinRv1282c oppC oligopeptide transport system per-

measeRv1281c oppD probable peptide transport proteinRv2320c rocE arginine/ornithine transporterRv3253c - probable cationic amino acid

transportRv3454 - possible proline permease

2. CationsRv2920c amt putative ammonium transporterRv1607 chaA putative calcium/proton antiporterRv1239c corA probable magnesium and cobalt

transport proteinRv0092 ctpA cation-transporting ATPaseRv0103c ctpB cation transport ATPaseRv3270 ctpC cation transport ATPaseRv1469 ctpD probable cadmium-transporting

ATPaseRv0908 ctpE probable cation transport ATPaseRv1997 ctpF probable cation transport ATPase Rv1992c ctpG probable cation transport ATPaseRv0425c ctpH C-terminal region putative cation-

transporting ATPaseRv0107c ctpI probable magnesium transport

ATPaseRv0969 ctpV cation transport ATPaseRv3044 fecB putative FeIII-dicitrate transporterRv0265c fecB2 iron transport protein FeIII dici-trate transporterRv1029 kdpA potassium-transporting ATPase A

chain

Rv1030 kdpB potassium-transporting ATPase B chain

Rv1031 kdpC potassium-transporting ATPase C chain

Rv3236c kefB probable glutathione-regulated potassium-efflux protein

Rv2877c merT possible mercury resistance transport system

Rv1811 mgtC probable magnesium transport ATPase protein C

Rv0362 mgtE putative magnesium ion transporter

Rv2856 nicT probable nickel transport proteinRv0924c nramp transmembrane protein belonging

to Nramp familyRv2691 trkA probable potassium uptake pro-

tein Rv2692 trkB probable potassium uptake pro-

tein Rv2287 yjcE probable Na+/H+ exchangerRv2723 - probable membrane protein,

tellurium resistanceRv3162c - probable membrane proteinRv3237c - possible potassium channel

proteinRv3743c - probable cation-transporting

ATPase

3. Carbohydrates, organic acids and alcoholsRv2443 dctA C4-dicarboxylate transport proteinRv3476c kgtP sugar transport proteinRv1902c nanT probable sialic acid transporter Rv1236 sugA membrane protein probably

involved in sugar transportRv1237 sugB sugar transport proteinRv1238 sugC ABC transporter component of

sugar uptake systemRv3331 sugI probable sugar transport proteinRv2835c ugpA sn-glycerol-3-phosphate

permeaseRv2833c ugpB sn-glycerol-3-phosphate-binding

periplasmic lipoproteinRv2832c ugpC sn-glycerol-3-phosphate transport

ATP-binding proteinRv2834c ugpE sn-glycerol-3-phosphate transport

system proteinRv2316 uspA sugar transport proteinRv2318 uspC sugar transport proteinRv2317 uspE sugar transport proteinRv1200 - probable sugar transporterRv2038c - probable ABC sugar transporterRv2039c - probable sugar transporterRv2040c - probable sugar transporterRv2041c - probable sugar transporter

4. AnionsRv2684 arsA probable arsenical pumpRv2685 arsB probable arsenical pumpRv3578 arsB2 probable arsenical pumpRv2643 arsC probable arsenical pumpRv2397c cysA sulphate transport ATP-binding

protein Rv2399c cysT sulphate transport system perme-

ase proteinRv2398c cysW sulphate transport system perme-

ase proteinRv1857 modA molybdate binding proteinRv1858 modB transport system permease,

molybdate uptakeRv1859 modC molybdate uptake ABC-

transporterRv1860 modD precursor of Apa (45/47

kD secreted protein)Rv2329c narK1 probable nitrite extrusion proteinRv1737c narK2 nitrite extrusion protein Rv0261c narK3 nitrite extrusion protein1 Rv0267 narU similar to nitrite extrusion

protein 2Rv0934 phoS1 PstS component of phosphate

uptakeRv0928 phoS2 PstS component of phosphate

uptakeRv0820 phoT phosphate transport system ABC

transporterRv3301c phoY1 phosphate transport system

regulatorRv0821c phoY2 phosphate transport system

regulatorRv0545c pitA low-affinity inorganic phosphate

transporterRv2281 pitB phosphate permeaseRv0930 pstA1 PstA component of phosphate

uptakeRv0936 pstA2 PstA component of phosphate

uptakeRv0933 pstB ABC transport component of

phosphate uptakeRv0935 pstC PstC component of phosphate

uptakeRv0929 pstC2 membrane-bound component of


8

phosphate transport systemRv0932c pstS PstS component of phosphate

uptakeRv2400c subI sulphate binding precursorRv0143c - probable chloride channelRv1707 - probable sulphate permeaseRv1739c - possible sulphate transporterRv3679 - possible anion transporterRv3680 - probable anion transporter

5. Fatty acid transportRv2790c ltp1 non-specific lipid transport proteinRv3540c ltp2 non-specific lipid transport protein

6. Efflux proteinsRv2936 drrA similar daunorubicin resistance

ABC-transporter Rv2937 drrB similar daunorubicin resistance

transmembrane proteinRv2938 drrC similar daunorubicin resistance

transmembrane proteinRv2846c efpA putative efflux proteinRv3065 emrE resistance to ethidium bromideRv0783c - multidrug resistance proteinRv0849 - possible quinolone efflux pumpRv1145 - probable drug transporterRv1146 - probable drug transporterRv1250 - probable drug efflux proteinRv1258c - probable multidrug resistance

pumpRv1410c - probable drug efflux proteinRv1634 - probable drug efflux proteinRv1819c - probable multidrug resistance

pumpRv2136c - putative bacitracin resistance pro-

teinRv2209 - probable drug efflux proteinRv2333c - probable tetracenomycin C resis-

tance proteinRv2994 - probable fluoroquinolone efflux

proteinRv1877 - probable drug efflux proteinRv2459 - probable drug efflux protein

B. Chaperones/Heat shockRv0384c clpB heat shock proteinRv0352 dnaJ acts with GrpE to stimulate DnaK

ATPaseRv2373c dnaJ2 DnaJ homologueRv0350 dnaK 70 kD heat shock protein, chromo-

some replicationRv3417c groEL1 60 kD chaperonin 1Rv0440 groEL2 60 kD chaperonin 2Rv3418c groES 10 kD chaperoneRv0351 grpE stimulates DnaK ATPase activityRv2374c hrcA heat-inducible transcription

repressorRv0251c hsp possible heat shock proteinRv0353 hspR heat shock regulatorRv2031c hspX 14kD antigen, heat shock protein

Hsp20 familyRv2299c htpG heat shock protein Hsp90 familyRv0563 htpX probable (transmembrane) heat

shock proteinRv2701c suhB putative extragenic suppressor

proteinRv3269 - probable heat shock protein

C. Cell divisionRv3641c fic possible cell division proteinRv3102c ftsE membrane proteinRv3610c ftsH inner membrane protein,

chaperoneRv2748c ftsK chromosome partitioningRv2151c ftsQ ingrowth of wall at septumRv2154c ftsW membrane protein (shape determi-

nation)Rv3101c ftsX membrane proteinRv2921c ftsY cell division protein FtsYRv2150c ftsZ circumferential ring, GTPaseRv3919c gid glucose inhibited division protein BRv3625c mesJ probable cell cycle proteinRv3917c parA chromosome partitioning; DNA -

bindingRv3918c parB possibly involved in chromosome

partitioningRv2922c smc member of Smc1/Cut3/Cut14

familyRv0012 - possible cell division proteinRv0435c - ATPase of AAA-familyRv2115c - ATPase of AAA-familyRv3213c - possible role in chromosome seg-

regationRv1708 - possible role in chromosome parti-

tioning

D. Protein and peptide secretionRv2916c ffh signal recognition particle proteinRv2903c lepB signal peptidase IRv1614 lgt prolipoprotein diacylglyceryl trans-

feraseRv1539 lspA lipoprotein signal peptidaseRv0379 sec probable transport protein

SecE/Sec61- γ familyRv3240c secA SecA, preprotein translocase sub-

unitRv1821 secA2 SecA, preprotein translocase sub-

unitRv2587c secD protein-export membrane proteinRv0638 secE SecE preprotein translocaseRv2586c secF protein-export membrane proteinRv1440 secG protein-export membrane protein

SecGRv0732 secY SecY subunit of preprotein translo-

caseRv2462c tig chaperone protein, similar to

trigger factorRv2813 - probable general secretion path-

way protein

E. Adaptations and atypical conditionsRv1901 cinA competence damage protein Rv3648c cspA cold shock protein, transcriptional

regulatorRv0871 cspB probable cold shock proteinRv3063 cstA starvation-induced stress

response proteinRv3490 otsA probable α,α-trehalose-phosphate

synthaseRv2006 otsB trehalose-6-phosphate phos-

phataseRv3372 otsB2 trehalose-6-phosphate phos-

phataseRv3758c proV osmoprotection ABC transporterRv3757c proW transport system permeaseRv3759c proX similar to osmoprotection proteinsRv3756c proZ transport system permeaseRv1026 - probable pppGpp-5'phosphohydro-

lase

F. DetoxificationRv2428 ahpC alkyl hydroperoxide reductaseRv2429 ahpD member of AhpC/TSA familyRv2238c ahpE member of AhpC/TSA familyRv2521 bcp bacterioferritin comigratory proteinRv1608c bcpB probable bacterioferritin comigra-

tory proteinRv3473c bpoA probable non-heme bromoperoxi-

daseRv1123c bpoB probable non-heme bromoperoxi-

daseRv0554 bpoC probable non-heme bromoperoxi-

daseRv3617 ephA probable epoxide hydrolaseRv1938 ephB probable epoxide hydrolaseRv1124 ephC probable epoxide hydrolase Rv2214c ephD probable epoxide hydrolase Rv3670 ephE probable epoxide hydrolaseRv0134 ephF probable epoxide hydrolaseRv3171c hpx probable non-heme haloperoxi-

daseRv1908c katG catalase-peroxidase Rv3846 sodA superoxide dismutaseRv0432 sodC superoxide dismutase precursor -

(Cu-Zn)Rv1932 tpx thiol peroxidaseRv0634c - putative glyoxylase IIRv2581c - putative glyoxylase IIRv3177 - probable non-heme haloperoxi-

dase

IV. OtherA. VirulenceRv0169 mce1 cell invasion proteinRv0589 mce2 cell invasion proteinRv1966 mce3 cell invasion proteinRv3499c mce4 cell invasion proteinRv3100c smpB probable small protein bRv1694 tlyA cytotoxin/hemolysin homologueRv0024 - putative p60 homologueRv0167 - part of mce1 operonRv0168 - part of mce1 operonRv0170 - part of mce1 operonRv0171 - part of mce1 operonRv0172 - part of mce1 operonRv0174 - part of mce1 operonRv0587 - part of mce2 operonRv0588 - part of mce2 operonRv0590 - part of mce2 operonRv0591 - part of mce2 operonRv0592 - part of mce2 operonRv0594 - part of mce2 operonRv1085c - possible hemolysinRv1477 - putative exported p60 protein

homologueRv1478 - putative exported p60 protein

homologueRv1566c - putative exported p60 protein

homologueRv1964 - part of mce3 operonRv1965 - part of mce3 operonRv1967 - part of mce3 operonRv1968 - part of mce3 operonRv1969 - part of mce3 operonRv1971 - part of mce3 operonRv2190c - putative p60 homologueRv3494c - part of mce4 operonRv3496c - part of mce4 operonRv3497c - part of mce4 operonRv3498c - part of mce4 operon

Rv3500c - part of mce4 operonRv3501c - part of mce4 operonRv3896c - putative p60 homologueRv3922c - possible hemolysin

B. IS elements, Repeated sequences, and Phage1. IS elementsIS6110 16 copiesIS1081 6 copiesOthers 37 copies

2. REP13E12 family 7 copies

3. Phage-related functionsRv2894c xerC integrase/recombinaseRv1701 xerD integrase/recombinaseRv1054 - integrase-aRv1055 - integrase-bRv1573 - phiRV1 phage related proteinRv1574 - phiRV1 phage related proteinRv1575 - phiRV1 phage related proteinRv1576c - phiRV1 phage related proteinRv1577c - phiRV1 possible prohead proteaseRv1578c - phiRV1 phage related proteinRv1579c - phiRV1 phage related proteinRv1580c - phiRV1 phage related proteinRv1581c - phiRV1 phage related proteinRv1582c - phiRV1 phage related proteinRv1583c - phiRV1 phage related proteinRv1584c - phiRV1 phage related proteinRv1585c - phiRV1 phage related proteinRv1586c - phiRV1 integraseRv2309c - integraseRv2310 - excisionaseRv2646 - phiRV2 integraseRv2647 - phiRV2 phage related proteinRv2650c - phiRV2 phage related proteinRv2651c - phiRV2 prohead proteaseRv2652c - phiRV2 phage related proteinRv2653c - phiRV2 phage related proteinRv2654c - phiRV2 phage related proteinRv2655c - phiRV2 phage related proteinRv2656c - phiRV2 phage related proteinRv2657c - similar to gp36 of mycobacterio-

phage L5 Rv2658c - phiRV2 phage related proteinRv2659c - phiRV2 integraseRv2830c - similar to phage P1 phd geneRv3750c - excisionaseRv3751 - putative integrase

C. PE and PPE families1. PE familyPE subfamily 38 membersPE_PGRS subfamily 61 members

2. PPE family 68 members

D. Antibiotic production and resistanceRv2068c blaC class A β-lactamaseRv3290c lat lysine-e aminotransferaseRv2043c pncA pyrazinamide resistance/sensitivityRv0133 - possible puromycin N-acetyltrans-

feraseRv0262c - aminoglycoside 2'-N-acetyltrans-

feraseRv0802c - acetyltransferaseRv1082 - similar to S. lincolnensis lmbERv1170 - similar to S. lincolnensis lmbERv1347c - possible aminoglycoside 6'-N-

acetyltransferaseRv2036 - similar to lincomycin production

genesRv2303c - similar to S. griseus macrotetrolide

resistance proteinRv3225c - probable aminoglycoside 3'-phos-

photransferasesRv3700c - probable acetyltransferaseRv3817 - probable aminoglycoside 3'-phos-

photransferase

E. Bacteriocin-like proteins 3

F. Cytochrome P450 enzymes 22

G. Coenzyme F420-dependent enzymes 3

H. Miscellaneous transferases 61

I. Miscellaneous phosphatases, lyases, and hydrolases 18

J. Cyclases 6

K. Chelatases 2

V. Conserved hypotheticals 912

VI. Unknowns 606

TOTAL 3924

letterstonature - University of Washingtondepts.washington.edu/genetics/courses/genet553-sp02/tbgenome.pdf · The DsbA-DsbB system affects the formation of disulﬁde bonds in periplasmic

Documents