Abstract BARRANGOU, RODOLPHE. Functional genomic analyses of carbohydrate utilization by Lactobacillus acidophilus. (Under the direction of Professor Todd R. Klaenhammer). Carbohydrates are a primary source of energy for microbes. Specifically, lactic acid bacteria have the ability to utilize a variety of nutrients available in their respective habitats. For probiotic microbes inhabiting the human gastrointestinal tract, the ability to utilize sugars non-digested by the host plays an important role in their survival. Lactobacillus acidophilus is a probiotic organism which can utilize a variety of mono-, di- and poly-saccharides, including prebiotic compounds such as fructooligosaccharides and raffinose. However, little information is available about the mechanisms and genes involved in carbohydrate utilization by lactobacilli. The transport and catabolic machinery involved in utilization of glucose, fructose, sucrose, FOS, raffinose, lactose, galactose and trehalose was characterized using global transcriptional profiling. Microarray hybridizations were carried out using a round-robin design and data analyzed using a two-stage mixed model ANOVA. Genes differentially expressed between treatments were visualized by hierarchical clustering, volcano plots, and 3-way contour plots. Globally, a small number of genes were highly induced, including a variety of carbohydrate transporters and sugar hydrolases. Members of the phosphoenolpyruvate sugar phosphotransferase system (PTS) family of transporters were identified for uptake of glucose, fructose, sucrose and trehalose. In contrast, transporters of the ATP binding cassette (ABC) family were identified for uptake of FOS and raffinose. A member of the LacS family of galactoside-pentose-hexuronide (GPH) translocators was identified for uptake of galactose and lactose. Saccharolytic enzymes likely involved in the metabolism of mono-, di- and poly- saccharides were also identified, including the enzymatic
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Abstract
BARRANGOU, RODOLPHE. Functional genomic analyses of carbohydrate utilization
by Lactobacillus acidophilus. (Under the direction of Professor Todd R. Klaenhammer).
Carbohydrates are a primary source of energy for microbes. Specifically, lactic
acid bacteria have the ability to utilize a variety of nutrients available in their respective
habitats. For probiotic microbes inhabiting the human gastrointestinal tract, the ability to
utilize sugars non-digested by the host plays an important role in their survival.
Lactobacillus acidophilus is a probiotic organism which can utilize a variety of mono-,
di- and poly-saccharides, including prebiotic compounds such as fructooligosaccharides
and raffinose. However, little information is available about the mechanisms and genes
involved in carbohydrate utilization by lactobacilli. The transport and catabolic
machinery involved in utilization of glucose, fructose, sucrose, FOS, raffinose, lactose,
galactose and trehalose was characterized using global transcriptional profiling.
Microarray hybridizations were carried out using a round-robin design and data analyzed
using a two-stage mixed model ANOVA. Genes differentially expressed between
treatments were visualized by hierarchical clustering, volcano plots, and 3-way contour
plots. Globally, a small number of genes were highly induced, including a variety of
carbohydrate transporters and sugar hydrolases. Members of the phosphoenolpyruvate
sugar phosphotransferase system (PTS) family of transporters were identified for uptake
of glucose, fructose, sucrose and trehalose. In contrast, transporters of the ATP binding
cassette (ABC) family were identified for uptake of FOS and raffinose. A member of the
LacS family of galactoside-pentose-hexuronide (GPH) translocators was identified for
uptake of galactose and lactose. Saccharolytic enzymes likely involved in the metabolism
of mono-, di- and poly- saccharides were also identified, including the enzymatic
machinery of the Leloir pathway. Insertional inactivation of genes encoding sugar
transporters and hydrolases confirmed microarray results. Quantitative RT-PCR was also
used to confirm differential gene expression. Additional transcription experiments
showed specific induction of genes encoding sugar transporters and hydrolases, and
transcriptional repression by glucose. Collectively, microarray data revealed coordinated
and regulated transcription of genes involved in sugar utilization based on carbohydrate
availability, likely via carbon catabolite repression.
The relationships between gene expression level, codon usage, chromosomal
location and intrinsic gene parameters were investigated globally. Gene expression levels
correlated most highly with GC content, codon adaptation index and gene size. In
contrast, gene expression levels did not correlate with GC content at the third codon
position. Perhaps the high correlation between GC content and gene expression is due to
the low genomic GC composition of L. acidophilus. Analysis of variance was used to
investigate the impact of chromosomal location on gene expression after data was
segregated into four groups, by strand and orientation relative to the origin and terminus
of replication. Results showed genes on the leading strand were more highly expressed.
Also, genes pointing toward the terminus of replication showed higher expression levels.
This preference allows for co-directional replication and transcription. Collectively,
results showed a strong influence of chromosomal architecture, GC content and codon
usage on gene transcription.
Globally, analysis of gene expression in Lactobacillus acidophilus revealed
orchestrated transcription, and adaptation to environmental conditions. Specifically,
dynamic adaptation to carbohydrate sources available in the environment might
contribute to competition with other commensal microbes for the limited nutrient sources
available in the human gastrointestinal tract.
FUNCTIONAL GENOMIC ANALYSES OF CARBOHYDRATE
UTILIZATION BY LACTOBACILLUS ACIDOPHILUS
by
RODOLPHE BARRANGOU
A dissertation submitted to the Graduate Faculty of North Carolina State University
in partial fulfillment of the requirements for the Degree of
Doctor of Philosophy
FUNCTIONAL GENOMICS
Raleigh
2004
APPROVED BY: ________________________________ _________________________________ Dr. Todd R. Klaenhammer Dr. Greg Gibson Chairman of Advisory Committee ________________________________ _________________________________ Dr. Robert M. Kelly Dr. Dahlia M. Nielsen
Biography
Rodolphe Barrangou, the son of Charles Barrangou-Poueys and Roseline Helie, was born
on July 20, 1975 in Caen, France and raised in Paris, France. He attended the University
of Rene Descartes, Paris V (France) between 1994 and 1996 where he obtained a degree
in Life Sciences. He also attended the University of Technology of Compiegne (France)
between 1996 and 1998 where he obtained a M. S. degree in Biological Engineering. In
January 1999, he began working towards a Master of Science in Food Science at North
Carolina State University (USA) in the Vegetable Fermentation Laboratory (USDA-
ARS) under the direction of Dr. Henry P. Fleming and Dr. Todd R. Klaenhammer. In
January 2001, he began working towards a Ph. D. in Functional Genomics at North
Carolina State University (USA) in the Southeast Dairy Foods Research Center under the
direction of Dr. Todd R. Klaenhammer.
ii
Acknowledgements
First and foremost, I would like to thank my advisor, Dr. Todd R. Klaenhammer for
giving me the opportunity to pursue another graduate degree at NC State, for his time,
supervision, guidance, availability and support throughout my graduate education. I also
wish to acknowledge Dr. Greg Gibson, Dr. Robert M. Kelly, and Dr. Dahlia Nielsen, for
serving on my advisory committee, giving me time outside of committee meetings, and
insightful discussions. Also, I would like to acknowledge all my co-workers and
collaborators within the “Klaenhammer lab”, especially Evelyn Durmaz, Dr. Andrea
Azcarate Peril, Dr. Eric Altermann, and Tri Duong, for technical help, sharing their
expertise and suggestions. I would also like to thank my other collaborators on campus, at
the GRL (Dr. Bryon Sosinski and Regina Brierley), for providing help with microarray
printing and scanning; in the Microbiology Department (Dr. Jose Bruno-Barcena and Dr.
Hosni Hassan), for proving help with Q-PCR; and my collaborators in the Bioinformatics
Program, namely Shannon Conners and Joshua Starmer for collaborating with me. I
would also like to acknowledge Dr. Barbara Sherry and Dr. Stephanie Curtis for their
leadership in the Functional Genomics program. I would like to dedicate my work to my
whole family for teaching me everything that I need to know, and for understanding my
need to go overseas. I would also like to acknowledge my friends Tri and Mike for
making my experience in the lab (and beyond) particularly enjoyable. Finally, I would
like to give a very special and personal thank you to my wife Lisa, for her patience,
understanding, and permanent support throughout my graduate career, for helping me
make the right decisions, understand what is important, and sharing everything in my life.
iii
Table of contents
LIST OF TABLES. ___________________________________________________VII
LIST OF FIGURES. _________________________________________________ VIII
LIST OF ABBREVIATIONS. ___________________________________________ X
CHAPTER I – LITERATURE REVIEW: TRANSPORT SYSTEMS IN LACTIC ACID BACTERIA. _________________________________________________ 1
CHAPTER II – FUNCTIONAL AND COMPARATIVE GENOMIC ANALYSES OF AN OPERON INVOLVED IN FRUCTOOLIGOSACCAHRIDE UTILIZATION BY LACTOBACILLUS ACIDOPHILUS. _________________ 42
CHAPTER III – GLOBAL ANALYSIS OF CARBOHYDRATE UTILIZATION AND TRANSCRIPTIONAL REGULATION IN LACTOBACILLUS ACIDOPHILUS USING WHOLE-GENOME cDNA MICROARRAYS. _____ 77
CHAPTER IV – GLOBAL CHARACTERIZATION OF THE LACTOBACILLUS ACIDOPHILUS TRANSCRIPTOME AND ANALYSIS OF RELATIONSHIPS BETWEEN GENE EXPRESSION LEVEL, CODON USAGE, CHROMOSOMAL LOCATION AND INTRINSIC GENE CHARACTERISTICS._____________________________________________ 115
APPENDIX I – FUNCTIONAL AND COMPARATIVE GENOMIC ANALYSES OF AN OPERON INVOLVED IN FRUCTOOLIGOSACCAHRIDE UTILIZATION BY LACTOBACILLUS ACIDOPHILUS. ________________ 157
vi
List of tables
Chapter I
1. Genomes of lactic acid bacteria and other probiotic species. _______________ 36
2. Hierarchical clustering analyses of gene expression patterns. ______________ 103
3. Hierarchical clustering analyses of gene expression patterns for select genes and operons. _________________________________________________________ 104
4. Volcano plot comparison of gene expression between FOS and raffinose. ____ 105
5. Contour plot comparison of gene expression between FOS, raffinose and trehalose. _________________________________________________________________ 106
6. Global differential gene expression. ___________________________________ 107
3. Correlations between gene expression level and intrinsic genes parameters. _ 151
4. Analysis of variance, by chromosomal location. _________________________ 153
5. Correlations between gene expression level and intrinsic genes parameters, by chromosomal location. _____________________________________________ 154
6. Gene distribution over select parameters, by chromosomal location. _______ 156
ix
x
List of abbreviations
ABC ATP Binding Cassette ANOVA ANalysis Of Variance CAI Codon Adaptation Index CCR Carbon Catabolite Repression CH CHaperone proteins CRE Catabolite Responsive Element DNA Deoxyribo Nucleic Acid EC Enzyme Commission FOS Fructo Oligo Saccharides GIT Gastro Intestinal Tract GPH Galactoside Pentose Hexuronide LaOT Lagging strand, between the Origin and Terminus LaTO Lagging strand, between the Terminus and Origin LeOT Leading strand, between the Origin and Terminus LeTO Leading strand, between the Terminus and Origin LGT Lateral Gene Transfer LSM Least Squares Means MSM Multiple Sugar Metabolism NCFM North Carolina Food Microbiology NDO Non Digestible Oligosaccharides ORF Open Reading Frame PCR Polymerase Chain Reaction PEP Phospho Enol Pyruvate PHX Predicted Highly eXpressed PTS Phoshoenolpyruvate Transferase System RBS Ribosome Binding Site RNA Ribo Nucleic Acid RP Ribosomal Proteins RSCU Relative Synonymous Codon Usage SD Shine Dalgarno TF Transcription and Translation Factors
CHAPTER I - Literature review: Transport systems in Lactic Acid Bacteria
1.1 Introduction
Bacteria are a dominant and diverse life form on earth. Molecular comparisons
between life forms divide organisms into three groups, namely eubacteria, archaebacteria
and eukaryotes (Woese et al., 1990). At the molecular level, those three groups are based
on differences within the ribosomal RNA (rRNA) structure and sequence (Woese et al.,
1990). This triad-nomenclature includes the eukaryote-prokaryote dichotomy, which is
based on presence / absence of a nucleus. Specifically, life on earth is divided into three
(ccpA) and HPrK/P (ptsK). Similarly, all those genes were identified in S. pneumoniae
(Tettelin et al., 2001). Those genes are involved in an active regulatory network based on
sugar availability. The regulatory networks involved in sugar utilization are not well
documented in lactobacilli and bifidobacteria, whereas they have been characterized in
streptococci (Vadeboncoeur and Pelletier, 1997). Nevertheless, previous work has
indicated involvement of CcpA in repression of specific operons in L. casei, and L.
plantarum (Viana et al., 2000; Muscariello et al., 2001) and L. pentosus (Mahr et al.,
2000). Specifically, the pepQ-ccpA locus has been identified in L. pentosus, L.
delbrueckii, L. casei, S. mutans and L. lactis (Mahr et al., 2000), and in most cases, a cre
sequence is found in the promoter-operator region of ccpA. The PTS is characterized by a
phosphate transfer cascade involving PEP, EI, HPr, and various EIIABCs, whereby a
phosphate is ultimately transferred to the carbohydrate substrate (Saier, 2000; Titgemeyer
and Hillen, 2002; Warner and Lolkema, 2003). HPr is a key component of CCR, which is
regulated via phosphorylation by enzyme I (EI) and HPr kinase/phosphatase (HPr K/P).
While HPr is the primary regulator of CCR, HPr K/P is the sensor enzyme of CCR in
Gram positive bacteria (Nessler, et al., 2003). HPrK/P has been found in a variety of
LAB, including L. casei, L. brevis, L. delbrueckii, L. gasseri, L. acidophilus, L. lactis,
Streptococcus bovis, S. mutans, S. salivarius, S. pneumoniae, S. pyogenes, S. agalactiae
and Leuconostoc mesenteroides (Warner and Lolkema, 2003; Altermann et al., 2004).
Similarly, HPr has also been found in a variety of LAB, including L. casei, L. sakei, L.
acidophilus, L. gasseri, L. brevis, L. mesenteroides, L. lactis. E. Faecalis, S. mutans, S.
salivarius, S. bovis, S. pyogenes, S. pneumoniae, S. thermophilus, S. agalactiae and
24
Oenococcus oeni (Warner and Lolkema, 2003; Altermann et al., 2004). The HPr-HPrK/P
complex has been characterized structurally (Fieulaine et al., 2002). When HPr is
phosphorylated at His15, the PTS is on (Poolman, 2002), and carbohydrates transported
via the PTS are phosphorylated via EIIABCs. In contrast, when HPr is phosphorylated at
Ser46, the PTS machinery is not functional (Vadeboncoeur and Pelletier, 1997;
Mijakovic et al., 2002,; Nessler et al., 2003). HPr-Ser46 acts as a co-repressor by binding
to CcpA (Fieulaine et al., 2002; Nessler et al., 2003). Ultimately, CcpA binds to cre
sequences in the promoter-operator region of operons encoding carbohydrate transporters
and hydrolases, and prevents their transcription (Hueck and Hillen, 1995; Poolman,
2002).
HPr has been identified in E. faecalis (Vadeboncoeur and Pelletier, 1997), S.
pyogenes (Deutscher and Saier, 1983; Vadeboncoeur and Pelletier, 1997), and L. lactis
(Luesink et al., 1999a).
CcpA-dependent repression and activation is well documented in a variety of
LAB, including enterococci, lactobacilli, lactococci and streptococci, especially with
regard to repression of the genes involved in utilization of galactosides (Titgemeyer and
Hillen, 2002).
The interaction between HPr and LacS has been shown in S. salivarius (Lessard et
al., 2003). It happens between HPr-His and EIIALacS, although LacS is not a member of
the PTS system. Since HPr is the primary regulator of CCR, the interaction between HPr
and LacS illustrates the likely regulation of the GPH system by CCR. In S. thermophilus,
the control of LacS by CCR has been illustrated, likely via interaction between CcpA and
25
two cre sequences found in the promoter-operator region of the lacSZ. Operon (van den
Bogaard et al., 2000).
Although the phosphorylation cascade suggests regulation at the protein level,
studies in LAB report both transcriptional modulation and constitutive expression of
ccpA and ptsHI. Specifically, in S. thermophilus, CcpA production is induced by glucose
(can den Bogaard, 2000). Similarly, in other bacteria, the carbohydrate source modulates
ptsHI transcriptional levels (Luesink et al., 1999a). In contrast, expression levels of ccpA
in L. pentosus (Mahr et al., 2000) and of ptsHI in S. thermophilus (Cochu et al., 2003) did
not vary in the presence of different carbohydrates.
Carbon catabolite repression is likely present in L. acidophilus, since all the
necessary regulatory proteins are encoded within its genome, cre-like sequences are
present in the promoter-operator regions of several carbohydrate loci (Barrangou et al.,
2003), and transcription of operons involved in utilization of non-preferred carbohydrates
is repressed by glucose (Barrangou et al., 2003).
Carbon catabolite repression illustrates how lactic acid bacteria adapt dynamically
to the diverse carbohydrate sources available in their various habitats.
1.9 Conclusions and perspectives
Although a variety of putative carbohydrate transporters have been identified in
LAB genomes recently published, little information is available regarding their biological
functions and expression profiles. Specifically, the substrate specificity of most PTS and
ABC transporters remains unclear, as illustrated in the incomplete annotation of most
PTS transporters in L. plantarum, L. acidophilus, L. johnsonii and S. pneumoniae
26
(Kleerebezem, 2003; Altermann, 2004; Schell et al., 2003; Tettelin et al., 2001). As a
result, in silico analyses must be confirmed and complemented by transcriptional and
biological analyses.
Surveys of carbohydrate uptake systems revealed greater diversity in prokaryotes
than eukaryotes. Specifically, eukaryotic carbohydrate transport is dominated by the
MFS, whereas that of prokaryotes involved both the MFS, PTS and ABC superfamilies
of transporters (Saier, 2000).
Recent advances in high throughput technologies, primarily genome sequencing
and microarrays have yielded global data that provide insight into the physiology of
microbes. Particularly, LAB genome analyses have illustrated the breadth and importance
of carbohydrate transporters in lactobacilli and bifidobacteria. Similarly, global
transcriptome analyses, similar to those carried out in Escherichia coli (Beloin et al.,
2004), Bacillus subtilis (Blencke et al., 2003), Vibrio cholerae (Meibom et al., 2003),
Thermotoga maritima (Chhabra et al., 2003; Pysz et al., 2004a; Pysz et al., 2004b) and
Pyrococcus furiosus (Shockley et al., 2003), applied to carbohydrate utilization
investigation in LAB will provide further insight into the transporters and metabolic
pathways involved in adaptation of LAB to their various environmental conditions.
Ultimately, genetic engineering of LAB could allow development of better starter
cultures and probiotic strains, optimized for utilization of specific carbohydrate sources,
and competition with other commensals. Genetic engineering in LAB is now possible,
following the development of molecular biology tools, including food-grade systems (de
Vos, 1996; Russell and Klaenhammer, 1998; Boucher et al., 2002; Kleerebezem and
Hugenholtz, 2003).
27
Overall, the combination of a diverse saccharolytic enzymatic machinery with a
polyvalent transport system, consisting primarily of ABC and PTS transporters, allows
lactic acid bacteria to utilize a variety of nutrient resources efficiently and dynamically
adapt its transcriptome to environmental conditions, ultimately rending these microbes
more competitive in their respective environments.
28
1.10 References
Ajdic, D., McShan, W. M., McLaughlin, R. E., Savic, G., Chang, J., Carson, M. B., Primeaux, C., Tian, R., Kenton, S., Jia, H., Lin, S., Qian, Y., Li, S., Zhu, H., Najar, F., Lai, H., White, J., Roe, B. A. & Ferretti, J. J. (2002) Proc. Natl. Acad. Sci. USA 99, 14434-14439
Alles, M. S., Hautvast, J. G. A. J., Nagengast, F. M., Hartemink, R., Van Laere, K. M. J.,
and J. B. M. Jansen (1996) Brit. J. Nutr. 76, 211-221 Altermann, E., Russell, W. M., Azcarate-Peril, M. A., Barrangou, R., Buck, L. B.,
McAuliffe, O., Souther, N., Dobson, A., Duong, T., Callanan, M., Lick, S., Hamrick, A., Cano, R., & Klaenhammer, T. R. (2004). J. Bacteriol In review
Barrangou R, Altermann E, Hutkins R, Cano, & Klaenhammer, TR. (2003) Proc. Natl.
Acad. Sci. USA 100, 8957-8962 Beloin, C., Valle, J., Latour-Lambert, P., Faure, P., Kzreminski, M., Balestrino, D.,
Haagensen, J. A. J., Molin, S., Prensier, G., Arbeile, B., & Ghigo, J. M. (2004) Mol. Microbiol. 51, 659-674
Blencke, H. M., Homuth, G., Ludwig, H., Mader, U., Hecker, M., & Stulke, J. (2003)
Metab. Eng. 5, 133-149 Boels, I. C., Kleerebezem, M., & de Vos, W. M. (2003) Appl. Environ. Microbiol. 69,
1129-1135 Bolotin, A., Mauger, S., Malarme, K., Ehrlich, S. D., & Sorokin, A. (1999) Antonie van
Leeuwenhoek 76, 27-76 Boucher, I., Parrot, M., Gaudreau, H., Champagne, C. P., Vadeboncoeur, C., & Moineau,
S. (2002) Appl. Environ. Microbiol. 68, 6152-6161 Boucher, I., Vadeboncoeur, C., & Moineau, S. (2003) Appl. Environ. Microbiol. 69,
4149-4156 Braibant, M., Gilot, P., & Content, J. (2000) FEMS Microbiol. Rev. 24, 449-467 Cavalier-Smith, T. (2004) Proc. R. Soc. Lond. 271, 1251-1262 Chhabra, S. R., Shockley, K. R., Conners, S. B., Scott, K. L., Wolfinger, R. D., & Kelly,
R. M. (2003) J. Biol. Chem. 278, 7540-7552 Cochu, A., Vadeboncoeur, C., Moineau, S, & Frenette, M. (2003) Appl. Environ.
Microbiol. 69, 5423-32
29
Curtis, T. P., & Sloan, W. T. (2004) Curr. Opin. Microbiol. 7, 221-226 Curtis, T. P., Sloan, W. T., & Scannell, J. W. (2002) Proc. Natl. Acad. Sci. USA 99,
10494-10499 Davidson, A. L., & Chen, J. (2004) Annu. Rev. Biochem. 73, 241-268 Deutscher, J., & Saier, M. H. (1983) Proc. Natl. Acad. Sci. USA 80, 6790-6794 De Vos, W. M. (1996) Antonie van Leeuwenhoek 70, 223-242 Djordjevic, G. M., Tchieu, J. H., & Saier, M. H. (2001) J. Bacteriol. 183, 3224-3236 Duong, T., Barrangou, R., Russell, M. W., & Klaenhammer, T. R. (2004) In review Embley, T. M., Hirt, R. P., & Williams, D. M. (1994) Phil. Trans. R. Soc. Lond. 345, 21-
33 Ferretti, J. J., McShan, W. M., Ajdic, D., Savic, D. J., Savic, G., Lyon, K., Primeaux, C.,
Sezate, S., Suvorov, A., Kenton, S., Lai, H. S., Lin, S. P., Qian, Y., Jia, H. G., Najar, F. Z., Ren, Q., Zhu, H., Song, L., White, J., Yuan, X., Clifton, S. W., Roe, B. A., & McLaughlin, R. (2001) Proc. Natl. Acad. Sci. USA 98, 4658-4663
Fieulaine, S., Morera, S., Poncet, S., Mijakoic, I., Galinier, A., Janin, J., Deutscher, J., &
Nessler, S. (2002) Proc. Natl. Acad. Sci. USA 99, 13437-13441 Fortina, M. G., Ricci, G., Mora, D., Guglielmetti, S., & Manachini, P. L. (2003) Appl.
Environ. Microbiol. 69, 3238-43 Gibson, G. R. & Roberfroid, M. B. (1995) J. Nutr. 125, 1401-1412 Grantham, R., Gautier, C., Gouy, M., Mercier, R., & Pave, A. (1980) Nucleic Acids Res.
8, r49-r62 Grossiord, B. P., Luesink, E. J., Vaughan, E. E., Arnaud, A., & De Vos, W. M. (2003) J.
Bacteriol. 185, 870-8 Hueck, C. J., & Hillen, W. (1995) Mol. Microbiol. 15, 395-401 Hugenholtz, J., Sybesma, W., Groot, M. N., Wisselink, W., Ladero, V., Birgess, K., van
Sinderen, D., Piard, J. C., Eggink, G., Smid, E. J., Savoy, G., Sesma, F., Jansen, T., Hols, P., & Kleerebezem, M. (2002) Antonie van Leeuwenhoek 82, 217-235
Kaplan, H., & Hutkins, R. W. (2000) Appl. Environ. Microbiol. 66, 2682-2684
30
Kaplan, H., & Hutkins, R. W. (2003) Appl. Environ. Microbiol. 69, 2217-2222 Klaenhammer, T. R., Altermann, E., Arigoni, F., Bolotin, A., Breidt, F., Broadbent, J.,
Cano, R., Chaillou, S., Deutscher, J., Gasson, M., van de Guchte, M., Guzzo, J., Hartke, A., Hawkins, T., Hols, P., Hutkins, R., Kleerebezem, M., Kok, J., Kuipers, O., Lubbers, M., Maguin, E., McKay, L., Mills, D., Nauta, A., Overbeek, R., Pel, H., Pridmore, D., Saier, M., van Sinderen, D., Sorokin, A., Steele, J., O'Sullivan, D., de Vos, W., Weimer, B., Zagorec, M., and Siezen, R. (2002) Antonie Van Leeuwenhoek 82, 29-58
Kleerebezem, M., Boekhorst, J., van Kranenburg, R., Molenaar, D., Kuipers, O. P., Leer,
R., Tarchini, R., Peters, S. A., Sandbrink, H. M., Fiers, M. W., Stiekema, W., Lankhorst, R. M., Bron, P. A., Hoffer, S. M., Groot, M. N., Kerkhoven, R., de Vries, M., Ursing, B., de Vos, W. M. & Siezen, R. J. (2003) Proc. Natl. Acad. Sci. USA 100, 1990-5
Kleerebezem, M., & Hugenholtz, J. (2003) Curr. Opin. Biotechnol. 14, 232-237 Konings, W. N. (2002) Antonie van Leeuwenkoeck 82, 3-27 Krogh, A., Larsson, B., von Heijne, G., & Sonnhammer, E. L. L. (2001) J. Mol. Biol.
305, 567-580 Kullen, M. J., & Klaenhammer, T. R. (1999) Mol. Microbiol. 33, 1152-1161 Kumar, S., Tamura, K., Jakobsen, I. B., & Nei, M. (2001) Bioinformatics 17, 1244-1245 Lapierre, L., Mollet, B., & Germond, J. E. (2002) J. Bacteriol. 184, 928-35 Leong-Morgenthaler, P., Zwahlen, M. C., & Hottinger, H. (1991) J. Bacteriol. 173, 1951-
1957 Lessard, C., Cochu, A., Lemay, J. D., Roy, D., Vaillancourt, K., Frenette, M., Moineau,
S., & Vadeboncoeur, C. (2003) J. Bacteriol. 185, 6764-72 Linton, K. J., & Higgins, C. F. (1998) Mol. Microbiol. 28, 5-13 Locher, K. P., Lee, A. T., & Rees, D. C. (2002) Science 296, 1091-1098 Luesink, E. J., Beumer, C. M. A., Kuipers, O. P., & de Vos, W. M. (1999a) J. Bacteriol.
181, 764-771 Luesink, E. J., Marugg, J. D., Kuipers, O. P. & de Vos, W. M. (1999b) J. Bacteriol. 181,
1924-1926 Mahr, K., Hillen, W., & Titgemeyer, F. (2000) Appl. Environ. Microbiol. 66, 277-83
31
Margulis, L. (1996) Proc. Natl. Acad. Sci. USA 93, 1071-1076 McLaughlin, R. E., & Ferretti, J. J. (1996) FEMS Microbiol. Lett. 140, 261-264 Meibom, K. L., Li, X. B., Wu, C. Y., Roseman, S., & Schoolnik, G. K. (2004) Proc. Natl.
Acad. Sci. USA 101, 2524-2529 Mijakovic, I., Poncet, S., Galinier, A., Monedero, V., Fieulaine, S., Janin, J., Nessler, S.,
Marquez, J. A., Scheffzek, K., Hasenbein, S., Hengstenberg, W., & Deutscher, J. (2002) Proc. Natl. Acad. Sci. USA 99, 13442-7
Miwa, Y., Nakata, A., Ogiwara, A., Yamamoto, M. & Fujita, Y. (2000) Nucleic Acids
Res. 28, 1206-10 Muscariello, L., Marasco, R., De Felice M., & Sacco, M. (2001) Appl. Environ.
Microbiol. 67, 2903-2907 Muto, A., & Osawa, S. (1987) Proc. Natl. Acad. Sci. USA 84, 166-169 Naumoff, D. G., & Livshits, V. A. (2001) Mol. Biol. 35, 19-27 Nesbo, C. L., Nelson, K. E., & Doolitle, W. F. (2002) J. Bacteriol. 184, 4475-4488 Nessler, S., Fieulaine, S., Poncet, S., Galinier, A., Deutscher, J., & Janin, J. (2003) J.
Bacteriol. 185, 4003-4010 Ouwehand, A. C., Salminen, S., & Isolauri, E. (2002) Antonie van Leeuwenhoek 82, 279-
289 Paulsen, I. T. Sliwinski, M. K., & Saier, M. H. (1998) J. Mol. Biol. 277, 573-592 Paulsen, I. T., Nguyen, L., Sliwinski, M. K., Rabus, R., & Saier, M. H. (2000) J. Mol.
Biol. 301, 75-100 Paulsen, I. T., Banerjei, L., Myers, G. S. A. Nelson, K. E., Seshadri, R., Read, T. D.,
Fouts, D. E., Eisen, J. A., Gill, S. R., Heidelberg, J. F., Tettelin, H., Dodson, R. J., Umayam, L., Brinkac, L., Beanan, M., Daugherty, S., DeBoy, R. T., Durkin, S., Kolonay, J., Madupu, R., Nelson, W., Vamathevan, J., Tran, B., Upton, J., Hansen, T., Shetty, J., Khouri, H., Utterback, T., Radune, D., Ketchum, K. A. Dougherty, B. A., & Fraser, C. M. (2003) Science 299, 2071-2074
Poolman, B. (2002) Antonie van Leeuwenhoek 82, 147-164
32
Pridmore RD, Berger B, Desiere F, Vilanova D, Barretto C, Pittet AC, Zwahlen MC, Rouvet M, Altermann E, Barrangou R, Mollet B, Mercenier A, Klaenhammer TR, Arigoni F, & Schell MA. (2004) Proc. Natl. Acad. Sci. USA 101, 2512-2517
Pysz, M. A., Conners, S. B., Montero, C. I., Shockley, K. R., Johnson, M. R., Ward, D.
E., & Kelly, R. M. (2004a) Appl. Environ. Microbiol. 70, 6098-6112 Pysz, M. A., Ward, D. E., Shockley, K. R., Montero, C. I., Conners, S. B., Johnson, M.
R., & Kelly, R. M. (2004b) Extremophiles 8, 209-17 Quentin, Y., Fichant, G., & Denizot, F. (1999) J. Mol. Biol. 287, 467-484 Reid, G. (1999) Appl. Environ. Microbiol. 65, 3763-6 Reid, G., Sanders, M. E., Gaskins, H. R., Gibson, G. R., Mercenier, A., Rastall, R.,
Roberfroid, M., Rowland, I., Cherbut, C., & Klaenhammer T. R. (2003) J. Clin. Gastroenterol. 37, 105-118
Rivera, M. C., & Lake, J. A. (2004) Nature 431, 152-155 Rosenow, C., Maniar, M., & Trias, J. (1999) Genome Res. 9, 1189-97 Russell, R. R. B., Aduse-Opoku, J., Sutcliffe, I. C., Tao, L. & Ferretti, J. J. (1992) J. Biol.
Chem. 267, 4631-4637 Russell, W. M., & Klaenhammer, T. R. (2001) Appl. Environ. Microbiol. 67, 4361-4364 Rycroft, C. E., Jones, M. R., Gibson, G. R. & Rastall, R. A. (2001) J. Appl. Microbiol.
91, 878-87 Saier, M. H., & Reizer, J. (1992) J. Bacteriol. 174, 1433-1438 Saier, M. H. (2000) Mol. Microbiol. 35, 699-710 Sanders, M. E., & Klaenhammer, T. R. (2001) J. Dairy. Sci. 84, 319-331 Schell, M. A., Karmirantzou, M., Snel, B., Vilanova, D., Berger, B., Pessi, G., Zwahlen,
M. C., Desiere, F., Bork, P., Delley, M., Pridmore, R. D. & Arigoni, F. (2002) Proc. Natl. Acad. Sci. USA 99, 14422-14427
Shockley, K. R., Ward, D. E., Chhabra, S. R., Conners, S. B., Montero, C. I., & Kelly, R.
M. (2003) Appl. Environ. Microbiol. 69, 2365-2371 Siebold, C., Flukiger, K., Beutler, R., & Erni, B. (2001) FEBS Lett. 504, 104-111
33
Sievers, M., Uermosi, C., Fehlmann, M., & Krieger, S. (2003) System. Appl. Microbiol. 26, 350-356
Siezen, R. J., van Enckevort, F. H. J., Kleerebezem, M., & Teusink, B. (2004) Curr.
Opin. Biotechnol. 15, 105-115 Snel, B., Bork, P., & Huynen M. A. (2002) Proc. Natl. Acad. Sci. USA 99, 5890-5895 Tannock, G. W. (1999) Antonie van Leeuwenhoek 76, 265-278 Tettelin, H., Nelson, K. E., Paulsen, I. T., Eisen, J. A., Read, T. D., Peterson, S.,
Heidelberg, J., Deboy, R. T., Haft, D. H., Dodson, R. J., Durkin, A. S., Gwinn, M., Kolonay, J. F., Nelson, W. C., Peretron, J. D., Umayam, L. A., While, O., Salzberg, S. L., Lewis, M. R., Radune, D., Holtzapple, E., Khouri, H., Wolf, A. M., Utterback, T. R., Hansen, C. L., McDonald, L. A., Feldblyum, T. V., Angiuoli, S., Dickinson, T., Hickey, E. K., Holt, I. E., Loftus, B. J., Yang, F., Smith, H. O., Venter, J. C., Dougherty, B. A., Morrison, D. A., Hollingshead, S. K., & Fraser, C. M. (2001) Science 293, 498-506
Tettelin, H., Masignani, V., Cieslewicz, M. J., Eisen, J. A., Peterson, S., Wessels, M. R.,
Paulsen, I. T., Nelson, K. E., Margarit, I., Read, T. D., Madoff, L. C., Wolf, A. M., Beanan, M. J., Brinkac, L. M., Daugherty, S. C., DeBoy, R. T., Durkin, A. S., Kolonay, J. F., Madupu, R., Lewis, M. R., Radune, D., Fedorova, N. B., Scanlan, D., Khouri, H., Mulligan, S., Carty, H. A., Cline, R. T., Van Aken, S. E., Gill, J., Scarselli, M., Mora, M., Iacobini, E. T., Brettoni, C., Galli, G., Mariani, M., Vegni, F., Maione, D., Rinaudo, D., Rappuoli, R., Telford, J. L., Kasper, D. L., Grandi, G., & Fraser, C. M. (2002) Proc. Natl. Acad. Sci. USA 99, 12391-12396
Thompson, J. D., Higgins, D. G., & Gibson, T. J. (1994) Nucleic Acids Res. 22, 4673-
4680 Titgemeyer, F., & Hillen, W. (2002) Antonie van Leeuwenhoek 82, 59-71 Vadeboncoeur, C., & Pelletier, M. (1997) FEMS Microbiol. Rev. 19, 187-207 Vaillancourt, K., Moineau, S., Frenette, M., Lessard, C., & Vadeboncoeur, C. (2002) J.
Bacteriol. 184, 785-793 Van den Bogaard, P. T. C., Kleerebezem, M., Kuipers, O. P., & De Vos, W. M. (2000) J.
Bacteriol. 182, 5982-5989 Van Laere, K. M., Hartemink, R., Bosveld, M., Schols, H. A. & Voragen, A. G. (2000) J.
Agric. Food Chem. 48, 1644-1652 Van Veen, H. W., Margolles, A., Putman, M., Sakamoto, K., & Konings, W. N. Antonie
van Leeuwenhoek 76, 347-352
34
Vaughan, E. E., David, S., & De Vos W. M. (1996) Appl. Environ. Microbiol. 62, 1574-
82 Vaughan, E. E., de Vries, M. C., Zoetendal, E. G., Ben-Amor, K., Akkermans, A. D. L.,
& de Vos, W. M. (2002) Antonie van Leeuwenhoek 82, 341-352 Venter, J. C., Remington, K., Heidelberg, J. F., Halpern, A. L., Rusch, D., Eisen, J. A.,
Wu, D., Paulsen, I., Nelson, K. E., Nelson, W., Fouts, D. E., Levy, S., Knap, A. H., Lomas, M. W., Nealson, K., White, O., Peterson, J., Hoffman, J., Parsons, R., Baden-Tillson, H., Pfannkoch, C., Rogers, Y. H., & Smith, H. O. (2004) Science 304, 66-74
Ventura, M., Canchaya, C., van Sinderen, D., Fitzgerald, G. F., & Zink, R. (2004) Appl.
Deutscher, J. (2000) Mol. Microbiol. 36, 570-584 Warner, J. B., & Lolkema, J. S. (2003) Microbiol. Mol. Rev. 67, 475-490 Weickert, M. J. & Chambliss, G. H. (1990) Proc. Natl. Acad. Sci. USA 87, 6238-6242 Woese, C. R., Kandler, O., & Wheelis, M. L. (1990) Proc. Natl. Acad. Sci. USA 87,
4576-4579
35
Table 1. Genomes of lactic acid bacteria and other probiotic species
Genus Species strain Size (Mbp) %GC Status reference Bifidobacterium longum NCC2705 2.3 60.1 C1 Schell et al. breve NCIMB8807 2.4 58.8 C Siezen et al. Enterococcus faecalis V583 3.2 37.5 C Paulsen et al. Lactobacillus acidophilus NCFM 2.0 34.7 C Altermann et al. gasseri ATCC333323 1.8 35.1 IP JGI johnsonii NCC533 2.0 34.6 C Pridmore et al. plantarum WCFS1 3.3 44.5 C Kleerebezem et al. casei ATCC334 2.5 41.1 IP JGI rhamnosus HN001 2.4 46.4 IP Klaenhammer et al. helveticus CNRZ32 2.4 37.1 IP Klaenhammer et al. brevis ATCC367 2.0 43.1 IP JGI sakei 23K 1.9 41.2 C Klaenhammer et al. delbrueckii ATCCBAA365 2.3 45.7 IP JGI Lactococcus lactis ssp. lactis IL1403 2.3 35.4 C Bolotin et al. lactis ssp. cremoris SK11 2.3 30.9 IP JGI Leuconostoc mesenteroides ATCC8293 2.0 37.4 IP JGI Oenococcus oeni ATCCBAA331 1.8 37.5 IP JGI Pediococcus pentosaceus ATCC25745 2.0 37.0 IP JGI Streptococcus agalactiae 2603V/R 2.2 35.7 C Tettelin et al. mutans UA159 2.0 36.8 C Ajdic et al. pneumoniae TIGR4 2.2 39.7 C Tettelin et al. pyogenes M1 1.9 38.5 C Ferretti etal. thermophilus LMD9 1.8 36.8 IP JGI
1 C, complete
2 IP, in progress
3 JGI, Joint Genome Institute
Adapted from Klaenhammer et al., 2002 and Siezen et al., 2004
36
Table 2. Carbohydrate utilization profiles for select lactic acid bacteria
TMD, number of transmembrane domains in a protein, as predicted by the algorithm developed by Krogh et al., 2001.
38
L. lac
tis
S. mutans
S. pneumoniae
S. pyogenes
S. agalactiae
L. gasseri
L. jo
hnso
nii
L. a
cido
phi lu
s
L. p
lant
arum
P. p
ento
sace
us
E. faec
alis
B. halodurans
B. subtilis
S. aureusL. mesenteroides
O. oeni
B. longum
B. linens
T. maritim
a
.r
L. lac
tis
S. mutans
S. pneumoniae
S. pyogenes
S. agalactiae
L. gasseri
L. jo
hnso
nii
L. a
cido
phi lu
s
L. p
lant
arum
P. p
ento
sace
us
E. faec
alis
B. halodurans
B. subtilis
S. aureusL. mesenteroides
O. oeni
B. longum
B. linens
T. maritim
a
.r
V. alginolyticus
V. cholerae
E. coliK. pneumoniae
S typhimuyium
V. alginolyticus
V. cholerae
E. coliK. pneumoniae
S typhimuyium
Figure 1. Phylogenetic tree of lactic acid bacteria and select microbial species. This phylogenetic tree is a neighbor-joining tree obtained from the multiple sequence alignment of 16S rRNA genes in ClustalW (Thompson et al., 1994), visualized in MEGA2 (Kumar et al., 2001). Black, lactic acid bacteria; red, bacillales; yellow, thermotogae; red, proteobacteria. Within LAB, branches for different subgroups have different colors: blue, streptococci, pink, lactobacilli, purple, high GC brevibacteria and bifidobacteria.
Figure 3. Transmembrane domains in ABC, PTS and GPH transporters in L. acidophilus. A, TMDs in FOS ABC transporter MsmE; B, TMDs in FOS ABC transporter MsmF; C, TMDs in sucrose PTS transporter ScrB; D, TMDs in lactose/galactose GPH transporter LacS
41
CHAPTER II – Functional and comparative genomic analyses of an operon involved in fructooligosaccharide utilization by Lactobacillus
acidophilus
Published in Proc. Natl. Acad. Sci. USA 100, 8957-8962 – see appendix 1
42
2.1 Abstract
Lactobacillus acidophilus NCFM is a probiotic organism that displays the ability to
utilize prebiotic compounds, such as fructo-oligosaccharides (FOS), which stimulate the
growth of beneficial commensals in the gastrointestinal tract. However, little is known
about the mechanisms and genes involved in FOS utilization by Lactobacillus species.
Analysis of the L. acidophilus NCFM genome revealed an msm locus composed of a
transcriptional regulator of the LacI family, a four component ABC transport system, a
fructosidase and a sucrose phosphorylase. Transcriptional analysis of this operon
demonstrated that gene expression was induced by sucrose and FOS, but not by glucose
or fructose, suggesting some specificity for non-readily fermentable sugars. Additionally,
expression was repressed by glucose, but not by fructose, suggesting catabolite
repression, via two cre-like sequences identified in the promoter-operator region.
Insertional inactivation of the genes encoding the ABC transporter substrate binding
protein and the fructosidase reduced the ability of the mutants to grow on FOS.
Comparative analysis of gene architecture within this cluster revealed a high degree of
synteny with operons in Streptococcus mutans and Streptococcus pneumoniae. However,
the association between a fructosidase and an ABC transporter is unusual, and may be
specific to L. acidophilus. This is the first description of a gene locus involved in
transport and catabolism of FOS compounds, which can promote competition of
beneficial microorganism in the human gastrointestinal tract.
43
2.2 Introduction
The ability of select intestinal microbes to utilize substrates non-digested by the
host may play an important role in their ability to successfully colonize the mammalian
gastrointestinal (GI) tract. A diverse carbohydrate catabolic potential is associated with
cariogenic activity of S. mutans in the oral cavity (Ajdic et al., 2002), adaptation of L.
plantarum to a variety of environmental niches (Kleerebezem et al., 2003), and residence
of B. longum in the colon (Schell et al., 2002), illustrating the competitive benefits of
complex sugar utilization. Prebiotics are non-digestible food ingredients that selectively
stimulate the growth and/or activity of beneficial microbial strains residing in the host
intestine (Gibson and Roberfroid, 1995). Among sugars that qualify as prebiotics, fructo-
oligosaccharides (FOS) are a diverse family of fructose polymers used commercially in
food products and nutritional supplements, that vary in length and can be either
derivatives of simple fructose polymers, or fructose moieties attached to a sucrose
molecule. The linkage and degree of polymerization can vary widely (usually between 2
and 60 moieties), and several names such as inulin, levan, oligofructose and neosugars
are used accordingly. The average daily intake of such compounds, originating mainly
from wheat, onion, artichoke, banana, and asparagus (Gibson and Roberfroid, 1995;
Moshfegh et al., 1999), is fairly significant with nearly 2.6 g of inulin and 2.5 g of
oligofructose consumed in the average American diet (Moshfegh et al., 1999). FOS are
not digested in the upper gastrointestinal tract and can be degraded by a variety of lactic
acid bacteria (Hartemink et al., 1995; Hartemink et al., 1997; Kaplan and Hutkins, 2000;
Van Laere et al., 2000), residing in the human lower gastrointestinal tract (Gibson and
Roberfroid, 1995; Orrhage et al., 2000). FOS and other oligosaccharides have been
44
shown in vivo to beneficially modulate the composition of the intestinal microbiota, and
specifically to increase bifidobacteria and lactobacilli (Gibson and Roberfroid, 1995;
Orrhage et al., 2000). A variety of L. acidophilus strains in particular have been shown to
utilize several polysaccharides and oligosaccharides such as arabinogalactan,
arabinoxylan and FOS (Kaplan and Hutkins, 2000; Van Laere et al., 2000). Despite the
recent interest in FOS utilization, little information is available about the metabolic
pathways and enzymes responsible for transport and catabolism of such complex sugars
in lactobacilli.
In silico analysis of a particular locus within the L. acidophilus NCFM genome
revealed the presence of a gene cluster encoding proteins potentially involved in prebiotic
transport and hydrolysis. This specific cluster was analyzed computationally and
functionally to reveal the genetic basis for FOS transport and catabolism by L.
acidophilus NCFM.
2.3Materials and Methods
2.3.1 Bacterial strain and media used in this study
The strain used in this study is L. acidophilus NCFM (Barefoot and
Klaenhammer, 1983). Cultures were propagated at 37°C, aerobically in MRS broth
(Difco). A semi-synthetic medium consisted of: 1% bactopeptone (w/v) (Difco), 0.5%
0.003 % bromocresol purple (v/v) (Fisher), and 1% sugar (w/v). The carbohydrates added
45
were either glucose (dextrose) (Sigma), fructose (Sigma), sucrose (Sigma), or FOS. Two
types of complex sugars were used as FOS: a GFn mix (manufactured by R. Hutkins),
consisting of glucose monomers linked α-1,2 to two, three or four fructosyl moieties
linked β-2,1, to form kestose (GF2), nystose (GF3) and fructofuranosyl-nystose (GF4),
respectively; and an Fn mix, raftilose, derived from inulin hydrolysis (Orafti). Without
carbohydrate supplementation, the semi-synthetic medium was unable to sustain bacterial
growth above OD600nm~0.2.
2.3.2 Computational analysis of the putative msm operon
A 10 kbp DNA locus containing a putative msm (multiple sugar metabolism)
operon was identified from the L. acidophilus NCFM genome sequence. ORF predictions
were carried out by four computational programs: Glimmer (Salzberg et al., 1998;
Delcher et al., 1999), Clone Manager (Scientific and Educational Software), the NCBI
ORF caller (http://www.ncbi.nlm.nih.gov/gorf/gorf.html), and GenoMax (InforMax Inc.,
MD). Glimmer was previously trained with a set of L. acidophilus genes available in
public databases. The predicted ORF’s were translated into putative proteins that were
submitted to BlastP analysis (Altschul et al., 1990).
2.3.3 RNA isolation and analysis
Total RNA was isolated using TRIzol (GibcoBRL) by following the supplier’s
instructions. Cells in the mid-log phase were harvested by centrifugation (2 minutes,
14,000 rpm) and cooled on ice. Pellets were resuspended in TRIZOL, by vortexing and
underwent five cycles of 1 min bead beating and 1 min on ice. Nucleic acids were
46
subsequently purified using three chloroform extractions, and precipitated using
isopropanol and centrifugation for 10 min at 12,000 rpm. The RNA pellet was washed
with 70% ethanol, and resuspended into DEPC treated water. RNA samples were treated
with DNAse I according to the supplier’s instructions (Boehringer Mannheim). First
strand cDNA was synthesized using the Invitrogen RT-PCR kit according to the
supplier’s instructions. cDNA products were subsequently amplified using PCR with
primers internal to genes of interest. For RNA slot blots, RNA samples were transferred
to nitrocellulose membranes (BioRad) using a slot blot apparatus (Bio-Dot SF, BioRad),
and the RNAs were UV crosslinked to the membranes. Blots were probed with DNA
fragments generated by PCR that had been purified from agarose gels (GeneClean III kit,
Midwest Scientific). Probes were labeled with α-32P, using the Amersham Multiprime
Kit, and consisted of a 700 bp and 750 bp fragment internal to the msmE and bfrA genes,
respectively. Hybridization and washes were carried out according to the supplier’s
instructions (Bio-Dot Microfiltration Apparatus, BioRad) and radioactive signals were
detected using a Kodak Biomax film. Primers are listed in Supporting Table 1.
2.3.4 Comparative genomic analysis
A gene cluster bearing a fructosidase gene was selected after computational data-
mining of the L. acidophilus NCFM genome. Additionally, microbial clusters containing
fructosidase EC 3.2.1.26 orthologs, or bearing an ABC transport system associated with
an alpha-galactosidase EC 3.2.1.22 were selected from public databases (NCBI, TIGR).
The sucrose operon is a widely distributed cluster, consisting of either three or four
elements, namely: a regulator, a sucrose PTS transporter, a sucrose hydrolase and
47
occasionally a fructokinase. Two gene cluster alignments were generated: (i) a PTS
alignment, representing similarities over the sucrose operon, bearing a PTS transport
system associated with a sucrose hydrolase; (ii) an ABC alignment, representing
similarities over the multiple sugar metabolism cluster, bearing an ABC transport system
usually associated with a galactosidase. Sequence information is available in Table 2.
2.3.5 Phylogenetic trees
Nucleotide and protein sequences were aligned computationally using the
CLUSTALW algorithm (Thompson et al., 1994). The multiple alignment outputs were
used for generating unrooted neighbor-joining phylogenetic trees using MEGA2 (Kumar
et al., 2001). In addition to a phylogenetic tree derived from 16S rRNA genes, trees were
generated for ABC transporters, PTS transporters, transcription regulators, fructosidases,
and fructokinases.
2.3.6 Gene inactivation
Gene inactivation was conducted by site-specific plasmid integration into the L.
acidophilus chromosome via homologous recombination (Russell and Klaenhammer,
2001). Internal fragments of the msmE and bfrA genes were cloned into pORI28 using E.
coli as a host (Law et al., 1995), and the constructs were subsequently purified and
transformed into L. acidophilus NCFM. The ability of the mutant strains to grow on a
variety of carbohydrate substrates was investigated using growth curves. Strains were
grown on semi-synthetic medium supplemented with 0.5% w/v carbohydrate.
48
2.4 Results
2.4.1 Computational analysis of the msm operon
Analysis of the msm locus using four ORF calling programs revealed the presence
of seven putative ORF’s. Because most of the encoded proteins were homologous to
those of the msm operon present in S. mutans (Russell et al., 1992), a similar gene
nomenclature was used. The analysis of the predicted ORF’s suggested the presence of a
transcriptional regulator of the LacI repressor family, MsmR; a four component transport
system of the ATP binding cassette (ABC) family, MsmEFGK; and two enzymes
involved in carbohydrate metabolism, namely a fructosidase EC 3.2.1.26, BfrA; and a
sucrose phosphorylase EC 2.4.1.7, GtfA. A putative Shine-Dalgarno sequence
5’AGGAGG3’ was found within 10 bp upstream of the msmE start codon. A dyad
symmetry analysis revealed the presence of two stem loop structures that could act as
putative Rho-independent transcriptional terminators: one between msmK and gtfA
(between bp 6986 and 7014), free energy – 13.6 kcal.mol-1, and one 20 bp downstream of
the last gene of the putative operon (between bp 8,500 and 8,538), free energy –16.5
kcal.mol-1. The operon structure is shown in Figure 1.
The regulator contained two distinct domains: a DNA binding domain at the
amino-terminus with a predicted helix-turn-helix motif (pfam00354), and a sugar-binding
domain at the carboxy-terminus (pfam00532). The transport elements consisted of a
periplasmic solute binding protein (pfam01547), two membrane spanning permeases
(pfam00528), and a cytoplasmic nucleotide binding protein (pfam 00005), characteristic
of the different subunits of a typical ABC transport system (Quentin et al., 1999). A
putative anchoring motif LSLTG was present at the amino-terminus of the substrate-
49
binding protein. Each permease contained five trans-membrane regions predicted
computationally (Krogh et al., 2001). Analyses of ABC transporters in recently
sequenced microbial genomes have defined four characteristic sequence motifs (Linton
and Higgins, 1998; Braibant et al., 2000). The predicted MsmK protein included all four
ABC conserved motifs, namely: Walker A: GPSGCGKST (consensus GxxGxGKST or
[AG]xxxxGK[ST]); Walker B: IFLMDEPLSNLD (consensus hhhhDEPT or
DExxxxxD); ABC signature sequence: LSGG; and Linton and Higgins motif: IAKLHQ
(consensus hhhhH+/-, with h, hydrophobic and +/- charged residues). The putative
fructosidase showed high similarity to glycosyl hydrolases (pfam 00251). The putative
sucrose phosphorylase shared 63% residue identity with that of S. mutans.
2.4.2 Sugar induction and co-expression of contiguous genes
Transcriptional analysis of the msm operon using RT-PCR and RNA slot blots
showed that sucrose and both types of oligofructose (GFn and Fn) were able to induce
expression of msmE and bfrA (Figure 2A). In contrast, glucose and fructose did not
induce transcription of those genes, suggesting specificity for non-readily fermentable
sugars and the presence of a regulation system based on carbohydrate availability. In the
presence of both FOS and readily fermentable sugars, glucose repressed expression of
msmE, even if present at a lower concentration, whereas fructose did not (Figure 2B).
Analysis of the transcripts induced by oligofructose indicated that all genes within the
operon are co-expressed (Figure 6) in a manner consistent with the S. mutans msm
operon (McLaughlin and Ferretti, 1996).
50
2.4.3 Mutant phenotype analysis
The ability of the bfrA (fructosidase) and msmE (ABC transporter) mutant strains
to grow on a variety of carbohydrates was monitored by both optical density at 600nm
and colony forming units (cfu). The mutants retained the ability to grow on glucose,
fructose, sucrose, galactose, lactose and FOS-GFn, in a manner similar to that of the
control strain (Figure 7), a lacZ mutant of the L. acidophilus parental strain also
generated by plasmid integration (Russell and Klaenhammer, 2001). This strain was
chosen because it also bears a copy of the plasmid used for gene inactivation integrated in
the genome. In contrast, both the bfrA and msmE mutants halted growth on FOS-Fn
prematurely (Figure 3), likely upon exhaustion of simple carbohydrate from the semi-
synthetic medium. After one passage, the msmE mutant displayed slower growth on FOS-
Fn, while the bfrA mutant could not grow (Figure 3). Additionally, terminal cell counts
from overnight cultures grown on FOS-Fn were significantly lower for the mutants,
especially after one passage (Figure 7).
2.4.4 Comparative genomic analyses and locus alignments
Comparative genomic analysis of gene architecture between L. acidophilus, S.
mutans, S. pneumoniae, B. subtilis and B. halodurans revealed a high degree of synteny
within the msm cluster, except for the core sugar hydrolase (Figure 4A). In contrast, gene
content was consistent, whereas gene order was not well conserved for the sucrose
operon (Figure 4B). The lactic acid bacteria exhibit a divergent sucrose operon, where the
regulator and the hydrolase are transcribed opposite to the transporter and the
fructokinase. In contrast, gene architecture was variable amongst the proteobacteria.
51
2.4.5 Phylogenetic trees
Phylogenetic trees were generated to investigate whether there was a correlation
between protein similarity, gene architecture and the phylogenic relationships of the
selected microorganisms. The phylogenetic relationships were obtained from 16S
ribosomal DNA alignment. All proteobacteria appeared distant from the LAB, and the
Clostridium species formed a well-defined cluster between T. maritima and the bacillales
(Figure 5A).
For the fructosidases, all enzymes obtained from the LAB sucrose operons
clustered extremely well together, at the left end of the tree, whereas there was apparent
shuffling of the other three groups (Figure 5B). The paralogs of those fructosidases in S.
mutans, S. pneumoniae, and L. acidophilus clustered at the opposite end of the tree.
Interestingly, the L. acidophilus fructosidase was distant from the LAB sucrose
hydrolases cluster, and showed strong homology to enzymes experimentally associated
with oligosaccharide hydrolysis, in organisms such as T. maritima, M. laevaniformans,
and B. subtilis.
Each component of the ABC transport system clustered together (Figure 5C),
namely MsmE, MsmF, MsmG and MsmK for substrate binding, membrane spanning
proteins and nucleotide binding unit, respectively. For MsmE, MsmF and MsmG, three
consistent sub-clusters were obtained: (i) the two Bacillus species; (ii) L. acidophilus, S.
mutans and S. pneumoniae from the operons bearing a galactosidase; (iii) L. acidophilus
and S. pneumoniae from the operons bearing a fructosidase.
52
For the phospho-transferase system (PTS) transporters, the clustering did not
proceed according to phylogeny, especially for lactic acid bacteria, which formed two
separate clusters (Figure 5D). The two distant transporters at the bottom of the tree are
non-PTS sucrose transporters of the major facilitator family of transporters, as suggested
by their initial annotation.
All regulators were repressors, with the exception of those regulators of L.
acidophilus, S. pneumoniae and S. mutans clustering at the bottom of the tree (Figure
5E), which activate transcription of operons bearing an ABC transport system associated
with a galactosidase (Russell et al., 1992). In contrast, the msm regulators for both S.
pneumoniae and L. acidophilus seemed to be repressors similar to that of the sucrose
operon (5E). The helix-turn-helix DNA binding motif of the regulator was very well
conserved amongst selected regulators of the LacI family (Supporting Figure 3A), as
shown previously (Nguyen and Saier, 1995). In contrast, the seven regulators at the
bottom of the tree did not contain this conserved motif.
The fructokinase clustering was the most similar to that of the 16S phylogenetic
tree, with distinct clustering of lactobacillales, bacillales, clostridia, and proteobacteria
(Figure 5F). The lack of correlation between phylogeny, gene architecture and protein
similarity may be due to extensive gene transfer amongst bacteria and independent
sequence divergence.
2.4.6 Catabolite response elements (cre) analysis
Analysis of the promoter-operator region upstream of the msmE gene revealed the
presence of two 17-bp palindromes separated by 30 nucleotides, showing high similarity
53
to a consensus sequence for the cis-acting sites controlling catabolite repression in Gram
positive bacteria, notably Bacillus subtilis (Burne et al., 1999; Weickert and Chambliss,
1990; Miwa et al., 2000; Yamamoto et al., 2001). Several cre-like sequences highly
similar to those found in B. subtilis and S. mutans (Weickert and Chambliss, 1990; Miwa
et al., 2000; Yamamoto et al., 2001) were also retrieved from the promoter-operator
region of the L. acidophilus NCFM sucrose operon as well as that of the other msm locus
(Table 1). Interestingly, sequences nearly identical to the cre-like elements found in the
L. acidophilus msm operon, were found in the promoter-operator region of the msm locus
in S. pneumoniae (Table 1).
2.5 Discussion
The L. acidophilus NCFM msm operon encodes an ABC transporter associated
with a fructosidase that are both induced in the presence of FOS. Sucrose and both types
of oligofructose induced expression of the operon, whereas glucose and fructose did not.
Additionally, glucose repressed expression of the operon, suggesting the presence of a
regulation mechanism of preferred carbohydrate utilization based on availability. Specific
induction by FOS and sucrose, and repression by glucose indicated transcriptional
regulation, likely through cre present in the operator-promoter region, similar to those
found in B. subtilis (Miwa et al., 2000) and S. mutans (Burne et al., 1999). Catabolite
repression is a mechanism widely distributed amongst Gram-positive bacteria, usually
mediated in cis by catabolite response elements, and in trans by repressors of the LacI
family, responsible for transcriptional repression of genes encoding catabolic enzymes in
54
the presence of readily fermentable sugars (Weickert and Chambliss, 1990; Hueck et al.,
1994; Wen and Burne, 2002).
A variety of enzymes have been associated with microbial utilization of fructo-
oligosaccharides, namely: fructosidase EC 3.2.1.26 (Burne et al., 1987; Liebl et al.,
1998), inulinase EC 3.2.1.7 (Onodera and Shiomi, 1988; McKellar and Modler, 1989;
Xiao et al., 1989), levanase EC 3.2.1.65 (Menendez et al., 2002), fructofuranosidase EC
3.2.1.26 (Muramatsu et al., 1992; Oda and Ito, 2000; Perrin et al., 2000), fructanase EC
3.2.1.80 (Hartemink et al., 1995), and levan biohydrolase EC 3.2.1.64 (Saito et al., 2000;
Song et al., 2002). Despite the semantic diversity, these enzymes are functionally related,
and should be considered as members of the same β-fructosidase super-family that
incorporates members of both glycosyl family 32 and 68 (Naumoff, 2001). All those
enzymes share the conserved motif H-x(2)-P-x(4)-[LIVM]-N-D-P-N-G, and are all
involved in the hydrolysis of β-D-fructosidic linkages to release fructose. Generally,
fructosidases across genera share approximately 25-30% identity and 35-50% similarity
(Burne et al., 1999), with several regions widely conserved across the glycosyl hydrolase
32 family (Naumoff, 2001). The two residues shown to be involved in the enzymatic
activity of fructan-hydrolases, namely Asp 47 and Cys 230 (Reddy and Maley, 1990;
Liebl et al., 1998), as well as motifs highly conserved in the beta-fructosidase
superfamily, such as the NDPNG, FRDP, and ECP motifs (Liebl et al., 1998; Naumoff,
2001), were extremely well conserved amongst all fructosidase sequences (Supporting
Figure 3B).
Since the L. acidophilus fructosidase was similar to that of T. maritima and S.
mutans’ FruA (see Figure 5B), two enzymes that have experimentally been associated
55
with oligofructose hydrolysis (Burne et al., 1987; Liebl et al., 1998), we initially
hypothesized that BfrA is responsible for FOS hydrolysis. Induction and gene
inactivation data confirmed the correlation between the msm locus and FOS utilization.
The L. acidophilus BfrA fructosidase was most similar to that of T. maritima, which has
the ability to release fructose from sucrose, raffinose, levan (β2,6) and inulin (β2,1) in an
exo-type manner (Liebl et al., 1998). It was also very similar to other enzymes which
have been characterized experimentally, and associated with hydrolysis of FOS
compounds by S. mutans (Burne et al., 1999) and M. laevaniformans (Song et al., 2002).
Analysis of FOS degradation by S. mutans showed that FruA is involved in hydrolysis of
levan, inulin, sucrose and raffinose (Burne et al., 1987; Russell et al., 1992; Hartemink et
al., 1995; Burne et al., 1999). Additionally, it was shown that expression of this gene was
regulated by catabolite response elements (Burne et al., 1999; Wen et al., 2002) and that
fruA transcription was induced by levan, inulin and sucrose, whereas repressed by readily
metabolizable hexoses (Burne et al., 1987; Burne et al., 1999).
In S. mutans, FruA was shown to be an extracellular enzyme, which is anchored
to the cell wall by a LPxTG motif (Burne and Penders, 1992), that catalyses the
degradation of available complex carbohydrates outside of the cell. Additionally,
microbial fructosidases associated with FOS hydrolysis such as M. laevaniformans LevM
(Song et al., 2002) and S. exfoliatus levanbiohydrolase (Saito et al., 2000) have been
reported as extracellular enzymes as well. In contrast, the L. acidophilus NCFM
fructosidase does not contain an anchoring signal, thus is likely a cytoplasmic enzyme
requiring transport of its substrate(s) through the cell membrane. No additional secreted
levanase or inulinase was found in the L. acidophilus genome sequence. Since transporter
56
genes are often co-expressed with genes involved in the metabolism of the transported
compounds (Lambert et al., 2001), in silico analysis of the msm operon indicates that the
substrate of the fructosidase is transported by an ABC transport system. This is rather
unusual since when the fructosidase is not extracellular, the fructosidase gene is
commonly associated with a sucrose PTS transporter (Figure 4), notably in lactococci,
streptococci and bacilli (Hiratsuka et al., 1998; Luesink et al., 1999), or a sucrose
permease of the major facilitator family, as in B. longum. Those fructosidases usually
associated with PTS transporters are generally sucrose-6-phosphate hydrolases that do
not have FOS as cognate substrate. Therefore, L. acidophilus NCFM may have combined
the ABC transport system usually associated with an alpha-galactosidase, with a
fructosidases, in the msm locus. The genetic makeup of NCFM is seemingly distinct, and
exclusively similar to that of S. pneumoniae. Additionally, recent evidence in L.
paracasei suggested that an ABC transport system might be involved in FOS utilization
(Kaplan and Hutkins, 2003), which further supports the hypothesis that FOS is
transported by an ABC transporter in L. acidophilus.
Lateral gene transfer (LGT) has increasingly been shown to account for a
significant number of genes in bacterial genomes (Koonin et al., 2001), and may account
for a large proportion of the strain-specific genes found in microbes, as shown in H.
pylori (Salama et al., 2000), C. jejuni (Dorrell et al., 2001), S. pneumoniae (Hackenbeck
et al., 2001), and T. maritima (Nesbo et al., 2002). Notably, in T. maritima, genes
involved in sugar transport and polysaccharide degradation represent a large proportion
of variable genes, with ABC transporters having the highest horizontal gene transfer
frequency (Nesbo et al., 2002). In addition, it was recently suggested that oligosaccharide
57
catabolic capabilities of B. longum have been expanded through horizontal transfer, as
part of its adaptation to the human GI tract (Schell et al., 2002), and that the large set of
sugar uptake and utilization genes in L. plantarum was acquired through LGT
(Kleerebezem et al., 2003).
Intestinal microbes would benefit greatly from acquisition of gene clusters
involved in transport and catabolism of undigested sugars, especially if they conferred a
competitive edge towards successful colonization of the host GI tract. It is possible that L.
acidophilus acquired the ability to utilize FOS through genetic exchange, since ABC
transporters and polysaccharide degradation enzymes have a high horizontal gene transfer
frequency (Nesbo et al., 2002). The two fructosidase paralogs seemed fairly distant from
one another, sharing 28% identity and 44% similarity, suggesting those genes might have
arisen from LGT rather than gene duplication. Also, since no neighboring genes or
sequences are common to those two genes, a duplication event seems unlikely. Given the
lack of consistency between phylogeny, gene architecture, and protein similarity, it is
possible both the msm and sucrose operons underwent gene rearrangements. However,
there was no evidence the msm cluster was obtained through LGT, since the GC content
was very similar to that of the genome, and there was no discrepancy in the genetic code
usage.
Based on these observations, we conclude that L. acidophilus has combined the
ABC transport system derived from the raffinose operon with a β-fructosidase to form a
distinct gene cluster involved in transport and catabolism of prebiotic compounds
including FOS, suggesting a possible adaptation of the sugar catabolism system towards
different complex sugars. The catabolic properties of this operon might differ from those
58
of the raffinose and sucrose operons (Figure 9). In light of the theory that environmental
factors and ecology might be dominant over phylogeny for variable genes (Nesbo et al.,
2002), we may hypothesize that L. acidophilus has acquired FOS utilization capabilities
through LGT, or rearranged its genetic make-up to build a competitive edge towards
colonization of the human GI tract by using prebiotic compounds, ultimately contributing
to a more beneficial microbiota.
59
2.6 References
Ajdic, D., McShan, W. M., McLaughlin, R. E., Savic, G., Chang, J., Carson, M. B., Primeaux, C., Tian, R., Kenton, S., Jia, H., Lin, S., Qian, Y., Li, S., Zhu, H., Najar, F., Lai, H., White, J., Roe, B. A. & Ferretti, J. J. (2002) Proc. Natl. Acad. Sci. USA 99, 14434-14439
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. (1990) J. Mol. Biol.
215, 403-410 Barefoot, S. F. & Klaenhammer, T. R. (1983) Appl. Environ. Microbiol. 45, 1808-1815 Braibant, M., Gilot, P. & Content, J. (2000) FEMS Microbiol. Rev. 24, 449-467 Burne, R. A., Schilling, K., Bowen, W. H. & Yasbin, R. E. (1987) J. Bacteriol. 169,
4507-4517 Burne, R. A. & Penders, J. E. (1992) Infect. Immun. 60, 4621-4632 Burne, R. A., Wen, Z. T., Chen, Y. Y. M. & Penders, J. E. C. (1999) J. Bacteriol. 181,
2863-2871 Delcher, A. L., Harmon, D., Kasif, S., White, O. & Salzberg, S. L. (1999) Nucleic Acids
Res. 27, 4636-4641 Dorrell, N., Mangan, J. A., Laing, K. G., Hinds, J., Linton, D., Al-Ghusein, H., Barrell,
B. G., Parkhill, J., Stoker, N. G., Karlyshev, A. V., Butcher, P. D. & Wren, B. W. (2001) Genome Res. 11, 1706-1715
Gibson, G. R. & Roberfroid, M. B. (1995) J. Nutr. 125, 1401-1412 Hakenbeck, R., Balmelle, N., Weber, B., Gardes, C., Keck, W. & de Saizieu, A. (2001)
Infect. Immun. 69, 2477-2486 Hartemink, R., Quataert, M. C. J., Vanlaere, K. M. J., Nout, M. J. R. & Rombouts, F. M.
(1995) J. Appl. Bacteriol. 79, 551-557 Hartemink, R., VanLaere, K. M. J. & Rombouts, F. M. (1997) J. Appl. Microbiol. 83,
367-374 Hiratsuka, K., Wang, B., Sato, Y. & Kuramitsu, H. (1998) Infect. Immun. 66, 3736-3743 Hueck, C. J., Hillen, W. & Saier, M. H., Jr. (1994) Res. Microbiol. 145, 503-518
60
Kaplan, H. & Hutkins, R. W. (2000) Appl. Environ. Microbiol. 66, 2682-2684 Kaplan, H., and Hutkins, R. W. (2003) Appl. Environ. Microbiol. 69, 2217-2222 Kleerebezem, M., Boekhorst, J., van Kranenburg, R., Molenaar, D., Kuipers, O. P., Leer,
R., Tarchini, R., Peters, S. A., Sandbrink, H. M., Fiers, M. W., Stiekema, W., Lankhorst, R. M., Bron, P. A., Hoffer, S. M., Groot, M. N., Kerkhoven, R., de Vries, M., Ursing, B., de Vos, W. M. & Siezen, R. J. (2003) Proc. Natl. Acad. Sci. USA 100, 1990-1995
Koonin, E. V., Makarova, K. S. & Aravind, L. (2001) Annu. Rev. Microbiol. 55, 709-742 Krogh, A., Larsson, B., von Heijne, G. & Sonnhammer, E. L. (2001) J. Mol. Biol. 305,
567-580 Kumar, S., Tamura, K., Jakobsen, I. B. & Nei, M. (2001) Bioinformatics 17, 1244-1245 Lambert, A., Osteras, M., Mandon, K., Poggi, M. C. & Le Rudulier, D. (2001) J.
Bacteriol. 183, 4709-4717 Law, J., Buist, G., Haandrikman, A., Kok, J., Venema, G. & Leenhouts, K. (1995) J.
Bacteriol. 177, 7011-7018 Liebl, W., Brem, D. & Gotschlich, A. (1998) Appl. Microbiol. Biotechnol. 50, 55-64 Linton, K. J. & Higgins, C. F. (1998) Mol. Microbiol. 28, 5-13 Luesink, E. J., Marugg, J. D., Kuipers, O. P. & de Vos, W. M. (1999) J. Bacteriol. 181,
1924-1926 McKellar, R. C. & Modler, H. W. (1989) Appl. Microbiol. Biotechnol. 31, 537-541 McLaughlin, R. E. & Ferretti, J. J. (1996) Fems Microbiol. Lett. 140, 261-264 Menendez, C., Hernandez, L., Selman, G., Mendoza, M. F., Hevia, P., Sotolongo, M. &
Arrieta, J. G. (2002) Curr. Microbiol. 45, 5-12 Miwa, Y., Nakata, A., Ogiwara, A., Yamamoto, M. & Fujita, Y. (2000) Nucleic Acids
Res. 28, 1206-1210 Moshfegh, A. J., Friday, J. E., Goldman, J. P. & Ahuja, J. K. C. (1999) J. Nutr. 129,
1407s-1411s Muramatsu, K., Onodera, S., Kikuchi, M. & Shiomi, N. (1992) Biosci. Biotech. Biochem.
56, 1451-1454
61
Naumoff, D. G. (2001) Proteins 42, 66-76 Nesbo, C. L., Nelson, K. E. & Doolittle, W. F. (2002) J. Bacteriol. 184, 4475-4488 Nguyen, C. C. & Saier, M. H., Jr. (1995) FEBS Lett. 377, 98-102 Oda, Y. & Ito, M. (2000) Curr. Microbiol. 41, 392-395 Onodera, S. & Shiomi, N. (1988) Agric. Biol. Chem. 52, 2569-2576 Orrhage, K., Sjostedt, S. & Nord, C. E. (2000) J. Antimicrob. Chemother. 46, 603-612 Perrin, S., Grill, J. P. & Schneider, F. (2000) J. Appl. Microbiol. 88, 968-974 Quentin, Y., Fichant, G. & Denizot, F. (1999) J. Mol. Biol. 287, 467-484 Reddy, V. A. & Maley, F. (1990) J. Biol. Chem. 265, 10817-10820 Russell, R. R. B., Aduseopoku, J., Sutcliffe, I. C., Tao, L. & Ferretti, J. J. (1992) J. Biol.
Chem. 267, 4631-4637 Russell, W. M. & Klaenhammer, T. R. (2001) Appl. Environ. Microbiol. 67, 4361-4364 Rycroft, C. E., Jones, M. R., Gibson, G. R. & Rastall, R. A. (2001) J. Appl. Microbiol.
91, 878-887 Saito, K., Kondo, K., Kojima, I., Yokota, A. & Tomita, F. (2000) Appl. Environ.
Microbiol. 66, 252-256 Salama, N., Guillemin, K., McDaniel, T. K., Sherlock, G., Tompkins, L. & Falkow, S.
(2000) Proc. Natl. Acad. Sci. USA 97, 14668-14673 Salzberg, S. L., Delcher, A. L., Kasif, S. & White, O. (1998) Nucleic Acids Res. 26, 544-
548 Schell, M. A., Karmirantzou, M., Snel, B., Vilanova, D., Berger, B., Pessi, G., Zwahlen,
M. C., Desiere, F., Bork, P., Delley, M., Pridmore, R. D. & Arigoni, F. (2002) Proc. Natl. Acad. Sc.i USA 99, 14422-14427
Song, E. K., Kim, H., Sung, H. K. & Cha, J. (2002) Gene 291, 45-55 Thompson, J. D., Higgins, D. G. & Gibson, T. J. (1994) Nucleic Acids Res. 22, 4673-
4680
62
Van Laere, K. M., Hartemink, R., Bosveld, M., Schols, H. A. & Voragen, A. G. (2000) J. Agric. Food Chem. 48, 1644-1652
Weickert, M. J. & Chambliss, G. H. (1990) Proc. Natl. Acad. Sci. USA 87, 6238-6242 Wen, Z. T. & Burne, R. A. (2002) J. Bacteriol. 184, 126-133 Xiao, R., Tanida, M. & Takao, S. (1989) J. Ferment. Bioeng. 67, 331-334 Yamamoto, H., Serizawa, M., Thompson, J. & Sekiguchi, J. (2001) J. Bacteriol. 183,
5110-5121
63
Table 1. Catabolite responsive elements sequences
Bacterium Sequence* Origin
B. subtilis WTGNAANCGNWNNCW search sequence Miwa et al., 2000
B. subtilis WWTGNAARCGNWWWCAWW new consensus Miwa et al., 2000
B. subtilis TGWAANCGNTNWCA consensus Weickert and Chambliss, 1990
B. subtilis TGTAAGCGCTTACA optimal operator Weickert and Chambliss, 1990
B. subtilis TGTAAACGTTATCA Yamamoto et al., 2001 L. acidophilus cre1 ATTG-AAACGTTT-CAA upstream of msmE L. acidophilus cre2 ATAG-AAACGTTT-CAA upstream of msmE S. pneumoniae cre1 AATG-AAACGTTT-CAA upstream of msmE2 S. pneumoniae cre2 AATG-AAACGTTT-CAA upstream of msmE2 L. acidophilus scr AATAAAAGCGTTTACAT upstream of scrB L. acidophilus cre3 TATGAAAGCGCTTAAAA upstream of msmE2 S. mutans creW AGATAGCGATTTGG Burne et al., 1999 S. mutans creS AGATAGCGCTTACA Burne et al., 1999
* N, any; W, A or T; R, G or A; shaded nucleotides were specifically conserved and consistent with the consensus sequences
64
Table 2. Primers used in this study
Primer Sequence* Gene† Position‡ A GTAATAATAGTCAAAGTGGC msmEf 1,518 B GATCGGATCCAAGATCAATGCTGCTTTAAA msmEf2 1,706 C GGAAGGCTGAAGTAGTTTGC msmEr 2,192 D GATCGAATTCGATACAGGATATGGCATTACG msmEr2 2,355 E AGGATCCATCCATATGCTCCACACT bfrAf 4,655 F AGAATTCAACATGATCAGCACTTCT bfrAr 5,370 G GGAATATCTTCGGCTAATTG bfrAr2 5,540 H CCACTTCAAGTAGCTGTTACTAATA msmGf 4,337 I CTTGAGTAAGATACTTTTGG msmGr 4,469 J GACCAGAAGATATTCACGCC msmKf 6,661 K ACCTGGCTTGTGATAATCAC msmKr 6,833 L GGTCTTTGAACTTGTTCCGC gtfAr 8,269
* underlined sequence indicates restriction site used for cloning † f, indicates forward strand; r, indicates reverse strand. ‡ position of the 5’ end of the primer, relative to the 10,000 bp DNA locus.
65
Table 3. Genes and proteins used for comparative genomic analyses
Bacterium Genome or locus Sequence information
B. anthracis NC_003995 bfrA NP_654697 B. halodurans NC_002570 BH1855 NP_242721, SacP NP242722, BH1857
Figure 1. Operon layout. The start and stop codons are in bold, the putative ribosome binding site is boxed, and the cre-like elements are underlined. Terminators are indicated by hairpin structures
68
msmE
bfrA
msmE
bfrA
Glc Fru Suc GFn Fn ctrl
msmEbfrA
%Fn 1.0 1.0 1.0 1.0 DNA%Glc 0.0 0.1 0.5 1.0 ctrl
msmEbfrA
%Fn 1.0 1.0 1.0 1.0 DNA%Fru 0.0 0.1 0.5 1.0 ctrl
A
B
msmE
bfrA
msmE
bfrA
Glc Fru Suc GFn Fn ctrl
msmEbfrA
%Fn 1.0 1.0 1.0 1.0 DNA%Glc 0.0 0.1 0.5 1.0 ctrl
msmEbfrA
%Fn 1.0 1.0 1.0 1.0 DNA%Fru 0.0 0.1 0.5 1.0 ctrl
A
B
Figure 2. Sugar induction and repression. A. Transcriptional induction of the msmE, and bfrA genes, monitored by RT-PCR (top) and RNA slot blots (bottom). Cells were grown on glucose (Glc), fructose (Fru), sucrose (Suc), FOS GFn, and FOS Fn. Chromosomal DNA was used as a positive control for the probe. B. Transcriptional repression analysis of msmE and bfrA by variable levels of glucose (Glc) and fructose (Fru): 0.1% (5.5 mM), 0.5% (28 mM) and 1.0% (55 mM), in the presence of 1% Fn. Cells were grown in the presence of Fn until OD600nm approximated 0.5-0.6, glucose was added and cells were propagated for an additional 30 minutes
69
Time (hrs)0 2 4 6 8 10 12 14 16 18
Ln O
D60
0nm
e-3
e-2
e-1
e0
e1
fructoseGFnFnlacZFn passage
Time (hrs)0 2 4 6 8 10 12 14 16 18
Ln O
D60
0nm
e-3
e-2
e-1
e0
e1
fructoseGFnFnlacZFn passage
Time (hrs)0 2 4 6 8 10 12 14 16 18
Ln O
D60
0nm
e-3
e-2
e-1
e0
e1
fructoseGFnFnlacZFn passage
Time (hrs)0 2 4 6 8 10 12 14 16 18
Ln O
D60
0nm
e-3
e-2
e-1
e0
e1
fructoseGFnFnlacZFn passage
Figure 3. Growth curves. The two mutants, bfrA (top) and msmE (bottom) were grown on semi-synthetic medium supplemented with 0.5% w/v carbohydrate: fructose (●), GFn (○), Fn (▼), Fn for one passage ( ). The lacZ mutant grown on Fn was used as control (∇)
70
S. pneumoniae
S. mutans
B. subtilis
B. halodurans
L. acidophilus
L. acidophilus
S. pneumoniae
R A E F G S
R A E F G S K
R E F G A
R E F G A
R2 E2 F2 G2 K2 A S2
R E F G B K S
R2 E2 F2 G2 B
E. faecalis
E. faecalis plasmid
S. pyogenes
P. pentosaceus
S. mutans
S. agalactiae
S. pneumoniae
L. acidophilus
L. plantarum
L. lactis
E. coli O157:H7
S. aureus
C. beijerinckii
C. perfringens
P. multocida
V. cholerae
B. subtilis
B. halodurans
C. acetobutylicum
G. stearothermophilus
R. solanacearum
1604 1603 1601
0070 0069 0067
scrR scrB scrA scrK
scrR scrB scrA scrK
scrR scrB scrA scrK
scrR scrB sag1 scrK
scrR scrB scrA scrK
scrR scrB scrA
sacR sacA PTS sacK
sacR sacB sacB sacK
3626 3625 3624 3623
scrR scrB 2040
scrA scrR scrB scrK
1534 1533 sacA 1531
ptsB scrR scrB 1849
0653 scrR 0655 0656
sacT sacP sacA
1855 sacP 1857 sacA
licT 0423 0424 sacA
surT surP surA
scrR scrA scrB
A
B
S. pneumoniae
S. mutans
B. subtilis
B. halodurans
L. acidophilus
L. acidophilus
S. pneumoniae
R A E F G S
R A E F G S K
R E F G A
R E F G A
R2 E2 F2 G2 K2 A S2
R E F G B K S
R2 E2 F2 G2 B
E. faecalis
E. faecalis plasmid
S. pyogenes
P. pentosaceus
S. mutans
S. agalactiae
S. pneumoniae
L. acidophilus
L. plantarum
L. lactis
E. coli O157:H7
S. aureus
C. beijerinckii
C. perfringens
P. multocida
V. cholerae
B. subtilis
B. halodurans
C. acetobutylicum
G. stearothermophilus
R. solanacearum
1604 1603 1601
0070 0069 0067
scrR scrB scrA scrK
scrR scrB scrA scrK
scrR scrB scrA scrK
scrR scrB sag1 scrK
scrR scrB scrA scrK
scrR scrB scrA
sacR sacA PTS sacK
sacR sacB sacB sacK
3626 3625 3624 3623
scrR scrB 2040
scrA scrR scrB scrK
1534 1533 sacA 1531
ptsB scrR scrB 1849
0653 scrR 0655 0656
sacT sacP sacA
1855 sacP 1857 sacA
licT 0423 0424 sacA
surT surP surA
scrR scrA scrB
A
B
Figure 4. Operon architecture analysis. A. Alignment of the msm locus from selected bacteria. Regulators, white; α-galactosidases, blue; ABC transporters, gray; fructosidases, yellow; sucrose phosphorylase, red. B. Alignment of the sucrose locus from selected microbes. Regulators, white; fructosidases, yellow; PTS transporters, green; fructokinase, purple; putative proteins, black
71
S. muta
nsmsm
F
L. acid
ophilus msm
F2
S. pneumoniae msmFS. pneumoniae mspA
L. acidophilus msmF
B. halodurans amyD
B. subtilis amyD
Spneumoniaemsp
B
L. a
cidop
hilu
s m
smG
S. p
neum
onia
em
smG
S. m
utan
sm
smG
L. a
cidop
hilus
msm
G2
B. h
alodu
rans
amyC
B. subtilis
amyC
S. pne
umon
iaemsm
E
S. mutans
msmE
L. acidophilus msmE2
S. pneunoniae sbpL. acidophilus msmE
B. halodurans msmEB. subtilis
msm
E
T. maritim
a1416
T. maritim
a1417
S. mutans
msm
K
L. acidophilus msm
K
L. acidophilus msmK2
C 0.2
S. muta
nsmsm
F
L. acid
ophilus msm
F2
S. pneumoniae msmFS. pneumoniae mspA
L. acidophilus msmF
B. halodurans amyD
B. subtilis amyD
Spneumoniaemsp
B
L. a
cidop
hilu
s m
smG
S. p
neum
onia
em
smG
S. m
utan
sm
smG
L. a
cidop
hilus
msm
G2
B. h
alodu
rans
amyC
B. subtilis
amyC
S. pne
umon
iaemsm
E
S. mutans
msmE
L. acidophilus msmE2
S. pneunoniae sbpL. acidophilus msmE
B. halodurans msmEB. subtilis
msm
E
T. maritim
a1416
T. maritim
a1417
S. mutans
msm
K
L. acidophilus msm
K
L. acidophilus msmK2
C 0.2
L. ac
idoph
ilus
L. gasseriL. plantarumP. pentosaceus
L. lactis
S. mutans
S. pneumoniae
S. pyogenes
E. faecalis
B. h
alod
uran
s
B. sub
tilis
G. stearothermophilus
C. acetobutylicum
C. beijerinckiiC. perfringensT. maritima
S. aureus
A 0.05
B. longum
S. agalactiae
L. ac
idoph
ilus
L. gasseriL. plantarumP. pentosaceus
L. lactis
S. mutans
S. pneumoniae
S. pyogenes
E. faecalis
B. h
alod
uran
s
B. sub
tilis
G. stearothermophilus
C. acetobutylicum
C. beijerinckiiC. perfringensT. maritima
S. aureus
A 0.05
B. longum
S. agalactiae
B 0.2
L. lactisL. acidophilusL. gasseri 38
faecalisplantarum
pentosaceus
L. gasseri 58
S. aureus
S. xylosus C. b
eijin
rinck
iiC
. per
fring
ens
B. a
nthr
acis
C. acetobutylicumB. su
btilis
sacA
B. halodurans
T. maritim
aL. acidophilus bfrA
S. mutans
fruAB. subtilis
sacC
M. laevaniform
ansS. m
utansfruB
Sneumoniae
sacA
G. stearothermophilus
S. m
utan
sS
. sob
rinus
E. fa
ecal
isp
S. p
neum
onia
epyogenes
o
agala
ctiae B. longum
B 0.2
L. lactisL. acidophilusL. gasseri 38
faecalisplantarum
pentosaceus
L. gasseri 58
S. aureus
S. xylosus C. b
eijin
rinck
iiC
. per
fring
ens
B. a
nthr
acis
C. acetobutylicumB. su
btilis
sacA
B. halodurans
T. maritim
aL. acidophilus bfrA
S. mutans
fruAB. subtilis
sacC
M. laevaniform
ansS. m
utansfruB
Sneumoniae
sacA
G. stearothermophilus
S. m
utan
sS
. sob
rinus
E. fa
ecal
isp
S. p
neum
onia
epyogenes
o
E. L.
P.
S.
S.
E. L.
P.
S.
S. aga
lactia
e B. longum
E 0.5
L. acidophilus scrR
L. gasseri
P. pentosaceus
E. faecalisL. lactis
E. fa
ecali
s p
S. mutans
S. p
yoge
nes
S. p
neum
onia
e
L. acidophilus R
S. pneumoniae
R2us
E. coli
C. b
eije
rinck
iiC
. per
fring
ens
B. h
alodu
rans
S. aureus
S. xylosus
B. halodurans msmR
B. sub ilis msmR
B. subtilissacT
G. stearotherm
ophilus
C. acetobutylicum
S. pneumoniae
R
S. mutans
R
L. acidophilus R2
B. longum
S. a
gala
ctia
e
L. plantarum
E 0.5
L. acidophilus scrR
L. gasseri
P. pentosaceus
E. faecalisL. lactis
E. fa
ecali
s p
S. mutans
S. p
yoge
nes
S. p
neum
onia
e
L. acidophilus R
S. pneumoniae
R2us
E. coli
C. b
eije
rinck
iiC
. per
fring
ens
B. h
alodu
rans
S. aureus
S. xylosus
B. halodurans msmR
B. sub ilis msmR
B. subtilissacT
G. stearotherm
ophilus
C. acetobutylicum
S. pneumoniae
R
S. mutans
R
L. acidophilus R2
B. longum
S. a
gala
ctia
e
L. plantarum
D 0.2
L. a
cido
philu
s
L. g
asse
ri21
E. fa
ecal
isp
S. muta
ns
L. sakei
P. pentosaceus
.
S. xylosus
V. alginolyticus C. a
ceto
buty
licum
Bu
ilis
B. halodura
ns
G. stearothermophilus
calis
C. beijerinckiiL. gasseri 58
S. sobrinus
S. pyogenes
S. pneumoniaeC. perfringens
L. lactisS. agalactiae
B. longum
L. plantarum
D 0.2
L. a
cido
philu
s
L. g
asse
ri21
E. fa
ecal
isp
S. muta
ns
L. sakei
P. pentosaceus
.
S. xylosus
V. alginolyticus C. a
ceto
buty
licum
Bu
ilis
B. halodura
ns
G. stearothermophilus
calis
C. beijerinckiiL. gasseri 58
S. sobrinus
S. pyogenes
S. pneumoniaeC. perfringens
L. lactisS. agalactiae
B. longum
L. plantarum
F 0.2
C. beijerin
ckiiC. perfringens
C. acetobutylicum
B. subtilis
T. maritim
aS. aureus
B. h
alod
uran
s
.
s
r
P. pentosaceus
L. fermentum
L. acidophilus
L. gasseri
L. lactisS. pneumoniae
S. mutans
S. pyogenes S. agalactiae
L. plantarum
F 0.2
C. beijerin
ckiiC. perfringens
C. acetobutylicum
B. subtilis
T. maritim
aS. aureus
B. h
alod
uran
s
.
s
r
P. pentosaceus
L. fermentum
L. acidophilus
L. gasseri
L. lactisS. pneumoniae
S. mutans
S. pyogenes S. agalactiae
L. plantarum
S. muta
nsmsm
F
L. acid
ophilus msm
F2
S. pneumoniae msmFS. pneumoniae mspA
L. acidophilus msmF
B. halodurans amyD
B. subtilis amyD
Spneumoniaemsp
B
L. a
cidop
hilu
s m
smG
S. p
neum
onia
em
smG
S. m
utan
sm
smG
L. a
cidop
hilus
msm
G2
B. h
alodu
rans
amyC
B. subtilis
amyC
S. pne
umon
iaemsm
E
S. mutans
msmE
L. acidophilus msmE2
S. pneunoniae sbpL. acidophilus msmE
B. halodurans msmEB. subtilis
msm
E
T. maritim
a1416
T. maritim
a1417
S. mutans
msm
K
L. acidophilus msm
K
L. acidophilus msmK2
C 0.2
S. muta
nsmsm
F
L. acid
ophilus msm
F2
S. pneumoniae msmFS. pneumoniae mspA
L. acidophilus msmF
B. halodurans amyD
B. subtilis amyD
Spneumoniaemsp
B
L. a
cidop
hilu
s m
smG
S. p
neum
onia
em
smG
S. m
utan
sm
smG
L. a
cidop
hilus
msm
G2
B. h
alodu
rans
amyC
B. subtilis
amyC
S. pne
umon
iaemsm
E
S. mutans
msmE
L. acidophilus msmE2
S. pneunoniae sbpL. acidophilus msmE
B. halodurans msmEB. subtilis
msm
E
T. maritim
a1416
T. maritim
a1417
S. mutans
msm
K
L. acidophilus msm
K
L. acidophilus msmK2
C 0.2
S. muta
nsmsm
F
L. acid
ophilus msm
F2
S. pneumoniae msmFS. pneumoniae mspA
L. acidophilus msmF
B. halodurans amyD
B. subtilis amyD
Spneumoniaemsp
B
L. a
cidop
hilu
s m
smG
S. p
neum
onia
em
smG
S. m
utan
sm
smG
L. a
cidop
hilus
msm
G2
B. h
alodu
rans
amyC
B. subtilis
amyC
S. pne
umon
iaemsm
E
S. mutans
msmE
L. acidophilus msmE2
S. pneunoniae sbpL. acidophilus msmE
B. halodurans msmEB. subtilis
msm
E
T. maritim
a1416
T. maritim
a1417
S. mutans
msm
K
L. acidophilus msm
K
L. acidophilus msmK2
C 0.2
S. muta
nsmsm
F
L. acid
ophilus msm
F2
S. pneumoniae msmFS. pneumoniae mspA
L. acidophilus msmF
B. halodurans amyD
B. subtilis amyD
Spneumoniaemsp
B
L. a
cidop
hilu
s m
smG
S. p
neum
onia
em
smG
S. m
utan
sm
smG
L. a
cidop
hilus
msm
G2
B. h
alodu
rans
amyC
B. subtilis
amyC
S. pne
umon
iaemsm
E
S. mutans
msmE
L. acidophilus msmE2
S. pneunoniae sbpL. acidophilus msmE
B. halodurans msmEB. subtilis
msm
E
T. maritim
a1416
T. maritim
a1417
S. mutans
msm
K
L. acidophilus msm
K
L. acidophilus msmK2
C 0.2
L. ac
idoph
ilus
L. gasseriL. plantarumP. pentosaceus
L. lactis
S. mutans
S. pneumoniae
S. pyogenes
E. faecalis
B. h
alod
uran
s
B. sub
tilis
G. stearothermophilus
C. acetobutylicum
C. beijerinckiiC. perfringensT. maritima
S. aureus
A 0.05
B. longum
S. agalactiae
L. ac
idoph
ilus
L. gasseriL. plantarumP. pentosaceus
L. lactis
S. mutans
S. pneumoniae
S. pyogenes
E. faecalis
B. h
alod
uran
s
B. sub
tilis
G. stearothermophilus
C. acetobutylicum
C. beijerinckiiC. perfringensT. maritima
S. aureus
A 0.05
B. longum
S. agalactiae
L. ac
idoph
ilus
L. gasseriL. plantarumP. pentosaceus
L. lactis
S. mutans
S. pneumoniae
S. pyogenes
E. faecalis
B. h
alod
uran
s
B. sub
tilis
G. stearothermophilus
C. acetobutylicum
C. beijerinckiiC. perfringensT. maritima
S. aureus
A 0.05
B. longum
S. agalactiae
L. ac
idoph
ilus
L. gasseriL. plantarumP. pentosaceus
L. lactis
S. mutans
S. pneumoniae
S. pyogenes
E. faecalis
B. h
alod
uran
s
B. sub
tilis
G. stearothermophilus
C. acetobutylicum
C. beijerinckiiC. perfringensT. maritima
S. aureus
A 0.05
B. longum
S. agalactiae
B 0.2
L. lactisL. acidophilusL. gasseri 38
faecalisplantarum
pentosaceus
L. gasseri 58
S. aureus
S. xylosus C. b
eijin
rinck
iiC
. per
fring
ens
B. a
nthr
acis
C. acetobutylicumB. su
btilis
sacA
B. halodurans
T. maritim
aL. acidophilus bfrA
S. mutans
fruAB. subtilis
sacC
M. laevaniform
ansS. m
utansfruB
Sneumoniae
sacA
G. stearothermophilus
S. m
utan
sS
. sob
rinus
E. fa
ecal
isp
S. p
neum
onia
epyogenes
o
agala
ctiae B. longum
B 0.2
L. lactisL. acidophilusL. gasseri 38
faecalisplantarum
pentosaceus
L. gasseri 58
S. aureus
S. xylosus C. b
eijin
rinck
iiC
. per
fring
ens
B. a
nthr
acis
C. acetobutylicumB. su
btilis
sacA
B. halodurans
T. maritim
aL. acidophilus bfrA
S. mutans
fruAB. subtilis
sacC
M. laevaniform
ansS. m
utansfruB
Sneumoniae
sacA
G. stearothermophilus
S. m
utan
sS
. sob
rinus
E. fa
ecal
isp
S. p
neum
onia
epyogenes
o
E. L.
P.
S.
S.
E. L.
P.
S.
S. aga
lactia
e B. longum
B 0.2
L. lactisL. acidophilusL. gasseri 38
faecalisplantarum
pentosaceus
L. gasseri 58
S. aureus
S. xylosus C. b
eijin
rinck
iiC
. per
fring
ens
B. a
nthr
acis
C. acetobutylicumB. su
btilis
sacA
B. halodurans
T. maritim
aL. acidophilus bfrA
S. mutans
fruAB. subtilis
sacC
M. laevaniform
ansS. m
utansfruB
Sneumoniae
sacA
G. stearothermophilus
S. m
utan
sS
. sob
rinus
E. fa
ecal
isp
S. p
neum
onia
epyogenes
o
agala
ctiae B. longum
B 0.2
L. lactisL. acidophilusL. gasseri 38
faecalisplantarum
pentosaceus
L. gasseri 58
S. aureus
S. xylosus C. b
eijin
rinck
iiC
. per
fring
ens
B. a
nthr
acis
C. acetobutylicumB. su
btilis
sacA
B. halodurans
T. maritim
aL. acidophilus bfrA
S. mutans
fruAB. subtilis
sacC
M. laevaniform
ansS. m
utansfruB
Sneumoniae
sacA
G. stearothermophilus
S. m
utan
sS
. sob
rinus
E. fa
ecal
isp
S. p
neum
onia
epyogenes
o
E. L.
P.
S.
S.
E. L.
P.
S.
S. aga
lactia
e B. longum
E 0.5
L. acidophilus scrR
L. gasseri
P. pentosaceus
E. faecalisL. lactis
E. fa
ecali
s p
S. mutans
S. p
yoge
nes
S. p
neum
onia
e
L. acidophilus R
S. pneumoniae
R2us
E. coli
C. b
eije
rinck
iiC
. per
fring
ens
B. h
alodu
rans
S. aureus
S. xylosus
B. halodurans msmR
B. sub ilis msmR
B. subtilissacT
G. stearotherm
ophilus
C. acetobutylicum
S. pneumoniae
R
S. mutans
R
L. acidophilus R2
B. longum
S. a
gala
ctia
e
L. plantarum
E 0.5
L. acidophilus scrR
L. gasseri
P. pentosaceus
E. faecalisL. lactis
E. fa
ecali
s p
S. mutans
S. p
yoge
nes
S. p
neum
onia
e
L. acidophilus R
S. pneumoniae
R2us
E. coli
C. b
eije
rinck
iiC
. per
fring
ens
B. h
alodu
rans
S. aureus
S. xylosus
B. halodurans msmR
B. sub ilis msmR
B. subtilissacT
G. stearotherm
ophilus
C. acetobutylicum
S. pneumoniae
R
S. mutans
R
L. acidophilus R2
B. longum
S. a
gala
ctia
e
L. plantarum
E 0.5
L. acidophilus scrR
L. gasseri
P. pentosaceus
E. faecalisL. lactis
E. fa
ecali
s p
S. mutans
S. p
yoge
nes
S. p
neum
onia
e
L. acidophilus R
S. pneumoniae
R2us
E. coli
C. b
eije
rinck
iiC
. per
fring
ens
B. h
alodu
rans
S. aureus
S. xylosus
B. halodurans msmR
B. sub ilis msmR
B. subtilissacT
G. stearotherm
ophilus
C. acetobutylicum
S. pneumoniae
R
S. mutans
R
L. acidophilus R2
B. longum
S. a
gala
ctia
e
L. plantarum
E 0.5
L. acidophilus scrR
L. gasseri
P. pentosaceus
E. faecalisL. lactis
E. fa
ecali
s p
S. mutans
S. p
yoge
nes
S. p
neum
onia
e
L. acidophilus R
S. pneumoniae
R2us
E. coli
C. b
eije
rinck
iiC
. per
fring
ens
B. h
alodu
rans
S. aureus
S. xylosus
B. halodurans msmR
B. sub ilis msmR
B. subtilissacT
G. stearotherm
ophilus
C. acetobutylicum
S. pneumoniae
R
S. mutans
R
L. acidophilus R2
B. longum
S. a
gala
ctia
e
L. plantarum
D 0.2
L. a
cido
philu
s
L. g
asse
ri21
E. fa
ecal
isp
S. muta
ns
L. sakei
P. pentosaceus
.
S. xylosus
V. alginolyticus C. a
ceto
buty
licum
Bu
ilis
B. halodura
ns
G. stearothermophilus
calis
C. beijerinckiiL. gasseri 58
S. sobrinus
S. pyogenes
S. pneumoniaeC. perfringens
L. lactisS. agalactiae
B. longum
L. plantarum
D 0.2
L. a
cido
philu
s
L. g
asse
ri21
E. fa
ecal
isp
S. muta
ns
L. sakei
P. pentosaceus
.
S. xylosus
V. alginolyticus C. a
ceto
buty
licum
Bu
ilis
B. halodura
ns
G. stearothermophilus
calis
C. beijerinckiiL. gasseri 58
S. sobrinus
S. pyogenes
S. pneumoniaeC. perfringens
L. lactisS. agalactiae
B. longum
L. plantarum
D 0.2
L. a
cido
philu
s
L. g
asse
ri21
E. fa
ecal
isp
S. muta
ns
L. sakei
P. pentosaceus
.
S. xylosus
V. alginolyticus C. a
ceto
buty
licum
Bu
ilis
B. halodura
ns
G. stearothermophilus
calis
C. beijerinckiiL. gasseri 58
S. sobrinus
S. pyogenes
S. pneumoniaeC. perfringens
L. lactisS. agalactiae
B. longum
L. plantarum
D 0.2
L. a
cido
philu
s
L. g
asse
ri21
E. fa
ecal
isp
S. muta
ns
L. sakei
P. pentosaceus
.
S. xylosus
V. alginolyticus C. a
ceto
buty
licum
Bu
ilis
B. halodura
ns
G. stearothermophilus
calis
C. beijerinckiiL. gasseri 58
S. sobrinus
S. pyogenes
S. pneumoniaeC. perfringens
L. lactisS. agalactiae
B. longum
L. plantarum
F 0.2
C. beijerin
ckiiC. perfringens
C. acetobutylicum
B. subtilis
T. maritim
aS. aureus
B. h
alod
uran
s
.
s
r
P. pentosaceus
L. fermentum
L. acidophilus
L. gasseri
L. lactisS. pneumoniae
S. mutans
S. pyogenes S. agalactiae
L. plantarum
F 0.2
C. beijerin
ckiiC. perfringens
C. acetobutylicum
B. subtilis
T. maritim
aS. aureus
B. h
alod
uran
s
.
s
r
P. pentosaceus
L. fermentum
L. acidophilus
L. gasseri
L. lactisS. pneumoniae
S. mutans
S. pyogenes S. agalactiae
L. plantarum
R. solanacearum
P. multocida
V. alginolyticus
V. choleraeK. pneum
oniaeS. typhim
uyrium
E. coli R. solanacearum
P. multocida
V. alginolyticus
V. choleraeK. pneum
oniaeS. typhim
uyrium
E. coli
P. multocida
V. alginolyticus
V. c
holer
ae
R. solanacearum
K. pneumoniaeS. typhimurium
. p
E. c li
P. multocida
V. alginolyticus
V. c
holer
ae
R. solanacearum
K. pneumoniaeS. typhimurium
. p
E. c li
V. alginolytic
R. s
olan
acea
rum
P. multocidaK. pneumoniaeS. typhimurium
tV. cholerae
V. alginolytic
R. s
olan
acea
rum
P. multocidaK. pneumoniaeS. typhimurium
tV. cholerae
P multocida
V. cholerae
K. p
neum
onia
e
S. t
yphi
mur
ium
. sbt
E. fae
E. coli
R. s
olan
acea
rum
P multocida
V. cholerae
K. p
neum
onia
e
S. t
yphi
mur
ium
. sbt
E. fae
E. coli
R. s
olan
acea
rum
E. c
oli
S. typ
himuri
um
P. multocida
V. choleraeV alginolyticuR. solanacea um
E. c
oli
S. typ
himuri
um
P. multocida
V. choleraeV alginolyticuR. solanacea um
R. solanacearum
P. multocida
V. alginolyticus
V. choleraeK. pneum
oniaeS. typhim
uyrium
E. coli R. solanacearum
P. multocida
V. alginolyticus
V. choleraeK. pneum
oniaeS. typhim
uyrium
E. coli R. solanacearum
P. multocida
V. alginolyticus
V. choleraeK. pneum
oniaeS. typhim
uyrium
E. coli R. solanacearum
P. multocida
V. alginolyticus
V. choleraeK. pneum
oniaeS. typhim
uyrium
E. coli
P. multocida
V. alginolyticus
V. c
holer
ae
R. solanacearum
K. pneumoniaeS. typhimurium
. p
E. c li
P. multocida
V. alginolyticus
V. c
holer
ae
R. solanacearum
K. pneumoniaeS. typhimurium
. p
E. c li
P. multocida
V. alginolyticus
V. c
holer
ae
R. solanacearum
K. pneumoniaeS. typhimurium
. p
E. c li
P. multocida
V. alginolyticus
V. c
holer
ae
R. solanacearum
K. pneumoniaeS. typhimurium
. p
E. c li
V. alginolytic
R. s
olan
acea
rum
P. multocidaK. pneumoniaeS. typhimurium
tV. cholerae
V. alginolytic
R. s
olan
acea
rum
P. multocidaK. pneumoniaeS. typhimurium
tV. cholerae
V. alginolytic
R. s
olan
acea
rum
P. multocidaK. pneumoniaeS. typhimurium
tV. cholerae
V. alginolytic
R. s
olan
acea
rum
P. multocidaK. pneumoniaeS. typhimurium
tV. cholerae
P multocida
V. cholerae
K. p
neum
onia
e
S. t
yphi
mur
ium
. sbt
E. fae
E. coli
R. s
olan
acea
rum
P multocida
V. cholerae
K. p
neum
onia
e
S. t
yphi
mur
ium
. sbt
E. fae
E. coli
R. s
olan
acea
rum
P multocida
V. cholerae
K. p
neum
onia
e
S. t
yphi
mur
ium
. sbt
E. fae
E. coli
R. s
olan
acea
rum
P multocida
V. cholerae
K. p
neum
onia
e
S. t
yphi
mur
ium
. sbt
E. fae
E. coli
R. s
olan
acea
rum
E. c
oli
S. typ
himuri
um
P. multocida
V. choleraeV alginolyticuR. solanacea um
E. c
oli
S. typ
himuri
um
P. multocida
V. choleraeV alginolyticuR. solanacea um
Figure 5. Neighbor-joining phylogenetic trees. Lactobacillales, black; bacillales, green; clostridia, blue; thermotogae, yellow; proteobacteria, red. A, 16S; B, fructosidase; C, ABC; D, PTS; E, regulators; F, fructokinase. L. acidophilus proteins are boxed, and shaded when encoded by the msm locus. Bars indicate scales for computed pairwise distances
72
msmE msmF msmG bfrA msmK gtfA
A C B D
A B C D a
M noR
T R
T D
NA
no
RT
RT
DN
A
noR
T R
T D
NA
no
RT
RT
DN
A
Figure 6. Co-expression of contiguous genes. Co-transcription of contiguous genes was monitored by RT-PCR using primers as shown on the lower panel. In each set of three bands, a negative control did not undergo reverse transcription (left), and a positive control was obtained from chromosomal DNA used as a template for PCR (right)
73
glc fru suc GFn Fn FnRP lac Gal Raff
log 10
cfu/
ml
1e+6
1e+7
1e+8
1e+9
1e+10
bfrAmsmElacZ
ncfm lacZ msmE bfrA
Log 10
cfu/
ml
1e+3
1e+4
1e+5
1e+6
1e+7
1e+8
1e+9
1e+10
fructoseFn
Figure 7. Mutant growth on select carbohydrates. Strains were grown overnight (18 hours) on semi-synthetic medium supplemented with 0.5% w/v carbohydrates, either glucose (Glc), fructose (Fru), sucrose (Suc), FOS-GFn (GFn), FOS-Fn from Orafti (Fn), FOS-Fn from Rhone-Poulenc (FnRP), lactose (Lac), or galactose (Gal). Cell counts obtained after one passage of the bfrA mutant on FOS-Fn are shown in the lower graph.
74
A Helix Turn Helix LacI consensus * TIKDVARLAGVSKSTVSRVLN B. halodurans_msmR MATIKDIAKLANVSNATVSRVLNR 24 B. subtilis_msmR MVRIKDIALKAKVSSATVSRILNE 24 K. pneumoniae_scrR RVTIKDIAELAGVSKATASLVLNG 28 S. typhimurium_scrR RVTIKDIAEQAGVSKATASLVLNG 28 P. multocida_scrR RITLSDIAKCCGLSTTTVSMILNN 31 C. beijerinckii_scrR KVTIQDIANMVNVSKSSVSRYLNN 27 C. perfringens_1533 KVTIQDIANMVGVSKSTVSRYLNG 26 B. halodurans_1855 MTTILDIAKLAGVAKSTVSRYLNG 24 S. aureus_scrR MKNISDIAKLAGVSKSTVSRFLNN 24 S. xylosus_scrR MKNIADIAKIAGVSKSTVSRYLNN 24 E. faecalis_0070 VAKLTDVAELAGVSPTTVSRVINN 35 S. pyogenes_scrR VAKLTDVAALAGVSPTTVSRVINK 25 S. agalactiae_scrR VAKLTDVAALAGVSPTTVSRVINK 25 S. mutans_scrR VAKLTDVAKLAGVSPTTVSRVINR 25 S. pneumoniae_scrR VAKLTDVAKLAGVSPTTVSRVINK 25 E. faecalis_1604 VVKLTDVAKLAGVSPTTVSRVINN 28 L. lactis_sacR MIKLEDVANKAGVSVTTVSRVINR 24 L. acidophilus_scrR PAKLSDVAREAGCSVTTVSRVINN 25 L. gasseri_scrR MVKLTDVAAKAGCSVTTVSRVINN 26 L. plantarum_sacR KPKLNDVAKLAGVSATTVSRVINN 25 P. pentosaceus_scrR KPKLNDVAKLAGVSATTVSRVINN 25 L. acidophilus_msmR MATMKDVAQRAGVGVGTVSRVINH 23 S. pneumoniae_scrR2 SITMKDVALEAGVSVGTVSRVINK 32 V. alginolyticus_scrR --SLHDVARLAGVSKSTVSRVIND 24 E. coli_3626 MASLKDVARLAGVSMMTVSRVMHN 24 B. longum_BL0107 MVTMKEIANKAGVSVSTVSLVLNG 25 R. solanacearum_scrR RPTIRDVATLAGVSTSTVSRVLNN 34
B NDPNG L. acidophilus_bfrA WINDPNGL 38 S. pneumoniae_sacA WINDPNGF 45 E. coli_3625 WMNDPNGL 43 B. longum WINDPNGL 45 P. multocida_scrB LLNDPNGL 71 V. alginolyticus_scrB LLNDPNGL 55 L. acidophilus_scrB LINDPNGF 51 L. gasseri_scrB38 LLNDPNGF 51 S. mutans_scrB LLNDPNGF 51 S. sobrinus_scrB LLNDPNGF 51 E. faecalisp_0069 LLNDPNGF 51 S. pneumoniae_scrB LLNDPNGF 51 S. agalactiae LLNDPNGH 51 S. pyogenes_scrB LLNDPNGF 51 L. lactis_sacA LLNDPNGF 51 L. plantarum_sacA LLNDPNGF 51 P. pentosaceus_scrB LLNDPNGF 51 E. faecalis_1603 LLNDPNGF 56 S. aureus_scrB LLNDPNGL 52 S. xylosus_scrB LLNDPNGL 52 L. gasseri_scrB58 MLGDPNGF 77 C. beijinrinckii_scrB LINDPNGL 45 C. perfringens_sacA LINDPNGL 43 B. subtilis_sacA LLNDPNGV 47 G. stearothermophilus surA LMNDPNGL 47 B. halodurans_sacA LLNDPNGF 47 C. acetobutylicum_sacA FMNDPNGL 45 K. pneumoniae_scrB LLNDPNGF 45 S. typhimurium_scrB LMNDPNGF 45 V. cholerae_0655 LLNDPNGF 112 R. solanacearum_scrB LLNDPNGL 33 T. maritima_bfrA WMNDPNGL 21 B. subtilis_sacC WMNDPNGM 53 S. mutans_fruA WANDPNGL 499 M. laevaniformans_levM WMNDPQRP 90 S. mutans_fruB FMNDIQTI 52
Figure 8. Motifs highly conserved amongst repressors and fructosidases. A, conserved helix-turn-helix motif of the regulators, the consensus sequence was obtained from Nguyen et al., 1995; B, conserved motifs of the β-fructosidases
75
scrR scrAscrB
msmE msmF msmG bfrA msmK gtfA
msmE2 msmF2 msmG2 msmK2 melA gtfA2
msmR
msmR2
SUCROSE
FOS
RAFFINOSE
SUCROSE-6P
FOS
RAFFINOSE
GLUCOSE-6P + FRUCTOSE
SUCROSE
+FRUCTOSE
SUCROSE+ GALACTOSE
GLUCOSE-1P + FRUCTOSE
ScrA ScrB
3.2.1.26
MsmEFGKBfrA
3.2.1.26
MsmEFGK2MelA
3.2.1.22
GtfA
2.4.1.7GLUCOSE-1P
+FRUCTOSE
GtfA2
2.4.1.7
scrR scrAscrB
msmE msmF msmG bfrA msmK gtfA
msmE2 msmF2 msmG2 msmK2 melA gtfA2
msmR
msmR2
SUCROSE
FOS
RAFFINOSE
SUCROSE-6P
FOS
RAFFINOSE
GLUCOSE-6P + FRUCTOSE
SUCROSE
+FRUCTOSE
SUCROSE+ GALACTOSE
GLUCOSE-1P + FRUCTOSE
ScrA ScrB
3.2.1.26
MsmEFGKBfrA
3.2.1.26
MsmEFGK2MelA
3.2.1.22
GtfA
2.4.1.7GLUCOSE-1P
+FRUCTOSE
GtfA2
2.4.1.7
Figure 9. Biochemical pathways. Biochemical pathways describing the likely reactions carried out by the enzymes and transporters encoded in the sucrose, FOS and raffinose loci. For the scr operon, sucrose is transported across the membrane and phosphorylated by a PTS transporter; the sucrose phosphate hydrolase hydrolyses the phosphorylated sucrose molecule into fructose and glucose-6-phosphate, and fructose. For the msm operon, FOS is transported across the membrane by an ABC transporter; the fructosidase hydrolyses fructose moieties, and the sucrose phosphorylase hydrolyses sucrose into glucose-1-phosphate and fructose. For the msm2 operon, raffinose is transported across the membrane by an ABC transporter, the alpha-galactosidase hydrolyses the galactose moiety, and the sucrose phosphorylase hydrolyses sucrose into glucose-1-phosphate and fructose.
76
Chapter III – Global analysis of carbohydrate utilization and transcriptional regulation in Lactobacillus acidophilus using whole-
genome cDNA microarrays
77
3.1 Abstract
The transport and catabolic machinery involved in carbohydrate utilization by the
probiotic lactic acid bacterium Lactobacillus acidophilus was characterized using whole-
genome cDNA microarrays. Global transcriptional profiles were determined for growth
on glucose, fructose, galactose, sucrose, lactose, trehalose, raffinose and
fructooligosaccharides. Hybridizations were carried out using a round robin design, and
microarray data was analyzed using a two-stage mixed model ANOVA. Genes
differentially expressed were visualized by hierarchical clustering, volcano plots and
novel 3-way contour plots. Quantitative PCR confirmed the fold induction determined by
microarrays. Although 379 genes (20% of the genome) were significantly differentially
expressed, only 63 genes showed induction above 4 fold, indicating that there was a small
number of highly induced genes, which included a variety of carbohydrate transporters
and sugar hydrolases. Specifically, members of the phosphoenolpyruvate: sugar
phosphotransferase system family of transporters were identified for uptake of glucose,
fructose, sucrose and trehalose. Transporters of the ATP binding cassette family were
identified for uptake of raffinose and fructooligosaccharides. A member of the LacS
subfamily of galactoside-pentose-hexuronide translocators was identified for uptake of
galactose and lactose. Saccharolytic enzymes likely involved in the metabolism of mono-
, di- and poly-saccharides into substrates of glycolysis were also identified, including the
enzymatic machinery of the Leloir pathway, involved in the catabolism of galactosides.
Results suggested the transcriptome is regulated by carbon catabolite repression.
Although substrate-specific carbohydrate transporters and hydrolases were regulated at
the transcriptional level, genes encoding regulatory proteins CcpA, Hpr, HprK/P and EI
78
were consistently highly expressed. Collectively, microarray data revealed coordinated
and regulated transcription of genes involved in sugar uptake and metabolism based on
carbohydrate availability in the environment. This dynamic adaptation to environmental
conditions likely contributes to competition with commensals for limited carbohydrate
sources available in the human gastrointestinal tract. This model study provides a global
view of carbohydrate metabolism in L. acidophilus, and illustrates how recently
implemented genomic tools can be used to investigate microbial physiology on a global
scale.
79
3.2 Introduction
A large, diverse and dynamic microbial community resides in the human
gastrointestinal tract (Tannock, 1999). In particular, the complex intestinal microbial
population includes beneficial bacteria such as bifidobacteria and lactobacilli (Gibson and
Roberfroid, 1995). Among species considered important for human health, a number of
documented lactobacilli have been characterized as probiotics (Reid, 1999). Probiotics
are generally defined as “live microorganisms which, when administered in adequate
amounts confer a health benefit on the host” (Reid et al, 2003). For such microbes,
survival and residence in the intestine relies on their ability to survive gastric passage,
adhere to epithelial cells and utilize nutrients available in the intestine.
Lactobacillus acidophilus NCFM is a gram-positive probiotic lactic acid
bacterium which has the ability to survive in the gastrointestinal tract (Sanders and
Klaenhammer, 2001; Sui et al., 2002), adhere to human epithelial cells in vitro (Greene
and Klaenhammer, 1994; Sanders and Klaenhammer, 2001), modify fecal flora (Sui et
al., 2002), modulate the host immune response (Varcoe et al., 2003), and prevent
microbial gastroenteritis (Varcoe et al., 2003). Additionally, L. acidophilus NCFM has
the ability to utilize prebiotic compounds, which may contribute to the organism’s ability
to compete in the human GIT (Barrangou et al., 2003).
Undigested carbohydrates are a primary source of energy for intestinal microbes
residing in the large intestine. Non-digestible oligosaccharides (NDO) consist primarily
of plant carbohydrates that are resistant to enzymatic degradation and are not absorbed in
the upper intestinal tract. Such dietary compounds eventually reach the large intestine,
whereby they are hydrolyzed by a limited range of organisms. As a result, NDO have the
80
ability to selectively modulate the composition of the intestinal microflora (Sui et al.,
2002). NDO such as raffinose and fructooligosaccharides have been shown to selectively
promote the growth of probiotic species, thus are considered prebiotic compounds
(Benno et al., 1987; Gibson et al., 1995). Prebiotics are defined as non-digestible
substances that provide a beneficial physiological effect on the host by selectively
stimulating the favorable growth or activity of a limited number of indigenous bacteria
(Reid et al., 2003). Although considerable attention has been devoted to studying
modulation of the intestinal flora by prebiotics, the molecular mechanisms involved in
uptake and metabolism of those compounds by desirable intestinal microbes remains
mostly uncharacterized.
Lactic acid bacteria are a heterogeneous family of microbes which can use a
variety of nutrients. Specifically, bifidobacteria, streptococci and lactobacilli possess
specialized saccharolytic potentials which reflect the nutrient availability in their
respective environments (Ajdic et al., 2002; Schell et al., 2002; Kleerebezem et al., 2003;
Pridmore et al., 2004). In particular, the versatile saccharolytic potential of L. acidophilus
likely reflects its ability to efficiently utilize energy sources available in the intestinal
environment. Although the Lactobacillus acidophilus NCFM genome encodes numerous
putative genes potentially involved in uptake and metabolism of a variety of
carbohydrates (Altermann et al., 2004), little information is available regarding their
biological functions and expression profiles.
The objective of this study was to use cDNA microarrays to characterize and compare
global gene expression in Lactobacillus acidophilus. Global gene transcription profiles
were used to identify uptake systems, catabolic machinery and regulatory networks
81
involved in utilization of eight carbohydrates. This is the first comparative global
transcriptional analysis of the fermentation pathways of a lactic acid bacterium over a
range of carbohydrates.
3.3 Materials and Methods
3.3.1 Bacterial strains and media used in this study
The strain used in this study is L. acidophilus NCFM (NCK56) (Altermann et al.,
2004). Cultures were propagated at 37°C, aerobically in MRS broth (Difco). A semi-
withdraws energy sources from the intestinal environment and deprives other bacteria of
96
access to such resources. Consequently, L. acidophilus may compete well against other
commensals for nutrients.
In summary, a variety of carbohydrate uptake systems were identified and
characterized, with respect to expression profiles in the presence of different
carbohydrates, including PTS, ABC and GHP transporters. The uptake and catabolic
machinery is highly regulated at the transcription level, suggesting the L. acidophilus
transcriptome is flexible, dynamic and designed for efficient carbohydrate utilization.
Differential gene expression indicated the presence of a global carbon catabolite
repression regulatory network. Regulatory proteins were consistently highly expressed,
suggesting regulation at the protein level, rather than the transcriptional level.
Collectively, L. acidophilus appears to be able to efficiently adapt its metabolic
machinery to fluctuating carbohydrate sources available in the nutritional complex
environment of the small intestine. In particular, ABC transporters of the MsmEFG
family involved in uptake of FOS and raffinose likely play an important role in the ability
of L. acidophilus to compete with intestinal commensals for complex sugars that are not
digested by the human host. Ultimately, this information provides new insights into how
undigested dietary compounds influence the intestinal microbial balance. This study is a
model for comparative transcriptional analysis of a bacterium exposed to varying growth
substrates.
97
3.6 References
Ajdic, D., McShan, W. M., McLaughlin, R. E., Savic, G., Chang, J., Carson, M. B., Primeaux, C., Tian, R., Kenton, S., Jia, H., Lin, S., Qian, Y., Li, S., Zhu, H., Najar, F., Lai, H., White, J., Roe, B. A. & Ferretti, J. J. (2002) Proc. Natl. Acad. Sci. USA 99, 14434-14439
Altermann, E., Russell, W. M., Azcarate-Peril, M. A., Barrangou, R., Buck, L. B.,
McAuliffe, O., Souther, N., Dobson, A., Duong, T., Callanan, M., Lick, S., Hamrick, A., Cano, R., & Klaenhammer, T. R. (2004). J. Bacteriol In review
Azcarate-Peril et al., 2004 In review Barrangou R, Altermann E, Hutkins R, Cano, & Klaenhammer, TR. (2003) Proc. Natl.
Acad. Sci. USA 100, 8957-8962 Benno, Y., Endo, K., Shiragami, N., Sayama, K., and Mitsuoka, T. (1987) Bifido. Micro.
6, 59-63 Bogaard, van den P. T. C., Kleerebezem, M., Kuipers, O. P., & De Vos, W. M. (2000) J.
Bacteriol. 182, 5982-9 Boucher, I., Vadeboncoeur, C., & Moineau, S. (2003) Appl. Environ. Microbiol. 69,
4149-56 Chhabra, S. R., Shockley, K. R., Conners, S. B., Scott, K. L., Wolfinger, R. D., & Kelly,
R. M. (2003) J. Biol. Chem. 278,7540-7552 Cochu, A., Vadeboncoeur, C., Moineau, S, & Frenette, M. (2003) Appl. Environ.
Microbiol. 69, 5423-32 Duong, T., Barrangou, R., Russell, M. W., & Klaenhammer, T. R. (2004) In review Eisen, M. B., Spellman, P. T., Brown, P. O., & Botstein, D. (1998) Proc. Natl. Acad. Sci.
USA 95, 14863-8 Fortina, M. G., Ricci, G., Mora, D., Guglielmetti, S., & Manachini, P. L. (2003) Appl.
Environ. Microbiol. 69, 3238-43 Gibson, G. R., Beatty, E. R., Wan, X., & Cummings, J. H. (1995) Gastroent. 108, 975-82 Gibson, G. R. & Roberfroid, M. B. (1995) J. Nutr. 125, 1401-1412. Greene, J. D., & Klaenhammer, T. R. (1994) Appl. Environ. Microbiol. 60, 4487-4494
98
Grossiord, B. P., Luesink, E. J., Vaughan, E. E., Arnaud, A., & De Vos, W. M. (2003) J. Bacteriol. 185, 870-8
Hedge, P., Qi, R., Abernathy, K., Gay, C., Dharap, S., Gaspard, R., Hughes, J. E.,
Snesrud, E., Lee, N., & Quackenbush J. (2000) Biotechniques 29, 548-562 Helden, van J., Andre, B., & Collado-Vides, J. (2000) Yeast 16, 177-87 Hsieh, W. P., Chu, T. M., Wolfinger, R. D., & Gibson, G. (2003) Genetics 165, 747-57 Jin, W., Riley, R. M., Wolfinger, R. D., White, K. P., Passador-Gurgel, G., & Gibson, G.
(2001) Nature Genet. 29, 389-395 Kerr, M. K., and Churchill G. A. (2001) Genet. Res. Camb. 77, 123-8 Kleerebezem, M., Boekhorst, J., van Kranenburg, R., Molenaar, D., Kuipers, O. P., Leer,
R., Tarchini, R., Peters, S. A., Sandbrink, H. M., Fiers, M. W., Stiekema, W., Lankhorst, R. M., Bron, P. A., Hoffer, S. M., Groot, M. N., Kerkhoven, R., de Vries, M., Ursing, B., de Vos, W. M. & Siezen, R. J. (2003) Proc. Natl. Acad. Sci. USA 100, 1990-5.
Lapierre, L., Mollet, B., & Germond, J. E. (2002) J. Bacteriol. 184, 928-35 Lessard, C., Cochu, A., Lemay, J. D., Roy, D., Vaillancourt, K., Frenette, M., Moineau,
S., & Vadeboncoeur, C. (2003) J. Bacteriol. 185, 6764-72 Luesink, E. J., Marugg, J. D., Kuipers, O. P. & de Vos, W. M. (1999) J. Bacteriol. 181,
764-71 Madsen, S. A., Chang, L. C., Hickey, M. C., Rosa, G. J. M., Coussens, P. M., & Burton,
J. L. (2004) Physiol. Genomics 16, 212-21 Mahr, K., Hillen, W., & Titgemeyer, F. (2000) Appl. Environ. Microbiol. 66, 277-83 Mijakovic, I., Poncet, S., Galinier, A., Monedero, V., Fieulaine, S., Janin, J., Nessler, S.,
Marquez, J. A., Scheffzek, K., Hasenbein, S., Hengstenberg, W., & Deutscher, J. (2002) Proc. Natl. Acad. Sci. USA 99, 13442-7
Miwa, Y., Nakata, A., Ogiwara, A., Yamamoto, M. & Fujita, Y. (2000) Nucleic Acids
Res. 28, 1206-10 Muscariello, L., Marasco, R., De Felice M., & Sacco, M. (2001) Appl. Environ.
Microbiol. 67, 2903-7
99
Pridmore RD, Berger B, Desiere F, Vilanova D, Barretto C, Pittet AC, Zwahlen MC, Rouvet M, Altermann E, Barrangou R, Mollet B, Mercenier A, Klaenhammer TR, Arigoni F, & Schell MA. (2004) Proc. Natl. Acad. Sci. USA 101, 2512-2517
Pysz, M. A., Ward, D. E., Shockley, K. R., Montero, C. I., Conners, S. B., Johnson, M.
R., & Kelly, R. M. (2004) Extremophiles 8, 209-17 Reid, G. (1999) Appl. Environ. Microbiol. 65, 3763-6 Reid, G., Sanders, M. E., Gaskins, H. R., Gibson, G. R., Mercenier, A., Rastall, R.,
Roberfroid, M., Rowland, I., Cherbut, C., & Klaenhammer T. R. (2003) J. Clin. Gastroenterol. 37, 105-118
Rosenow, C., Maniar, M., & Trias, J. (1999) Genome Res. 9, 1189-97 Russell, R. R. B., Aduseopoku, J., Sutcliffe, I. C., Tao, L. & Ferretti, J. J. (1992) J. Biol.
Chem. 267, 4631-4637. Saier, M. H. Jr. (2000) Mol. Microbiol. 35, 699-710 Sanders, M. E., & Klaenhammer, T. R. (2001) J. Dairy. Sci. 84, 319-331 Schell, M. A., Karmirantzou, M., Snel, B., Vilanova, D., Berger, B., Pessi, G., Zwahlen,
M. C., Desiere, F., Bork, P., Delley, M., Pridmore, R. D. & Arigoni, F. (2002) Proc. Natl. Acad. Sci. USA 99, 14422-14427.
Sui, J., Leighton, S., Busta, F., & Brady, L. (2002) J. Appl. Microbiol. 92, 907-12 Tannock, G. W. (1999) Antonie van Leeuwenhoek 76, 265-78 Vaillancourt, K., Moineau, S., Frenette, M., Lessard, C., & Vadeboncoeur, C. (2002) J.
Bacteriol. 184, 785-93 Varcoe. J. J., Krejcarek, G., Busta, F., & Brady, L. (2003) J. Food Prot. 66, 457-465 Vaughan, E. E., David, S., & De Vos W. M. (1996) Appl. Environ. Microbiol. 62, 1574-
Deutscher, J. (2000) Mol. Microbiol. 36, 570-584 Wagner, V. E., Bushnell, D., Passador, L., Brooks, A. I., & Iglewski, H. I.(2003) J. Bac.
185, 2080-95 Warner, J. B., & Lolkema, J. S. (2003) Microbiol. Mol. Rev. 67, 475-90
100
Weickert, M. J. & Chambliss, G. H. (1990) Proc. Natl. Acad. Sci. USA 87, 6238-42 Wolfinger, R. D., Gibson, G., Wolfinger, E. D., Bennett, L., Hamadeh, H., Bushel, P.,
Afshari, C., & Paules, R. S. (2001) J. Comput. Biol. 8, 625-637
101
Glc Fru
Tre Suc
Gal FOS
Lac Raff
12
3456
7Glc Fru
Tre Suc
Gal FOS
Lac Raff
8
9
101112
13
Glc Fru
Tre Suc
Gal FOS
Lac Raff
14
1516
17
18
Glc Fru
Tre Suc
Gal FOS
Lac Raff
1920
2122
Glc Fru
Tre Suc
Gal FOS
Lac Raff23
24
25
Glc Fru
Tre Suc
Gal FOS
Lac Raff
26
27
Glc Fru
Tre Suc
Gal FOS
Lac Raff
28
Glc Fru
Tre Suc
Gal FOS
Lac Raff
Glc Fru
Tre Suc
Gal FOS
Lac Raff
12
3456
7Glc Fru
Tre Suc
Gal FOS
Lac Raff
12
3456
7Glc Fru
Tre Suc
Gal FOS
Lac Raff
8
9
101112
13
Glc Fru
Tre Suc
Gal FOS
Lac Raff
8
9
101112
13
Glc Fru
Tre Suc
Gal FOS
Lac Raff
14
1516
17
18
Glc Fru
Tre Suc
Gal FOS
Lac Raff
14
1516
17
18
Glc Fru
Tre Suc
Gal FOS
Lac Raff
1920
2122
Glc Fru
Tre Suc
Gal FOS
Lac Raff
1920
2122
Glc Fru
Tre Suc
Gal FOS
Lac Raff23
24
25
Glc Fru
Tre Suc
Gal FOS
Lac Raff23
24
25
Glc Fru
Tre Suc
Gal FOS
Lac Raff
26
27
Glc Fru
Tre Suc
Gal FOS
Lac Raff
26
27
Glc Fru
Tre Suc
Gal FOS
Lac Raff
28
Glc Fru
Tre Suc
Gal FOS
Lac Raff
28
Glc Fru
Tre Suc
Gal FOS
Lac Raff
Glc Fru
Tre Suc
Gal FOS
Lac Raff
Figure 1. Round-robin microarray hybridization design. Each carbohydrate is at a vertex of an octagon. Glc, glucose; Fru, fructose; Suc, sucrose; FOS, fructooligosaccharides; Raf, raffinose; Lac, lactose; Gal, galactose; Tre, trehalose. Each arrow represents a hybridization whereby the plain end of the arrow indicates labeling with Cy3, and the tip of the arrow indicates labeling with Cy5. This design allows all possible direct comparison of all treatments.
102
Figure 2. Hierarchical clustering analyses of gene expression patterns. The expression of 1,889 genes (vertically) after growth on eight carbohydrates (horizontally) is shown colorimetrically. (A) Least squares means, representing overall gene expression level corrected for systematic and random errors (see Methods): low=blue, high=red; Hierarchical clustering of least squares means allows visualization of the relative expression levels of all genes within each treatment (Figure 1A). (B) Standardized least square means, representing gene expression level standardized across all 8 treatments, with color indicating expression level relative to the mean expression level across all treatments: low=green, high=red. Clustering of standardized least squares means allows comparison of the standardized expression profile of every gene, across all treatments (Figure 1B). FOS, fructooligosaccharides; FRU, fructose; GAL, galactose; GLC, glucose; LAC, lactose; RAF, raffinose; SUC, sucrose; TRE, trehalose.
103
Figure 3. Hierarchical clustering analysis of expression patterns for select genes and operons. (A) Least squares means of genes of selected genes and operons of interest, representing overall gene expression within treatments: low=blue, high=red; (B) Standardized least squares means of genes of interest, indicating relative expression level across all treatments: low=green, high=red. Carbohydrate sources are displayed at the bottom: FOS, fructooligosaccharides; FRU, fructose; GAL, galactose; GLC, glucose; LAC, lactose; RAF, raffinose; SUC, sucrose; TRE, trehalose.
104
fold change FOS/RAFF
-64 -32 -16 -8 -4 -2 0 2 4 8 16 32 64
sign
ifica
nce
(-lo
g10
P-va
lue)
0
10
20
30
40
501438
14371441
1442 1439
1440 507
503506
502504
505
fold change FOS/RAFF
-64 -32 -16 -8 -4 -2 0 2 4 8 16 32 64
sign
ifica
nce
(-lo
g10
P-va
lue)
0
10
20
30
40
501438
14371441
1442 1439
1440 507
503506
502504
505
1014
1012
1014
1012
Figure 4. Volcano plot comparison of gene expression between FOS and raffinose. Visualization of the global differential gene expression profiles in the presence of raffinose and FOS. The X axis indicates the differential expression profiles, plotting the fold-induction ratios in a logarithmic-2 scale. The Y axis indicates the statistical significance, plotting the statistical significance of the difference in expression (P-value from a t-test) in a logarithmic-10 scale. Genes within the raffinose msm locus are shown in green, genes within the FOS msm locus are shown in blue, and two genes within the trehalose tre locus are shown in red.
105
-3.0
-2.0
-1.0
0.0
1.0
2.0
3.0
4.0
Lsm
RA
FFIN
OSE
-3.0 -2.0 -1.0 .0 1.0 2.0 3.0 4.0 5.0
Lsm FOS
Lsm
TREH
ALO
SE
<= -2
<= -1
<= 0
<= 1
<= 2
<= 3
> 3
-3.0
-2.0
-1.0
0.0
1.0
2.0
3.0
4.0
Lsm
RA
FFIN
OSE
-3.0 -2.0 -1.0 .0 1.0 2.0 3.0 4.0 5.0
Lsm FOS
Lsm
TREH
ALO
SE
<= -2
<= -1
<= 0
<= 1
<= 2
<= 3
> 3
Figure 5. Contour plot comparison of gene expression between FOS, raffinose and trehalose. Three-way plot of the least squares means of all the genes in the presence of FOS (X axis), raffinose (Y axis), trehalose (Z axis, color coded). In the third dimension (Z axis) the gene expression level is coded colorimetrically: blue=low gene expression, red=high gene expression. Each color in-between is representative of a value range. Differentially expressed operons are annotated: 1437-1442 raffinose msm operon, 502-507 FOS msm operon, 1012, 1014 trehalose tre locus.
106
Treatment Comparison
Lac-Raf
Raf-Suc
Raf-Fru
Fos-Raf
Raf-Glu
Tre-Raf
Lac-Gal
Gal-Fos
Gal-Raf
Gal-Suc
Gal-Tre
Gal-Fru
Fos-Glu
Tre-Fru
Lac-Fos
Lac-Fru
Lac-Glu
Glu-Fru
Gal-Glu
Tre-Fos
Tre-Suc
Tre-Glu
Tre-lac
Suc-Glu
Lac-Suc
Fru-Fos
Suc-Fos
Fru-Suc
Num
ber o
f gen
es d
iffer
entia
lly e
xpre
ssed
0
50
100
150
200
250
300
350
400
Figure 6. Global differential gene expression. Quantification of the number of genes declared differentially expressed by statistical criteria. For all 28 possible treatment comparisons, genes with p-values from a t-test below the Bonferroni correction (-log10(p-value) > 6.04) were considered differentially expressed. For each comparison, the number of genes statistically differentially expressed is plotted, in decreasing order.
107
Minimum Fold Induction
1 2 3 4 5 6 7 8 9 10 11
Num
ber o
f gen
es
0
50
100
150
200
250
300
350
400
Figure 7. Gene fold induction. Quantification of the number of genes differentially expressed above various fold induction cut offs. All possible treatment comparisons were considered, and a gene was considered induced above a particular level if it showed induction in at least one treatment comparison. For genes that showed induction in more than one instance, the highest induction level was selected.
108
1 2 4 8 16 32 64
1
2
4
8
16
32
64
128
256
fold
indu
ctio
n Q
-PCR
fold induction microarrays
La1467 La505 La1436 La1012 La1679
Figure 8. RT-Q-PCR analysis of differentially expressed genes. For five selected genes, induction levels were compared between six different treatments, resulting in 15 induction levels for each gene. The comparison between the fold induction determined by microarrays (X axis) and real-time quantitative RT-PCR (Y axis) is plotted, on a logarithmic-2 scale. Induction levels for each genes are color-coded.
109
manL
pepQ ccpA
Man
Fru
Suc
Fos
Raff
Lac
Lac
Tre
CCR
manM manN
fruR fruK fruA
scrR scrAscrB
msmE msmF msmG bfrA msmK gtfA
msmE2 msmF2 msmG2 msmK2 melA gtfA2
galK galT galMlacS lacZ hypo muB
lacL lacM galE
treC treBtreR
ptsH ptsI ptsK
msmR
msmR2
manL
pepQ ccpA
Man
Fru
Suc
Fos
Raff
Lac
Lac
Tre
CCR
manM manN
fruR fruK fruA
scrR scrAscrB
msmE msmF msmG bfrA msmK gtfA
msmE2 msmF2 msmG2 msmK2 melA gtfA2
galK galT galMlacS lacZ hypo muB
lacL lacM galE
treC treBtreR
ptsH ptsI ptsK
msmR
msmR2
Figure 9. Genetic loci of interest. The layouts of the loci discussed in the text are shown: man, glucose-mannose locus; fru, fructose locus; suc, sucrose locus; fos, FOS locus; raff, raffinose locus; Lac, lactose-galactose loci; tre, trehalose locus; CCR, carbon catabolite loci.
110
L. johnsonii reg lacS bgaB galK galT galMgalE lacM lacL
L. gasseri reg galK galT galMgalE
lacS lactose-proton symporter
lacLM beta-galactosidase
galE galactose epimerase
galK galactokinase
galT galactose-1P uridyl transferase
L. acidophilus galK galT galMlacS lacZ
hypo muB
reg
Tn
galE lacM lacL
L. johnsonii reg lacS bgaB galK galT galMgalE lacM lacL
L. gasseri reg galK galT galMgalE
lacS lactose-proton symporter
lacLM beta-galactosidase
galE galactose epimerase
galK galactokinase
galT galactose-1P uridyl transferase
L. acidophilus galK galT galMlacS lacZ
hypo muB
reg
Tn
galE lacM lacL
Figure 10. Lactose locus in select lactobacili. Layout of the lactose loci in Lactobacillus gasseri, Lactobacillus johnsonii and Lactobacillus acidophilus.
111
La400 cre1 TGataaaCGtttgaCA -72 bp
cre2 AGataaCGcttaCA -17 bpLa401 cre1 TGaataCGttatCA -48 bp
Figure 11. Catabolite responsive elements sequences. Putative catabolite responsive elements are highlighted in the promoter regions of select differentially expressed genes. Numbers indicate the position of the last cre nucleotide relative to the translational start of the ORF mentioned. The promoter-operator regions of differentially expressed genes and operons were searched for putative catabolite response elements according to consensus sequences TGNNWNCGNNWNCA (Miwa et al., 2000) and TGWAANCGNTNWCA (Weickert and Chambliss, 1990).
112
GLYCOLYSIS
GLUCOSEFRUCTOSESUCROSETREHALOSE
FOS
RAFFINOSE
LACTOSE
GALACTOSE
GLUCOSE-6PFRUCTOSE-1PSUCROSE-6P
LACTOSE
GALACTOSE
TREHALOSE-6P
FOS
RAFFINOSE
FRUCTOSE-1-6P2GLUCOSE-6P + GLUCOSE
GLUCOSE-6P + FRUCTOSE
FRUCTOSE
SUCROSE+
GALACTOSE
GLUCOSE +FRUCTOSE
GLUCOSE +GALACTOSE
GALACTOSE-1P
UDP-GALACTOSEUDP-GLUCOSE
GLUCOSE-1P
GLUCOSE-6P
FruA
2.7.1.56 FruK
ScrA
3.2.1.26 ScrBMsmEFGK
BfrA 3.2.1.26
MsmEFGK2
MelA 3.2.1.22GtfA
3.2.1.26
LacS
LacS
3.2.1.23 LacZ
2.7.1.6 GalK
2.7.1.10 GalTGalE
5.1.3.2
2.7.7.9 GalU
5.4.2.2 Pgm
ManLMNTreB
3.2.1.93 TreC
GLYCOLYSIS
GLUCOSEFRUCTOSESUCROSETREHALOSE
FOS
RAFFINOSE
LACTOSE
GALACTOSE
GLUCOSE-6PFRUCTOSE-1PSUCROSE-6P
LACTOSE
GALACTOSE
TREHALOSE-6P
FOS
RAFFINOSE
FRUCTOSE-1-6P2GLUCOSE-6P + GLUCOSE
GLUCOSE-6P + FRUCTOSE
FRUCTOSE
SUCROSE+
GALACTOSE
GLUCOSE +FRUCTOSE
GLUCOSE +GALACTOSE
GALACTOSE-1P
UDP-GALACTOSEUDP-GLUCOSE
GLUCOSE-1P
GLUCOSE-6P
FruA
2.7.1.56 FruK
ScrA
3.2.1.26 ScrBMsmEFGK
BfrA 3.2.1.26
MsmEFGK2
MelA 3.2.1.22GtfA
3.2.1.26
LacS
LacS
3.2.1.23 LacZ
2.7.1.6 GalK
2.7.1.10 GalTGalE
5.1.3.2
2.7.7.9 GalU
5.4.2.2 Pgm
ManLMNTreB
3.2.1.93 TreC
Figure 12. Carbohydrate utilization in L. acidophilus. This diagram shows carbohydrate transporters and hydrolases as predicted by transcriptional profiles. Protein names and EC numbers are specified for each element. PTS transporters are shown in red. GPH transporters are shown in yellow. ABC transporters are shown in green.
Chapter IV – Global characterization of the Lactobacillus acidophilus transcriptome and analysis of relationships between gene expression level, codon usage, chromosomal location and
intrinsic gene characteristics
115
4.1 Abstract
The relationships between gene expression level, codon usage, chromosomal
location and intrinsic genes parameters were investigated globally, in Lactobacillus
acidophilus. The codon usage profile revealed a general bias towards AT-rich codons, as
expected for a low GC content organisms. In contrast, genes showing high codon usage
bias had higher GC-content at the third codon position. Correlation analyses showed that
gene expression levels were most highly correlated with GC content, codon adaptation
index, size and then RBS. Gene expression levels did not correlate with GC content at the
third codon position. The high correlation between GC content and gene expression level
may reflect that genes with GC contents much higher than that of the genome signature
are biologically important and highly expressed. Data were segregated into four
chromosomal locations, by strand, location and orientation, relative to the origin and
terminus of replication. Analysis of variance was used to investigate whether there were
differences in gene expression between the four chromosomal locations. The results
showed that genes on the leading strand were more highly expressed, and showed higher
codon usage bias. Also, genes located between the origin and terminus of replication,
relative to the forward strand were also more highly expressed. Overall, genes on either
strand pointing towards the terminus of replication were more highly expressed. Analysis
of the correlation between gene expression level and intrinsic gene parameters, by
location, revealed a strong influence of chromosomal architecture on gene transcription.
Codon usage showed a strong strand bias. Specifically, genes on the leading strand
located between the origin and terminus of replication, pointing towards the terminus,
showed both the highest codon usage bias and gene expression levels. For this particular
116
location, gene expression levels were most highly correlated with codon adaptation
index. Additionally, genes on the lagging strand located between the terminus and the
origin of replication, oriented towards the terminus, showed high expression levels, but
low codon usage bias. The correlations between gene expression level, CAI and GC
content indicate very highly expressed genes have a higher GC content, and display
coefficients of 0.427, 0.411 and 0.289, for Pearson, Spearman and Kendall, respectively.
Additionally, the linear regression analysis between GCall and LSM gave a statistically
very significant fit, although the relationship may not be fully linear (Figure 3).
126
Visualization of the correlations between gene expression levels and all other
parameters is shown in Figure 3. The best regression curve (least sum of square residual)
corresponded with the most statistically significant P-value, and the highest correlation
coefficient (see Table 2). As previously suggested, the parameter showing the best
regression curve is the best predictor of gene expression level (Coghlan and Wolfe,
2000). However, a prior report suggests that the Pearson correlation is inappropriate for
analyzing CAI correlation with gene expression level, thus Spearman should be preferred
(Coghlan and Wolfe, 2000). Our results indicate both tests gave similar results, with the
exception of correlation to gene size.
A previous study reported a strong correlation between PHX genes and strong
Shine-Dalgarno sequences (Karlin and Mrazek, 2000), and a second study found a
correlation between Shine-Dalgarno sequence conservation and codon usage (Sakai et al.,
2001). In contrast, we found a small correlation between gene expression level and RBS
strength. This weak correlation has been reported for three archaebacteria previously
(Sakai et al., 2001).
4.4.3 Chromosomal location
Although it is tempting to arbitrarily select subsets of genes, based on expression
level, CAI range, or GC content, we segregated the data according to genome location,
within strand, and relative to the origin and terminus of replication (Figure 2). The
correlation between gene expression level and all other parameters were then investigated
again, using methodologies presented above. ANOVA analysis investigated whether
there were differences in the distributions of the parameters between the four
127
chromosomal locations. Results are summarized in Table 3. Analyses revealed genes on
the leading strand were more highly expressed that those on the lagging strand. Similarly,
genes located from the origin to the terminus were more highly expressed than those
located from the terminus to the origin. Genes were segregated by strand (Leading or
Lagging) and relative to the terminus (from the Origin to the Terminus OT, or from the
Terminus to the Origin TO) into four groups, namely LeTO, LeOT, LaTO, LaOT. A
comparison of gene expression across these four groups revealed that LeOT genes were
the most highly expressed, followed by LaTO, and by both LeTO and LaOT, which were
both most lowly expressed (Figure 4).
In contrast, GC3 showed opposite distributions between the four locations, with
LeOT and LaTO showing a lower GC3 content than LeTO and LaOT (Table 3, Figure 4).
CAI10 and CAI50 both showed differences between strands, with higher values
on the leading strand. These results correlate well with distributions observed on Figure
1, showing differences between CAI10-CAI50 and CAIall. Also, the strand differences
explain the dual distribution observed for CAIall on Figure1.
Additionally, correlation analyses were carried out, after data were split into the
four chromosomal locations (Table 4). Differences between locations were observed,
consistent with ANOVA results (Table 3). Since genes LeOT and LaTO showed the
highest expression levels (Table 3, Figure 4), particular attention was given to their
correlations with other parameters. For LeOT, CAI showed the highest correlation,
followed by GCall, GC1, GC2, Size and RBS. Interestingly, CAI10/CAI50 showed the
highest correlation coefficients, namely 0.52 and 0.51. For LaTO, GCall, GC1, CAIall,
128
GC2, CAI10, size, CAI50 and size showed highest correlations. Both GC3 and start did
not show any significant correlation, which is similar to global results (Table 2).
Visualization of the correlation analyses for CAI10, GCall and size can be seen on
Figure 5. For select parameters, gene distribution for each location can be seen on Figure
6. CAI10 showed the strongest correlation with gene expression level, for LeOT (Table
4). Most of those high CAI10 value correspond to genes with high expression levels
(Figure 5). Most of the highly expressed genes are located on LeOT, and some on LaTO
(Figures 5 and 6), while only a few genes located on LeTO and LaOT show expression
above LSM=1.0. In contrast, for the lagging strand, only a few genes show CAI10 above
0.55 (Figures 5 and 6), none of which have high gene expression.
When comparing the relationship between CAI10 and LSM globally (Figure 3)
vs. by location (Figure 5), there is a location-specific difference. In contrast, the
relationship between LSM and gene size, or LSM and GCall does not change when data
is segregated by location (Figures 3 and 5). This is consistent with a strand discrepancy in
codon adaptation (CAI10), as seen on Figure 6.
Both GCall and GC3 contents are consistent regardless of location (Figure 6),
with a value close to that of the genome in the case of GCall. For GC3, the value has to
be lower than that of the genome, due to the restriction in the first two positions, resulting
in a higher GC content for this position. As a result, since L. acidophilus is a low GC
organism, the GC content at the third codon position has to be lower than that of the
genome. Gene size distribution is seemingly equal throughout the chromosome, although
many more genes are present on the LeOT, and LaTO. There is a strong difference
between the two strands as to codon adaptation (Table 3, Figures 4 and 6), which was
129
observed irregardless of the training set used. The codon adaptation index is always
higher for the leading strand, regardless of direction relative to the origin or terminus of
replication (Figure 6).
4.5 Discussion
The analysis of the relationships between gene expression levels, codon usage,
chromosomal location and intrinsic gene properties in L. acidophilus revealed strong
correlations between GC content, codon usage, chromosomal location and gene
expression levels. However, there was no correlation between GC3 and gene expression
level. Globally, chromosomal architecture seemed to influence gene expression strongly,
with both a strand bias, and a gene location and orientation effect, relative to the origin
and terminus or replication.
Globally, a relatively small number of genes showed high expression levels.
Predicted highly expressed genes usually encompass ribosomal proteins (RP),
transcription and translation processing factors (TF), chaperone proteins (CH),
recombination and repair proteins, outer membrane proteins and energy metabolism
enzymes (Karlin and Mrazek, 2000; Karlin et al., 2004). Throughout a variety of
prokaryotes, those genes display a high codon bias (Karlin and Mrazek, 2000). Our
results indicate that the 20 most highly expressed included genes were involved in
glycolysis, transcription, ATP synthesis, membrane construction, ribosomal proteins,
regulators, and a peptidase. Genes encoding glycolytic enzymes and translation factors
have also been shown to be highly expressed in S. pneumoniae (Martin-Galiano et al.,
130
2004). Although this is consistent with RP and TF families of genes, genes most highly
expressed in L. acidophilus did not include CH genes.
Although most studies analyzing codon bias have relied on multivariate statistical
analyses such as correspondence analysis (Perriere and Thioulouse, 2002), the major
trends identified in codon usage account for a low proportion of the variation (Grocok
and Sharp, 2002). In a thorough study of Pseudomonas aeruginosa, the first axis
accounted for 17% of the variation, and the first four axes combined accounted for a total
of 30% of the variation (Grocock and Sharp, 2002). In another study, the combination of
the first three axes account for less than 23% of the variation in codon usage (McInerney,
1994).
Since our objective was to investigate correlation between gene features and
expression levels, rather than describe the variation within CAI distributions, we used
correlation analysis rather than correspondence analysis. Although no assumption can be
made as to the linearity of the relationships between parameters being tested, a linear
regression was attempted nonetheless. Several correlation analyses were carried out,
including both parametric and non-parametric analyses, namely Pearson, Spearman and
Kendall, since no assumptions were made a priori regarding data distribution and
linearity of the relationships. Spearman correlation analysis has previously been used in
codon analysis studies and Spearman ranking was considered a more appropriate statistic
than the Pearson correlation coefficient (Coghlan and Wolfe, 2000). Similarly, Spearman
correlation has also been used previously to investigate correlation between effective
number of codons in a gene (Nc) and CAI (Fuglsang, 2003). Additionally, Kendall
correlation has also been used to analyze the correlation between gene expression level
131
and codon usage (dosReis et al., 2003). A combination of both Pearson and Spearman
correlation analyses has also been used to investigate correlations between CAI and other
parameters (Jansen et al., 2003). Pearson correlation coefficients have also been used to
analyze the correlation between codon bias and microarray expression data (Fraser et al.,
2004). Our strategy allows comparison of results obtained from both parametric
(Pearson) and non-parametric (Kendall, Spearman) correlation tests. It was previously
suggested that non-parametric tests are more appropriate for such analyses, since they are
robust against non-linearity and non-normality (dosReis et al., 2003).
Prior studies carried out using correspondence analysis to investigate CAI statistic
distribution (Lloyd and Sharp, 1992) have identified a major and a secondary trend, with
the first axis appearing to differentiate genes according to their expression level. (Lloyd
and Sharp, 1992, Kliman et al., 2003). Although our results indicate a correlation
between CAI and gene expression level, globally, our strongest correlation was
established between gene expression level and GC content. Additionally, our
investigation of the correlation between CAI and other statistics indicated it is not
correlated with GC3.
CAI has previously been shown to be the best codon usage bias indicator
(Coghlan and Wolfe, 2000). CAI was also shown to be highly correlated with mRNA
expression levels in S. cerevisiae (Coghlan and Wolfe, 2000). CAI and mRNA levels
have been shown to be correlated previously (dosReis et al., 2003). In our study, we
found a strong correlation between gene expression level and CAI10/CAI50. Although it
was not as strong as that between gene expression level and GCall on a global scale, it
was the strongest correlation for gene positioned LeOT. Our results show gene GC
132
content is the parameter most highly correlated with gene expression, which is different
from results shown previously (dosReis et al., 2003), but similar to findings from rodents
(Konu and Li, 2002).
Our results, indicating a positive correlation between gene expression level and
gene size, differ from previous studies reporting a negative correlation between mRNA
concentration and protein length (Coghlan and Wolfe, 2000) and gene length and codon
usage (Kliman et al., 2003). Perhaps this discrepancy reflects the differences between the
organism used in the study, eukaryotic S. cerevisiae and prokaryotic L. acidophilus. The
relationship between CAI and mRNA levels has been shown previously in S. cerevisiae
(Coghlan and Wolfe, 2000), and in E. coli (dosReis et al., 2003). Also, a non-parametric
regression on mRNA expression levels in E. coli has shown that gene size followed by
GC and then CAI are the best predictors of mRNA concentration (dosReis et al., 2003).
Although several studies have used the CAI as an indicator of gene expression, a
variable positive correlation is found between codon bias and level of gene expression.
Historically, initial CAI studies claimed the strong correlation between CAI and levels of
gene expression allow utilization of CAI as a predictor of gene expression (Sharp and Li,
1987). In contrast, we believe the correlation between CAI and gene expression level is
indicative, rather than predictive of the level at which a gene is expressed.
The a priori assumption that genes with genes bias close to that of highly
expressed genes should be highly expressed is not consistent with the fact that some
genes with very high CAI values are not highly expressed (Figure 5). Analysis of CAI
and microarray gene expression levels in Streptococcus pneumoniae showed that CAI is
not always predictive of gene expression (Martin-Galiano et al., 2004). Specifically,
133
genes with high CAI are not always highly expressed, and genes with low CAI can be
highly expressed (Marting-Galiano et al., 2004), which is also shown in our current
findings (Figure 5). Interestingly, S. pneumoniae and L. acidophilus are both low GC
Gram-positive lactic acid bacteria.
A small correlation (r2 0.09) has been shown between CAI and microarray
fluorescence in S. pneumoniae (Martin-Galiano et al., 2004). A similar correlation level
(r2 0.07) was shown in L. acidophilus. In contrast, a higher correlation (r2 0.18) was
found between GC and gene expression level in L. acidophilus.
The genomic DNA GC-content varies widely between species, as a result of
mutation pressure (Muto and Osawa, 1987). GC variation has been shown to be the most
important parameter differentiating codon usage bias between organisms, in archae and
eubacteria (Chen et al., 2004). The relationship between codon usage bias and GC
composition has been characterized across unicellular genomes (Wan et al., 2004).
Specifically, GC3 was shown to be the primary factor within GC content to correlate
highly with codon usage bias (Wan et al., 2004). Further, GC3 was hypothesized as the
key factor driving synonymous codon usage, independently of species (Zhang and Chou,
1994; Wan et al., 2003). Although those results were inferred across 70 bacterial species
and 16 archaeal genomes, our results show this is not the case for L. acidophilus. We
found no correlation between GC3 and CAI. The non-linearity of the relationship
between codon usage bias measures and GC3 has been shown previously in a variety of
bacteria and archaea (Wan et al., 2004).
The L. acidophilus NCFM genome is 34.7% GC, so it is not surprising that codon
usage is related to base composition bias. The observed differences in GC content at the
134
three codon positions illustrate the overall GC content. Codon degeneracy is located
primarily at the third position of the codon, since there are strict constraints on the first
and second position of each codon (Zhang and Chou, 1994). As a result, the third codon
position is representative of the GC content of an organism, and reflects differences
between species (Muto and Osawa, 1987; Carbone et al., 2003). GC3 has previously been
shown to vary between species (Zhang and Chou, 1994), explaining the species impact
on the correlation between GC3 and CAI (Lloyd et al., 1992). Also, it was previously
reported that CAI can most highly correlate with GC skew (Carbone et al., 2003), and
that gene expression levels are correlated with GC3 (Kliman et al., 2003). The position-
specific GC content within codons has been investigated previously (Muto and Osawa,
1987; Chen and Zhang, 2003), across species with varying GC content, indicating that
low GC content bacteria have higher GC content at the first codon position and lower GC
content at the third codon position, than that of their overall genome content, while that
of the second codon position is close to their genomic content (Chen and Zhang, 2003).
This is consistent with our findings in L. acidophilus (Figure 1). Early work showed that
there is a codon position bias in GC content, which is correlated with genome GC content
(Muto and Osawa, 1987). Specifically, the correlation between GC3 and genome GC
content explains the discrepancies observed at the third codon position between species
with varying GC content (Muto and Osawa, 1987).
A previous study investigating codon bias in P. aeruginosa (Grocock et al., 2002)
reported that for species with highly biased GC base composition, the CAI methodology
may not be appropriate. While the study in P. aeruginosa (67% GC) illustrated this point
for high GC organisms, our analyses in L. acidophilus (35% GC) might validate this
135
theory for low GC organisms. It was recently suggested lactic acid bacteria are a
desirable group of organisms for analysis of codon usage (Fuglsang, 2003), but our result
suggest that caution should be applied when using the CAI methodology.
Perhaps the high correlation between GC content and gene expression level is due
to the genomic composition of L. acidophilus. The genomic GC content in prokaryotes
ranges between approximately 25% and 75% (Muto and Osawa, 1987), which allows
great codon usage flexibility and variability. Since L. acidophilus is a low GC organism
(Altermann et al., 2004), perhaps the strong correlation between GC content and gene
expression level is due to the importance of high GC content genes. Indeed, for a low GC
organism such as L. acidophilus, genes with a high GC content differ widely from its
genomic “fingerprint”, since GC content is a main component of genomic signature
(Sandberg et al., 2003). Therefore, retaining genes that vary from its overall genomic
signature may indicate that they are biologically important, and consequently highly
expressed.
A correlation between RBS and gene expression level was found, albeit it was
minor compared to that of GCall. Nonetheless, a positive correlation between a strong
RBS and gene expression level is intuitive, and consistent with previous findings (Ma et
al., 2002).
We observed a discrepancy between the genome signature (low GC) and highly
expressed genes (high GC), perhaps indicating the codon usage for highly expressed
genes is different from that of the genome. Specifically, the genome-wide codon usage is
characterized by a high AT content at the third codon position, which is consistent with a
low GC organism. In contrast, genes with high codon bias showed a specific preference
136
for high GC content at the third codon position for select amino acids (Table 1).
However, GC3 was not a good indicator of gene expression (Tables 2 and 4). Perhaps
this is an indicator that for low GC organisms, overall gene GC content is more
representative of bias than codon usage.
Differences in the base composition between strands have been shown previously
(Grocock and Sharp, 2002; Lobry and Sueoka, 2002). The leading-lagging strand bias in
codon usage has been shown in Borriella burgdorferi (McInerney, 1998; Carbone et al.,
2003). Additionally, replication selection is seemingly responsible for the presence of the
majority of the genes on the leading strand, whereas transcription selection results in
higher expression of genes present on the leading strand (McInerney, 1998).
Interestingly, location per se did not correlate with gene expression level globally
(Table 2). This means that the position of the start of any gene on the chromosome does
not correlate with gene expression level. However, it was shown previously that location
is indeed an important factor in gene expression. We therefore further investigated the
effect of both strand location, and orientation relative to the terminus on gene expression
level.
The importance of chromosomal location has been illustrated before in P.
aeruginosa (Grocock and Sharp, 2002), Borrelia burgdorferi (McInerney, 1998), and
Treponoma pallidum (Lafay et al., 1999). Specifically, differences between strands have
been illustrated for codon usage (McInerney, 1998). Strand location was shown to be a
major cause of variation in codon usage (McInerney, 1998). Albeit the correlation
between gene location and expression level has been estimated weak in P. aeruginosa,
whereby gene location was only the tertiary trend in correspondence analysis, accounting
137
for only 4.4% of variation (Grocock and Sharp, 2002). Strand location accounted for
8.6% of the variation in codon usage, as the secondary source of variation (Lafay et al.,
1999). In contrast, in B. burgdorferi, strand location is the primary parameter involved in
codon usage, accounting for 13.7% of the variation (McInerney, 1998). Within species,
inter-strand differences appear on the primary axis of correspondence analysis (Lafay et
al., 1999). Nevertheless, they showed that the position of a gene relative to the strand has
an influence on codon usage (Grocock and Sharp, 2002). In addition to strand location,
the orientation of a gene relative to the direction of DNA replication is also important in
codon usage pattern (McInerney, 1998). Nevertheless, the impact of both strand and
orientation on gene expression had not yet been illustrated simultaneously, prior to our
study.
Chromosomal architecture has a major effect on gene expression, both relative to
strand bias and gene position and orientation relative to the terminus of replication. The
impact of chromosomal architecture is important for many of the parameters measured in
our study, showing a significant bias for genes converging towards the terminus.
Although it was previously shown the leading strand in low GC Gram-positives
pervasively exceeds 75% of the genes (Karlin et al., 2004), it is not the case in L.
acidophilus, where only 55% of the genes are on the leading strand. Nevertheless, very
significant differences in codon usage, GC content and other parameters were observed
between the two strands.
Interestingly, while codon usage, GC content and gene size all showed a global
correlation with gene expression levels, CAI was the parameter which showed the most
variability between chromosomal locations, relative to the strand bias, and the position
138
and orientation relative to the terminus. Specifically, the correlation between CAI10 and
gene expression level is higher for LeOT genes (Table 4). In contrast, the correlation
between gene expression level and GCall or GC3 was consistent regardless of location.
For CAI particularly, genes on the leading strand located between the origin and the
terminus of replication show the most codon usage bias. Specifically, genes that show the
most codon bias are located in this region, and are likely to be highly expressed (Figure
5).
Globally, it seems chromosomal architecture is a primary factor controlling gene
expression in L. acidophilus. Perhaps the combination between replication efficiency and
transcription efficiency underlie the impact of chromosomal location on gene
expressivity. Indeed, replication is thought to be more efficient while co-directional with
transcription (French, 1992), since collisions between the RNA and DNA polymerases
are likely to slow down both processes (French, 1992). Hence, there is a selective
advantage towards locating the majority of the genes on each strand pointing towards the
terminus. As suggested previously, more efficient replication may be a selective
advantage (McInerney, 1998) and the most desirable gene location would combine genes
on the leading strand and on the lagging strand pointing toward the terminus. This is
consistent with our observations, namely genes on LeOT and LaTO showing higher
expression levels (Table 3, Figures 4 and 5).
The causal link established between codon usage and gene expression level is still
as controversial as when the concept was initially presented (Sharp et al., 1986). Early
work aimed at predicting expression level of a gene given only the nucleotide sequence
of the coding region (Sharp et al., 1986). Although, tRNA relative abundancies also
139
impact gene expression level, we did not include them in our study, since our primary
objective was to investigate the correlation between intrinsic gene features and
expression. Nevertheless, a correlation between usage of preferred codons and level of
their respective major isoacceptor tRNA has been shown in E. coli . This correlation
explains an adaptation of highly expressed genes towards translational efficiency
(dosReis et al., 2003).
Although CAI has previously been reported as a predictor of mRNA
concentration, it is an imperfect and unreliable measure of gene expression (Coghlan and
Wolfe, 2000). From a biological standpoint, intrinsic gene parameters are set, regardless
of environmental conditions. Since environmental conditions have been shown to impact
gene expression on a large scale, as shown by microarray studies, intrinsic gene
parameters are unable to predict changes in mRNA levels with changing biological
conditions, as mentioned previously (Coghlan and Wolfe, 2000). Indeed, extrinsic
parameters such as intergenic regions comprising promoter sequences are also involved
in gene expression control.
Globally, gene expression is controlled at several levels, including initiation of
transcription, transcription termination and codon usage. Additionally, the minor codon
modulator hypothesis stipulates that minor codons near the initiation site may play a role
in regulating gene expression (Ohno et al., 2001). As a result, although codon bias
measures may be correlated with intrinsic parameters, they are not good predictors of
mRNA levels. Perhaps a mixed model similar to that presented by dosReis et al. (2003),
including several parameters, is more representative of the heteroscedastic nature of gene
expression. Overall, many factors are involved in gene expression, including codon
140
usage, gene length, transcription initiation, amino-acid composition, protein function,
tRNA abundance, environmental conditions, mutation and evolutionary forces, GC
compositions, and others, which underlies the complexity in modeling and predicting
gene expression based on a defined number of parameters. It would be utopic to consider
that intrinsic gene parameters can solely be used to predict gene expression. An effective
predictor of gene expression has to include all of the parameters involved in translation,
transcription, environmental conditions and physiological state of the organism.
Nevertheless, this study illustrates the importance of chromosomal architecture for gene
transcription, and shows that codon usage and GC content are best correlated with
expression levels in L. acidophilus.
141
4.6 References
Ajdic, D., McShan, W. M., McLaughlin, R. E., Savic, G., Chang, J., Carson, M. B., Primeaux, C., Tian, R., Kenton, S., Jia, H., Lin, S., Qian, Y., Li, S., Zhu, H., Najar, F., Lai, H., White, J., Roe, B. A. & Ferretti, J. J. (2002) Proc. Natl. Acad. Sci. USA 99, 14434-14439
Altermann, E., Russell, W. M., Azcarate-Peril, M. A., Barrangou, R., Buck, L. B.,
McAuliffe, O., Souther, N., Dobson, A., Duong, T., Callanan, M., Lick, S., Hamrick, A., Cano, R., & Klaenhammer, T. R. (2004). J. Bacteriol In review
Aota, S. I., Gojobori, T., Ishibashi, F., Maruyama, T., & Ikemura, T. (1988) Nucleic
Acids Res. 16, r315-r402 Azcarate-Peril et al., (2004) In review Barrangou, R., Azcarate-Peril, M. A., Duong, T., Conners, S. B., Kelly, R. M., &
Klaenhammer, T. R. (2004) In review. Bolotin, A., Mauger, S., Malarme, K., Ehrlich, S. D., & Sorokin, A. (1999) Antonie van
Leeuwenhoek 76, 27-76 Carbone, a., Zinovyev, A., & Kepes, F. (2003) Bioinformatics 19, 2005-2015 Chen, L. L., & Zhang, C. T. (2003) Biochem. Biophys. Res. Comm. 306, 310-317 Chen, S. L., Lee, W., Hottes, A. K., Shapiro, L., & McAdams, H. H. (2004) Proc. Natl.
Acad. Sci. USA 101, 3480-3485 Coghlan, A., & Wolfe, K. H. (2000) Yeast 16, 1131-1145 dosReis, M., Wernisch, L., & Savva, R. (2003) Nucleic Acids Res. 31, 6976-6985 Ferretti, J. J., McShan, W. M., Ajdic, D., Savic, D. J., Savic, G., Lyon, K., Primeaux, C.,
Sezate, S., Suvorov, A., Kenton, S., Lai, H. S., Lin, S. P., Qian, Y., Jia, H. G., Najar, F. Z., Ren, Q., Zhu, H., Song, L., White, J., Yuan, X., Clifton, S. W., Roe, B. A., & McLaughlin, R. (2001) Proc. Natl. Acad. Sci. USA 98, 4658-4663
Fraser, H. B., Hirsch, A. E., Wall, D. P., & Eisen, M. B. (2004) Proc. Natl. Acad. Sci.
USA 101, 9033-9038 French, S. (1992) Science 258, 1362-1361365 Fuglsang, A. (2003) Biochem. Biophys. Res. Comm. 312, 285-291
142
Fuglsang, A. (2004) Antonie van Leeuwenhoek 86, 135-147 Grantham, R., Gautier, C., Gouy, M., Mercier, R., & Pave, A. (1980) Nucleic Acids Res.
8, r49-r62 Grocock, R. J., & Sharp, P. M. Gene 289, 131-139 Jansen, R., Bussemaker, H. J., & Gerstein, M. (2003) Nucleic Acids Res. 31, 2242-2251 Karlin, S., & Mrazek, J. (2000) J. Bacteriol. 182, 5238-5250 Karlin, S., Theriot, J., & Mrazek, J. (2004) Proc. Natl. Acad. Sci. USA 101, 6182-6187 Klaenhammer, T. R., Altermann, E., Arigoni, F., Bolotin, A., Breidt, F., Broadbent, J.,
Cano, R., Chaillou, S., Deutscher, J., Gasson, M., van de Guchte, M., Guzzo, J., Hartke, A., Hawkins, T., Hols, P., Hutkins, R., Kleerebezem, M., Kok, J., Kuipers, O., Lubbers, M., Maguin, E., McKay, L., Mills, D., Nauta, A., Overbeek, R., Pel, H., Pridmore, D., Saier, M., van Sinderen, D., Sorokin, A., Steele, J., O'Sullivan, D., de Vos, W., Weimer, B., Zagorec, M., and Siezen, R. (2002) Antonie Van Leeuwenhoek 82, 29-58
Kleerebezem, M., Boekhorst, J., van Kranenburg, R., Molenaar, D., Kuipers, O. P., Leer,
R., Tarchini, R., Peters, S. A., Sandbrink, H. M., Fiers, M. W., Stiekema, W., Lankhorst, R. M., Bron, P. A., Hoffer, S. M., Groot, M. N., Kerkhoven, R., de Vries, M., Ursing, B., de Vos, W. M. & Siezen, R. J. (2003) Proc. Natl. Acad. Sci. USA 100, 1990-5
Kliman, R. M., Irving, N., & Santiago, M. (2003) J. Mol. Evol. 57, 98-109 Lafay, B., Lloyd, A. T., McLean, M. J., Devine, K. M., Sharp, P. M., and Wolfe, K. H.
(1999) Nucleic Acids Res. 27, 1642-1649 Lloyd, A. T., & Sharp, P. M. (1992) Nucleic Acids Res. 20, 5289-5295 Lobry, J. R., & Sueoka, N. (2002) Genome Biol. 3, 1-14 Ma, J., Campbell, A., & Karlin, S. (2002) J. Bacteriol. 184, 5733-5745 Martin-Galiano, A. J., Wells, J. M., & de la Campa, A. G. (2004) Microbiol. 150, 2313-
2325 McInerney, J. O. (1998) Proc. Natl. Acad. Sci. USA 95, 10698-10703 Muto, A., & Osawa, S. (1987) Proc. Natl. Acad. Sci. USA 84, 166-169 Ohno, H., Sakai, H., Washio, T., & Tomita, M. (2001) Gene 276, 107-115
Rouvet M, Altermann E, Barrangou R, Mollet B, Mercenier A, Klaenhammer TR, Arigoni F, & Schell MA. (2004) Proc. Natl. Acad. Sci. USA 101, 2512-2517
Rice, P., Longden, I., & Bleasby, A. (2000) Trends Gen. 16, 276-7 Sakai, H., Imamura, C., Osada, Y., Saito, R., Washio, T., & Tomita, M. (2001) J. Mol.
Evol. 52, 164-170 Sandberg, R., Branden, C. I., Ernberg, I., & Coster, J. (2003) Gene 311, 35-42 Schell, M. A., Karmirantzou, M., Snel, B., Vilanova, D., Berger, B., Pessi, G., Zwahlen,
M. C., Desiere, F., Bork, P., Delley, M., Pridmore, R. D. & Arigoni, F. (2002) Proc. Natl. Acad. Sci. USA 99, 14422-14427
Sharp, P. M., Tuohy, T. M. F., & Mosurski, K. R. (1986) Nucleic Acids Res. 14, 5125-
5143 Sharp, P. M., & Li, W. H. (1987) Nucleic Acids Res. 15, 1281-1295 Sharp, P. M., Cowe, E., Higgins, D. G., Shields, D. C., Wolfe, K. H., & Wright, F. (1988)
Nucleic Acids Res. 16, 8207-8211 Siezen, R. J., van Enckevort, F. H. J., Kleerebezem, K., & Teusink, B. (2004) Curr. Opin.
Biotechnol. 15, 105-115 Tettelin, H., Nelson, K. E., Paulsen, I. T., Eisen, J. A., Read, T. D., Peterson, S.,
Heidelberg, J., Deboy, R. T., Haft, D. H., Dodson, R. J., Durkin, A. S., Gwinn, M., Kolonay, J. F., Nelson, W. C., Peretron, J. D., Umayam, L. A., While, O., Salzberg, S. L., Lewis, M. R., Radune, D., Holtzapple, E., Khouri, H., Wolf, A. M., Utterback, T. R., Hansen, C. L., McDonald, L. A., Feldblyum, T. V., Angiuoli, S., Dickinson, T., Hickey, E. K., Holt, I. E., Loftus, B. J., Yang, F., Smith, H. O., Venter, J. C., Dougherty, B. A., Morrison, D. A., Hollingshead, S. K., & Fraser, C. M. (2001) Science 293, 498-506
Wan, X. F., Xu, D., Kleinhofs, A., & Zhou, J. (2004) BMC Evol. Biol. 28, 19-30 Zhang, C. T., & Chou, K. C. (1994) J. Mol. Biol. 238, 1-8
144
Table 1. Codon usage Table
Fraction per AA Fraction per 1000 Total number Codon AA CAI10 CAI50 CAIall CAI10 CAI50 CAIall CAI10 CAI50 CAIall
* The first number indicates the correlation coefficient, and the second number indicates
the statistical significance rP Pearson correlation coefficient rS Spearman correlation coefficient rK Kendall correlation coefficient P-value significance of the linear regression statistic SSR Sum of Square Residuals from the linear regression
146
Table 3. Analysis of variance between chromosomal locations
* The first number indicates the mean, and the second number indicates the statistical
significance. Means with the same letter are not significantly different. LeOT leading strand, from the origin to the terminus LeTO leading strand, from the terminus to the origin LaOT lagging strand, from the origin to the terminus LaTO lagging strand, from the terminus to the origin
147
Table 4. Correlation analyses, by chromosomal location
* The first number indicates the Pearson correlation coefficient, the second number
indicates the statistical significance. LeOT leading strand, from the origin to the terminus LeTO leading strand, from the terminus to the origin LaOT lagging strand, from the origin to the terminus LaTO lagging strand, from the terminus to the origin
148
%GC
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
Num
ber o
f gen
es
0
100
200
300
400
500
600
GC1GC2GC3GCall
Gene expression level (array LSM)
-3 -2 -1 0 1 2 3 4 5
Num
ber o
f gen
es
0
10
20
30
40
50
60
70
CAI
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Num
ber o
f gen
es
0
50
100
150
200
250
300
CAI10CAI50CAIall
Gene size (nt)
100
200
300
400
500
600
700
800
900
1000
1100
1200
1300
1400
1500
1600
1700
1800
1900
2000
2100
2200
2300
2400
2500
2600
2700
2800
2900
3000
>300
0
Num
ber o
f gen
es
0
20
40
60
80
100
120
140
160
180
%GC
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
Num
ber o
f gen
es
0
100
200
300
400
500
600
GC1GC2GC3GCall
Gene expression level (array LSM)
-3 -2 -1 0 1 2 3 4 5
Num
ber o
f gen
es
0
10
20
30
40
50
60
70
CAI
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Num
ber o
f gen
es
0
50
100
150
200
250
300
CAI10CAI50CAIall
Gene size (nt)
100
200
300
400
500
600
700
800
900
1000
1100
1200
1300
1400
1500
1600
1700
1800
1900
2000
2100
2200
2300
2400
2500
2600
2700
2800
2900
3000
>300
0
Num
ber o
f gen
es
0
20
40
60
80
100
120
140
160
180
Figure 1. Gene distribution over select parameters. The gene distribution is shown over gene expression level (top left), gene size (top right), %GC (bottom left) and codon adaptation index (bottom right). For gene expression levels, the distribution is plotted as a factor of the transcription level determined by microarray experiments, namely the LSM (least square means), representing median gene expression level. For gene size, the distribution is plotted as a factor of the size of the gene, in nucleotides. For %GC, the distribution is plotted for each gene, globally, and for each codon position, namely GC1, GC2 and GC3 for the first, second and third position, respectively. For codon adaptation index, the distribution is plotted for all three training sets, namely CAI10 (using the 10 most highly expressed genes as training set), CAI50 (using the 50 most highly expressed genes as training set), and CAIall (using all the genes as training set).
149
LeOTLeTO
Origin Origin
TerminusTerminus
LEADING STRAND LeOTLeTO
Origin Origin
TerminusTerminus
LEADING STRAND
LaTO LaOTLAGGING STRANDLaTO LaOT
LAGGING STRAND
Figure 2. Chromosomal locations. Each strand is represented individually. The leading strand is colored in blue, while the lagging strand is colored in red. For genes on the leading strand (Le), genes from the origin to the terminus (OT) are solid blue (LeOT), whereas genes from the terminus to the origin (TO) are dashed blue (LeTO). For genes on the lagging strand (La), genes from the origin to the terminus (OT) (relative to the leading strand) are dashed red (LaOT), whereas genes from the terminus to the origin (TO) are solid red (LaTO).
150
Figure 3. Correlations between gene expression level and intrinsic gene parameters.
-2
-1
0
1
2
3
4
5
LSM
0 1000000 2000000start
-2
-1
0
1
2
3
4
5
LSM
.2 .3 .4 .5 .6 .7 .8 .9CAI10
-2
-1
0
1
2
3
4
5
LSM
.2 .3 .4 .5 .6 .7 .8 .9CAI50
-2
-1
0
1
2
3
4
5
LSM
.5 .6 .7 .8 .9CAIall
-2
-1
0
1
2
3
4
5
LSM
100 1000 600 400 200 10000 5000 3000
Size
-2
-1
0
1
2
3
4
5
LSM
-18 -16 -14 -12 -10 -8 -6 -4 -2 0RBSall
-2
-1
0
1
2
3
4
5
LSM
0 1000000 2000000start
-2
-1
0
1
2
3
4
5
LSM
.2 .3 .4 .5 .6 .7 .8 .9CAI10
-2
-1
0
1
2
3
4
5
LSM
.2 .3 .4 .5 .6 .7 .8 .9CAI50
-2
-1
0
1
2
3
4
5
LSM
.5 .6 .7 .8 .9CAIall
-2
-1
0
1
2
3
4
5
LSM
100 1000 600 400 200 10000 5000 3000
Size
-2
-1
0
1
2
3
4
5
LSM
-18 -16 -14 -12 -10 -8 -6 -4 -2 0RBSall
151
Figure 3 (continued). Correlations between gene expression level and intrinsic gene parameters. All plots investigate the relationship between an intrinsic parameter (X axis) and gene expression level (Y axis). The microarray median LSM represents the gene expression level. For intrinsic parameters, the position of the first translated nucleotide is used as the “start”; the gene length in nucleotide is used as the “size”; CAI10, CAI50 and CAI all are used as the values for codon adaptation index calculated for each training set; GC1, GC2, GC3 and GCall are used as the GC contents of the first, second, and third codon positions, respectively, while GCall represents the global GC content of a gene; RBSall represents the free energy level of the putative Shine Dalgarno sequence found upstream of the translational start.
-2
-1
0
1
2
3
4
5
LSM
.2 .3 .4 .5 .6 .7GC1
-2
-1
0
1
2
3
4
5
LSM
.1 .2 .3 .4 .5GC3
-2
-1
0
1
2
3
4
5
LSM
.1 .2 .3 .4 .5 .6GC2
-2
-1
0
1
2
3
4
5LS
M
.2 .3 .4GCall
-2
-1
0
1
2
3
4
5
LSM
.2 .3 .4 .5 .6 .7GC1
-2
-1
0
1
2
3
4
5
LSM
.1 .2 .3 .4 .5GC3
-2
-1
0
1
2
3
4
5
LSM
.1 .2 .3 .4 .5 .6GC2
-2
-1
0
1
2
3
4
5LS
M
.2 .3 .4GCall
152
Figure 4. Analysis of variance by chromosomal location. For each chromosomal location, namely LeOT (Leading strand, between the origin and the terminus), LeTO (Leading strand, between the terminus and the origin), LaOT (Lagging strand, between the origin and the terminus, relative to the leading strand) and LaTO (Lagging strand, between the terminus and the origin, relative to the leading strand), the mean values for gene expression level, GC-content at the third codon position (%GC3), and codon adaptation index, as determined by the first training set (CAI10) are plotted. Means with the same letter are not significantly different Within each plot, data points with the same letter are not significantly different.
before terminus after terminus
%G
C3
0.240
0.245
0.250
0.255
0.260
0.265
0.270
0.275
Leading strandLagging strand
before terminus after terminus
CA
I10
0.42
0.44
0.46
0.48
0.50
0.52
0.54
0.56
0.58
0.60
Leading strandLagging strand
before Terminus after Terminus
Mea
n E
xpre
ssio
n Le
vel
-0.6
-0.4
-0.2
0.0
0.2
0.4
Leading strandLagging strand
LeOT - A
LeOT - B
LeOT - A
LaOT - C
LaOT - A
LaOT - B
LaTO - B
LaTO - B
LaTO - B
LeTO - C
LeTO - A
LeTO - A
before terminus after terminus
%G
C3
0.240
0.245
0.250
0.255
0.260
0.265
0.270
0.275
Leading strandLagging strand
before terminus after terminus
CA
I10
0.42
0.44
0.46
0.48
0.50
0.52
0.54
0.56
0.58
0.60
Leading strandLagging strand
before Terminus after Terminus
Mea
n E
xpre
ssio
n Le
vel
-0.6
-0.4
-0.2
0.0
0.2
0.4
Leading strandLagging strand
before terminus after terminus
%G
C3
0.240
0.245
0.250
0.255
0.260
0.265
0.270
0.275
Leading strandLagging strand
before terminus after terminus
CA
I10
0.42
0.44
0.46
0.48
0.50
0.52
0.54
0.56
0.58
0.60
Leading strandLagging strand
before Terminus after Terminus
Mea
n E
xpre
ssio
n Le
vel
-0.6
-0.4
-0.2
0.0
0.2
0.4
Leading strandLagging strand
LeOT - A
LeOT - B
LeOT - A
LaOT - C
LaOT - A
LaOT - B
LaTO - B
LaTO - B
LaTO - B
LeTO - C
LeTO - A
LeTO - A
153
Size LeTO
10 100 1000 10000
Arra
y LS
M
-3
-2
-1
0
1
2
3
4
5
size LaOT
10 100 1000 10000
Arra
y LS
M-3
-2
-1
0
1
2
3
4
5
Size LeOT
10 100 1000 10000
Arra
y LS
M
-3
-2
-1
0
1
2
3
4
5
Size LaTO
10 100 1000 10000
arra
y LS
M
-3
-2
-1
0
1
2
3
4
5
Size LeTO
10 100 1000 10000
Arra
y LS
M
-3
-2
-1
0
1
2
3
4
5
size LaOT
10 100 1000 10000
Arra
y LS
M-3
-2
-1
0
1
2
3
4
5
Size LeOT
10 100 1000 10000
Arra
y LS
M
-3
-2
-1
0
1
2
3
4
5
Size LaTO
10 100 1000 10000
arra
y LS
M
-3
-2
-1
0
1
2
3
4
5
LaOT GCall
0.20 0.25 0.30 0.35 0.40 0.45 0.50
LSM
-3
-2
-1
0
1
2
3
4
5
LeOT GCall
0.20 0.25 0.30 0.35 0.40 0.45 0.50
LSM
-3
-2
-1
0
1
2
3
4
5
LaTO GCall
0.20 0.25 0.30 0.35 0.40 0.45 0.50
LSM
-3
-2
-1
0
1
2
3
4
5
LeTO GCall
0.20 0.25 0.30 0.35 0.40 0.45 0.50
LSM
-3
-2
-1
0
1
2
3
4
5
LaOT GCall
0.20 0.25 0.30 0.35 0.40 0.45 0.50
LSM
-3
-2
-1
0
1
2
3
4
5
LeOT GCall
0.20 0.25 0.30 0.35 0.40 0.45 0.50
LSM
-3
-2
-1
0
1
2
3
4
5
LaTO GCall
0.20 0.25 0.30 0.35 0.40 0.45 0.50
LSM
-3
-2
-1
0
1
2
3
4
5
LeTO GCall
0.20 0.25 0.30 0.35 0.40 0.45 0.50
LSM
-3
-2
-1
0
1
2
3
4
5
Figure 5. Correlations between gene expression level and intrinsic genes parameters, by chromosomal location.
154
CAI10 LeTO
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Arra
y LS
M
-3
-2
-1
0
1
2
3
4
5
CAI10 LaTO
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Arra
y LS
M
-3
-2
-1
0
1
2
3
4
5
CAI10 LeOT
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Arra
y LS
M
-3
-2
-1
0
1
2
3
4
5
CAI10 LaOT
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Arra
y LS
M
-3
-2
-1
0
1
2
3
4
5
CAI10 LeTO
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Arra
y LS
M
-3
-2
-1
0
1
2
3
4
5
CAI10 LaTO
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Arra
y LS
M
-3
-2
-1
0
1
2
3
4
5
CAI10 LeOT
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Arra
y LS
M
-3
-2
-1
0
1
2
3
4
5
CAI10 LaOT
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Arra
y LS
M
-3
-2
-1
0
1
2
3
4
5
Figure 5 (continued). Correlations between gene expression level and intrinsic genes parameters, by chromosomal location. The relationships between gene expression level (array LSM) and three intrinsic parameters (gene size, GCall and CAI10) are plotted, for each location, namely LeOT, LeTO, LaOT, LaTO, as specified previously.
155
GC3
0.0 0.2 0.4 0.6 0.8 1.0
Num
ber o
f gen
es
0
100
200
300
400
LaTOLeTOLeOTLaOT
Gene size (bp)
10 100 1000 10000
Num
ber o
f gen
es
0
20
40
60
80
100
LaTOLeTOLeOTLaOT
CAIall
0.0 0.2 0.4 0.6 0.8 1.0
Num
ber o
f gen
es
0
100
200
300
400
500
600
LaTOLeTOLeOTLaOT
CAI10
0.3 0.4 0.5 0.6 0.7 0.8 0.9
Num
ber o
f gen
es
0
50
100
150
200
250
300
LaTOLeTOLeOTLaOT
%GCall
0.0 0.2 0.4 0.6 0.8 1.0
Num
ber o
f gen
es
0
100
200
300
400
LaTOLeTOLeOTLaOT
Gene expression level (LSM)
-4 -2 0 2 4 6
Num
ber o
f gen
es
0
20
40
60
80
100
LaTOLeTOLeOTLaOT
GC3
0.0 0.2 0.4 0.6 0.8 1.0
Num
ber o
f gen
es
0
100
200
300
400
LaTOLeTOLeOTLaOT
Gene size (bp)
10 100 1000 10000
Num
ber o
f gen
es
0
20
40
60
80
100
LaTOLeTOLeOTLaOT
CAIall
0.0 0.2 0.4 0.6 0.8 1.0
Num
ber o
f gen
es
0
100
200
300
400
500
600
LaTOLeTOLeOTLaOT
CAI10
0.3 0.4 0.5 0.6 0.7 0.8 0.9
Num
ber o
f gen
es
0
50
100
150
200
250
300
LaTOLeTOLeOTLaOT
%GCall
0.0 0.2 0.4 0.6 0.8 1.0
Num
ber o
f gen
es
0
100
200
300
400
LaTOLeTOLeOTLaOT
Gene expression level (LSM)
-4 -2 0 2 4 6
Num
ber o
f gen
es
0
20
40
60
80
100
LaTOLeTOLeOTLaOT
Figure 6. Gene distribution over select parameters, by chromosomal location. The gene distribution over select parameters, namely gene expression level, %GCall, gene size, CAI10, CAIall and GC3 is plotted, for each chromosomal location, namely LeOT, LeTO, LaOT, LaTO, as specified previously.
156
APPENDIX I – Functional and comparative genomic analyses of an operon involved in fructooligosaccharides utilization by