Top Banner
Sequence Heterogeneities Among 16s Ribosomal RNA Sequences, and Their Effect on Phylogenetic Analyses at the Species Level Vale’ rie Cilia, Bgnkdicte Lafay, and Richard Christen CNRS 8z UniversitC Paris 6, Observatoire Ockanologique, Villefranche sur mer, France We have analyzed what phylogenetic signal can be derived by small subunit rRNA comparison for bacteria of different but closely related genera (enterobacteria) and for different species or strains within a single genus (Esch- erichia or Salmonella), and finally how similar are the ribosomal operons within a single organism (Escherichia coli). These sequences have been analyzed by neighbor-joining, maximum likelihood, and parsimony. The robustness of each topology was assessed by bootstrap. Sequences were obtained for the seven rrn operons of E. coli strain PK3. These data demonstrated differences located in three highly variable domains. Their nature and localization suggest that since the divergence of E. coli and SaZmoneZZu typhimurium, most point mutations that occurred within each gene have been propagated among the gene family by conversions involving short domains, and that homogenization by conversions may not have affected the entire sequence of each gene. We show that the differences that exist between the different operons are ignored when sequences are obtained either after cloning of a single operon or directly from polymerase chain reaction (PCR) products. Direct sequencing of PCR products produces a mean sequence in which mutations present in the most variable domains become hidden. Cloning a single operon results in a sequence that differs from that of the other operons and of the mean sequence by several point mutations. For identification of unknown bacteria at the species level or below, a mean sequence or the sequence of a single nonidentified operon should therefore be avoided. Taking into account the seven operons and therefore mutations that accumulate in the most variable domains would perhaps increase tree resolution. However, if gene conversions that homogenize the rRNA multigene family are rare events, some nodes in phylogenetic trees will reflect these recombination events and these trees may therefore be gene trees rather than organismal trees. Introduction Ribosomal RNA sequences are now widely used for phylogenetic analyses and identifications of species. For phylogeny, rRNA sequences have become popular because they are universally present and have a con- served function (they are homologous in all organisms), because they are easy to sequence, and finally because they are constituted of an interspersion of highly con- served to very variable domains. Ribosomal RNA se- quences are now widely used in bacteriology because it is often easier to identify bacteria by specific nucleic acid sequences rather than by their biochemical or phys- iological traits. Moreover, the coupling of a polymerase chain reaction (PCR) amplification using universal prim- ers to molecular cloning allows the identification of mi- croorganisms that cannot be easily cultured under lab- oratory conditions (Giovannoni et al. 1990; Ward, Wel- ler, and Bateson 1990; Britschgi and Giovannoni 1991; Schmidt, Delong, and Pace 1991; Fuhrman, McCallum, and Davis 1992; Berchtold, Ludwig, and Kijnig 1994). The large ribosomal subunit RNA (LSU rRNA) has a larger information content than the small ribosomal subunit RNA (SSU rRNA), not only because of its larg- Key words: 16s rRNA, phylogeny, TTTZ operons, Escherichiu coli. Address for correspondence and reprints: Richard Christen, CNRS & UniversitC Paris 6, Observatoire OcCanologique, Station Zoologique, Villefranche sur mer, 06230 France. Email: [email protected]. Mol. Biol. Evol. 13(3):451461. 1996 0 1996 by the Society for Molecular Biology and Evolution. ISSN: 0737-4038 er size but also because it contains domains of rapid expansion and higher mutation rates (Hassouna, Michot, and Bachellerie 1984) that could be more adequate for a distance evaluation between closely related species. Despite their seemingly lower phylogenetic information content, the SSU rRNA sequences have become more widely used for estimating phylogenetic relationships among unicellular organisms (Fox, Wisotzkey, and Jurt- shuk 1992). However, SSU rRNA sequences may not be adequate to analyze phylogenetic relationships be- tween closely related species (not to speak of different strains within the same species) because there may not be enough differences between these sequences; this has been suggested by phylogenetic analyses of genera such as Bacillus (Ash et al. 1991; Riissler et al. 1991) or Vibrio (Ruimy et al. 1994). Another problem derives from the organization of rRNA genes (rDNA) as a multigene family, often as rrn operons in bacteria. In eubacteria for example there are two rrn operons in Mycobacterium smegmatis (Suzuki and Yamada 1988) and seven rrn operons in Escherichia coli and Salmonella typhimurium (Hill and Harnish 198 1); Bacillus subtilis is reported to have 9 or 10 rDNA per genome (Loughney, Lund, and Dahlberg 1982) and a few hundred copies are often present in eukaryotic genomes (Long and Dawid 1980). The members of a multigene family are subject to a homogenization pro- cess (Dover 1982, 1987; Ohta 1991) and rDNA se- 451 Downloaded from https://academic.oup.com/mbe/article/13/3/451/1057313 by guest on 17 December 2021
11

Sequence Heterogeneities Among 16s Ribosomal RNA Sequences

Feb 10, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Sequence Heterogeneities Among 16s Ribosomal RNA Sequences

Sequence Heterogeneities Among 16s Ribosomal RNA Sequences, and Their Effect on Phylogenetic Analyses at the Species Level

Vale’rie Cilia, Bgnkdicte Lafay, and Richard Christen CNRS 8z UniversitC Paris 6, Observatoire Ockanologique, Villefranche sur mer, France

We have analyzed what phylogenetic signal can be derived by small subunit rRNA comparison for bacteria of different but closely related genera (enterobacteria) and for different species or strains within a single genus (Esch- erichia or Salmonella), and finally how similar are the ribosomal operons within a single organism (Escherichia coli). These sequences have been analyzed by neighbor-joining, maximum likelihood, and parsimony. The robustness of each topology was assessed by bootstrap.

Sequences were obtained for the seven rrn operons of E. coli strain PK3. These data demonstrated differences located in three highly variable domains. Their nature and localization suggest that since the divergence of E. coli and SaZmoneZZu typhimurium, most point mutations that occurred within each gene have been propagated among the gene family by conversions involving short domains, and that homogenization by conversions may not have affected the entire sequence of each gene. We show that the differences that exist between the different operons are ignored when sequences are obtained either after cloning of a single operon or directly from polymerase chain reaction (PCR) products. Direct sequencing of PCR products produces a mean sequence in which mutations present in the most variable domains become hidden. Cloning a single operon results in a sequence that differs from that of the other operons and of the mean sequence by several point mutations. For identification of unknown bacteria at the species level or below, a mean sequence or the sequence of a single nonidentified operon should therefore be avoided. Taking into account the seven operons and therefore mutations that accumulate in the most variable domains would perhaps increase tree resolution. However, if gene conversions that homogenize the rRNA multigene family are rare events, some nodes in phylogenetic trees will reflect these recombination events and these trees may therefore be gene trees rather than organismal trees.

Introduction

Ribosomal RNA sequences are now widely used for phylogenetic analyses and identifications of species. For phylogeny, rRNA sequences have become popular because they are universally present and have a con- served function (they are homologous in all organisms), because they are easy to sequence, and finally because they are constituted of an interspersion of highly con- served to very variable domains. Ribosomal RNA se- quences are now widely used in bacteriology because it is often easier to identify bacteria by specific nucleic acid sequences rather than by their biochemical or phys- iological traits. Moreover, the coupling of a polymerase chain reaction (PCR) amplification using universal prim- ers to molecular cloning allows the identification of mi- croorganisms that cannot be easily cultured under lab- oratory conditions (Giovannoni et al. 1990; Ward, Wel- ler, and Bateson 1990; Britschgi and Giovannoni 1991; Schmidt, Delong, and Pace 1991; Fuhrman, McCallum, and Davis 1992; Berchtold, Ludwig, and Kijnig 1994).

The large ribosomal subunit RNA (LSU rRNA) has a larger information content than the small ribosomal subunit RNA (SSU rRNA), not only because of its larg-

Key words: 16s rRNA, phylogeny, TTTZ operons, Escherichiu coli.

Address for correspondence and reprints: Richard Christen, CNRS & UniversitC Paris 6, Observatoire OcCanologique, Station Zoologique, Villefranche sur mer, 06230 France. Email: [email protected].

Mol. Biol. Evol. 13(3):451461. 1996 0 1996 by the Society for Molecular Biology and Evolution. ISSN: 0737-4038

er size but also because it contains domains of rapid expansion and higher mutation rates (Hassouna, Michot, and Bachellerie 1984) that could be more adequate for a distance evaluation between closely related species. Despite their seemingly lower phylogenetic information content, the SSU rRNA sequences have become more widely used for estimating phylogenetic relationships among unicellular organisms (Fox, Wisotzkey, and Jurt- shuk 1992). However, SSU rRNA sequences may not be adequate to analyze phylogenetic relationships be- tween closely related species (not to speak of different strains within the same species) because there may not be enough differences between these sequences; this has been suggested by phylogenetic analyses of genera such as Bacillus (Ash et al. 1991; Riissler et al. 1991) or Vibrio (Ruimy et al. 1994).

Another problem derives from the organization of rRNA genes (rDNA) as a multigene family, often as rrn operons in bacteria. In eubacteria for example there are two rrn operons in Mycobacterium smegmatis (Suzuki and Yamada 1988) and seven rrn operons in Escherichia coli and Salmonella typhimurium (Hill and Harnish 198 1); Bacillus subtilis is reported to have 9 or 10 rDNA per genome (Loughney, Lund, and Dahlberg 1982) and a few hundred copies are often present in eukaryotic genomes (Long and Dawid 1980). The members of a multigene family are subject to a homogenization pro- cess (Dover 1982, 1987; Ohta 1991) and rDNA se-

451

Dow

nloaded from https://academ

ic.oup.com/m

be/article/13/3/451/1057313 by guest on 17 Decem

ber 2021

Page 2: Sequence Heterogeneities Among 16s Ribosomal RNA Sequences

452 Cilia et al.

quences tend to evolve in concert. A particular rRNA gene is probably more similar to a paralogous copy in the same organism than to its homologous copy in an- other species, but we do not know how the necessary homology condition is met when nonhomologous t-RNA genes are compared in different species, that is when the sequence of a single gene is obtained by cloning. Fi- nally, the differences that may exist between the differ- ent members of the t-RNA multigene family are probably ignored when sequences are obtained by direct sequenc- ing of RNA or of PCR products.

In order to analyze if a lack of resolution in phy- logenetic trees of closely related species could be the result either of a lack of divergence between rRNA sequences or of recombination events that may blur the hierarchichal phylogenetic information (Harvey et al. 1988; Harvey and Hill 1990; Sneath 1993) or because analyses are performed irrespective of operon homol- ogy, the following questions have been investigated: (1) What phylogenetic signal can be derived for bac- teria of different but closely related genera? (2) Is it possible to distinguish different species within a genus and to analyze their phylogeny? (3) How similar are the different operons within a single organism? Bac- teria belonging to the family Enterobacteriaceae have been chosen as a model system. Within this family, most genera are at least 20% related one to another as measured by reassociation of hybrid genomic DNA as a function of temperature: 20%-60% between Esche- richia coli and the genera Citrobacter, Klebsiella, or Erwinia; 40%-50% between E. coli and Salmonella and finally, more than 70% between E. coli and Shi- gella (Brenner and Falkow 1971; Krieg and Holt 1984). Sequences have been obtained for bacteria of each of these genera, for several species within two genera (Escherichia and Salmonella), for several strains of E. coli, and finally for the seven rrn operons of a single strain of E. coli. These sequences have been compared and analyzed by three phylogenetic methods: neighbor- joining, maximum likelihood, and parsimony, and the robustness of each topology has been assessed by boot- strap. It is concluded that the method used for obtain- ing a sequence (i.e., cloning versus PCR product direct sequencing) and the occurrence of recombinations be- tween operons are important parameters that should be taken into account for deriving phylogenies between closely related organisms.

Materials and Methods Bacterial Strains

Bacterial strains belonging to the family Entero- bacteriaceae for which SSU rRNA sequences are now available are listed in table 1. Escherichia sp. and Sal-

monella sp. are peculiar strains whose identifications by phenotypic methods could not be obtained at the species level, but that were characterized at the genus level. They were therefore considered of interest for our analysis. All phenotypic characterizations of “BioM” strains have been done by the API laborato- ries at the Institut BioMerieux. Escherichia coli 5 strain MC4 100 has been characterized previously (Casadaban 1976). Table 1 summarizes the methods used, when the information was available, for obtaining a sequence: with reverse transcriptase directly from RNAs (rRNA), with T7 DNA polymerase directly from PCR products of full genomic DNAs (PCR) or after cloning (E. coli 2 and 3), or isolation of an operon (rrnX).

DNA Preparation and Hybridization

Genomic DNA of E. coli PK3 strain was purified as previously described (Ausubel et al. 1991). DNA was digested using BamHI and Pstl (Hill and Harnish 1981). Restriction fragments were separated in two dif- ferent lanes on a 0.6% low melting agarose gel (Gibco BRL). Southern blot transfers of DNA were performed (Ausubel et al. 1991) on Nylon membrane Positive@ (Appligene, France). One lane was transferred on the membrane and the other lane was stored at 4°C. DNA fixation was achieved by baking for 15 min.

A DNA probe of 1.4 kb specific for E. coli SSU rDNA was generated by PCR. Unincorporated dNTP were separated from the probe with a sepharose CL- 4B column (Pharmacia). DNA probes were then la- beled by random priming with digoxygenin 1 l-dUTP using a DIG DNA labeling kit (Boehringer Mannheim) as recommended in the instructions. Hybridization and immunological detection of DIG-labeled probes were performed using a DIG-DNA detection kit according to manufacturer’s instructions (Boehringer Mannheim). The membrane was then superposed to the remaining half of the agarose gel. Bands corresponding to each rrn operon were cut off and used for amplification.

DNA PCR

All PCR reactions were carried out using Taq DNA polymerase supplied by Appligene France, par- affin wax (fusion temperature: 6O”C), and a hot start procedure. Reaction buffer (final concentrations: Tris- HCl, 50 mM; (NHJ2S04, 16 mM; KCl, 50 mM; MgCl,, 7 mM; bovine serum albumin, 0.2 mg/ml; pH 9.0), dNTP 250 PM each and 30 pmol of each primer were added in a 50-~1 volume below the wax layer. Taq DNA polymerase (2.5 U), 5 ~1 of DNA sample in 0.6% low melting agarose, and reaction buffer were in a 50-p,l volume above the wax layer. Reactions were carried out using a Techne PHC-2 (Techne UK) with the following set of conditions: 95”C, 3 min; 53OC, 2

Dow

nloaded from https://academ

ic.oup.com/m

be/article/13/3/451/1057313 by guest on 17 Decem

ber 2021

Page 3: Sequence Heterogeneities Among 16s Ribosomal RNA Sequences

16s t-RNA Heterogeneities 453

Table 1 Strains and rrn Operons Analyzed and Methods Used to Obtain SSU RNA Sequences

Specie@ Source (strain)b EMBLC Methodd

Escherichia coli 1 ............... Escherichia coli 2 ............... Escherichia coli 3 ............... Escherichia coli 4* .............. Escherichia coli 5* .............. Escherichia coli 6* .............. Escherichia coli 7* .............. Escherichia coli 8A .............. Escherichia coli 8B .............. Escherichia coli 8C .............. Escherichia coli 8D .............. Escherichia coli 8E .............. Escherichia coli 9 ............... Escherichia coli 10 .............. Escherichia sp.* ................ Escherichia vulneris* ............ Escherichia hennaniP ........... Shigella Jlexneri* ................ Shigella dysenteriae* ............ Shigella sonnei* ................ Salmonella give* ................ Salmonella paratyphi A* ......... Salmonella sp.* ................. Salmonella sofa* ............... Salmonella typhimurium* ......... Salmonella shomron* ............ Citrobacter freundii .............. Klebsiella pneumoniae* .......... Serratia marcescens ............. Erwinia carotovora .............. Escherichia coli 4A* ............. Escherichia coli 4B* ............. Escherichia coli 4C* ............. Escherichia coli 4D* ............. Escherichia coli 4E* ............. Escherichia coli 4G* ............. Escherichia coli 4H* .............

(PK3) (MC4100) CIP (ATCC 1 1775T) ATCC 25922

(Kl2) (Kl2) (Kl2) (Kl2) (Kl2)

BioM CIP (ATCC 33821T) BioM BioM BioM BioM BioM BioM ATCC 9712 BioM ATCC 13311 BioM

Nice, Pasteur hospital

(PK3) (PK3) (PK3) (PK3) (PK3) (PK3) (PK3)

JO1859 JO1695 VOO348 X8073 1 X80732 X80725 X80724 M87049 UOOO06 L10328 U18997 UOOO06 D15061 M29364 X80733 X80734 X80675 X80679 X80680 X80726 X80683 X80682 X80676 X80677 X8068 1 X80678 M5929 1 X80684 M59160 M59149 X8072 1 X80722 X80723 X80727 X80728 X80729 X80730

rRNA rrnB rrnB PCR PCR PCR PCR rrnA rmB rrnC rrnD rmE rrnH rrnG PCR PCR rRNA rRNA rRNA rRNA x-RNA rRNA rRNA rRNA rRNA rRNA

t-RNA

rrnA rrnB rrnC rrnD rrnE rrnG rrnH

a Asterisks indicate strains that have been sequenced for the purpose of this study. b Bacterial collection from which each strain has been obtained: BioM (BioMtrieux, Marcy l’Etiole, France), CIP (Col-

lection de 1’Institut Pasteur), ATCC (American Type Culture Collection). c Accession numbers under which each sequence is available. d Method by which each sequence has been obtained: rRNA indicates that total rRNA has been sequenced using reverse

transcriptase, PCR indicates that total PCR products have been sequenced using T7-DNA polymerase; rrnX indicates that the sequence of a single operon (X) has been obtained; finally the method of sequencing could not be retrieved for some sequences.

min; 72”C, 1 min followed by 94”C, 30 set; 53OC, 1 min; 72”C, 1 min for 25 cycles; a final cycle was 94”C, 30 set; 53”C, 1 min; 72”C, 5 min. Samples were pre- cipitated with PEG 8000 20%, NaCl 2.5 M (Paithankar and Prasad 1991) and purified on 1% low melting aga- rose in TAE buffer (Tris-acetate pH 7.0, 40 mM; EDTA, pH 8.0, 2 mM).

PCR product direct sequencing was carried out us- ing T7 DNA polymerase (USB). Direct sequencing re- actions of total PCR products were performed as pre- viously described (Ruimy et al. 1994).

Direct SSU r-RNA Sequencing

Ribosomal RNA purification and sequencing reac- tions using reverse transcriptase were carried out as pre- viously described (Ruimy et al. 1994).

Primer Positions

Ten DNA primers were used in the sequencing re- actions. These primers corresponded to the following positions in E. coli sequence: positions 99-l 19; posi- tions 342-356; positions 5 18-534; positions 684-702; positions 906-925; positions 1099-l 114; positions

Dow

nloaded from https://academ

ic.oup.com/m

be/article/13/3/451/1057313 by guest on 17 Decem

ber 2021

Page 4: Sequence Heterogeneities Among 16s Ribosomal RNA Sequences

454 Cilia et al.

1223-1240; positions 1384-1400; positions 1493-1509. A reverse primer (positions 8-28) was used in conjunc- tion with the last primer to amplify SSU rDNAs.

Phylogenetic Analysis

The phylogenetic data described below were ob- tained by alignment of the different sequences and phy- logenetic analyses. All operations were done using com- puter programs developed by us and available on request from R. Christen. The SSU rRNA sequences were aligned by eye. Domains used for deriving phylogenies were restricted to parts of the sequences for which ho- mologies were without doubt and did not include too many undetermined nucleotides; for figure 1 they were positions 14-78, 101-858, 862-l 124, 1137-1431 (cor- responding to E. coli 2 sequence). 5 ’ end and 3’ end analyses used the same domains but, respectively, up to and from position 665. For the Escherichia and Shigella species trees, we used positions 14-80, 94-205, 207- 418, 420-860, 863-1023, 1029-1381, 1385-1438. Do- mains used for the Salmonella species trees were: 3-78, 103-205, 207418, 420-1023, 1029-1431 (correspond- ing to E. coli 2 sequence). Individuals positions that contained undetermined nucleotides have also been re- moved from analyses. Although all available sequences are mentioned in table 1, they have not all been included in phylogenetic analyses shown in this paper because some of them were identical to other sequences (se- quences of E. coli 8 and 9) or because one of them was only a partial sequence (E. coli 10).

A neighbor-joining algorithm like that developed by Saitou and Nei (1987) was used. The program was rewritten to include inputs and outputs compatible with the ribosomal database and other programs developed in our laboratory. Distances were calcultated as uncorrect- ed percentages of divergence or using corrections (Jukes and Cantor 1969) and two-parameter correction (Kimura 1980). Identical trees were obtained by neighbor-joining whichever distance was used. Topologies shown in this paper were obtained using the correction of Jukes and Cantor for estimating distances. For parsimony, the PAUP program (Swofford 1992) for Macintosh com- puters was used. All topologies were first obtained by using the heuristic options. When several most-parsi- monious trees were obtained, a 100% consensus tree was constructed and treated as the most-parsimonious tree for constructing figures. For the parsimony data ma- trices only positions that had at least one mutation in one sequence were considered (nonuniform positions). For maximum likelihood, every position was taken into account in order to assess base frequency. Finally, a bootstrap analysis was performed (heuristic option, 100 replications) to check each topology for robustness. For maximum likelihood the fDNAm1 program rewritten by

G. J. Olsen (University of Illinois, Urbana) was used (using E Y, and G options). All trees were plotted by using a program developed by M. Gouy (University Claude Bernard, Lyon, France) that allows transforma- tion of a formal tree representation (Newick’s format) into MacDraw drawings.

EMBL Accession Numbers

All sequences obtained for the purpose of this study have been deposited in the EMBL databank under ac- cession numbers indicated in table 1.

Results

Phylogenetic analyses of SSU rRNA sequences representative of different enterobacterial genera were performed using three methods: neighbor-joining, max- imum parsimony, and maximum likelihood. Considering that E. coli and ShigeZZa in fact belong to the same bac- terial species because they share a genomic DNA relat- edness greater than 70%, and that Escherichia hermanii is probably misclassified as Escherichia (Hart1 1992), the phylogenetic analyses provided resolution at the ge- nus level as two robust monophyletic units were found in all analyses (fig. 1): (1) the genus Salmonella (S. give, S. paratyphi A, S. sp., S. Sofia, S. typhimurium, and S. shomron) and (2) the genera Escherichia (all of seven strains of E. coli, E. vuZneris, E. sp.) and ShigeZZa (S. sonnei, S. dysenteriae, S. jfexneri). The parsimony anal- ysis with a heuristic search option found 12 trees (for 23 sequences, 146 nonuniform characters, 81 informa- tive positions, tree length 238, consistency index ex- cluding uninformative characters 0.576, retention index 0.822). The two other methods produced very similar topologies as summarized in figure 1. The internal branches for these two monophyletic taxa were well supported, either by maximum likelihood (each branch showed a significantly positive length at P < 0.01) or in a bootstrap analysis using the parsimony method that showed, respectively, a support in 97% and 100% of bootstrap replications (fig. 1). Two other analyses were also undertaken, by excluding the leftmost or the right- most parts of the sequences. Analyses performed using the 5’ ends of the sequences produced a topology iden- tical to that obtained when considering the entire se- quences. However, trees obtained with the 3’ ends of the sequences showed significant differences. In particular, the two monophyletic units described above were not retrieved (fig. 1). The most significant changes (topol- ogies obtained using neighbor-joining, maximum parsi- mony and maximum likelihood) were for the positions of E. sp. and S. paratyphi A. Escherichia sp. clustered with Salmonella (but this was not supported in a boot- strap analysis), while S. paratyphi A clustered with

Dow

nloaded from https://academ

ic.oup.com/m

be/article/13/3/451/1057313 by guest on 17 Decem

ber 2021

Page 5: Sequence Heterogeneities Among 16s Ribosomal RNA Sequences

16s t-RNA Heterogeneities 455

r Escherichia coli 6

Escherichia coli 4

Escherichia coli 3 Escherkhia coli 2

Escherichia coli 1

Escherichia coli I

Escherichia coli 5

[5’; 9951 [S; 67%] 97% [3’; 93%] **

91% **

- Escherichia sp.

Salmonella paratyphiA

Salmonella shomron

Salmonella typhimurium

Escherichia he-ii

Citrobacter freundii

Klebsiella pneumoniae

Erwinia car&worn

FIG. 1 .-Phylogenetic relationships among Enterobacteriaceae in- ferred from SSU rRNA sequences. The topology shown is an unrooted tree obtained using a neighbor-joining algorithm. Branches also found by maximum likelihood (branches significantly positive at P < 0.01) are labeled with asterisks. Supports from a bootstrap analysis using parsimony are shown as percentages of replications above each branch (only percentage above 50% are indicated), these numbers also indicate branches retrieved in the most parsimonious tree (strict consensus of 12 trees). Internal branches retrieved by all three methods using the 5’ or the 3’ ends are indicated (as well as percentage of bootstrap repli- cations). For Escherichia coli strain PK3, the mean sequence obtained from direct sequencing of PCR products was used. A resolution at the genus level is achieved, since two robust monophyletic taxa regroup strains that belong to the same genus (see the text for more details).

Escherichia and ShigeZZa (supported at the 75% level of bootstrap replications).

These problems could result either from a lack of data or from inappropriate data. The two reduced data matrices used in parsimony had similar characteristics: for the 3’ end: 77 nonuniform characters, 45 informative positions, tree length 118, consistency index excluding noninformative characters 0.612, retention index 0.876; and 69 nonuniform characters, 36 informative positions, tree length 110, consistency index excluding noninfor- mative characters 0.610, retention index 0.790 for the 5 ’ end. A lack of support in bootstrap analyses suggested that there was at least a lack of significant data, as did the low distances between any two sequences used in the neighbor-joining approach. A reduced signal when only half of the sequences was used may however not be the only problem, as the presence of distant out- groups could bring a spurious signal in phylogenetic analyses because multiple mutations occurring at the same site may not be detected during the sequence alignment and the choice of characters retained for the phylogenetic analysis (Smith, Lafay, and Christen

a phylogenetic study was undertaken, restricted to the genera Escherichia, Shigella, and Salmonella. These phylogenetic analyses revealed the same pattern as anal- yses with an outgroup (data not shown), therefore ex- cluding the outgroup effect mentioned above.

We then analyzed how phylogenetic relationships were resolved within the monophyletic group that can be tentatively identified as the true genus Escherichia. For this taxon, the maximum parsimony analysis (heu- ristic search) found four trees (for 12 sequences, 34 non- uniform characters, 7 informative positions, tree length 40, consistency index excluding uninformative charac- ters 0.818, retention index 0.800). The branch separating all E. coli strains on the one hand and E. vulneris, S. dysenteriae, and S. flexneri on the other hand was sup- ported at the 70% level of bootstrap replications and was also observed in analyses using the two other methods (fig. 2). But the other internal branches could not be resolved. For the Salmonella taxon, maximum parsi- mony found three trees (for 6 sequences, 36 nonuniform characters, 7 informative positions, tree length 45, con- sistency index excluding uninformative characters 0.750, retention index 0.571). A single internal branch was retrieved by all three methods (and supported by 70% of bootstrap replications). In both taxa, internal branches that were retrieved with some robust support also corresponded to the deepest branchings observed in the general analysis of figure 1. The lack of resolution observed at the species level clearly resulted from a lack of significant data in all analyses. In particular, the data matrices used for parsimony had only seven informative positions for, respectively, 6 or 12 species.

In order to investigate if the lack of differences between species was indeed a lack of divergence or if it resulted from an inappropriate use of sequencing methods (see below), we then compared the SSU rRNA sequences of the different operons within a single or- ganism. Escherichia coli strain PK3 (Kahn 1968) was chosen because it is one of the parental strain of the recombinant bacteria “Salmorichia,” which can used in experimental studies to estimate the rates of recombi- nation between rrn operons (Rayssiguier, Thaler, and Radman 1989). The presence of seven rrn operons has been described for E. coli (Hill and Harnish 1981). These data were confirmed by DNA hybridization of a labeled probe to the genomic DNA of E. coli strain PK3, after digestion by a combination of restriction enzymes (see the Materials and Methods). These seven SSU rRNA genes were sequenced revealing that there were mutations in the different genes. These sequences were then aligned with the available SSU rRNA sequences of the other enterobacterial species and strains of E. coli retrieved from EMBL (fig. 3). This analysis showed that

1992). In order to examine this problem in more detail, these mutations were not-dispersed over-the entire length

Dow

nloaded from https://academ

ic.oup.com/m

be/article/13/3/451/1057313 by guest on 17 Decem

ber 2021

Page 6: Sequence Heterogeneities Among 16s Ribosomal RNA Sequences

456 Cilia et al.

D

B

I SalmoneUa paroryphiA

E F

FIG. 2.-Phylogenetic relationships among Escherichiu and Shigella or among Salmonella. The trees shown in this figure are unrooted trees obtained by the neighbor-joining method (trees A and D), by the maximum likelihood method (trees B and E), and by the maximum parsimony method (trees C and F). Different topologies were obtained with different methods. Branches significantly positive at P < 0.01 in maximum likelihood are labeled with asterisks. When several trees were obtained by parsimony, a strict consensus was computed. Supports in a bootstrap analyses are indicated as percentages of replications above each branch (only percentage above 50% are indicated); these numbers also indicate branches retrieved in the most parsimonious tree. Thicker lines indicate branches that were found in all three methods.

of the sequence but appeared in domains that were also variable in the SSU r-RNA sequences of other species (fig. 3), and that corresponded to helices in the second- ary structure (De Rijk et al. 1992). Identical mutations were often found when our sequences were compared to published sequences for homologous operons al- though probably for other strains of E. coli. The most obvious example was the 3’ end of the sequence of rmH (Nakayashiki et al. 1992; Miyamoto 1993), but some minor differences can also be found (fig. 3). These dif- ferences are difficult to analyze because strains have of- ten not been mentioned in previously published works. For strain PK3, it is striking that differences between operons do not seem to appear at random. Operons D and G had identical sequences but differed from any other operon at positions 80, 81, 90, 91, and 94. The sequence of operon D was different from any other op- eron at positions 25 1, 254, and 274. Finally, the se- quence of operon H was different from any other operon at positions 1004, 1008, 1012, 1021, 1022, 1023, 1024, 1025, and 1041. A “bulk” sequence was also obtained by amplification of total genomic DNA of E. coli PK3 followed by a direct sequencing of PCR products using T7 DNA polymerase. The readings of the sequencing gels did not show any strongly ambiguous position; some double bands could be observed in positions that corresponded to differences between r-m operons, but the second band was weak and would have been inter- preted as a small sequencing artifact in an usual reading.

As a result, the sequence obtained can indeed be con- sidered as a mean rRNA sequence for the organism con- sidered (compare this sequence with that of the seven operons in fig. 3).

A phylogenetic analysis including the sequences of the seven rrn operons showed that rmH clustered with E. sp. (fig. 4). The parsimony analysis found seven trees (for 19 sequences, 37 nonuniform characters, 15 infor- mative positions, tree length 46, consistency index ex- cluding noninformative characters 0.696, retention index 0.720). The internal branch linking E. coli rmH with E. sp. was supported at 78% level of bootstrap replication and showed a significantly positive length at P < 0.01 using the maximum likelihood method.

Discussion

Relationships among organisms are often presented as evolutionary trees. The resolution obtained in a given phylogenetic tree is generally thought to be a schematic picture of our understanding of the evolutionary history of these organisms. A group of species appears as a clade when they all descend from a node from which no other species that is not a member of this group also descend. In figure 1 for example, two clades can be ro- bustly identified, i.e., that of {Escherichia + Shigella} and that of {SaZmoneZZa}. However, further analyses could not derive decisive phylogenetic ingroup relation- ships within the {Escherichia + ShigeZZa} and Salmo-

Dow

nloaded from https://academ

ic.oup.com/m

be/article/13/3/451/1057313 by guest on 17 Decem

ber 2021

Page 7: Sequence Heterogeneities Among 16s Ribosomal RNA Sequences

16s rRNA Heterogeneities 457

1003 CGGAAGTTTT

1013 CAGAGATGAG

1023 *AAT*GTGCC

1033 TTCGGGAACC -________-

251 261 271 AGTAGGTGGG GTAACGGCTC ACCT

80 90 GAAGCTTGCT TCTTT

_-___-__ -- AC____---- GT-_C _____--___ _____ __________ CT_-*

Eco 4 rmABCE

Eco4rmD Eco4rrnG

Eco 4 rmH

Eco 1 Eco 2

Eco 3 Eco 4 Eco 5 Eco 6 EC07

Eco 8 rrnABE .&08tTIlC Eco%rmD Eco9rrnH Eco 10 rmG E. sp. E. vulneris E. he-ii Sh. jlexnen’ Sh. dysenteriae Sh. sonnei S. give

S. paratyphiA s. sp.

S. sofa S. typhimurium S. shomron C. fredii

K. pneumoniae Se. marcexens Er. carotovom

-__---____ __--- -________- -_________ __-- __________ _-_-- ---------- ----_----- --__

-_________ ---------- -_-_ --_---____ ___-- _-----_--- CT__C

_C________ G_---

CG________ G____

----___ __

AC____---- GT__C

---------- -____

__________ ___-______ __-- ---_-_____ ~--------- ---_ __________ ____A_____ ____

__________ __________ ____ T--T___--- __________ ____

-A---C---C ___---_- - ______

_c________ l ___c

---_-_____ ----- C-NN-----G NT-CG _C_______N *T--C

-C-----_-- GU-_C

_C_____--_ G____

AC_------- G,J__C

-C-------C G____

NNN+--N_- l ____

_C________ G__--

_C__-_---_ G____

_C______-N *---C

_c________ *___c

_G________ C-_-G

*G-_-_-_-N *__CG

-G-____--- C-C-G

_G__---___ N-_-G

__ _____^ -_________ --- ---------- --________ ____

__---_____ __________ __---___-- _--_-__--C -----_--GA A__G______

- --_ _-_ ---_--__ _ --___*

---_______ ______--__ _---____--

_--_-C---C -____---GA -“‘“G------

_____C._N --------GA _---____

-A--_-~-C _____-_-GA _T,“G______

-A---CCC-G” ------AC_ --GM_____

GA---C--G& -------CC _N’“G-_____

-~---C---C- --_--_-GA -wG ----_*

GA___C___C_ _-_-_--GA _,J”G --____

_A___“__GG” ______ACC _~A__----

__________ _-_----__y ______-_-_ -________u -- ---_ _ __________ --_

-- -__- _ -__--_--__ ___ _--“-___A_ __________ -__A

---___---N

---______”

-----__--”

_-----_-_”

_____----g

---------y

___--___-” ----____A_ ___-“_____ ____

FIG. 3.-Aligned SSU t-RNA sequences for Enterobacteriaceae and for the seven operons of E. coli strain PK3. Only parts of the sequences that showed differences from one operon to another are shown. Positions that were identical to that in the first sequence are indicated by dashes. Asterisks are used to show deletions necessary to maximize homologies. For strain PK3, individual operons were isolated and sequenced (Escherichia coli 4 rrn A-H), but a sequence was also obtained directly from PCR products using full genomic DNA (E. coli 4). Note that for each domain, a sequence that is different in one operon of strain PK3 matches a mean sequence of another species (underlined). Sequence for operon B was identical to that obtained after cloning of the same operon: E. coli 2 (Brosius, et al. 1978; Weisburg et al. 1991). Operon H was different from any other operon at positions strikingly located in the 3’ end of the sequence. Confirmation of the peculiar 3’ end sequence of operon H can be found elsewhere (Nakayashiki et al. 1992; Miyamoto 1993). Numbers refer to positions in E. coli 2 sequence. Abbreviations: Eco: Escherichia coli; E.: Escherichia; Sh.: Shigella; S. : Salmonella; C.: Citrobacter, K.: Klebsiella; Se.: Serratia; Er.: Erwinia. Eco4 rrn ABCE correspond to rrnA, B, C, and E of E. coli 4, which possess identical sequences. rrnA, B, and E of E. coli 8 were also identical and are referenced as Eco 8 rrn ABE.

nella clades, at least on the basis of SSU t-RNA sequenc- es analysis. This problem is clearly due to a lack of data because not enough differences exist within these genera between the SSU rRNA sequences of such closely re- lated species, at least as these sequences are currently obtained. Usually, a mean sequence of the different cop- ies of the rRNA genes is now obtained by direct se- quencing of rRNA molecules (Devereux et al. 1990; this work) or by direct sequencing of PCR products (Lawson et al. 1993; Rainey and Stackebrandt 1993; Willems and Collins 1993; Ruimy et al. 1994; this work). Such se- quences are filtered either because mutations present only in a single gene copy are not apparent or because bases are scored as undetermined when ambiguities are apparent. The success of this filtering effort was verified by comparing bulk sequences obtained by direct se- quencing of PCR products or rRNA with individual se- quences obtained after isolation of each operon (fig. 3).

Most of the variable positions that have been ob- served from one operon to another are located in do- mains of high mutation rates. These highly divergent

domains are usually excluded from the data matrix when a phylogenetic analysis is undertaken for resolving dis- tant relationships because these positions would include too many characters that are obviously homoplasic. It is therefore not important that the existence of a hetero- geneity is not known or not taken into account, provided that these characters are removed from phylogenetic analyses. Any sequencing method mentioned above is then probably perfectly appropriate for determining the sequences of conserved domains. It is worth noting that the sequences of conserved domains as determined from different operons in different species do not respect the homology condition, but they are identical in practice within a clade of related organisms probably because the rate of fixation of mutations allowed by the selective pressure is lower than the rate of homogenization through conversion (see below). As heterogeneity be- tween the different members of the rrn family has al- ready been described in Mycoplasma mycoides and My- coplasma sp. (Bascufiana et al. 1994; Pettersson, Jo- hansson, and Uhlen 1994), our results suggest that het-

Dow

nloaded from https://academ

ic.oup.com/m

be/article/13/3/451/1057313 by guest on 17 Decem

ber 2021

Page 8: Sequence Heterogeneities Among 16s Ribosomal RNA Sequences

458 Cilia et al.

94% **

-.chctichia coli 7

** LEscherichia coli 4 & 8 rmD

l- Escherichia coli 1

scherichia co/i 4 (mean sequence)

scherichia coli 3

scherichia coli 2

scherichia coli 4 rmE

schen’chia coli 4 rmC

scherichio coli 4 rmB

scherichia coli 4 rmA

I Escherichia sp.

Escherichia coli 4 rmH

FIG. 4.-Phylogenetic relationships among the seven rrn operons of Escherichia coli PK3. The topology shown is an unrooted tree obtained using a neighbor-joining algorithm. Branches also found by maximum likelihood (branches significantly positive at P < 0.01) are labeled with asterisks. Supports from a bootstrap analysis using parsimony are shown as percentages of replications above each branch (only per- centage above 50% are indicated). These numbers also indicate branch- es retrieved in the most parsimonious tree.

erogeneity among rrn operons might be a common phenomenon at least in bacteria. Because this hetero- geneity increases the number of differences that can be analyzed between the rRNA sequences of any two or- ganisms, obtaining a more robust phylogeny between closely related organisms may be considered after se- quencing each operon, determining operon homology and comparing sequences that are homologous. Al- though this approach may seem sound, it is probably not appropriate because of a likely loss of homology at the nucleotide level that will result from frequent con- versions (discussed below).

The differences observed between the seven oper- ons that we have sequenced are not located randomly over the entire SSU t-RNA sequence, but they are lo- calized for each operon within a single particular do- main of high evolutionary rate (fig. 3). Because two dif- ferent operons can bear identical mutations all located over short domains (operons D and G between positions 71 and 100 for example), this suggests that the inde- pendent appearance of point mutations in the sequence of each operon has been obliterated by recombination events. The alternative hypothesis would be that these identical mutations at five different positions have ap- peared independently in two sequences. Although these

mutations are compensatory mutations because they do not alter the secondary structure, it seems unlikely that five identical mutations occurred in a single variable do- main for only two operons and not in the other domains nor in the other operons (see operons D and G in po- sitions 251-280). A decisive proof that such domain identity is the result of gene recombination and not of convergent point mutations would require a statistical analysis including data of several closely related strains of E. coli, but we suggest that recombinations have oc- curred and that they did not affect the entire length of the SSU rRNA sequences. Finally, the very different sequence observed in the 1001-1050 domain of operon H and the 251-280 domain of operon D are very similar to some bulk sequences of species belonging to other genera.

The SSU RNA sequence of each operon within a particular line of descent probably results from the su- perposition of different phenomena: (1) an initially het- erogeneous multigenic family when the taxon diverged, (2) fixation of random mutations, (3) gene conversions within the multigene family, and (4) lateral transfers. Gene conversions affecting short domains are suggested by our results that show small identical domains in dif- ferent operons. It is still difficult to ascertain the origin of the differences that allow identification of such re- combinations within a single organism. They may result either from point mutations since the establishment of the lineage, from a heterogenous rrn family at the time of divergence or from a lateral transfer. For E. coli, lat- eral transfers can occur even from distantly related or- ganisms such as S. typhimurium (Rayssiguier, Thaler, and Radman 1989). An interesting hypothesis is that the sequences observed presently in strain PK3 are in part the result of a polymorphic rm family in the bacteria that was the common ancestor at least to Escherichia and Salmonella and an absence of homogenizing con- version between some domains (3’ end of rrnH for ex- ample) and the rest of the family during the lapse of time that separates strain PK3 from this ancestor. In this view, it can be predicted that some strains of E. coli have probably experienced a conversion and that these domains have been replaced by a typical E. coli se- quence in these strains.

After the completion of homogenization within the rRNA family in a single genome, a direction of homog- enization can be defined by considering which particular operonic sequence has replaced the ancestral sequences in all other operons. If the homogenization process does not involve the entire SSU-rRNA sequence, different di- rections of homogenization can occur for different do- mains of the molecule; as a result, even without se- quencing each operon, this phenomenon could be traced from the observation of conflicting trees when, respec-

Dow

nloaded from https://academ

ic.oup.com/m

be/article/13/3/451/1057313 by guest on 17 Decem

ber 2021

Page 9: Sequence Heterogeneities Among 16s Ribosomal RNA Sequences

16s r-RNA Heterogeneities 459

tively, the 5’ end or the 3’ end of mean sequences or different variable domains are used in a phylogenetic analysis. Apart from our observations, other studies have demonstrated this problem either with SSU rRNA sequences (Sneath 1993) or with the multigene families of heat-shock HSP70 (Boorstein, Ziegelhoffer, and Craig 1994) and chorion proteins (Regier et al. 1994) for which trees derived from the C-terminal domain differed from that obtained using the N termini. Inheritance of a polymorphism followed by a subsequent complete ho- mogenization through conversions would cause all gene trees to become coherent and display robust taxa that may reflect a similarity in the directions of conversions rather than a true monophyletic assemblage of species. This is a well known problem that phylogenetic analyses reflect the genealogy of the genes, not necessarily that of the organisms that possess the sequences in question (Nei 1987). One way to circumvent such chance phy- logenetic relationships due to historical recombinations is to use another gene, such as the rRNA gene for the large subunit, which is expected to be differently ho- mogenized by conversion or to use a different type of gene, such as a single copy gene not liable to gene con- version (Barcak and Wolf 1988; Nelson, Whittam, and Selander 1991; Nelson and Selander 1992; Boyd et al. 1994). However, lateral transfers have been demonstrat- ed for many single copy genes, resulting in conflicting phylogenies when different genes are studied (Nelson, Whittam, and Selander 199 1; Nelson and Selander 1992, 1994); an example of such conflict can be observed for the position of E. vulneris in our study and that of Law- rence and collaborators (Lawrence, Ochman, and Hart1 1991). Organismal trees need to be derived from com- parisons of phylogenies of several genes widely spaced on the genome in order to avoid as much as possible the effect of lateral transfers. However, hierarchical re- lationships and a rigid concept of species may not al- ways apply at the gene level, because evolutionary pro- cesses are probably in part reticulate as opposed to strictly hierarchical (“modification with descent”) and taxon dependent.

Considering the large database of SSU rRNA se- quences now available (Maidak et al. 1994), the ease with which these sequences are now obtained for any organism, and their wide use in bacterial systematics, it will be of importance to determine the rates at which gene homogenizations and lateral transfers occur in the rRNA multigene family. A precise assessment of the relations between the species tree and the gene tree will require a knowledge of the respective rates of speciation, recombination, and point mutation. Finally, one should note that some care should be taken when deriving spe- cies-specific probes, especially for a detection by PCR, as a single operon in one species might bear a sequence

that is present in the majority of the operons in another species (see fig. 3). How often such problems can be encountered is still difficult to assess, because we lack data for individual operonic sequences in closely related organisms and because we have only poor estimates for the frequencies of recombinations for rrn operons (Milkman and MC Kane Bridges 1990, 1993; Medigue et al. 1991; Hart1 1992; Lenski 1993; Guttman and Dy- khuizen 1994).

Acknowledgments

This work was supported by the CNRS, fundings from BioMerieux, and an MRT fellowship to V. Cilia. We thank Bernard Michot, Christiane Rayssiguier, Man- 010 Gouy, and Andrew B. Smith for reading this manu- script.

LITERATURE CITED

ASH, C., J. A. E. FARROW, S. WALLBANKS, and M. D. COLLINS. 199 1. Phylogenetic heterogeneity of the genus Bacillus re- vealed by comparative analysis of small-subunit-ribosomal RNA sequences. Lett. Appl. Microbial. 13:202-206.

AUSUBEL, E M., R. BRENT, R. E. KINGSTON, D. D. MOORE, J. G. SEIDMAN, J. A. SMITH, and K. STRUHL, eds. 1991. Cur- rent protocols in molecular biology. John Wiley & Sons, New York.

BARCAK, G. J., and J. WOLF. 1988. Comparative nucleotide sequence analysis of growth-rate-regulated gnd alleles from natural isolates of Escherichia coli and from Salmonella typhimurium LT-2. J. Bacterial. 170:372-379.

BASCU&XNA, C. R., J. G. MATTSSON, G. BOLSKE, and K. E. JOHANSSON. 1994. Characterization of the 16s rRNA genes from Mycoplasma sp. strain F38 and development of an identification system based on PCR. J. Bacterial. 176:2577- 2586.

BERCHTOLD, M., W. LUDWIG, and H. K~NIG. 1994. 16s rDNA sequence and phylogenetic position of an uncultivated spi- rochete from the hindgut of the termite Mustotermes dur- winiensis Froggatt. FEMS Microbial. Lett. 123:269-274.

BOORSTEIN, W. R., T. ZIEGELHOFFER, and E. A. CRAIG. 1994. Molecular evolution of the HSP70 multigene family. J. Mol. Evol. 38: 1-17.

BOYD, E. E, K. NELSON, E S. WANG, T. S. WHITTAM, and R. K. SELANDER. 1994. Molecular genetic basis of allelic poly- morphism in malate dehydrogenase (mdh) in natural pop- ulations of Escherichia coli and Salmonella enterica. Proc. Natl. Acad. Sci. USA 91: 1280-1284.

BRENNER, D. J., and S. FALKOW. 1971. Molecular relationships among members of Enterobacteriaceae. Adv. Genet. 16:81- 118.

BRITSCHGI, T. B., and S. J. GIOVANNONI. 1991. Phylogenetic analysis of a natural marine bacterioplankton population by rRNA gene cloning and sequencing. Appl. Environ. Micro- biol. 57:1707-1713.

BROSIUS, J., M. L. PALMER, J. P KENNEDY, and H. E NOLLER. 1978. Complete nucleotide sequence of a 16s ribosomal

Dow

nloaded from https://academ

ic.oup.com/m

be/article/13/3/451/1057313 by guest on 17 Decem

ber 2021

Page 10: Sequence Heterogeneities Among 16s Ribosomal RNA Sequences

460 Cilia et al.

RNA gene from Escherichia coli. Proc. Natl. Acad. Sci. USA 75:4801-4805.

CASADABAN, M. J. 1976. Transposition and fusion of the lac genes to selected promoters in Escherichia coli using bac- teriophage lambda and mu. J. Mol. Biol. 104541-555.

DE RLIK, I?, J. M. NEEFS, Y. VAN DE PEER, and R. DE WACHTER. 1992. Compilation of small ribosomal subunit RNA se- quence. Nucleic Acids Res. 20:2075-2089.

DEVEREUX R., S. HE, C. L. DOYLE, S. ORKLAND, D. A. STAHL, J. LEGALL, and W. B. WHITMAN. 1990. Diversity and origin of desulfovibrio species: phylogenetic definition of a family. J. Bacterial. 172:3609-3619.

DOVER, G. 1982. Molecular drive: a cohesive mode of species evolution. Nature 299: 11 l-l 17.

-. 1987. DNA turnover and the molecular clock. J. Mol. Evol. 26:47-58.

Fox, G. E., J. D. WISOTZKEY, and F? J. JURTSHUK. 1992. How close is close: 16s rRNA sequence identity may not be sufficient to guarantee species identity. Int. J. Syst. Bacter- iol. 42: 166-170.

FUHRMAN, J. A., K. MCCALLUM, and A. A. DAVIS. 1992. Nov- el major archaebacterial group from marine plankton. Na- ture 356: 148-149.

GIOVANNONI, S. J., T. B. BRITSCHGI, C. L. MOYER, and K. G. FIELD. 1990. Genetic diversity in Sargasso sea bacterio- plankton. Nature 345:60-63.

GUTTMAN, D. S., and D. E. DYKHUIZEN. 1994. Clonal diver- gence in Escherichia coli as a result of recombination, not mutation. Science 266: 1380-1383.

HARTL, D. 1992. Population genetics of microbial organisms. Curr. Opin. Genet. Dev. 2:937-942.

HARVEY, S., and C. W. HILL. 1990. Exchange of spacer regions between rRNA operons in Escherichia coli. Genetics 125: 683-690.

HARVEY, S., C. W. HILL, C. SQUIRES, and C. L. SQUIRES. 1988. Loss of the spacer loop sequence from the rrnB operon in the Escherichia coli K- 12 subline that bears the relA1 mu- tation. J. Bacterial. 170:1235-1238.

HASSOUNA, N., B. MICHOT, and J. I? BACHELLERIE. 1984. The complete nucleotide sequence of mouse 28s rRNA gene. Implications for the process of size increase of large subunit rRNA in higher eukaryotes. Nucleic Acids Res. 12:3563- 3583.

HILL, C. W., and B. W. HARNISH. 1981. Inversion between ribosomal RNA genes of Escherichia coli. Proc. Natl. Acad. Sci. USA 787069-7072.

JUKES, T. H., and C. R. CANTOR. 1969. Evolution of protein molecules. Pp. 21-132 in H. N. MUNTU, ed. Mammalian protein metabolism, Vol. III. Academic Press, New York.

KAHN, I? L. 1968. Isolation of high-frequency recombining strains from Escherichia coli containing the V colicinogenic factor. J. Bacterial. 96:205-214.

KIMURA, M. 1980. A simple method for estimating evolution- ary rate of base substitutions through comparative studies of nucleotides sequences. J. Mol. Evol. 16: 11 l-120.

KRIEG, N. R., and J. G. HOLT. 1984. Bergey’s manual of sys- tematic bacteriology, Vol. 1. Williams & Wilkins, Balti- more, London.

LAWRENCE, J. G., H. OCHMAN, and D. L. HARTL. 1991. Mo- lecular and Evolutionary relationships among enteric bac- teria. J. Gen. Microbial. 137:1911-1921.

LAWSON, I? A., I? LLOP-PEREZ, R. A. HUTSON, H. HIPPE, and M. D. COLLINS. 1993. Towards a phylogeny of the clostridia based on 16s rRNA sequences. FEMS Microbial. Lett. 113: 87-92.

LENSKI, R. 1993. Assessing the genetic structure of microbial populations. Proc. Natl. Acad. Sci. USA 90:4334--4336.

LONG, E. O., and I. B. DAWID. 1980. Repeated genes in eu- karyotes. Annu. Rev. Biochem. 49:727-764.

LOUGHNEY, K., E. LUND, and J. E. DAHLBERG. 1982. tRNA genes are found between the 16s and 23s rRNA genes in Bacillus subtilis. Nucleic Acids Res. 10: 1607-1625.

MAIDAK, B. L., N. LARSEN, M. J. MCCAUGHEY, R. OVERBEEK, G. J. OLSEN, K. FOGEL, J. BLANDY, and C. R. WOESE. 1994. The ribosomal database project. Nucleic Acids Res. 22: 3485-3488.

M~DIGUE, C., T. ROUXEL, l? VIGIER, A. HENAUT, and A. DAN- CHIN. 1991. Evidence for horizontal gene transfer in Esch- erichia coli speciation. J. Mol. Biol. 222:851-856.

MILKMAN, R., and M. MC KANE BRIDGES. 1990. Molecular evolution of the Escherichia coli chromosome. III. Clonal frames. Genetics 126:505-5 17.

-. 1993. Molecular evolution of the Escherichia coli chromosome. IV. Sequence comparisons. Genetics 133: 455-468.

MIYAMOTO, K. 1993. Nucleotide sequence of 5’ flanking re- gion of the ribosomal RNA. Unpublished material. EMBL, access number: D 1506 1.

NAKAYASHIKI, T., T. ADACHI, T. MIKI, and H. INOKUCHI. 1992. Nucleotide sequence of the rrnH gene from Escherichia coli. Unpublished material. EMBL, access number: D12649.

NEI, M. 1987. Molecular evolutionary genetics. Columbia Uni- versity Press, New York.

NELSON, K., and R. K. SELANDER. 1992. Evolutionary genetics of the proline permease gene (PutP) and the control region of the proline utilization operon in populations of Salmo- nella and Escherichia coli. J. Bacterial. 174:6886-6895.

-. 1994. Intergeneric transfer and recombination of the 6-phosphogluconate dehydrogenase gene (gnd) in enteric bacteria. Proc. Natl. Acad. Sci. USA 91: 10227-10231.

NELSON, K., T. S. WHITTAM, and R. K. SELANDER. 199 1. Nu- cleotide polymorphism and evolution in the glyceraldehyde- 3-phosphate dehydrogenase gene (gapA) in natural popu- lation of Salmonella and Escherichia coli. Proc. Natl. Acad. Sci. USA 88:6667-667 1.

OHTA, T. 199 1. Multigene families and the evolution of com- plexity. J. Mol. Evol. 33:34-41.

PAITHANKAR, K. R., and K. S. N. PRASAD. 1991. Precipitation of DNA by polyethylene glycol and ethanol. Nucleic Acids Res. 19: 1346.

PETTERSSON, B., K. E. JOHANSSON, and M. UHLEN. 1994. Se- quence analysis of 16s rRNA from mycoplasmas by direct solid-phase sequencing. Appl. Environ. Microbial. 60: 2456-246 1.

Dow

nloaded from https://academ

ic.oup.com/m

be/article/13/3/451/1057313 by guest on 17 Decem

ber 2021

Page 11: Sequence Heterogeneities Among 16s Ribosomal RNA Sequences

16s rRNA Heterogeneities 461

RAINEY, E A., and E. STACKEBRANDT. 1993. 16s rDNA anal- ysis reveals phylogenetic diversity among the polysacchar- olytic clostridia. FEMS Microbial. Lett. 113: 125-128.

RAYSSIGUIER, C., D. S. THALER, and M. RADMAN. 1989. The barrier to recombination between Escherichia coli and Sal- monella typhimurium is disrupted in mismatch repair mu- tants. Nature 342:39wOl.

REGIER, J. C., B. M. WIEGMANN, R. E LECLERC, and T. I? FRIEDLANDER. 1994. Loss of phylogenetic in chorion gene families of Bombyx mori by gene conversion. Mol. Biol. Evol. 11:72-87.

R~SSLER, D., W. LUDWIG, K. H. SCHLEIFER, C. LIN, T J. Mc- GILL, J. D. WISOTZKEY, J. R. JURTSHUK, and G. E. Fox. 199 1. Phylogenetic diversity in the genus Bacillus as seen by 16s rRNA sequencing studies. System. Appl. Microbial. 14:266-269.

RUIMY, R., V. BREIT~MAYER, l? ELBAZE, B. LAFAY, 0. Bous- SEMART, and R. CHRISTEN. 1994. Phylogenetic analysis and assessment of the genera Vibrio, Photobacterium, Aero- monus, and Plesiomonus deduced from small-subunit rRNA sequences. Int. J. Syst. Bacterial. 44:416426.

SAITOU, N., and M. NEI. 1987. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4:4Ow25.

SCHMIDT, T. M., E. E DELONG, and N. R. PACE. 199 1. Analysis

SMITH, A. B., B. LAFAY, and R. CHRISTEN. 1992. Comparative variation of morphological and molecular evolution through geologic time: 28s ribosomal RNA versus morphology in echinoids. Phil. Trans. R. Sot. Lond. B 338:365-382.

SNEATH, F? H. A. 1993. Evidence from Aeromonus for genetic crossing-over in ribosomal sequences. Int. J. Syst. Bacterial. 43:626629.

SUZUKI, Y., and T. YAMADA. 1988. Study on rRNA genes in Mycobucterium smegmatis. Microbial. Immunol. 32: 1259- 1262.

SWOFFORD, D. 1992. PAUP: phylogenetic analysis using par- cimony, version 3.0s. Illinois Natural History Survey, Champaign.

WARD, D. M., R. WELLER, and M. M. BATESON. 1990. 16s rRNA sequences reveal numerous uncultured microorgan- isms in a natural community. Nature 345:6365.

WEISBURG, W. G., S. M. BARNS, D. A. PELLETIER, and D. L. LANE. 1991. 16s ribosomal DNA amplification for phylo- genetic study. J. Bacterial. 173:697-703.

WILLEMS, A., and M. D. COLLINS. 1993. Phylogenetic analysis of Rhizobiu and Agrobucteriu based on 16s t-RNA gene sequences. Int. J. Syst. Bacterial. 43:305-313.

PAUL M. SHARP, reviewing editor

of a marine picoplankton community by 16s rRNA cloning and sequencing. J. , Bacterial. 173:4371-4378.

gene Accepted November 8, 1995

Dow

nloaded from https://academ

ic.oup.com/m

be/article/13/3/451/1057313 by guest on 17 Decem

ber 2021