-
Proc. Natl. Acad. Sci. USAVol. 92, pp. 11829-11833, December
1995Genetics
Characterization of repetitive DNA in the Mycoplasmagenitalium
genome: Possible role in the generation ofantigenic variation
(genetic variability/adhesins/minimal genome)
SCOTr N. PETERSON*t, CAMELLA C. BAILEYt§, J0RGEN S. JENSENS,
MARTIN B. BORRET, ELIZABETH S. KINGt,KENNETH F. Borr*t, AND CLYDE
A. HUTCHISON LII*tII*Curriculum in Genetics and :Department of
Microbiology and Immunology, University of North Carolina, Chapel
Hill, NC 27599; andIMycoplasma Laboratory, Neisseria Department,
Statens Seruminstitut, DK-2300 Copenhagen, Denmark
Contributed by Clyde A. Hutchison III, September 7, 1995
ABSTRACT We have characterized a family of repetitiveDNA
elements with homology to the MgPa cellular adhesionoperon of
Mycoplasma genitalium, a bacterium that has thesmallest known
genome of any free-living organism. Oneelement, 2272 bp in length
and flanked by DNA with nohomology to MgPa, was completely
sequenced. At least fourothers were partially sequenced. The
complete element is acomposite of six regions. Five of these
regions show sequencesimilarity with nonadjacent segments of genes
of the MgPaoperon. The sixth region, located near the center of
theelement, is an A+T-rich sequence that has only been found inthis
repeat family. Open reading frames are present within thefive
individual regions showing sequence homology to MgPaand the
adjacent open reading frame 3 (ORF3) gene. However,termination
codons are found between adjacent regions ofhomology to the MgPa
operon and in the A+T-rich sequence.Thus, these repetitive elements
do not appear to be directlyexpressible protein coding sequences.
The sequence of oneregion from five different repetitive elements
was comparedwith the homologous region of the MgPa gene from the
typestrain G37 and four newly isolated M. genitalium
strains.Recombination between repetitive elements of strain G37
andthe MgPa operon can explain the majority of polymorphismswithin
our partial sequences of the MgPa genes of the newisolates.
Therefore, we propose that the repetitive elements ofM. genitalium
provide a reservoir of sequence that contributesto antigenic
variation in proteins of the MgPa cellular adhe-sion operon.
Mycoplasmas are a class of wall-less bacteria with
genomesranging from 1700 kb to as little as 580 kb as in the case
ofMycoplasma genitalium (1-4). It has been estimated that
M.genitalium has the capacity to encode "400 proteins (5, 6).
Itsgenome is arranged in a conservative manner, making heavyuse of
operons and having minimal spacer regions betweencoding sequences
(6, 7). All indications suggest that mycoplas-mas are under
selective pressure to reduce their genome sizeto that of a minimal
system (8). In light of this, it was somewhatsurprising that M.
genitalium, as well as several other Myco-plasma species, possess
repetitive DNAs in their genomes (1,9-11). In M. genitalium it has
been estimated that repetitiveDNA represents up to 4% of the genome
(6).
Repetitive DNA is a common feature of many bacterialgenomes. In
certain cases, this DNA has the clearly establishedfunction of
creating antigenically distinct cells among a pop-ulation (12-15).
Such genetic variation may be important inevading host immune
responses as well as optimizing thebacterial surface for
colonization of a particular host. In other
The publication costs of this article were defrayed in part by
page chargepayment. This article must therefore be hereby marked
"advertisement" inaccordance with 18 U.S.C. §1734 solely to
indicate this fact.
cases, repetitive DNA has been characterized at the
molecularlevel, but its function remains unclear.
Repetitive DNA in M. genitalium has sequence homology tothe
three-gene operon that encodes one of the major surfaceproteins,
the adhesin molecule MgPa (16). MgPa is highlyimmunogenic and
necessary for attachment of the organism tothe host epithelium
(17-19). Spontaneous mutants of M.genitalium that lack or are
unable to correctly localize MgPalose the ability to cause
hemagglutination or cellular adhesion,a phenotype that has been
associated with the ability of theorganism to attach to host
epithelial cells in vivo (17, 18). Dalloand Baseman (11) divided
the MgPa gene into 10 restrictionfragments (A-J; see Fig. 1A) and
used these individual frag-ments to probe restriction digests of
genomic M. genitaliumDNA. They showed that certain regions of the
MgPa gene wererepeated, while other regions were present only once,
in theoperon itself (11).To investigate the role of repetitive DNA
in the minimal
genome of M. genitalium, we have analyzed repetitive
DNAsequences and compared them with sequences from the func-tional
MgPa operon in new clinical isolates. * * Our datasupport the idea
that maintenance of these sequences by thisminimal organism is
required to provide a mechanism forantigenic variation.
MATERIALS AND METHODSM. genitalium Strains. Type strain G37 was
generously
provided by P.-c. Hu (University of North Carolina, ChapelHill).
This strain was originally obtained from the AmericanType Culture
Collection (no. 33530) in 1983 and was propa-gated in modified
Hayflick's medium (30) containing agammahorse serum, 10% yeast
dialysate, and penicillin G (1000units/ml). It is referred to here
as G37-US. Four new strains ofM. genitalium were isolated as
described (20). The G37 strainused by one of us (J.S.J.) was
obtained from David Taylor-Robinson in 1982 after its initial
isolation but prior to depositinto the American Type Culture
Collection. This strain waspropagated in modified Friis FF medium
(31); it is referred tohere as G37-DK.
Clones and Sequencing. Escherichia coli clones
containingfragments of M. genitalium G37-US DNA in pUC118
weresequenced partially as part of a random sequencing project
(6).
Abbreviation: ORF, open reading frame.tPresent address:
Burroughs Wellcome, Division of Cell Biology,Research Triangle
Park, NC 27709.§Present address: Center for Vaccine Development,
University ofMaryland at Baltimore, Baltimore, MD 21201."To whom
reprint requests should be addressed.**The sequences reported in
this paper have been deposited in theGenBank data base (accession
nos. U34967-U34970 and X91071-X91075 as identified in Figs. 1 and
2).
11829
Dow
nloa
ded
by g
uest
on
May
30,
202
1
-
Proc. Natl. Acad. Sci. USA 92 (1995)
In this study we have performed a more complete analysis ofsome
of these clones and of some clones from the same projectthat were
not previously sequenced. DNA inserts from clonesSAll, SC2, SE6,
SF4, SG8, and XF2, each -2 kb, were usedto prepare sonication
libraries. Sonicated DNA fragmentswere electrophoresed in a 1%
low-melting-point agarose gel.DNA fragments of >300 nt were
repaired with the Klenowfragment of DNA polymerase I by standard
procedures.Repaired fragments were cloned into the HinclI site
ofpUC118 and sequenced with the Universal primer. In a fewcases,
gaps in sequence were closed by use of specific oligo-nucleotide
primers. In such cases, sequence was generallydetermined for only
one of the two strands.
Sequence Analysis. Sequence data were analyzed with theGenetics
Computer Group (Madison, WI) program package(GCG) version 7.0 (21).
Sequences were read manually andentered by using the program SEQED.
The Staden programs forshotgun sequencing (22) were used to
assemble sequencesobtained from sonication libraries. The Genetics
ComputerGroup program GAP was used for comparing individual
se-quences. The programs PILEUP or LINEUP were used formultiple
sequence alignments. Data base searches were per-formed on flanking
sequences and the A+T-rich repetitiveDNA region by using the
program FASTA (23).
Analysis of MgPa Genes from Four New M. genitaliumIsolates.
Crude boiled lysates of strains G37-DK, M2288,M2300, M2321, and
M2341 were subjected to PCR primed bysequences from nonrepetitive
parts of the MgPa operonflanking the B region (see Fig. 1):
Mgpat-1010 (5'-AAATT-AGTGATGTTGTTAGTGATTGTGTG) and
Mgpat-3072(5'-TAGGGGAGTGTTGGTTAGTTTGTTAGA). Individ-ual PCR
products were digested with BstNI, and the 1008-bpfragment was
recovered by electrophoresis through an agarosegel and was purified
with Prep-A-Gene (Bio-Rad). The frag-ments were blunt-ended with
phage T4 polymerase and clonedinto the EcoRV site of pBluescript.
Cloned fragments weresequenced with the Applied Biosystems
cycle-sequencing kitwith dye-terminators by using oligonucleotide
sequencingprimers. Usually only one representative clone was
sequencedin conserved regions. In regions where variability
betweensequences was encountered, both strands of more than
oneclone were sequenced. Sequences were read by an
AppliedBiosystems model 373A automatic sequencer.
RESULTS AND ANALYSISSequences of Repetitive Elements. Fifteen
clones containing
repetitive DNA homologous to the MgPa operon were iden-tified in
a random sequencing analysis of the M. genitaliumgenome (6). These
clones contained DNA fragments thatshowed 71-92% nucleotide
sequence identity to regions of theMgPa operon. Additional sequence
was obtained for several ofthese clones. Extended regions of exact
sequence identity orshared non-MgPa flanking sequence allowed the
sequencesfrom certain clones to be joined. The resulting nine
contiguoussequences, each named for one of the clones that
contributedto it, are diagrammed in Fig. 1.
Structure of a Complete Repetitive Element. The sequenceSAl1
contains a complete repetitive element, defined as anMgPa-like
sequence flanked on both sides by sequence with norelationship to
MgPa. Fig. 1B depicts the structure of thiscomplete repetitive
element sequence, 2272 bp long, withindividual regions of homology
to the MgPa operon shaded tomatch the corresponding regions of the
MgPa operon shownin Fig. 1A. The restriction fragments defined in
the originalDallo and Baseman study (11) are also indicated. We
havechosen to retain the Dallo and Baseman nomenclature andrefer to
regions present in the elements by the names of therestriction
fragments from which they come. For example,repetitive region B
contains sequence with similarity to nt
1635-2080 of the MgPa gene, which lies almost entirely withinthe
B restriction fragment. Our results confirm the presence
ofrepetitive sequence in restriction fragments B, E, F, and G
andthe absence from fragments A, D, H, and I. Analysis of acomplete
repetitive element within sequence SAl 1 allowed usto define three
additional restriction fragments downstream ofthe fragments tested
by Dallo and Baseman in ORF-3 of theMgPa operon. We refer to these
fragments as K, L, and M, andthe regions found within repetitive
DNA containing thesesequences as KL and LM (Fig. 1).The structure
of the SA 1 repeat indicates that this element
is a composite of noncontiguous regions from two genesencoded by
the MgPa operon, MgPa and ORF-3. The order ofthese regions as they
appear in SA 1 is B, EF, KL, G, and LM.We find an unusually
A+T-rich sequence (20% G+C) be-tween the EF and KL region; this
sequence shows no apparentsequence similarity to the MgPa operon.
During randomsequencing of this genome, this region was never
encounteredout of the context of the repetitive DNAs (6). The
A+Trichness of this DNA is expected in sequence that is no
longerfunctional; as such they are free to accumulate A and
Tnucleotides because of the mutational pressure thought to beacting
in M. genitalium (6, 8).
Partial Element Sequences. Analysis of other partial repet-itive
element sequences gave a minimum estimate of fivedistinct
repetitive elements in the M. genitalium genome. Thesequences in
Fig. 1 show five different nonrepetitive sequencesflanking B
regions of repeats. These partial sequences allowedus to define
some common structural features of repetitivesequences (Fig. 1 B
and C). Several of the structural featuresseen in SAl 1 were
present on all of the elements that wecharacterized. Five elements
begin with a B region. We havefour examples of B followed by EF. An
A+T-rich regionfollows EF in three cases. We encountered an
exception to thestructure of the complete element (SAl1) in the
clone HSA7.This does not contain an A+T-rich region downstream of
EF.Four elements contain A+T-rich sequence followed by KL.The
complete element sequence is our only example of theKL-G-LM
arrangement. Finally, we have three examplesending with region
LM.The exact endpoints of each region within the repeat vary
somewhat (Fig. 1). The approximate nucleotide ranges
corre-sponding to repeated DNA sequence are as follows: B
(nt1651-2082), EF (nt 3359-3942), G (nt 4368-4553), KL
(nt5527-6283), and LM (nt 6788-6938) (numbers refer to the
MgPaoperon sequence found in GenBank accession no. M31431).
Itshould be noted that, although a very small number of
nucleotidesfrom restriction fragments C and J (42 and 31,
respectively) arefound in repetitive elements, we have not included
these frag-ments in the names given to repetitive sequence
regions.Coding Potential of the Repeats. A pairwise alignment
was
performed on the repetitive DNA sequences. In each case,
thewindow of comparison was defined as the boundary of a
region(B-M). This comparison revealed several noteworthy featuresas
illustrated by the alignment shown in Fig. 2.ORFs are maintained
throughout individual regions that
share a high level of amino acid sequence similarity with
theMgPa or ORF-3 proteins. Deletions and insertions (withrespect to
the published MgPa gene sequence) that occur in thesequence of the
repetitive DNAs are almost always in multiplesof three nucleotides
(Fig. 2). In general, we find stop codonsnear the beginning and end
of the B, EF, KL, and LM regions.No methionine codons or other
start codons are found nearthe 5' end of the ORFs contained by the
various regions of theelements. Because these sequences lack
obvious translationinitiation signals and in view of the high
mutation rate thoughtto be acting in M. genitalium (8), it was
unexpected to findORFs throughout almost all of the individual
regions (the onlyexception is in region KL of sequence SG8). Even
morestriking was the maintenance of high levels of sequence
11830 Genetics: Peterson et al.
Dow
nloa
ded
by g
uest
on
May
30,
202
1
-
Proc. Natl. Acad. Sci. USA 92 (1995) 11831
A 29kD - POR MgPa-------- PC 49,35
ORF-3
4 4 4 4835 1380 2040 2573
A B C
* r11/4-A .12I I4 rz4 4 4 4-1
3174 3805 4285 4770 5195 5537 5960D E F G H J K
6815 7080L M
0 s t N cocc coc O 0FCO aT CD
B EF AT
LO CO LO CMU: oo )C o_2cu
SC2 1l\\E\\tz//L
coLO
?06
_ILc') co cl) co r-CO (D 4n 00 CONmc LC) r:lNaCDto-s rc,o CD
KL G LMCD
CD
A
"sN C C\J CO C)LO co v Nl- co
N X O 325 e
SG8I1_
IllilN NNOcl
CD O Cf mI\\X EScmccWfflmfl~~~/~ESH1 0
t,
a) Cc
N
I°r- ~ ~ CO
HSA7 .X I>1X157 C
HB7N-m
SD3a L,ZI-0ccCDOtiCD CD
HG4vE
FIG. 1. Structures of MgPa-related repetitive elements. The
sequence of the MgPa operon (GenBank accession no. M31431) is
compareddiagrammatically to nine homologous repetitive element
sequences. Accession numbers of repetitive sequences are listed in
parentheses followingthe names of the contributing genomic clones.
Sequences based upon data from more than one clone are named for
the clone listed first: SAil,SE6, and SF4 (U34967); SC2 and XF2
(U34968); SG8 and ESA4 (U34969); ESH10 and HSD3 (U34970); X5
(U01810); HSA7 and HE3 (U01766);HB7 (U02105); SD3A (U02157); HG4
(U02110). (A) Schematic representation of the three-gene MgPa
operon. Divisions (A-J) represent therestriction fragments
described by Dallo and Baseman (11). Positions of restriction sites
used by Dallo and Baseman are indicated by vertical arrows.The
coordinate of each site on the MgPa operon sequence is also shown.
Sites defining three additional fragments, K, L, and M, are also
indicated.Five shaded boxes indicate the regions that are found in
repetitive DNA elements. These are named B, EF, KL, G, and LM.
Positions of the PCRprimers used in the analysis of new isolates
are indicated by horizontal arrows. ORF-3, open reading frame 3.
(B) The structure of a completerepetitive element found in sequence
SAil. Regions of sequence similarity between repetitive elements
and the operon are indicated by identicalshading. Flanking
sequences with no similarity to the MgPa operon are indicated by
open boxes. Coordinates of the regions of homology withinthe MgPa
sequence are shown above the junctions between regions. Letters
below each box indicate regions of homology to the MgPa operon
(B,EF, KL, G, and LM) and the A+T-rich region of the repeat (AT).
(C) Schematic representation of eight partial repetitive element
sequencesanalyzed in this study. Boxes are shaded as in B to
indicate different regions of homology with the MgPa operon or are
open to indicate nonrepetitiveflanking sequence. The partial
elements are aligned to illustrate structural similarities to the
SAl 1 sequence. Coordinates of the regions of homologywithin the
MgPa sequence are shown as in B. The positions of single base
insertions and deletions within sequence SG8 are indicated by (+ 1)
and(-1) and the corresponding position in the homologous region of
the MgPa operon sequence is given.
identity to the corresponding region of the MgPa protein(amino
acid sequence identities range from 66% to 98%).Since Mycoplasma
promoter sequences are not well charac-terized, we cannot rule out
the possibility that these elementsare transcribed from their own
promoters or in a promiscuousfashion from neighboring promoters.
However, it is an attrac-tive hypothesis that these sequences are
not directly expressedas protein but are under indirect selective
pressure to maintainORFs with sequence similarity to the MgPa
protein.
Second, sequence polymorphisms are shared between twoor more
elements over varying lengths of sequence. Particu-larly noteworthy
is the fact that polymorphisms involvingdeletions and insertions
are shared between repetitive DNAs(Fig. 2). When we compared pairs
of repetitive elements in aparticular region, the two elements most
closely related to oneanother with respect to both sequence
identity and lengthpolymorphisms changed over the length of that
region. Third,some elements shared 100% nucleotide sequence
identity overregions as long as 107 bp, and repetitive sequence SC2
isidentical to the MgPa operon sequence for 333 bp. Given
theoverall relatedness of these sequences and assuming a
randomdistribution of mutational events, the absence of even
third
position changes is remarkable. All of these features
stronglyimply that recombination events occur between
repetitiveDNAs and likewise between repetitive DNAs and the
MgPacoding sequence.
Analysis of the MgPa Operon B Region Sequence from FourNew
Isolates of M. genitalium. To obtain additional evidenceconcerning
the role of repetitive sequences in genetic variationof the MgPa
operon, we made use of four new M. genitaliumstrains isolated from
urethral specimens (by J.S.J.) afterprimary culture in Vero monkey
kidney cells. A segment of theMgPa gene from these strains
containing the B region wasamplified by PCR. Oligonucleotide
primers were derived fromnonrepetitive sequences flanking the B
region, so that thefunctional MgPa gene would be amplified and the
repeatedsequences would not (see Fig. 1). Restriction enzyme
analysesof PCR-amplified sequences from the four new isolates
andother clinical specimens showed clear differences in theirMgPa
genes (24). The cloned BstNI fragments generated bycleavage of the
PCR product encompassing nt 1320-2327 fromthe MgPa operons of the
four new strains were sequenced.Comparison of these sequences to
one another showed largeregions of sequence identity flanking a
variable region of -380
BSA11 [
C
iFi.1
X./E
////////Ei~~~~~~~1
Genetics: Peterson et al.
Dow
nloa
ded
by g
uest
on
May
30,
202
1
-
Proc. Natl. Acad. Sci. USA 92 (1995)
1960.C. .CACC.. TC ....... .......... .A .. AGC .. . .........
.................... .......... ..C....... G37-US..........
........ ... .......... RepeatsA.. ACA.TC C.......J .. C.A.. ACA.TC
C 1......... ........ :A.. AGC. .. .......... ..........A.. ACA.TC
C ........... C. ...... MgPaA.. ACA.TC C ............ Operons
..... . . . . . . . . . . . ... . .. . . . . . . . . .CACTTTTAAA
AAACGACTTT GCTAAAAAGCJ
...AG.A.A...... ACA.
.......... .......ACA.
*-... .. ... .
.......... .......ACA.
.G...G..A.-.---------
.......... .......ACA.
... AG.A.A...... ACA.
r- t" 7k -_-_______.u ... U..A~~~ ------
- ... ...*
TAAGAACAGT AGTGGG---G
2007....... G. C..-.......... .. CT. .GA.T
G..T. ..GAGT G37-US
.......... ..TT..C.. Repeats
---A........ .C.
.T........ ... TT..C..G..T. ..GA.T
---A G. .T... C.. MgPa.......... ... TGAGT Operons
G. .T..GAGT
AGGTGAAGTT AGAGGCAGAGJ
FIG. 2. Sequence alignment of aportion of the B region of
fiverepetitive element sequences fromG37-US (ESH10, SAil, X5,
SG8,SC2) and MgPa genes from sixstrains. The G37-US and
repetitiveelement sequences are the same asin Fig 1. The new MgPa
gene se-quence GenBank accession num-bers follow the strain name in
pa-rentheses: M2288 (X91071);M2300 (X91072); M2321(X91073); M2341
(X91074);G37-DK (X91075). Identities tobases present in G37-US are
rep-resented by dots, and differenceswith respect to the sequence
ofG37-US by the correspondingbases. Dashes represent the ab-sence
of bases and are added foralignment of the sequences.
nt spanning positions 1710-2080. This corresponds closely tothe
B region found in the repetitive DNA elements. In thevariable
region, more than one clone derived from each strainwas sequenced
to exclude the possibility that reassortment ofsequences between
the MgPa gene and the repeated sequenceswas occurring as an
artifact of the PCR. Several differenceswithin the B-region DNA
sequence were identified in theMgPa gene from the Danish stock of
the laboratory strain(G37-DK) and the published MgPa sequence from
stocksmaintained in Chapel Hill (G37-US). Our analyses of
therepetitive elements of M. genitalium were performed on
DNAisolated from the same stock from which the sequence of
thepublished MgPa gene was determined (G37-US) (16).Almost the
entire sequence of the MgPa gene B regions of
five strains (G37-DK, M2288, M2300, M2321, and M2341)could be
generated by recombination between small portionsof several of the
repetitive element B regions sequenced fromG37-US (Fig. 3). The
sequence differences found in the Bregion DNA sequence of the
G37-DK and G37-US MgPagenes, spanning nt 1997-2010, can be
explained by a singlerecombination event involving the B region of
the repetitiveelement contained on clone X5. The sequence
similaritybetween X5 and both G37 strains extends 100 nt 5' and 40
nt
1649M23211
1649
M230 0 I.
16492341::::::::.........M2341 ......... , . . . . .
..,,FF......,.... l
1649M228 l----I
1649G37.- D''''''''"""""
K.................................................................
1'us5 1 -ursr:::::::::::::-::::::
::::::::::::::::::::::::::::::::::::::::::::::::::-:-:-.-:-.-.-
-:
3' to the polymorphic region, making it impossible to
identifythe exact endpoint of the putative sequence exchange.
DISCUSSION
Previous estimates from an analysis of a complete cosmidlibrary
suggested evidence for at least seven repetitive ele-ments with
sequence homology to the MgPa operon within theM. genitalium genome
(3). Here we have characterized at leastfive examples of this DNA,
including one complete repetitiveunit, which share several
structural features. In M. genitaliumG37, repetitive elements
appear to be -2200 nt long and to bea composite of noncontiguous
regions of the MgPa adhesinoperon. In all cases examined, we saw an
extremely A+T-richsequence (with no apparent homology to MgPa or
any othersequences from M. genitalium) present between the EF and
KLregions. This A+T-rich region averaged 266 nt in length
andpossessed several translational stop codons in all frames.The
genomic sequence of M. genitalium has been obtained
since the work described in this paper was performed and
willallow a detailed global analysis of the structure of this
familyof repeated sequences (25).
2089_ Z | ..~ .:
2099_'
2099 ESH10_ r;:;;1 SAl11
7 1ul1t- X5 >2099 SC2 i
I]G37-US
M23212099 M2300
M2341, ,,2.:::::::::: ...-::::::: M2288u1u
FIG. 3. Schematic representation of the MgPa operon B regions
from the four new M. genitalium strains and G37-US, showing
sequencesimilarities between the sequences present in the operon
and in the repetitive DNA sequences. Different shadings represent
repetitive DNAsequences or strains ofM. genitalium, as shown in the
key. Regions of 100% sequence identity between repetitive elements
and MgPa genes ofvariousstrains are indicated by identical shading.
Blocks of white represent regions of the particular MgPa gene,
which were not identical to any of theelements we have sequenced or
other MgPa B region sequence. In each case the repetitive element
whose sequence showed the longest stretchof sequence identity with
the operon was chosen. Block sizes are multiples of 10 nt.
TC.---GACAA..CAAGGCG
A._.CAA_CG
A..CAAGGCGA..CAAGGCGA..CAAGGCGA..CAAGGCGA..CAAGGCG
GAT------T
ESH10SAil
X5SG8SC2
M2288M2341M2300M2321G37-DKG37-US
ESH10SAi1
X5SG8SC2
M2288M2341M2300M2321G37-DKG37-US
1917TTC .TC. G. GC... A.TGTC
... .A.TGTC
....A..GTCC... A.TGTC.A.A.TGTC
....A..GTC
GAGTCAAAGT
1961....G..A......G..A......G..A....G..A..
.GT.T..AG.
....G..A..
....G..A..
.GT.T..AG.**....***-
....G..A..CACTAAAGCA
---------- - - -_--_A_---___--___ 5|
11832 Genetics: Peterson et al.
Dow
nloa
ded
by g
uest
on
May
30,
202
1
-
Proc. Natl. Acad. Sci. USA 92 (1995) 11833
Our sequence data strongly suggests that five regions of theMgPa
operon found in repetitive elements (B, EF, KL, G, andLM) recombine
with themselves and the MgPa operon, bothin the laboratory and in
vivo. These conclusions are based upon(i) the wide range of
similarities exhibited between pairs ofrepetitive elements over the
length of their entire sequence; (ii)the maintenance of ORFs in
individual regions of the ele-ments; (iii) the occurrence of
deletions and insertions inmultiples of three nucleotides; (iv) our
inability to suggest apossible mode for the translation of these
sequences, suggest-ing some indirect selective pressure for the
maintenance ofORFs with similarity to the MgPa sequence; and (v)
theconfirmation that four new isolates of M. genitalium and
anindependently maintained stock of the laboratory strainG37-DK
show polymorphisms within the B region of theirrespective MgPa
genes. Recombination events involving therepetitive elements and
the MgPa operon could have createdthese sequences, thus providing a
potential mechanism forantigenic variation of the surface proteins
encoded by theadhesin operon. This idea is further supported by the
fact thatat least some of the non-repeated regions of MgPa genes
arehighly or perfectly conserved among different strains.
It is of additional interest that the MgPa protein was shownto
be immunodominant in chimpanzees which had been in-fected with M.
genitalium (19). Opitz and Jacobs (26) mappedepitopes of adherence
inhibiting monoclonal antibodies tooverlapping eight-residue
synthetic peptides designed from theMgPa protein sequence. Four of
these epitopes mapped withinregions of the protein encoded by the B
and EF regions,suggesting that these regions may be surface-exposed
andimportant in adherence (26). The ability of M. genitalium tovary
the.sequence of these regions could enable this organismto survive
the host immune response.
Neisseria gonorrhoeae exhibits genetic variation in its
pilin,which functions as an adhesin, by a mechanism similar to
theone we are proposing for the MgPa operon of M. genitalium(15).
However, these systems differ in several ways: (i) theproteins
encoded by the MgPa operon are not pilins; (ii) "silentcopies" of
the pilin gene in N. gonorrhoeae are found in tandemarrays, whereas
repeats are scattered in the M. genitaliumgenome; and (iii) silent
copies in N. gonorrhoeae represent acontiguous segment of the pilin
gene, whereas the M. geni-talium repeat is composed of several
regions that are non-contiguous in the MgPa operon. The fact that
only selectedregions of the adhesin operon are found in the
repetitive DNAof M. genitalium rather than larger truncated gene
duplicationsmay be a reflection of selective pressure on this
organism tominimize its genome size.Our findings suggest that two
genes of M. genitalium, MgPa
and ORF-3, are subject to genetic variation by the
mechanismproposed here. Although no direct evidence that the
ORF-3product functions in cellular adhesion has been reported,
itshould be noted that the Mycoplasma pneumoniae homolog,ORF-6 in
the Pl adhesin operon; has been implicated in thecellular adhesion
process (27). The adhesin operon of M.genitalium is closely related
to the P1 adhesin operon of M.pneumoniae by several different
criteria (28). M. pneumoniaealso contains repetitive DNA in the
form of truncated copiesof the P1 adhesin operon, but no evidence
for compositeelements similar to those described here has been
reported.Recent evidence suggests that the ORF-6 gene of the
P1adhesin operon may be undergoing gene conversion eventswith
sequences present in one type of repetitive element (29).Since both
species maintain DNA in the form of truncatedcopies of their
adhesin operons, and these sequences seem toundergo recombination
events with the functional operons, itappears that the ability to
alter the appearance of adhesinproteins is necessary for
survival.
Our finding that M. genitalium repeated sequences appear tobe
involved in genetic variation of the adhesin MgPa helps
torationalize the presence of repeats in a minimal genome.
Thisorganism is extremely fastidious as a direct consequence of
itssmall genome size and in nature must obtain essential
nutrientsfrom its mammalian host. It seems plausible that an
efficientmechanism to optimize cellular adhesion and to evade the
hostimmune response is therefore a necessary part of such
agenome.
Note Added in Proof. Comparison of our data to the
completesequence ofM. genitalium indicates that the sequence SC2
was derivedfrom a clone that is a recombinant between the MgPa
operon and theMgPa repeat at positions 167174-169797 in the genome
(25). Genomiccoordinates of the sequences presented here are
available on request(send electronic mail to
[email protected]).
C.C.B. and S.N.P. contributed equally to this manuscript. We
thankJanne Cannon and Kevin Dybvig for helpful comments. This work
wassupported by National Institutes of Health Grants A108998
andGM21313 to C.A.H. and A133161 to K.F.B.
1. Colman, S. D., Hu, P.-C., Litaker, W. & Bott, K. F.
(1990) Mol.Microbiol. 4, 683-687.
2. Krawiec, S. & Riley, M. (1990) Microbiol. Rev. 54,
502-539.3. Lucier, T. S., Hu, P.-Q., Peterson, S. N., Song, X.-Y.,
Miller, L.,
Heitzman, K., Bott, K. F., Hutchison, C. A., III, & Hu,
P.-C.(1994) Gene 150, 27-34.
4. Su, C. J. & Baseman, J. B. (1990) J. Bacteriol. 172,
4705-4707.5. Morowitz, H. J. (1984) Isr. J. Med. Sci. 20,
750-753.6. Peterson, S. N., Hu, P.-C., Bott, K. F. & Hutchison,
C. A., III
(1993) J. Bacteriol. 175, 7918-7930.7. Bailey, C. C. & Bott,
K. F. (1994) J. Bacteriol. 176, 5814-5819.8. Maniloff, J. (1992) in
Phylogeny of Mycoplasmas, eds. Maniloff,
J., McElhaney, R. N., Finch, L. R. & Baseman, J. B. (Am.
Soc.Microbiol., Washington, DC), pp. 549-559.
9. Ferrell, R. V., Heidari, M. B., Wise, K. S. & McIntosh,
M. A.(1989) Mol. Microbiol. 3, 957-967.
10. Wenzel, R. & Herrmann, R. (1988) Nucleic Acids Res.
16,8337-8350.
11. Dallo, S. F. & Baseman, J. B. (1991) Microb. Pathog. 10,
475-480.12. Haas, R. & Meyer, T. F. (1986) Cell 44, 107-115.13.
Meier, J. T., Simon, M. I. & Barbour, A. G. (1985) Cell 41,
403-409.14. Moxon, E. R., Rainey, P. B., Nowak, M. A. &
Lenski, R. E.
(1994) Curr. Biol. 4, 24-33.15. Robertson, B. D. & Meyer, T.
F. (1992) Trends Genet. 8,422-427.16. Inamine, J. M., Loechel, S.,
Collier, A. M., Barile, R. M. & Hu,
P.-c. (1989) Gene 82, 259-267.17. Collier, A. M., Carson, J. L.,
Hu, P.-C., Hu, S. S., Huang, C. H.
& Barile, M. F. (1990) Zbl. Bakt. S20, 730-732.18. Mernaugh,
G. R., Dallo, S. F., Holt, S. C. & Baseman, J. B.
(1993) Clin. Infect. Dis. 17, S69-78.19. Tully, J. G.,
Taylor-Robinson, D., Rose, D. L., Furr, P. M.,
Graham, C. E. & Barile, M. F. (1986) J. Infect. Dis. 153,
1046-1054.
20. Jensen, J. S., Hansen, H. T. & Lind, K. (1994) Int.
Organ.Mycoplasmol. Lett. 3, 143-144 (abstr.).
21. Devereaux, J., Haberli, P. & Smithies, 0. (1984) Nucleic
AcidsRes. 12, 387-395.
22. Staden, R. (1982) Nucleic Acids Res. 10, 4731-4751.23.
Pearson, W. R. & Lipman, D. J. (1988) Proc. Natl. Acad. Sci.
USA
85, 2444-2448.24. Jensen, J. S. (1994) Int. Organ. Mycoplasmol.
Lett. 3, 429-430
(abstr.).25. Fraser, C. M., Gocayne, J. D., White, O., Adams, M.
D., Clayton,
R. A., et al. (1995) Science 270, 397-403.26. Opitz, 0. &
Jacobs, E. (1992) J. Gen. Microbiol. 138, 1785-1790.27.
Layh-Schmitt, G., Hilbert, H. & Pirkl, E. (1995) J. Bacteriol.
177,
843-846.28. Razin, S. & Jacobs, E. (1992) J. Gen. Microbiol.
138, 407-422.29. Ruland, K., Himmelreich, R. & Herrmann, R.
(1994)J. Bacteriol.
176, 5202-5209.30. Hayf lick, L. (1965) Tex. Rep. Biol. Med. 23,
285-303.31. Jensen, J. S., 0rsum, R., Dohn, B., Uldum, S., Worm,
A.-M. &
Lind, K. (1993) Genitourin. Med. 69, 265-269.
Genetics: Peterson et al.
Dow
nloa
ded
by g
uest
on
May
30,
202
1