Top Banner
THE JOURNAL OF BIO~ICAL CHEMISTRY 0 1994 by The American Society for Biochemistry and Molecular Biology, Inc Vol. 269, No. 46, Issue of November 18, pp. 28803-28808, 1994 Printed in U.S.A. The Mouse Gene Coding for High Mobility Group 1 Protein (HMGl)* (Received for publication, July 27, 1994, and in revised form, August 29, 1994) Simona FerrariSO, Lorenza RonfaniS, Sabina CalogeroO, and Marco E. Bianchi+H From $DZBZT, San Raffaele Scientific Institute, via Olgettina 58, 20132 Milano, Italy and the 5Dipartimento di Genetica e di Biologia dei Microrganismi, Universita di Milano, via Celoria 26, 20133 Milano, Italy We have isolated an active gene encoding the mouse HMGl protein among a multitude of cross-hybridizing sequences, which most likely are retrotransposed pseu- dogenes. The hmgl gene contains five exons, of which the first is not translated, and the last contains a long 3’-untranslated sequence and three alternative polyade- nylation sites. We found no evidence for a sequence en- coding a membrane localization signal in the hmgl gene, despite the presence of HMGl protein on the surface of several cell types. The hmgl promoter coincides with a CpG island, contains no TATA sequence, and drives the expression of reporter genes placed under its control. The hmgl gene may be a member of a family of closely related genes but appears to be the major or the only active gene coding for HMGl protein. High mobility group 1 protein (HMGl)’ is a very abundant and highly conserved chromatin protein, which is present in all vertebrate nuclei. HMG1-like proteins exist in invertebrates, yeast, protozoa, and plants and are probably present in all eukaryotic cells (reviewed in Refs. 1 and 2). The ubiquity and sequence conservation of HMG1-like proteins suggest that they play fundamental functions; roles in DNA replication, chroma- tin assembly, and transcription have been proposed but so far have not been proved unequivocally. In vitro mammalian HMGl and its two DNA-binding do- mains bind with low affinity and no specificity to single- stranded, linear duplex and supercoiled DNA (3, 4). They also bind with high specificity and in a sequence-independent man- ner to DNA containing a sharp bend or kinks (5-7). More gen- erally, HMGl has the ability to introduce bends or kinks into linear DNA and therefore is functionally (but not structurally) similar to the prokaryotic proteins HU and IHF (reviewed in Ref. 8). The main role of HMGl may be to facilitate the forma- tion of specificnucleoproteincomplexes (9) and perhaps to modulate the structure of chromatin (10). Finally, HMGl has been shown to be present on the surface of neurons and other cell types (111, where it is probably bound to the polysaccharide moiety of proteoglycans and may play roles in adhesion and tissue remodeling. Several cell types also display on their surface other proteins that differ from HMGl only by a few amino acids (12). Thus, the hmgl gene may be a member of a gene family. Istituto Superiore di Sanita, Progetto AIDS 1994 (to M. E. B.). The costs * This work was supported by Telethon Grant A.07 and funds from of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “advertise- ment” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. Olgettina 58, 4”piano A2, 20132Milano,Italy.Tel.:39-2-26434780; 7 To whom correspondence should be addressed: DIBIT, via Fax: 39-2-26434767; E-mail: [email protected]. The abbreviations used are: HMG, high mobility group; HMGI-R, HMGI-related; kb, kilobase(s); nt, nucleotide(s); PCR, polymerase chain reaction; RSV, Rous sarcoma virus. Despite this wealth of biochemical information, the analysis of the physiological role of HMGl has been impaired by the lack of mutations in the gene coding for it. In fact, whereas HMGl cDNAs have been cloned from a variety of sources (11, the genomic locus coding for HMGl had not been identified so far. In this paper we describe the identification and the organiza- tion of the mouse hmgl gene. MATERIALS AND METHODS OLigonucLeotides and Enzymes-All oligonucleotides were purchased from Genset. DNA modification and restriction enzymes were from Boehringer Mannheim, Promega, and New England Biolabs. PCR-Intron 3 and intron 4 of gene hmgl were obtained by PCR on genomic DNA from mouse NIH3T3 cells,using the following oligonucleo- tides: INT3for(coding strand), 5’-ACCCAAGAGGCCTCCG3‘;INT3rev (non-coding strand), 5’-CAAGAAGAAGGCCGAACT-3’; INT4for (coding strand), 5’-GGAGAAGTATGAGAAGGT-3’; INT4rev (non-coding strand), 5’-TGTAGGCAGCAATATCC-3’. PCR mixtures (50 pl) contained 50 pmol each of oligonucleotides INT3for and INT3rev for amplification of intron 3 and oligonucleotides INT4for and INT4rev for amplification of intron 4,0.2 m~ dNTPs, 5 pl of Taq polymerase 10 x buffer, 20 ng of mouse genomic DNA, and 1 unit of Taq polymerase (Promega). Thirty-five cycles of dena- turation (60 s at 94 “C), annealing (60 s at 55 “C for INT3 and 60 s at 51 “C for INT4), and extension (60 s at 72 “C) were performed on a Perkin-Elmer instrument. Isolation of Genomic hmgl C1ones-lO6 phage plaques of the 129SV mouse genomic library in the A-FWI vector (5,105primary recombi- nant phages, insert size of 9-22 kb,provided by Stratagene) were screened initially with the3’-untranslated sequence of the mouse HMGl cDNA, obtained byPCR and labeled by random priming. The same number of plaques was later screened with two probes obtained from PCR products INT3 and INTI. Fragments from the positive clones weresubcloned into Bluescript KS(+) and sequencedwithT7 DNA polymerase (Pharmacia Biotech Inc.). Plasmid Construction-To obtain the pHMG1-neo plasmid, the 4.5-kb NsiI-NsiI fragment (containing a 2-kb region upstream of the transcription start site, exon 1, intron 1, and a part of exon 2) and a 2-kb XhoI-XhoI fragment (containing the aph gene from Tn5 transposon) were cloned into the pBlueScript KS(+) vector. This plasmid expresses a chimeric protein in which the first 2 residues of the bacterial aph gene product are substituted by 15 amino acids from HMGl protein, followed by 11 amino acids coded by polylinker sequences. The phBamHI-neo and the phEcoRI-neo constructs were obtained from pHMG1-neo by internal deletion of a 1.5-kb BamHI-BamHI fragment and a 3-kbEcoRI- EcoRI fragment, respectively. Cell Culture and IFansfection-The mouse NIH3T3 cell line was grown in high glucose Dulbecco’s modified Eagle’s medium, supple- mented by 10% newborn calf serum and antibiotics. Cells were trans- fected by calcium phosphate co-precipitation in 6-cm dishes as indicated in the legend to Fig. 5. RNA Extraction, Primer Extension, and RNase Protection-Total RNA was prepared as described by Chomczynski and Sacchi (13). For primer extension, the 32P-labeled oligonucleotide MnlI (5‘-GCCTCTCG- GCTTCTTAG-3’) was hybridized to20 pg of total RNA extracted from NIH3T3 cells in 10 pl of hybridization buffer containing 10 m~ Tris-C1, pH 7.5, 2 mM EDTA, and 60 m~ NaCl. After 3 h of hybridization at 42 “C, 40 pl of reverse transcriptase reaction mixture containing 10 mM Tris-C1, pH 8.4,5 mM MgCI,, 10 mM dithiothreitol, 1 m~ dNTPs, 20 units of RNase inhibitor, and 12.5 units of avian myeloblastosis virus reverse transcriptase (Boehringer Mannheim) were added to the samples, and incubation was continued for 30 min at 42 “C. Reactions were stopped 28803
6

THE OF Vol. 46, for in U.S.A. Gene Coding High Mobility ... · erally, HMGl has the ability to introduce bends or kinks into ... Olgettina 58, 4” piano A2, 20132 Milano, Italy.

May 02, 2018

Download

Documents

phungkhue
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: THE OF Vol. 46, for in U.S.A. Gene Coding High Mobility ... · erally, HMGl has the ability to introduce bends or kinks into ... Olgettina 58, 4” piano A2, 20132 Milano, Italy.

THE JOURNAL OF B I O ~ I C A L CHEMISTRY 0 1994 by The American Society for Biochemistry and Molecular Biology, Inc

Vol. 269, No. 46, Issue of November 18, pp. 28803-28808, 1994 Printed in U.S.A.

The Mouse Gene Coding for High Mobility Group 1 Protein (HMGl)* (Received for publication, July 27, 1994, and in revised form, August 29, 1994)

Simona FerrariSO, Lorenza RonfaniS, Sabina CalogeroO, and Marco E. Bianchi+H From $DZBZT, San Raffaele Scientific Institute, via Olgettina 58, 20132 Milano, Italy and the 5Dipartimento di Genetica e di Biologia dei Microrganismi, Universita di Milano, via Celoria 26, 20133 Milano, Italy

We have isolated an active gene encoding the mouse HMGl protein among a multitude of cross-hybridizing sequences, which most likely are retrotransposed pseu- dogenes. The hmgl gene contains five exons, of which the first is not translated, and the last contains a long 3’-untranslated sequence and three alternative polyade- nylation sites. We found no evidence for a sequence en- coding a membrane localization signal in the hmgl gene, despite the presence of HMGl protein on the surface of several cell types. The hmgl promoter coincides with a CpG island, contains no TATA sequence, and drives the expression of reporter genes placed under its control. The hmgl gene may be a member of a family of closely related genes but appears to be the major or the only active gene coding for HMGl protein.

High mobility group 1 protein (HMGl)’ is a very abundant and highly conserved chromatin protein, which is present in all vertebrate nuclei. HMG1-like proteins exist in invertebrates, yeast, protozoa, and plants and are probably present in all eukaryotic cells (reviewed in Refs. 1 and 2). The ubiquity and sequence conservation of HMG1-like proteins suggest that they play fundamental functions; roles in DNA replication, chroma- tin assembly, and transcription have been proposed but so far have not been proved unequivocally.

I n vitro mammalian HMGl and its two DNA-binding do- mains bind with low affinity and no specificity to single- stranded, linear duplex and supercoiled DNA (3, 4). They also bind with high specificity and in a sequence-independent man- ner to DNA containing a sharp bend or kinks (5-7). More gen- erally, HMGl has the ability to introduce bends or kinks into linear DNA and therefore is functionally (but not structurally) similar to the prokaryotic proteins HU and IHF (reviewed in Ref. 8). The main role of HMGl may be to facilitate the forma- tion of specific nucleoprotein complexes (9) and perhaps to modulate the structure of chromatin (10).

Finally, HMGl has been shown to be present on the surface of neurons and other cell types (111, where it is probably bound to the polysaccharide moiety of proteoglycans and may play roles in adhesion and tissue remodeling. Several cell types also display on their surface other proteins that differ from HMGl only by a few amino acids (12). Thus, the hmgl gene may be a member of a gene family.

Istituto Superiore di Sanita, Progetto AIDS 1994 (to M. E. B.). The costs * This work was supported by Telethon Grant A.07 and funds from

of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “advertise- ment” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

Olgettina 58, 4” piano A 2 , 20132 Milano, Italy. Tel.: 39-2-26434780; 7 To whom correspondence should be addressed: DIBIT, via

Fax: 39-2-26434767; E-mail: [email protected]. The abbreviations used are: HMG, high mobility group; HMGI-R,

HMGI-related; kb, kilobase(s); nt, nucleotide(s); PCR, polymerase chain reaction; RSV, Rous sarcoma virus.

Despite this wealth of biochemical information, the analysis of the physiological role of HMGl has been impaired by the lack of mutations in the gene coding for it. In fact, whereas HMGl cDNAs have been cloned from a variety of sources (11, the genomic locus coding for HMGl had not been identified so far. In this paper we describe the identification and the organiza- tion of the mouse hmgl gene.

MATERIALS AND METHODS OLigonucLeotides and Enzymes-All oligonucleotides were purchased

from Genset. DNA modification and restriction enzymes were from Boehringer Mannheim, Promega, and New England Biolabs.

PCR-Intron 3 and intron 4 of gene hmgl were obtained by PCR on genomic DNA from mouse NIH3T3 cells, using the following oligonucleo- tides: INT3for (coding strand), 5’-ACCCAAGAGGCCTCCG3‘; INT3rev (non-coding strand), 5’-CAAGAAGAAGGCCGAACT-3’; INT4for (coding strand), 5’-GGAGAAGTATGAGAAGGT-3’; INT4rev (non-coding strand), 5’-TGTAGGCAGCAATATCC-3’. PCR mixtures (50 pl) contained 50 pmol each of oligonucleotides INT3for and INT3rev for amplification of intron 3 and oligonucleotides INT4for and INT4rev for amplification of intron 4,0.2 m~ dNTPs, 5 pl of Taq polymerase 10 x buffer, 20 ng of mouse genomic DNA, and 1 unit of Taq polymerase (Promega). Thirty-five cycles of dena- turation (60 s at 94 “C), annealing (60 s at 55 “C for INT3 and 60 s at 51 “C for INT4), and extension (60 s at 72 “C) were performed on a Perkin-Elmer instrument.

Isolation of Genomic hmgl C1ones-lO6 phage plaques of the 129SV mouse genomic library in the A-FWI vector (5,105 primary recombi- nant phages, insert size of 9-22 kb, provided by Stratagene) were screened initially with the 3’-untranslated sequence of the mouse HMGl cDNA, obtained by PCR and labeled by random priming. The same number of plaques was later screened with two probes obtained from PCR products INT3 and INTI. Fragments from the positive clones were subcloned into Bluescript KS(+) and sequenced with T7 DNA polymerase (Pharmacia Biotech Inc.).

Plasmid Construction-To obtain the pHMG1-neo plasmid, the 4.5-kb NsiI-NsiI fragment (containing a 2-kb region upstream of the transcription start site, exon 1, intron 1, and a part of exon 2) and a 2-kb XhoI-XhoI fragment (containing the aph gene from Tn5 transposon) were cloned into the pBlueScript KS(+) vector. This plasmid expresses a chimeric protein in which the first 2 residues of the bacterial aph gene product are substituted by 15 amino acids from HMGl protein, followed by 11 amino acids coded by polylinker sequences. The phBamHI-neo and the phEcoRI-neo constructs were obtained from pHMG1-neo by internal deletion of a 1.5-kb BamHI-BamHI fragment and a 3-kb EcoRI- EcoRI fragment, respectively.

Cell Culture and IFansfection-The mouse NIH3T3 cell line was grown in high glucose Dulbecco’s modified Eagle’s medium, supple- mented by 10% newborn calf serum and antibiotics. Cells were trans- fected by calcium phosphate co-precipitation in 6-cm dishes as indicated in the legend to Fig. 5.

RNA Extraction, Primer Extension, and RNase Protection-Total RNA was prepared as described by Chomczynski and Sacchi (13). For primer extension, the 32P-labeled oligonucleotide MnlI (5‘-GCCTCTCG- GCTTCTTAG-3’) was hybridized to 20 pg of total RNA extracted from NIH3T3 cells in 10 pl of hybridization buffer containing 10 m~ Tris-C1, pH 7.5, 2 mM EDTA, and 60 m~ NaCl. After 3 h of hybridization at 42 “C, 40 pl of reverse transcriptase reaction mixture containing 10 mM Tris-C1, pH 8.4,5 mM MgCI,, 10 mM dithiothreitol, 1 m~ dNTPs, 20 units of RNase inhibitor, and 12.5 units of avian myeloblastosis virus reverse transcriptase (Boehringer Mannheim) were added to the samples, and incubation was continued for 30 min at 42 “C. Reactions were stopped

28803

Page 2: THE OF Vol. 46, for in U.S.A. Gene Coding High Mobility ... · erally, HMGl has the ability to introduce bends or kinks into ... Olgettina 58, 4” piano A2, 20132 Milano, Italy.

28804 The hmgl Gene

HMGl cONA

3 4

1

-14.1

- 8.4 - 7.2 - 6.4 - 5.7 - 4.8 - 4.3

- 3.6

- 2.3

- 1.9

1.4 - 1.3 -

FIG. 1. Southern blot analysis of mouse genomic DNA. High molecular weight DNA from mouse teratocarcinoma F9 cells was di- gested with HindIII (lanes 1 and 3 ) and BamHI (lanes 2 and 4), sepa- rated in a 0.8% agarose gel, transferred to a GeneScreen Plus mem- brane, and probed with 32P-labeled full-length HMGl cDNA (lanes 1 and 2) or with the INT4 fragment (lanes 3 and 4) .

by the addition of 1 1.11 of 0.5 M EDTA, reverse transcripts were ethanol- precipitated and run on a 6.5% sequencing gel. For RNase protection, total RNAs extracted from NIH3T3 cells 48 h after transfection were treated with RNase-free DNase (Boehringer Mannheim) and analyzed with a ribonuclease protection assay kit (Ambion). Approximately lo5 cpm of riboprobe (obtained by in vitro transcription of a 1.3-kb NotI-PstI fragment derived from the pHMG1-neo plasmid) was incubated over- night at 45 “C with 10 pg of total RNA. Following RNase treatment, protected fragments were separated on a 6.5% sequencing gel.

RESULTS AND DISCUSSION The Mouse Genome Contains a Large Number of Sequences

Similar to the HMGl cDNA-The presence of several se- quences with homology to the HMGl cDNAhas been previously reported in human and pig genomes (14, 15). Southern blot analysis using as probes different regions of the rat HMGl cDNA (HMGlboxA and the 3“untranslated region) confirmed the presence of many HMG1-related sequences in mouse, rat, and hamster (Fig. 1 and data not shown). We reasoned that hmgl and related genes might be distinguished from processed retropseudogenes by the presence of introns. We performed polymerase chain reactions on total mouse genomic DNA using as primers several pairs of oligonucleotides corresponding to the mouse cDNA sequence (161, but despite the extensive var- iation of reaction conditions the PCR products were always identical to those obtained using the cDNA clone as template (data not shown). We concluded that the mouse genome must contain a large number of processed retropseudogenes that compete with the putative intron-containing gene for amplifi- cation or that the presumptive introns are too long for efficient amplification.

We then screened a phage bank of mouse genomic DNA for plaques hybridizing to the mouse cDNA sequence and repeated the PCR reactions with several pairs of primers on the material picked from the positive plaques. We obtained 240 positive plaques from 4 genome equivalents, but none of them yielded PCR products longer than the cDNAcontrols. We continued the analysis only on 15 plaques that contained both ends of the cDNA sequence and gave no PCR products on amplification. We obtained partial sequences from 9 phages; all were highly sim-

ilar to the cDNA sequence but contained no intron. Seven of them cannot code for HMGl protein because they contain pre- mature stop codons or frameshifts. Phage A-154 contains no stop codon or frameshift, but its conceptual translation differs from HMG1. Phage A-227 contains a sequence potentially cod- ing for HYG1, but we did not verify whether this particular clone corresponded to the hmgl gene because an alternative strategy had yielded a more promising candidate (see ahead).

Isolation and Characterization of the Hmgl Gene-Recently Shirakawa and Yoshida (17) reported the cloning of the func- tional human gene coding for HMG2 protein, which is closely related to HMG1. We supposed that genes that belong to the same family and that show high sequence similarity might have a conserved intron-exon organization, as is the case for the globin family. We then decided to test whether the pre- sumptive hmgl gene contained introns in the same position as the hmg2 gene.

We designed two couples of oligonucleotides with sequences corresponding to the HMGl cDNA sequence flanking the pre- sumptive intron 3 and intron 4 and ending at their 3’ terminus with one or two bases corresponding to the canonical splicing junction 5’-GT and 3’-CT. PCR reactions on genomic mouse DNA yielded two fragments of about 350 and 850 base pairs, which we named INT3 and INT4, respectively. Control PCRs performed on the cDNA clone gave no amplified band. Direct sequencing of the 350-base pair fragment showed that it had features characteristic of introns (AT-rich sequence, lack of any ORF, and presence of 3”splicing consensus site). Finally, a PCR reaction performed with the primers 5’ to the presumptive intron 3 and 3’ to the presumptive intron 4 gave an amplified band of 1.4 kb, indicating that INT3 and INT4 are colinear on the mouse genome.

Evidence that INT3 and INT4 fragments were represented by a single-copy sequence in the mouse genome was provided by Southern blot analysis of mouse genomic DNA, shown in Fig. 1. We then isolated from the phage mouse genomic library four positive clones (HMG1-@1 to HMG1-@4), which hybridized to both INT3 and INT4 probes and had,overlapping restriction maps (Fig. 3). HindIII and Not1 fragments from phages HMG141 and HMG144 were subcloned in plasmid pBlueScript E(+).

Fig. 2 shows the sequence we obtained, which includes entirely the 5’-untranslated region and the coding region of the mouse HMGl cDNA (16). The cDNA sequence contains four base substitutions, one of which causes the conservative re- placement of a glutamic acid for an aspartic acid residue in the acidic tail of HMGl protein. These sequence divergences most probably represent genetic polymorphisms between inbred mouse strains; the cDNA was derived from P19 cells (corre- sponding to the C3H strain), while our gene was isolated from a bank made with the DNA of SV129 mice.

Structure of the Mouse Hmgl Gene-The hmgl gene contains five exons, as indicated by the comparison of the cDNA and genomic sequences (Fig. 3). An untranslated first exon falls in a region of very high C and G content with the features of a CpG island. Exon 2, which is located about 2.5 kb 3’ to exon 1, contains the translation start site. The DNA binding domain A is encoded by exon 2 and exon 3, and the DNA binding domain B is encoded by exons 3,4, and 5. The relative positions of the introns within the segments coding for the two HMG boxes are different, which is somewhat unexpected if one supposes that the vertebrate hmgl gene arose by internal duplication of an ancestral gene containing a single HMG box, similar to the modern gene for HMG1-like proteins in lower eukaryotes, plants, and insects (18-22). The terminal acidic tail is encoded

Page 3: THE OF Vol. 46, for in U.S.A. Gene Coding High Mobility ... · erally, HMGl has the ability to introduce bends or kinks into ... Olgettina 58, 4” piano A2, 20132 Milano, Italy.

The hmgl Gene 28805

by exon 5, which is the longest and contains a long untrans- lated region and multiple polyadenylation sites.

In order to map the transcription start site of the hmgl gene, we performed a primer extension on total RNA extracted from NIH3T3 cells (Fig. 4), using an oligonucleotide that maps im- mediately downstream of the translation start site in exon 2. One major extended product and two minor ones were ob- tained. Two of them are not far upstream of the 5' terminus of the longest known cDNA for mouse HMGl (16); the third, shorter extended product may correspond to a weak start site or to a strong pause site for reverse transcriptase.

We found no evidence of alternatively spliced variants of the HMGl mRNAs, and we found no sequence in the genomic locus

that could code for membrane-targeting signals. Thus, the presence of the HMGl protein outside of the cell membrane in several cell types probably does not depend on classical protein secretion routes.

The Promoter of the hmgl Gene--To qualify as the authentic hmgl gene, the genomic fragment must contain a region in cis capable of driving its transcription. We constructed a plasmid, pHMG1-neo, in which a fragment encompassing 2 kb upstream of the transcription start sites, exon 1, the entire intron 1, and part of exon 2, was fused in frame to the prokaryotic aminogly- coside 3'-phosphotransferase gene, uph. The expression of this construct results in the production of a chimeric protein capa- ble of inactivating the G418 antibiotic. The plasmid was intro-

Page 4: THE OF Vol. 46, for in U.S.A. Gene Coding High Mobility ... · erally, HMGl has the ability to introduce bends or kinks into ... Olgettina 58, 4” piano A2, 20132 Milano, Italy.

28806 The hmgl Gene

A H N N N H H H H H

FIG. 3. Structure of the hmgl gene. A, exon-intron organization of the hmgl gene; exons are indicated by boxes (open for translated regions; filled for untrans- lated regions). H, Hind111 sites; N , Not1

A and HMG box B. The locations of the sites. B, sequence alignment of HMG box

introns are indicated by arrows. The po- sition of the intron in the single HMG box coding sequence in the TCF-I gene (28) is different yet again.

A r 3T3 A G C T

1

I I

1 2 3 4 5

I I HMG1-@1 HMGl -@Z I I

I I HMG143 U 1 kb

I HMG1-@4 I

B intron 2

JI HMG-box A GDPKKPRGKMSSYAFFVQTCREEHKKKHPDASVNFSEFSKKCSERWKTMSAKEKGKFEDMAKADKARYEREMKTYIPP

intron 3

JI intron 4

JI HMG-box B KDPNAPKRPPSAFFLFCSEPRPKIKGEHPGLSI--GDVAKKLGE~NNTAADDKQPYEKKAAKLKEKyEKD~y~K

+ L J

B ~ ~ ~ ~ ~ t ~ t ~ ~ t ~ ~ ~ ~ c ~ n t g g q a q q q a a c q q g ~ n g c g g g c c t g g t g o g t t ~ c ~ ~ t a ~ ~ a ~ ~ a ~ ~ c ~ c ~ c t ~

q~tqgoqoqtoatqtta~agaqcqqoqaqogtqogg~qgctq~gt~tgcgtcccgctctcac~gccattgcagtac~ttgo

~~~~~at~qaqo~aq~qrcqqqqcaoqcqcqaq~cqqacgggcactgggcqactctqtgcctcgcggagq

' o ~ q q q ~ ~ ~ r a a q q o q o t ~ ~ t a a q ~ ~ q ~ ~ q ~ q ~ q q ~ a ~ ~ a atcafl 0

FIG. 4. Primer extension of the HMGl mR.N.4. A, 20 pg of total RNA from NIH3T3 cells was hybridized with oligonucleotide MnlI, mapping to exon 2. B, major predicted transcription start sites are indicated by closed dots; an open dot indicates a possible weaker start site. An asterisk indicates the start of the longest mouse HMGl cDNA clone. CCAAT sites are underlined.

duced in NIH3T3 cells, and G418-resistant clones were selected (Fig. 5). Plasmids pRSVneo and pSV2ne0, in which the aph gene is under the control of the Rous sarcoma virus LTR or the SV40 promoter, respectively, were used as positive controls. In several independent experiments, transfection with the pHMG1-neo plasmid gave rise to a large number of resistant clones, almost comparable with the number of clones arising

after transfection with the positive controls. Transfection with the bacterial aph gene with no eukaryotic sequences in cis, or with promoterless derivatives of the pHMG1-neo plasmid, gave rise to no resistant clones or to very few. Northern blot analysis of total RNA extracted from three clones stably transfected with pHMG1-neo and three clones stably transfected with pRS- Vneo confirmed that transcripts of the expected length and containing the aph sequence were indeed present (Fig. 5B and results not shown).

In order to estimate the strength of the hmgl promoter, we compared by RNase protection the relative abundance of the transcripts produced under its control with the abundance of those produced under the control of the RSV long terminal repeat (Fig. 6). Total RNAs were isolated from NIH3T3 cells 48 h after transfection. Transcripts from the endogenous hmgl genes give rise to a 45-nt band (not shown in Fig. 6), the RSV- neo mRNA gives rise to a 177-nt band, and the HMG1-neo chimeric mRNA gives rise to a 284-nt band. The intensities of the signals from a cotransfection (lane 1 ) point to a high level of expression of the chimeric mRNA and indirectly to a high activity of the hmgl promoter.

The region immediately upstream of the proposed transcrip- tion start sites does not contain any sequence conforming to the consensus TATAA, this promoter is probably TATA-less, as is often the case for promoters of housekeeping genes (23). How- ever, it does contain several CAAT boxes, which might promote transcription by binding any one of several factor types (24).

Since its promoter can direct the expression of reporter genes placed under its control, we have no doubt that the gene we have isolated is active and encodes the HMGl protein. The hmgl gene is expressed at high but not identical levels in all tissues of the mouse embryo a t day 10.5 post coitum'; likewise, we and others found in all tissues that were investigated the typical set of three HMGl mRNAs due to differential usage of three alternative polyadenylation signals (14, 25). Being quite active and compact, the hmgl promoter should be usefbl to direct the ubiquitous expression of transgenes. How Many hmgl Genes Are Present?-Although the data

reported in the preceding paragraphs indicate that we have isolated a bona fide hmgl gene, additional hmgl-related genes may exist, especially if they are intronless like the genes coding for SRY and SOX HMG box proteins (26). We did not find direct evidence for additional active genes; the fragments amplified by PCR were all colinear and belonged to the same hmgl gene

M. Gulisano and M. E. Bianchi, unpublished results.

Page 5: THE OF Vol. 46, for in U.S.A. Gene Coding High Mobility ... · erally, HMGl has the ability to introduce bends or kinks into ... Olgettina 58, 4” piano A2, 20132 Milano, Italy.

The hmgl Gene 28807

number of G418-resistant dons 1 2 3 4

expl exp II r-.---.-. -"-q A

,1 kb ,

RI RI ......................................................... 1""" ......................................................... 1 AEcOFU-~XO 0

\ \ ~ : ~ ~ : ~ : i ~ : i ~ : : i ~ ~ RSV-neo 65

f 2 8 S

f l 8 S

W / / / R U Sv2qeo > 200 L -

FIG. 5. The hmgl promoter is active in NIH3T3 cells. A, NIH3T3 cells were transfected in duplicate with the indicated plasmids. In experiment I 3.5 x lo5 cells were transfected with 20 pg of the expression vectors, and in experiment I1 7 x lo5 cells were transfected with 5 pg of the expression vectors. Stably transfected clones were selected by adding 900 pg/ml G418 24 h after transfection and were counted 11 days later. Plasmid pHMG1-neo contains the aph gene driven by the NsiI-NsiI promoter fragment; plasmids pMarnHI-neo and pAEcoRI-neo are derived from pHMG1-neo by deletion of a 1.5-kb BamHI fragment and a 3-kb EcoRI fragment, respectively. pRSV-neo and pSV2-neo contain the uph gene driven by the RSV and SV40 promoters. B, 10 pg of total RNA derived from stably transfected clones was separated in a 1.2% agarose/formaldehyde gel, transferred to a Genescreen Plus membrane, and probed with the 2-kb XhoI-XhoI fragment containing the uph gene. The migration of ribosomal RNAs is indicated. Lane 1, untransfected NIH3T3 cell line; lune 2, NIH3T3 clone stably transfected with pRSV-neo; lanes 3 and 4, two independent NIH3T3 clones stably transfected with pHMG1-neo.

A A:.:.: ..................................... ............................ ...... :.:.?

I I .- AAAA WNA HMGl -ne0

B 1 2 3 4 5 "

........................................................................................................... I I

_____) AAAA W N A RSV-WO 284 nt +

177 nt 177nt +

WNA HMGl

fragment derived from plasmid pHMG1-neo. It can hybridize to HMG1, HMG1-neo, and RSV-neo mRNAs, protecting fragments of 45,284, and 177 FIG. 6. RNase protection analysis. A, experimental strategy: the riboprobe was obtained by in uitro transcription of a 1.3-kb NotI-PstI

nucleotides, respectively. Black boxes, HMGl mRNA sequences; gray boxes, aph sequences. B, RNase protection analysis of transcripts from NIH3T3 cells transiently transfected with 20 pg of pHMG1-neo plus 2 pg of pRSV-neo (lane 1 ), 2.5 pg of pRSV-neo (lune 2), 10 pg of pHMG1-neo (lune 3), 20 pg of phBumH1-neo (lune 4 ) , or mock-transfected with 10 pg of pBlueScript KS(+) (lune 5). The arrows indicate the migration of,the 284 and 177 nucleotide fragments. The signal for the 45-nt band is outside of the area shown in the figure. Lane M contains molecular welght markers.

we isolated. Additional evidence, however, may be gathered from the examination of the sequences of hmgl pseudogenes. If all the pseudogenes are derived by retrotranscription from the single hmgl gene we isolated, they should all originally have had the same sequence, identical to the exons of the gene. Later, they should have accumulated base changes and addi- tions or deletions, but these should have occurred independ- ently in the various pseudogenes. In other words, the different pseudogenes are not expected to contain the same mutations.

We aligned the HMGl cDNAto the intronless HMG1-related sequences we had partially characterized during the quest for the gene. Starting from the translation start site and up to the codon corresponding to amino acid 65 of HMG1, the degree of similarity was between 90 and 99.5%. However, several muta- tions were identical in a few of the HMG1-related sequences. In particular, a group of five sequences contained the same pat- tern of 10 deviations from the cDNA sequence (Fig. 7A); this finding rules out an independent origin for each one in this group of sequences.

We applied to the HMGl cDNA and to the nine related se- quences a maximum parsimony algorithm, which tries to ar-

range nucleotide sequences in a topology that minimizes the number of substitutions from an ancestral sequence. It is clear that the HMGl cDNA sequence could not correspond to the sequence from which the other ones were derived independ- ently (an example of the trees generated is shown in Fig. 7B) . By default, we conclude that some or all of the HMG1-related sequences may derive from a gene distinct from hmgl and which we code name hmg-Z (Fig. 7C). hmg-2 cannot code for the HMG1-related cell surface proteins; it is nonetheless much more closely related to the hmgl than to the hmg2 gene, which diverged from a common ancestor at least 200 million years ago (27). Northern analysis of total RNA from adult or embryonic mouse tissues gives no evidence of additional mRNA species beyond those deriving from the hmgl gene, and the selection of HMG1-related clones from cDNA libraries consistently leads to the identification of sequences deriving from the hmgl gene. This implies that the expression of the hmgd gene and/or of the genes coding for cell surface proteins very closely related to HMGl may be restricted to limited districts or times or may be totally absent nowadays.

Page 6: THE OF Vol. 46, for in U.S.A. Gene Coding High Mobility ... · erally, HMGl has the ability to introduce bends or kinks into ... Olgettina 58, 4” piano A2, 20132 Milano, Italy.

28808 The hmgl Gene

A

HMGl -R-177 HMGl -R-87

HMGl 47-1 3 5 HMGl -R-l59 HMGl -R-16 1 cDNA HMGl -R-227 HMGl -R-l45 HMGl-R-154 HMGl-R-168

3 4 9 K L K L K L K L K L

K G P

R

L

B Distance 0.1

12 15 18 20 21 23 24 25 26 31 32 41 L F W V Z L L F W V Z L L F W V Z L

L F W V Z L L F W V Z L

K S F V Q C R E E H P F

T L L K W

L

M Q Y L

43 49 50 53 64 65 D T - -

Z D T G - D T G -

M D T G - D T G

K W K S A K

C / : \ 0 Hmg-X

I ”

J i

+ + (Hmsr) ”” @

HMGl-13-227 HMGl -R-87 HMGl -R-177 HMGl-R-135 HMGl -R-l59 HMGl-R-161

FIG. 7. Comparison of HMG1-related sequences. A, alignment of the nine HMG1-R sequences to HMGl cDNA. Only the conceptual translation of the codons with non-silent substitutions with respect to the HMGl cDNA is given here; most of the HMG-R sequences contain frameshifts too. The nucleotide sequences are deposited at the EMBL data library with accession numbers X80459-X80467, and their similarity to the cDNAranges between 90 and 99.5%. 2 represents a stop codon; a dash indicates that the sequence corresponding to the indicated position was not determined. A group of five sequences contains the same pattern of mutations. E, the maximum parsimony algorithm TREECOM was applied to the HMGl cDNA and HMG1-R sequences. The predicted evolutionary tree shows that HMGl cDNA does not correspond to the sequence from which the HMG1-R sequences were derived, with the exception of HMG1-R-227. C , a model for the evolution of HMG1-R pseudogenes. hmgl and hmg2 genes probably were derived from a common ancestral gene, hmgX, as indicated by the high sequence similarity and the identical

the HMG1-R sequences. exon-intron organization. We suppose that another gene, hmgd, itself derived from hmgl or directly from hmg-X, might have originated some of

Acknowledgments-We thank Dr. Pierre Ferrier for hospitality (to S. C.), Dr. Claudio Bandi for help in the evolutionary analysis, and all the members of our laboratory for helpful suggestions and critical reading of the manuscript.

REFERENCES 1. Bustin, M., Lehn, D. A,, and Landsman, D. (1990) Biochirn. Biophys. Acta

2. Bianchi, M. E., Beltrame, M., and Falciola, L. (1992) in Nucleic Acids and 1049,231-243

Molecular Biology (Eckatein, F., and Lilley, D. M. J., eds) Val. 6, pp. 112-128, Springer-Verlag, Berlin

3. Sheflin, L. G., and Spaulding, S. W. (1989) Biochemistry 28,5658-5664 4. Stros, M., StokrovL, J., and Thomas, J. 0. (1994) Nucleic Acids Res. 22,1044-

5. Bianchi, M. E., Beltrame, M., and Paonessa, G. (1989) Science 243,1056-1059 6. Bianchi, M. E., Falciola, L., Ferrari, S., and Lilley, D. M. J. (1992) EMBO J. 11,

7. Pil, P. M., and Lippard, S. J. (1992) Science 256,234-236 8. Bianchi, M. E. (1994) Mol. Microbiol. 13, in press 9. Oiiate, S. A,, Prendergast, P., Wagner, J. P., Nissen, M., Reeves, R., Pettijohn,

1051

1055-1063

D. E., and Edwards, D. P. (1994) Mol. Cell. Biol. 14, 3376-3391 10. Ner, S. S., and Travers, A. A. (1994) EMBO J. 13,1817-1822 11. Merenmies, J., Pihlaskari, R., Laitinene, J., Wartiovaara, J., and Rauvala, H.

12. Parkkinen, J., Raulo, E., Merenmies, J., Nolo, R., Kajander, E. O., Baumann, (1991) J. Biol. Chem. 266,16722-16729

M., and Rauvala, H. (1993) J. Biol. Chem. 268,1972619738

13. Chomczynski, P., and Sacchi, N. (1987)Anul. Biochem. 162,156-159 14. Wen, L., Huang, J.-It, Johnson, B. H., and Reeck, G. R. (1989) Nucleic Acids

15. Tsuda, K., Kikuchi, M., Mori, K., Waga, S., and Yoshida, M. (1988) Biochern- Res. 17,1197-1214

16. Yotov, W. V., and St.-Amaud, R. (1992) Nucleic Acids Res. 20, 3516 17. Shirakawa, H., and Yoshida, M. (1992) J. Biol. Chem. 267, 6641-6645

19. Roth, S. Y., Schulman, I. G., Cook, R. G., and Allis, C. D. (1987) Nucleic Acids 18. Kolodrubetz, D., and Burgum, A. (1989) J. Bid. Chem. 286,3234-3239

20. Grasser, It D., and Feix, G. (1991) Nucleic Acids Res. 19, 2573-2577 21. Wagner, C. R., Hamana, K., and Elgin, S. C. R. (1992) Mol. Cell. Biol. 12,

22. Ner, S. S., Churchill, M. A. E., Searles, M. A,, and Travers, A. A. (1993) Nucleic

23. Maniatis, T., Goodboum, S., and Fischer, J. A. (1987) Science 236,1237-1245 24. Locker, J. (1993) in Gene Danscription. A Practical Approach (Hames, B. D.,

and Higgins, S. J., eds) pp. 320-345, Oxford University Press, Oxford 25. Lee, K-L. D., Pentecost, B. T., DAnna, J. A,, Tobey, R. A,, Gurley, L. R., and

Dixon, G. H. (1987) Nucleic Acids Res. 16, 5051-5068 26. Gubbay, J., Vivian, N., Economou, A,, Jackson, D., Goodfellow, P., and Lovell-

Badge, R. (1992) Proc. Natl. Acad. Sci. U. S. A. 89,7953-7957 27. Laudet, V., Stehelin, D., and Clevers, H. (1993) Nucleic Acids Res. 21, 2493-

2501 28. van de Wetering, M., Oosterwegel, M., Holstege, F., Dooyes, D., Suijkerbuijk,

R., van Kessel, A. G., and Clevers, H. (1992) J. Biol. Chem. 267,8530-8536

istry 27,6159-6163

Res. 19,8112

1915-1923

Acids Res. 21,4369-4371