Top Banner
Voue1Iubr518 NcecAisRsac Complete nucleotide sequence of tobacco streak virus RNA 3 Ben J.C.Cornelissen, Henk Janssen, Douwe Zuidema and John F.Bol Department of Biochemistry, State University of Leiden, P.O. Box 9505, 2300 RA Leiden, The Netherlands Received 31 January 1984; Accepted 16 February 1984 ABSTRACT Double-stranded cDNA of in vitro polyadenylated tobacco streak virus (TSV) RNA 3 has been cloned and sequenced. The complete primary structure of 2,205 nucleotides reveals two open reading frames flanked by a leader sequence of 210 bases, an intercistronic region of 123 nucleotides and a 3'-extracistronic sequence of 288 nucleotides. The 5'-terminal open reading frame codes for a Mr 31,742 protein, which probably corresponds to the only in vitro translation product of TSV RNA 3. The 3'-tenminal coding region predicts a Mr 26,346 protein, probably the viral coat protein, which is the translation product of the subgenomic messenger, RNA 4. Although the coat proteins of alfalfa mosaic virus (AlMV) and TSV are functionally equivalent in acti vati ng thei r own and each others genomes, no homol ogy between the primary structures of those two proteins is detectable. INTRODUCTION Tobacco streak virus (TSV) is a member of the ilarvirus group, one of the three proposed genera of the family of plantviruses with a single- stranded, tripartite RNA genome, the tricornaviridae (1). The complete nucleotide sequence of the genome is known for two members of this family: alfalfa mosaic virus (AlMV, ilarvirus group) and brome mosaic virus (BMV, bromovirus group) (2-6). The data support the view that RNAs 1 and 2 of these viruses are monocistronic, whereas RNA 3 contains two cistrons, en- coding a "35K" protein and the viral coat protein, respectively. Only the 5'-proximate "35K" cistron is open to translation in RNA 3; viral coat pro- tein is translated from a subgenomic messenger, RNA 4 (1). A similar orga- nization of genetic information has been reported for RNA 3 of cucumber mosaic virus (CMV, cucumovirus group) (7). An unique property of the ilarviruses is the necessity of coat protein to initiate infection. This phenomenon, which differentiates ilarviruses from bromo- and cucumoviruses, is studied in most detail with AlMV. The ge- nome of this virus is not infectious unless each of the RNAs 1, 2 and 3 have bound a few coat protein molecules (8,9). Studies in our laboratory © I R L Press Umited, Oxford, England. Nucleic Acids Research Volume 12 Number 5 1984 2427
11

Complete nucleotide sequence of tobacco streak virus RNA 3

May 16, 2023

Download

Documents

James Symonds
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Complete nucleotide sequence of tobacco streak virus RNA 3

Voue1Iubr518 NcecAisRsac

Complete nucleotide sequence of tobacco streak virus RNA 3

Ben J.C.Cornelissen, Henk Janssen, Douwe Zuidema and John F.Bol

Department of Biochemistry, State University of Leiden, P.O. Box 9505, 2300 RA Leiden, TheNetherlands

Received 31 January 1984; Accepted 16 February 1984

ABSTRACTDouble-stranded cDNA of in vitro polyadenylated tobacco streak virus

(TSV) RNA 3 has been cloned and sequenced. The complete primary structureof 2,205 nucleotides reveals two open reading frames flanked by a leadersequence of 210 bases, an intercistronic region of 123 nucleotides and a3'-extracistronic sequence of 288 nucleotides. The 5'-terminal open readingframe codes for a Mr 31,742 protein, which probably corresponds to the onlyin vitro translation product of TSV RNA 3. The 3'-tenminal coding regionpredicts a Mr 26,346 protein, probably the viral coat protein, which is thetranslation product of the subgenomic messenger, RNA 4. Although the coatproteins of alfalfa mosaic virus (AlMV) and TSV are functionally equivalentin acti vati ng thei r own and each others genomes, no homol ogy between theprimary structures of those two proteins is detectable.

INTRODUCTIONTobacco streak virus (TSV) is a member of the ilarvirus group, one of

the three proposed genera of the family of plantviruses with a single-stranded, tripartite RNA genome, the tricornaviridae (1). The completenucleotide sequence of the genome is known for two members of this family:alfalfa mosaic virus (AlMV, ilarvirus group) and brome mosaic virus (BMV,bromovirus group) (2-6). The data support the view that RNAs 1 and 2 ofthese viruses are monocistronic, whereas RNA 3 contains two cistrons, en-coding a "35K" protein and the viral coat protein, respectively. Only the5'-proximate "35K" cistron is open to translation in RNA 3; viral coat pro-tein is translated from a subgenomic messenger, RNA 4 (1). A similar orga-nization of genetic information has been reported for RNA 3 of cucumbermosaic virus (CMV, cucumovirus group) (7).

An unique property of the ilarviruses is the necessity of coat proteinto initiate infection. This phenomenon, which differentiates ilarvirusesfrom bromo- and cucumoviruses, is studied in most detail with AlMV. The ge-nome of this virus is not infectious unless each of the RNAs 1, 2 and 3have bound a few coat protein molecules (8,9). Studies in our laboratory

© I R L Press Umited, Oxford, England.

Nucleic Acids ResearchVolume 12 Number 5 1984

2427

Page 2: Complete nucleotide sequence of tobacco streak virus RNA 3

Nucleic Acids Research

(10-12) have shown that coat protein binds preferentially to a region in the3'-terminal sequence of 145 nucleotides that is highly homologous in RNAs 1,2, and 3 (13-15). The hypothesis has been put forward by Houwing and Jaspars(16) that binding of coat protein to the 3'-termini of the genome segmentsis required for a proper recognition of the viral RNAs by the viral repli-case.

For several ilarviruses it has been shown that the respective coat pro-teins are freely exchangeable in the process of genome activation (17-19).For example, the AlMV genome can be activated by TSV coat protein and viceversa. In line with this observation Zuidema and Jaspars (12) have shownthat TSV protein binds at the 3'-termini of AlMV RNA 3 whereas AlMV proteinbinds at the 3'-termini of TSV RNAs. An analysis of the 3'-terminal sequen-ces of TSV RNAs 1, 2, and 3 has revealed a homology in the last 45 nucleo-tides (12,20). Although there is virtually no similarity in primary struc-ture between the 3'-terminal homologous regions of AlMV and TSV RNAs, theyhave in common the occurrence of stable hairpin structures flanked by therepeating tetranucleotide sequence AUGC (20). Probably, these RNA structuresare responsible for the mutual recognition of AlMV and TSV RNAs by eachothers coat proteins. In addition, it would be interesting to know if AlMVand TSV coat proteins have structural features in common, which might be in-volved in the recognition of their own and each others RNAs. The lack of aserological relationship between AlMV and TSV (17), the tryptic peptide mapsof the two coat proteins (17) and the failure to detect sequence homologybetween the RNAs of the two viruses by cometition hybridization (21) indi-cate that the two proteins may be rather different. To obtain further infor-mation on the mechanism of ilarvirus genome activation by homologous andheterologous coat proteins, we deduced the nucleotide sequence of TSV RNA 3from in vitro synthesized or cloned cDNA. Like RNA 3 of other tricorna-viruses, TSV RNA 3 contains two open reading frames encoding a "35K" proteinand the presumtive coat protein. Although some homology was found betweenthe "35K" proteins of AIMV and TSV, no homology was detected between theprimary amino acid sequence of the coat proteins of the two viruses.

MATERIALS AND METHODSMateriaZs. T4 polynucleotide kinase and (y-32P)ATP were from New Eng-

land Nuclear. ATP:RNA adenyltransferase was isolated from E. coli Q13 asdescribed (22). Restriction enzymes were purchased from New England Biolabs,Boehringer and Amersham. Reverse transcriptase was kindly provided by Dr.

2428

Page 3: Complete nucleotide sequence of tobacco streak virus RNA 3

Nucleic Acids Research

J.W. Beard (St. Petersburg, Florida).

Virus isolation and RNA purification. TSV (strain WC) was grown in N.

glutinosa x N. clevelandii (23) and isolated as described (17). RNA 3 waspurified essentially as reported by Van Vloten-Doting (17).

Synthesis and sequence analysis of randomly primed cDNA frag?nnts toRNA 3. Randomly primed cDNA to TSV RNA 3 was synthesized according to Riceand Strauss (24). After cleavage of the single-stranded cDNA by a restric-tion enzyme, the fragments were 5'-end labelled and sized on a gel (24).Purification of the fragments was followed by sequence analysis according toMaxam and Gilbert (25).

Synthesis and cloning of double-stranded cDNA. A poly(A) chain wasattached to the 3'-end of RNA 3 with ATP:RNA adenyltransferase by the proce-dure of Devos et al. (26). Double-stranded cDNA to polyadenylated RNA 3 wassynthesized and cloned into the PstI site of pBR322 as described (2).

DNA sequencing. DNA of recombinant plasmids was isolated (2) and cutwith an appropriate enzyme. The resulting fragments were separated on andsubsequently eluted from 5% polyacrylamide gels (25). After 5'-end labellingthe fragments were digested with a second enzyme, purified and sequenced(25).

Sequence determination at the 5'-terminus of TSV RNA 3. TSV RNA 3 wasdecapped, 5'-end labelled, purified and the sequence of the 5'-terminal 18nucleotides was determined by the "wandering spot" method as described (27).

RESULTS AND DISCUSSIONConstruction of the sequence

Previously we have reported the sequence of the 3'-terminal 140 nucleo-tides of TSV RNA 3 (20). In order to elucidate the complete primary struc-ture of this RNA, we initially used the approach described by Rice andStrauss (24). Single-stranded DNA fragments were generated by cutting ran-domly primed cDNA to TSV RNA 3 with a restriction endonuclease. Afterlabelling with (y-32P)ATP and kinase and sizing on a 5% acrylamide gel, thefragments were sequenced by the Maxam and Gilbert technique (25). In thisway we deduced the sequence of about 65% of the RNA (Figure 1A), includingthe 5'-terminal sequence (see below). However, at this stage of the work it

became clear, that we would not be able to establish the complete primarystructure unambiguously by this approach, because of the occurrence of severe

band compression in some sequence gels, due to strong secondary structures insome DNA fragments. In addition, some of the potential single-stranded DNA

2429

Page 4: Complete nucleotide sequence of tobacco streak virus RNA 3

Nucleic Acids Research

A < < 4- 4- 4- 4--

1 500 1000 1500 2000

pTS3-69 7 I I/ / z z z 7''pTS3-17 r z z z z z z z z z z , , ApTS3-29 / 'Z Z Z z z z z z z z z / z

c a coO°°' °C ~ ~~~~~I Into in Q.111 >C ~ ~~~~Iv e IJIYI Y I I

"- El -4 - o 0 000or 0

r 0 0CT0rc 0r 00.0

Figure 1. (A) TSV RNA 3 sequences deduced from cDNA fragments, generatedaccording to Rice and Strauss (24). (B) Alignment of inserts of hybrid plas-mids that were used to construct the sequence of TSV RNA 3. (C) Physical mapand sequencing strategy.

cutting restriction endonucleases, like HpaII and HaeIII, did not give satis-factory results after digesting random primed cDNA. In retrospect, this canbe explained by the fact that the recognition sites for those enzymes occurconsiderably less frequent in the sequence than expected statisticly. In fact,in our hands only the endonucleases TaqI and AluI gave good results.

In a second approach we synthesized and cloned double-stranded DNAcopies of TSV RNA 3 in the PstI site of pBR322 vector DNA. Three hybrid plas-mids with inserts longer than 1,500 bp, notably pTS3-69, pTS3-29 and pTS3-17,were selected for further studies. Figure 1B shows the alignment of the in-serts of those three plasmids and Figure 1C shows a restriction map and thestrategy used to sequence the DNA by the Maxam and Gilbert technique. The in-sert of pTS3-69 appeared to cover the RNA 3 sequence from nucleotide 15 tothe 3'-end. The initial 14 nucleotides of RNA 3 were sequenced in two ways.By the method of Rice and Strauss (see above) AluI and TaqI fragments were

generated with the same 3'-terminus, suggesting a strong stop in the reverse

transcription of TSV RNA 3. Since the end of a template is the strongest stoppossible, we assumed to have generated and sequenced fragments complementaryto the 5'-terminus of RNA 3. This assumption appeared to be correct aftersequencing 5'-end labelled TSV RNA 3 with the wandering spot technique(result not shown).Sequence and features of TSV RNA 3

The complete sequence of 2,205 nucleotides of TSV RNA 3 is shown in

2430

Page 5: Complete nucleotide sequence of tobacco streak virus RNA 3

Nucleic Acids Research

GTATTCTCCGAGCTTAAGATACCACTTGCAATTTGATTCCGAATCGGACGATTTCCAAC?TTTGAATTCCTACAAGTTGAGACCATTASTL6.92

Y!iHet Ala Leu Ala Pro Thr Het Lys Ala Lou Tot Pile Ser Ala Asp Asp Gbu Tilt Ser Leou Ci Lys Ala Vai Tilr 6l Ala Leu Scr 6lya" GCCG TTA GCA CCA ACG ATG AAA GCT TTA ACA TTC TCT GCA BAT GAT GAA ACA TCC CTC GAB AAA GCT GTA ACT GAA GCA CT? TCA BAT

Ser Vol Asp Leu Asn Het Gly Leau AraI Ara Cys Ala Ala Pile Pco Ala GIu Aso Tor Gly Ala Pile Leou Cys Giu Leou Tr TSr Lys GiuTCC GTT GAT TTG AAC ATG GGG TTA CG? CC? T6C BCA GCA T TT CCT CPCT BAA AAC ACG GGT GCT TTC CTG TGT GAG TTB ACT ACT AAA GAA391Thr Lys 5er Phe Ile Gly Lys Poe Ser Asp Lys Vol Arg Gly Arg Vol Pho, II& Asp His Ala Vol Ile His met Met Tyr Ilie Pro VaiACG AAA TCC TTT ATC GGT AAA TTT TCC GAT AAA GT? AGA GGA CBT GTC TTT ATA GAT CAC 6CG GTT ATA CAC ATG ATG TAT ATA CCC GTA481Ilie Leu Asn Tot Tot His Ala Ile Ala Giu Lou Lys Leu Lys Ass Leou Ala Tht Gly Asp Giv Leo Tyt Gly Gly Ter Lys Val Asn LeoATA TTG AAC ACC ACT CAT GCC ATC GCCT CAB CTT AAA TTC AAA AAT TTA GCC ACC GGT GAT BAA CTT TAT GGT GCT ACT AAA GTC AAC TTG571Asn Lys Ala Phe Ilie Leu Tot Het Tht Trp Pro Arg Set Leo Pile Ala Clu Ala Vol His Ass His Lys Cly Leou Tyr Leo Sly Gly TiltAAC AAG GCC TTC ATA TTA ACT ATB ACA TGG CCT CGT TCT CTA TTT GCT BAA GCA GT? CAC AAT CAC AAA GGA TTA TAC CT6 BC? GBA ACT661Val Set Cys Ala Set Set Val Pro Ala His Ala Lys Ilie Gly Het Trp Tyr Pro Ilie Trp Set Gia Lys Val Set Ilie Lys Bin Leo TyrGTT TCC TGC GCT TCC TCA GTG CCT GCA CAC GCC AAA ATC GBG ATG TCB TAC CCC ATT TGG TCB BAA AAB GTT TCB ATT AAA CAA TT6 TA?751Bin Asn Tot Ile Asp Ilie His Lys Thrt Glu Ala Ilie Glu Tsr POe Tsr Pro Tor Not Ilie Set Set Asp Lys Glu Het Arg Set Leo LeuCAB AAT ACG ATT BAT ATT CAT AAA ACC CAB GCCC AT? CAB ACG TTT ACC CCG ACC ATB ATC ABC ACT CA? AAB GAA ATG ABA TCC CTA TTG841Arg Set Arg Ala Set Ilie Asp Vai Ala Ala Lys Tilt Arg Blu Lys Pro Val Ilie Cys Set Cia Atg Vol Set Leo Leo Asp Bin His TsrABA ACT CGT CCC TCA ATA BAT GTA GCT GCA AAA ACA CGA CAB AAA CCT GTG A?A TGC TCG BAA CCT GTT AC? TTG CTG BA? CAB CAT ALT931Bin Gly Vol Asp Phe Tilt Vol TirltBIlie 61u Pro Blu Lys Asp Asp Asp Ala Gly Tsr Set Ilie Leo Gly Pro Lys Het Val Pro IlieCAB BBT BTC CAT TTT ACT CTT ACA BAA ATC BAA CCT BAA AAG GAC CAT BA? GCA BGA ACB TCA ATC TTA GGA CCA AAG ATG CTT CCo AT?1021Ciu Bin Vol Pro Set Vol Lys Leo Set Set Glu Ala Bly Arg Ass Leo Leo Tilt Al& 0CAB CAA GTA CCA TCT CTT AAA CTT TCG TCG BAA GCA BGT ABC AAC CTC CTT ACA BCC ~CTCTTCGGTTGACTAAGACATGGGGGGCCTTGAAATAAAG

1 120Het Ass Tot Leo Ilie Bin Gly Pro Asp

GGGCTAGTGCCTCGGAGTGAGACGAGTATTAAGTGGATGAATTCTAGAAATAGATAAGTCGCTCTCGCGACTTACCTGAGA?TGTi AAT ACT TTG ATC CAA CC? CCA GAC1231His Pro Set Ass Ala Me: Set Set Arg Ala Asn Ass Arg Set Ass Ass Set Arg Cys Pro Tort Cys Ilie Asp CIa Leo Asp Ala He: AlaCAT CCA TCC AAC CCC ATG TC? TCG CC? GCT AAC AAC CCC TCA AAT AAC ABC ABA TGC CCA ACT TGC ATT CA? CAB TTG GAC GCT A?A CCL

1321Ar9 Ass Cys Pro Ala His Ass Tot Val Ass Tilt Vol Set Arg Arg Bin Arg At9 Ass Ala Ala Arg Ala Ala Ala Tyr Arg Ass Ala AssABC AA? TGT CCC CCC CAT AAT ACC GTG AAC ACT CT? TCA CCA CCC CAB CCC CGT AAT CCC GCC? ABA GCT BCC GCCC ?A? ABA AAC GCCC AA?

1411Ala Arg Val Pro Leo Pro Leo Pto Vol Vol Set Vol Set Arg Pro Bin Ala Lys Ala Set Leo Arg Leo Pro Ass Ass Bin Val Trp VolGCT ABA GTA CCC CTA CCC CTT CC? GTG GTA TCG CTT ?CC CCC CCT CAA GCCC AAG GCCC TCG TTG ABC TTA CCC AAC AAT CAA GTT TAA BTA

Tot At9 Lys Ala Set Glu Ttp Set Ala Lys Thrt Vol Asp Tsr Ass Asp Ala Ilie Pro Pile Lys Tilr Ilie Val 610 Bly Ilie Pro Ala IlieACT CCC AAA GCCC ACT BAA TGG TCT GCA AAG ACT CT? BAT ACC AAC BAT BC? ATC CCC ??C AAA ACC A?A A?C CAB ABS AT? CCC oAA A?C1591Gly Ala Glu Thrt Lys Pile Pile Arg Leo Leo Ilie Gly Pile Val Ala Val Set Asp Bly Tilt Pile Cly Het Vol Asp Cly Vol Tilt Cly AspBC? BC? CAB ACG AAG TTT T?C CC? CTC TTG ATC CC? TTT GTC CCC BTC TCA CAT CCC ACG TTT CBC ATG CT? CA? GSA GTA ACA CCC BA?1691Val Ilie Pro Asp Pro Pro Vol Vol Gly Arg Len Gly Pile Lys Lys Ass Tiltr Tyr Arg Set Arg Asp Pile Asp Leo Cly Gly Lys Leo LeoGTC A?? CCT GAC CCA CCA GTC GTA GGA CCC TTG BC? TTC AAG AAG AAT ACA TAC CBC ABC CCA GAC TTT CA? CTC CC? CC? AAA C?C C?C

1771Ass Bin Leo Asp Asp Arg Ala Vol Vol Trp Cys Leou Asp Gia Ar a Ar~ Arg Glu Ala Lys Arg Val Bin Leo Ala Gly Tyr Trp Ilie AlaAAT CAA C?A GAC GAC ABA GCT GTC GTC TGG TGC CTC GAC BAA CCGT CCGT CGA BAA CCC AAG ABC GTT CAB CTG ACA ABA TA? TAB ATC GCA1961Ilie Set Lys Pro Ala Pro Leo He: Pro Pro Glu Asp Pile Leo Vol Ass Bin Asp00AT? TCT AAA CCA GCT CCC TTG ATG CCA CCA BAA BAT TTT CTG GTG AAT CAA GAC CTAGATGGTCACCTCGCTGAGACCCAGATGCCCGCACATAGATG1961

2199-

GAGATGCC

Figure 2. The complete nucleotide sequence of TSV RNA 3 (strain WC) and theami no adid sequences corresponding to the two open readi ng frames. The threesequences (22, 7, and 7 nucleotides, respectively) in the 5'-noncoding region,that are repeated in the 3'-noncoding region, are underlined.

Figure 2, confirming the previously determined 140 3'-terminal residues. Two

long open reading frames starting with a methionine codon are revealed bythis sequence. One begins at the first AUG triplet from the 5'-end at nucleo-tides 211-213 and terminates with an UGA triplet at residues 1,078-1,080(Figures 2 and 3). The AUG codon at positions 1,204-1,206 is followed by thesecond long open reading frame, terminating with an opal codon at nucleotides1,915-1,917. Besides these two coding regions the longest reading frame,starting with a methionine codon, codes for only 44 amino acids (AUG tripletat residues 2,039-2,041). The sequence between the two long reading frames

2431

Page 6: Complete nucleotide sequence of tobacco streak virus RNA 3

Nucleic Acids Research

TSV RNA 3

210 870 123 714 288

1 211 1080 1204 1917 2205Mr 31,742 protein Mr 26.346 protein

289 amino acids 237 amino acids

Figure 3. The overall structure of TSV RNA 3. The numbers under the bar arethe residue numbers counted from the 51-terminus. The lengths in nucleotidenumbers of the various regions in TSV RNA 3 are indicated above the bar. Theblack parts of the bar represent the two open reading frames.

is 123 nucleotides; the 3'-noncoding region is 288 residues (Figures 2 and3). So, we conclude that the overall structure of TSV RNA 3 (Figure 3) isvery similar to the structures of the RNAs 3 of AlMV, BMV, and CMV. Theunderlined nucleotides in Figure 2 denote three sequences with a total lengthof 36 nucleotides in the 5'-noncoding region which are exactly repeated inthe 3'-noncoding region.TSV Mr 31, 742 protein

In eukaryotes protein synthesis is usually initiated at the first AUGtriplet from the 5'-end of the mRNA (28). Since the first 5'-proximal AUGcodon in TSV RNA 3 is followed by a long open reading frame, this triplet islikely to be the initiation codon for translation. Starting at this AUG, thecoding region of 870 nucleotides predicts a protein of Mr 31,742. This pro-tein probably corresponds to the only in vitro translation product of TSVRNA 3, which has an approximate Mr of 33,000 (29-31). The similarities in

-structure of the RNAs 3 of the tricornaviruses do suggest similar functionsfor the proteins they are coding for. Thus, the Mr 31,742 protein encoded byTSV RNA 3 might be functionally similar to the Mr 32,400 protein of AlMV RNA3 (4), the Mr 32,480 BMV protein (6), and the Mr 36,700 CMV protein (7). Ifthese proteins have indeed the same function(s), the question raises whetherthey show also structural similarities. From the amino acid compositions ofthe four proteins it can be calculated that they all four are slightly basicat pH 7.0 (TSV protein: + 2; AlMV and CMV proteins: + 4; BMV protein: + 5).In addition, Murthy showed recently that there exists a homology of over 35%between the BMV and CMV proteins (32). We compared the primary structure ofthe Mr 31,742 TSV protein with the amino acid sequences of the Mr 32,480 BMVprotein and the Mr 36,700 CMV protein, but we did not find significant homo-logies. However, the Mr 31,742 TSV protein does show some homology with thecorresponding AlMV protein. The details of this comparison will be reported

2432

Page 7: Complete nucleotide sequence of tobacco streak virus RNA 3

Nucleic Acids Research

elsewhere (Gibbs et al., to be published).TSV coat protein

By analogy with AlMV, BMV, and CMV, the 3'-proximal open reading frameof TSV RNA 3 is likely to code for the viral coat protein. From amino acidanalysis a Mr of 30,000 was derived for the TSV coat protein, whereas a Mrof 28,500 was obtained from SDS polyacrylamide gel electrophoresis (33). Theopen reading frame at the 3'-end of RNA predicts a protein of Mr 26,346 (in-cluding the first methionine residue). The amino acid composition of thisprotein of 237 residues, as deduced from the sequence, corresponds ratherwell with the amino acid composition reported for TSV coat protein (33)(Table 1). Like the coat proteins of AlMV and BMV, the in vivo occurring TSVcoat protein might lack the first methionine residue of the deduced aminoacid sequence. Some of the other differences between the deduced and deter-mined amino acid compositions can be explained by the difference in TSVstrains used. The result of an amino acid analysis of the coat protein ofTSV, strain WC, is in agreement with the amino acid composition of the pro-tein, deduced from the sequence (B. Kraal and J.M. De Graaf, unpublishedresults).

As already stated in the Introduction, the coat proteins of AIMV and TSVare functionally equivalent in activating their respective genomes. Compari-son of the primary structures of both coat proteins by a simple computerprogram and by visual inspection does not reveal significant sequence homo-logy. This is in agreement with an earlier observation that the tryptic pep-tide maps of both proteins are very different (17). Possibly, the ability ofAIMV and TSV coat protein to recognize and activate each others genome is

due to a common feature at the level of the tertiary structure of the respec-tive polypeptide chains.

Based on proton nuclear magnetic resonance studies a model has been pos-tulated for AlMV coat protein, consisting of a rigid core and a flexible N-terminal part of approximately 36 amino acids (34). This N-terminus is, likethe N-termini of the BMV and CMV coat proteins, rather basic (7-8 Arg and/orLys residues within the 30 N-terminal amino acids) and is believed to inter-act with the viral RNA (35). Upon removal of the 25 N-terminal amino acidsby a mild trypsine digestion, the AlMV coat protein loses the capability to

activate the AlMV genome (36,37) and the specificity to bind to the 3'-ter-mini of the AlMV RNAs (37). In contrast with the coat proteins of AlMV, BMV,and CMV, the N-terminus of TSV coat protein is not basic (3 Arg and 1 Hisresidues within the first 30 amino acids). However, the part of TSV coat pro-

2433

Page 8: Complete nucleotide sequence of tobacco streak virus RNA 3

Nucleic Acids Research

Table 1. Amino acidstrains

composition of TSV coat protein of three different

Number of residuesBRAZa

3126

} 33}3

191719

22123

1125141434

25272

in coatGVa2526

} 383

191162

1020134

1223141333

28272

protein of TSV strainsWCb24231618487

132

1017105818141233

21237

a As determined by amino acid analysis (33).b As deduced from the sequence.

tein between residues 52 and 71 is very basic (7 Arg residues) and might have

the same function in vivo as the N-termini of the AlMV, BMV, and CMV coat

proteins.

The noncoding regions

The length of 210 nucleotides of the 5'-noncoding region of TSV RNA 3 iscomparable to the leader of AlMV RNA 3, but more than twice the length of theleaders of BMV and CMV RNA 3 (Table 2). As compared to the total base content(27.2% A, 21.9% C, 24.4% G, 26.5% U), the leader of TSV RNA 3 is slightlyenriched in U (30.5%). The leaders of AlMV, BMV, and CMV RNA 3, however, con-

tain approximately 40% U's. A sequence of 28-30 nucleotides is repeated threetimes in the leader sequence of AlMV RNA 3 (4,38). No such repeat is found inTSV RNA 3. However, in this RNA repeats between the 5'- and 3'-noncodingsequence were noticed. A possible function of these repeats is not known at

present. The fact that they are not of general occurrence in tricornavirusesmay suggest that they have no function in virus replication.

The intercistronic region of TSV RNA 3 of 123 nucleotides is exactly as

2434

Ami no aci d

AlaArgAspAsnCysGluGlnGlyHisIleLeuLysMetPheProSerThrTrpTyrVal

Total number

Page 9: Complete nucleotide sequence of tobacco streak virus RNA 3

Nucleic Acids Research

Table 2. Organization of genetic information in RNA 3 of several tricorna-vi rusesa

Number of nucleotides

Viral 5'-Non- Open Inter- Open 3'-Non- TotalRNA 3 coding readingb cistronic reading coding number o

region frame 1 region frame 2b region residues

TSV 210 870 123 714 288 2,205AlMV 240 903 49 666 179 2,037BMV 91 912 232-2380 579 297 2,114CMV 94 1,002 123 711 263 2,193

a RNA 3 sequence of AIMV, BMV, and CMV is from references (4), (6), and (7),b respecti vely.

Stopcodons are included.Variation in number of nucleotides due to variable length of internalpoly(A) tract.

long as the corresponding region in CMV RNA 3 (Table 2). The intercistronicregion of 49 residues in AlMV RNA 3 is much smaller, and that of 232-238nucleotides in BMV RNA 3 is much larger in size (Table 2). The intercistronicregion does not only separate the two coding regions but contains also therecognition site for the production of the subgenomic messenger RNA 4. ForAlMV, BMV, and CMV it is exactly known where the sequence of RNA 4 starts inthe intercistronic region in RNA 3. The start points of RNAs 4 have in comnmonthat they are preceded by a C residue and that the first base of the subge-nomic RNA is a G (4,6,7). For TSV RNA 3 it is not yet known where the sequen-ce of RNA 4 begins. The amount of RNA 4 in our TSV RNA preparations was toolow to permit an analysis of the 5'-terminal sequence of this subgenomicmessenger. A dinucleotide CG that might represent the start point of RNA 4,occurs five times in the intercistronic region of TSV RNA 3 (Figure 2).

BMV RNA 3 contains in its intercistronic region a poly(A) tract whichvaries from 16-22 nucleotides in length (6). It has been suggested that thistract plays a role in the formation of BMV RNA 4 (6). If this is true, thismechanism would be unique for BMV, since the RNAs 3 of AIMV, CMV, and TSV donot have an internal poly(A) tract. The 288 3'-terminal nucleotides of TSVRNA 3 form a 3'-extracistronic region which is comparable in length to the3'-extracistronic regions of AIMV, BMV, and CMV RNA2 3 (Table 2). The 3'-extracistronic region of TSV RNA 3 contains several potential hairpin struc-

2435

Page 10: Complete nucleotide sequence of tobacco streak virus RNA 3

Nucleic Acids Research

tures and resenbles in this sense the 3'-extracistronic region of AlMV RNA 3.Details and biological significance of those potential hairpin structureshave been discussed elsewhere (12,20).

ACKNOWLEDGEMENTSWe thank J. Martien De Graaf and Dr. Barend Kraal for an amino acid ana-

lysis on TSV coat protein, Frans Th. Brederode for the wandering spot on 5'-labelled TSV RNA 3, and Dr. Lous Van Vloten-Doting for valuable advices,helpful discussions and critical reading of the manuscript.

This work was sponsored in part by the Netherlands Foundation for Chemi-cal Research (S.O.N.) with financial aid from the Netherlands Organizationfor the Advancement of Pure Research (Z.W.O.).

REFERENCES1. Van Vloten-Doting, L., Francki, R.I.B., Fulton, R.W., Kaper, J.M. and

Lane, L.C. (1981) Intervirology 15, 198-203.2. Cornelissen, B.J.C., Brederode, F.Th., Moormann, R.J.M. and Bol, J.F.

(1983) Nucleic Acids Res. 11, 1253-1265.3. Cornelissen, B.J.C., Brederode, F.Th., Veeneman, G.H., Van Boom, J.H.

and Bol, J.F. (1983) Nucleic Acids Res. 11, 3019-3025.4. Barker, R.F., Jarvis, N.P., Thompson, D.V., Loesch-Fries, L.S. and Hall,

T.C. (1983) Nucleic Acids Res. 11, 2881-2891.5. Ahlquist, P., Dasgupta, R. and Kaesberg, P., in press.6. Ahlquist, P., Luckow, V. and Kaesberg, P. (1981) J. Mot. Biol. 153, 23-

38.7. Gould, A.R. and Symons, R.H. (1982) Eur. J. Biochem. 126, 217-226.8. Smit, C.H. and Jaspars, E.M.J. (1980) Virology 104, 454-461.9. Snit, C.H., Roosien, J., Van Vloten-Doting, L. and Jaspars, E.M.J.

(1981) Virology 112, 169-173.10. Houwing, C.J. and Jaspars, E.M.J. (1982) Biochemistry 21, 3408-3414.11. Zuidema, D., Bierhuizen, M.F.A., Cornelissen, B.J.C., Bol, J.F. and

Jaspars, E.M.J. (1983) Virology 125, 361-369.12. Zuidema, D. and Jaspars, E.M.J. (1984) Virology, in press.13. Koper-Zwarthoff, E.C., Brederode, F.Th., Walstra, P. and Bol, J.F. (1979)

Nucleic Acids Res. 7, 1887-1900.14. Pinck, L. and Pinck, M. (1979) FEBS Lett. 107, 61-65.15. Gunn, M.R. and Symons, R.H. (1980) FEBS Lett. 109, 145-150.16. Houwing, C.J. and Jaspars, E.M.J. (1978) Biochemistry 17, 2927-2933.17. Van Vloten-Doting, L. (1975) Virology 65, 215-225.18. Gonsalves, D. and Garnsey, S.M. (1975) Virology 67, 319-326.19. Gonsalves, D. and Fulton, R.W. (1977) Virology 81, 398-407.20. Koper-Zwarthoff, E.C. and Bol, J.F. (1980) Nucleic Acids Res. 8, 3307-

3318.21. Bol, J.F., Brederode, F.Th., Janze, G.C. and Rauh, D.C. (1976)

Virology 65, 1-15.22. Sippel, A. (1973) Eur. J. Biochem. 37, 31-40.23. Christie, S.R. (1969) Plant. Dis. Rep. 53, 939-941.24. Rice, C.M. and Strauss, J.H. (1981) J. Mol. Biol. 150, 315-340.25. Maxam, A.M. and Gilbert, W. (1980) in "Methods in Enzymology" (Grossman,

L., ed.), Vol. 65, pp. 499-560, Academic Press, New York.26. Devos, R., Van Emmelo, J., Seurinck-Opsomer, C., Gillis, E. and Fiers,

W. (1976) Biochim. Biophys. Acta 447, 319-327.27. Koper-Zwarthoff, E.C., Lockard, R.E., Alzner-De Weerd, B., RajBhandary,

2436

Page 11: Complete nucleotide sequence of tobacco streak virus RNA 3

Nucleic Acids Research

U.L. and Bol, J.F. (1977) Proc. NatZ. Acad. Sci. U.S.A. 74, 5504-5508.28. Kozak, M. (1981) Nucleic Acids Res. 9, 5233-5252.29. Rutgers, A.S. (1977) Ph.D. Thesis, University of Leiden, The Netherlands.30. Davies, J.W. (1979) in "Nucleic Acids in Plants" (Hall, T.C. and Davies,

J.W., eds.), Vol. II, pp. 111-149, CRC Press, Boca Raton, Florida,U.S.A.

31. Van Tol, R.G.L. and Van Vloten-Doting, L. (1981) Virology 109, 444-447.32. Murthy, M.R.N. (1983) J. Moi. Biol. 168, 469-475.33. Ghabrial, S.A. and Lister, R.M. (1974) Virology 57, 1-10.34. Kan, J.H., Andree, P.-J., Kouijzer, L.C. and Mellema, J.E. (1982) Eur.

J. Biochem. 126, 29-33.35. Argos, P. (1981) Virology 110, 55-62.36. Bol, J.F., Kraal, B. and Brederode, F.Th. (1974) Virology 46, 73-85.37. Zuidema, D., Bierhuizen, M.F.A. and Jaspars, E.M.J. (1983) Virology 129,

255-260.38. Pinck, M., Fritsch, C., Ravelonandro, M., Thivent, C. and Pinck, L.

(1981) Nucleic Acids Res. 9, 1087-1100.

2437