This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Genomic and Proteomic Analysis of Invertebrate Iridovirus Type 9�†Chun K. Wong,1 Vivienne L. Young,1 Torsten Kleffmann,2 and Vernon K. Ward1*
Department of Microbiology and Immunology1 and Centre for Protein Research, Department of Biochemistry,2
School of Medical Sciences, University of Otago, P.O. Box 56, Dunedin, New Zealand 9054
Received 30 March 2011/Accepted 23 May 2011
Iridoviruses (IV) are nuclear cytoplasmic large DNA viruses that are receiving increasing attention assublethal pathogens of a range of insects. Invertebrate iridovirus type 9 (IIV-9; Wiseana iridovirus) is amember of the major phylogenetic group of iridoviruses for which there is very limited genomic and proteomicinformation. The genome is 205,791 bp, has a G�C content of 31%, and contains 191 predicted genes, withapproximately 20% of its repeat sequences being located predominantly within coding regions. The repeatedsequences include 11 proteins with helix-turn-helix motifs and genes encoding related tandem repeat aminoacid sequences. Of the 191 proteins encoded by IIV-9, 108 are most closely related to orthologs in IIV-3(Chloriridovirus genus), and 114 of the 126 IIV-3 genes have orthologs in IIV-9. In contrast, only 97 of 211 IIV-6genes have orthologs in IIV-9. There is almost no conservation of gene order between IIV-3, IIV-6, and IIV-9.Phylogenetic analysis using a concatenated sequence of 26 core IV genes confirms that IIV-3 is more closelyrelated to IIV-9 than to IIV-6, despite being from a different genus of the Iridoviridae. An interaction betweenIIV and small RNA regulatory systems is supported by the prediction of seven putative microRNA (miRNA)sequences combined with XRN exonuclease, RNase III, and double-stranded RNA binding activities encodedon the genome. Proteomic analysis of IIV-9 identified 64 proteins in the virus particle and, when combined withinfected cell analysis, confirmed the expression of 94 viral proteins. This study provides the first full-genomeand consequent proteomic analysis of group II IIV.
Iridoviruses (IV) are members of the nucleocytoplasmiclarge DNA viruses (NCLDV) (19). They possess a linear dou-ble-stranded DNA (dsDNA) genome with circular permuta-tion and terminal redundancy (6, 13), and replication of theviral genome includes distinct nuclear and cytoplasmicphases (12). The genomes are encapsidated within an ico-sahedral shell ranging between 120 and 180 nm in diameterand comprised predominantly of a 50-kDa major capsidprotein (MCP). The invertebrate iridoviruses (IIV), studied bycryo-electron microscopy, have 2-nm-diameter surface fibrils(23, 42); for invertebrate iridovirus type 6 (IIV-6), these fibrilsextend from the 3-fold rotational axis of the 1,460 hexamericcapsids found in the virus particle (43). IV are divided into 5genera (Table 1), with members of three genera infectingpoikilothermic vertebrates and members of the Iridovirus andChloriridovirus genera infecting invertebrates. The Chlorirido-virus genus has only one member, IIV-3 (mosquito iridovirus),and the primary defining differences between the Chloriridovi-rus and Iridovirus genera are particle sizes of approximately180 and 135 nm, respectively, and the mosquito host rangerestriction of IIV-3 (4).
The vertebrate IV cause disease in fish, amphibians, andreptiles and have received considerable attention due to theireffects upon aquaculture. In contrast, the IIV cause predomi-nantly subpathogenic infections, and their consequently lim-ited utility for pest control has meant that less is known about
IIV. Of particular importance has been the recent study ofBromenshenk et al. (3) linking colony collapse disorder inhoney bees to coinfection with Nosema and an unidentifiediridovirus(es). A strong causal relationship was established;however, the identity of the IV was not established, at least inpart due to a lack of IIV genomic information. In addition, therefraction of light by assemblies of IIV particles offers newopportunities in materials development (23, 28) that wouldbenefit from more information on the virus particle and itsconstituents. The roles of viral proteins, such as the surfacefiber, in iridescence are unknown, and the proteins and func-tional activities associated with the virus particle remain to beelucidated. Central to this is the need for information on IIVgenomes and the proteomic analysis of the virus particle.
Fourteen iridovirus species have been fully sequenced (Ta-ble 1), with multiple members of the Ranavirus, Lymphocysti-virus, and Megalocytivirus genera providing a comprehensivecoverage of these vertebrate genera of IV. Vertebrate IV ge-nomes range from 105 kbp for tiger frog virus (17) to 186 kbpfor lymphocystis disease virus, China strain (LCDV-C) (46).The Ranavirus and Megalocytivirus species have G�C contentsof approximately 50%, while the Lymphocystivirus species haveG�C contents of less than 30%. There is a consistent lack ofgenome colinearity between IV except with very closely relatedisolates, although all IV sequenced to date possess a corecohort of 26 conserved genes (8). In contrast to the vertebrateIV, the only fully sequenced IIV are IIV-6 (Chilo iridovirus[CIV]) (20) and IIV-3 (mosquito chloriridovirus [MIV]) (5).IIV-6 is the type species of the Iridovirus genus, with a genomeof 212 kbp and a G�C content of 29%; however, phylogeneticstudies show that IIV-6 belongs in a clade distant from that ofmost iridoviruses (Fig. 1A) (38). IIV-3, with a genome of 191kbp and a G�C content of 48%, represents a different genus
* Corresponding author. Mailing address: Department of Microbi-ology and Immunology, University of Otago, P.O. Box 56, Dunedin,New Zealand 9054. Phone: 64 3 4799028. Fax: 64 3 4798540. E-mail:[email protected].
† Supplemental material for this article may be found at http://jvi.asm.org/.
� Published ahead of print on 1 June 2011.
7900
Dow
nloa
ded
from
http
s://j
ourn
als.
asm
.org
/jour
nal/j
vi o
n 05
Jan
uary
202
2 by
92.
49.1
60.2
7.
that may be more closely related to members of the Iridovirusgenus than its placement in a separate genus suggests.
To date, only limited sequence information is available frommembers of the major clade of IIV, defined as group II irido-viruses by Williams and Cory (41), and genome analysis of amember of this clade would provide information on the rela-tionships between disparate IIV. IIV-9 (Wiseana iridovirus[WIV]), a representative of the major clade, was isolated inNew Zealand from larvae of the pasture pest Wiseana spp.(Lepidoptera: Hepialidae) (9). The mechanism of transmissionof this virus is unknown, though the presence of this virus indamp and cryptic habitats is consistent with many other IIV(40), and suggestions of vector transmission have been made,though not confirmed. Like most invertebrate iridoviruses,IIV-9 replicates in larvae of the greater wax moth Galleriamellonella upon injection, and heavily infected larvae displaytypical iridescence upon accumulation of paracrystalline arraysof virus particles within infected tissues (9). IIV-9 also repli-cates in Spodoptera frugiperda (Sf9, Sf21) cells, albeit at therestricted temperature of 21°C. IIV-9 is a member of the majorclade of IIV, as determined by partial major capsid proteinphylogeny (38).
This study presents the complete genomic sequence of IIV-9and uses this information for proteomic analysis of IIV-9’sencoded proteins in purified virus particles and within infectedcells. Analysis of the genome indicates that IIV-9 is moreclosely related to IIV-3 than to IIV-6 and provides the firstcomplete genome from the major clade of invertebrate irido-viruses.
MATERIALS AND METHODS
IIV-9 purification, DNA extraction, and sequencing. Sf21 cells were grown inSF900II serum-free medium (Invitrogen, Auckland, New Zealand) and infectedwith dilutions of a field isolate of IIV-9 that had been passaged repeatedlythrough G. mellonella larvae. Infected cells were incubated for 5 days at 21°Cunder an agarose overlay and stained with neutral red. Individual plaques werepicked and passaged once in cell culture. One plaque isolate was randomlyselected, propagated in G. mellonella, and purified on sucrose gradients asdescribed previously (23). Genomic DNA was extracted by phenol-chloroformextraction (37), and 50 �l (100 ng �l�1) of genomic DNA in deionized water wassequenced using the Roche/454 GS FLX High Throughput Sequencing Serviceprovided by the Department of Anatomy and Structural Biology, University ofOtago. All contig junctions were determined by sequencing of available restric-tion fragment clones or by PCR. Briefly, primers were designed near the terminiof contigs and used with primers on adjacent contigs to generate PCR productsdirectly from genomic DNA using the Expand high-fidelity PCR kit (RocheDiagnostics, Auckland, New Zealand). The PCR products were either sequenceddirectly at the Allan Wilson Sequencing Centre, Palmerston North, New Zea-land, on an ABI 3730 automated sequencer or cloned into pGEMTeasy (Pro-mega Corp., Madison, WI) prior to sequencing. Sequence conflicts, long repeats,and long runs of single nucleotides were confirmed by PCR and sequencing ofthe region in question. All ABI 3730-generated sequences were edited in Seq-Man (DNAStar) for sequence quality prior to use.
Sequence analysis. Newbler Assembler software (454 Life Sciences, Branford,CT) was used to assemble data into unordered and unoriented contigs (defaultsettings). The contigs were exported to the SeqMan program in the Lasergenesuite of DNA analysis programs (DNAStar, Madison, WI) and reassembled intoa draft alignment using the SeqMan assembler (match size, 12; minimum matchpercentage, 80%; minimum sequence length, 100; maximum number of addedgaps per kb in the contig, 70; maximum number of added gaps per kb in thesequence, 70; maximum register shift difference, 70; last group considered, 2; gappenalty, 0.00; gap length penalty, 0.70). All contigs were aligned to generate adraft alignment with a minimum match percentage of 95%. PCR primers weredesigned using PrimerDesign (DNAStar). An in silico analysis of the restriction
TABLE 1. Fully sequenced genomes from vertebrate and invertebrate iridoviruses
Genus and virusa Genomesize (bp) % G�C ORFb Coding
density (%)Protein sizerange (aa)
GenBankaccession no. Reference
IridovirusIIV-9 205,791 31 191 90 50–2,051 GQ918152 This studyIIV-6 212,482 29 243c 85 40–2,432 AF303741 Jakob et al. (20)
b Essentially nonoverlapping ORF encoding a minimum length of 40 to 62 aa.c Revised annotation of Eaton et al. (8).
VOL. 85, 2011 GENOMIC ANALYSIS OF IIV-9 7901
Dow
nloa
ded
from
http
s://j
ourn
als.
asm
.org
/jour
nal/j
vi o
n 05
Jan
uary
202
2 by
92.
49.1
60.2
7.
profile of the complete genome was performed using GeneQuest (DNAStar),and results were compared to published restriction profiles of IIV-9 genomicDNA as a confirmation of the assembly profile.
Tandem repeats within the IIV-9 genome were identified using Tandem Re-peats Finder (2), with parameters set for match and mismatch and indels equalto 2, 7, and 7, respectively. The minimum alignment score was set at 50, with amaximum period size of 2,000 bases. Direct, inverted, and dyad repeats wereidentified using GeneQuest (DNAStar) with an unlimited loop size. The minimumperiod sizes set for direct, inverted, and dyad repeats were 25 bp, 25 and 50 bp, and16 bp, respectively. Dot plot analysis was performed to identify DNA repeat clustersusing MegAlign (DNAStar), with a window size of 50 bp and a 75% match.
Open reading frames (ORF) encoding proteins with a minimum size of 50amino acids (aa) and that contained a start codon were designated using Seq-Builder (DNAStar). All designated ORF were named with “orf” followed bynumbers corresponding to their position and a forward/reverse (right [R]/left [L],respectively) designation to indicate their orientation. ORF that fell com-pletely within a larger ORF were excluded. Heavily overlapping open readingframes where the most likely ORF could not be determined were given thesame number but different orientations. All designated IIV-9 open readingframes were exported from SeqBuilder (DNAStar) to EditSeq (DNAStar),and BLASTP analysis of the predicted amino acid sequences was performedfor each open reading frame. Amino acid identity to the closest BLASTPmatch was performed using MegAlign (DNAStar). Analysis of IIV-9 ORFfunction was performed via the ExPASy Proteomics Server and includedInterProScan, SignalP, and PredictProtein. Protein repeats were identifiedusing the XSTREAM prediction server (27), and subsequent alignments ofprotein repeats were generated using MegAlign.
A phylogenetic tree was constructed based on the alignment of the 26 core
gene amino acid sequences found in IIV-9, IIV-6, IIV-3, Singapore grouperiridovirus (SGIV), lymphocystis disease virus 1 (LCDV-1), and infectious spleenand kidney necrosis virus (ISKNV) using MegAlign (DNAStar), with bootstraptrials set at 1,000. All core gene-encoded proteins were combined as one con-tinuous amino acid sequence in the same gene order prior to assembly. This wascompared to the partial major capsid protein tree as described in the work ofWebby and Kalmakoff (38).
The complete genome was scanned for miRNA coding regions using VMir(14, 33) and possible miRNA coding sequences further analyzed by MiPred(22). All images were generated using Microsoft PowerPoint and/or AdobePhotoshop CS4.
MS analysis. Purified IIV-9 virions or infected Sf21 cells harvested 24 hpostinfection were denatured in SDS-PAGE sample buffer, and proteins wereseparated on individual 10% SDS-PAGE gels using standard techniques. Thegels were stained with Coomassie G250 and protein lanes cut into five (for liquidchromatography coupled with electrospray ionization linear ion trap [LC-ESILTQ] Orbitrap tandem mass spectrometry [MS/MS] analyses of IIV-9 virionsand infected Sf21 cells) or eight (for LC–matrix-assisted laser desorption ion-ization–tandem time of flight [MALDI TOF/TOF] analysis of IIV-9 virions)equally sized fractions. Fractions were subjected to in-gel protein digestion withtrypsin essentially by following the protocol of Shevchenko et al. (30), using aliquid handling robotic workstation (DigestPro MSi; Intavis AG, Cologne, Ger-many). Each digested fraction was concentrated using a centrifugal vacuumconcentrator and reconstituted in a 10-�l aqueous solution of 2% (vol/vol)acetonitrile (ACN) supplemented with either 0.1% (vol/vol) trifluoroacetic acid(TFA) for LC-MALDI TOF/TOF analyses or 0.2% formic acid for LC-ESI LTQOrbitrap analyses.
Structural proteins from purified IIV-9 virions were analyzed by LC-MALDITOF/TOF MS and LC-ESI LTQ Orbitrap MS/MS, and proteins from infectedSf21 cells were analyzed by LC-ESI LTQ Orbitrap MS/MS according to thedetails of methods described in the supplemental material.
Peak lists were processed through the 4000 series Explorer software (AppliedBiosystems, MA) for MALDI TOF/TOF data and the Proteome Discoverer 1.1software (Thermo Scientific, San Jose, CA) for all ESI LTQ Orbitrap data usingthe software’s default settings. All peak lists were then searched with an in-houseMascot server (version 2.1.0; Matrix Science) against an amino acid sequencedatabase combining all predicted and translated IIV-9 ORF and all entries fromthe NCBI nonredundant sequence database, matching the taxa Lepidoptera andDrosophila melanogaster (downloaded January 2011; 355,290 sequence entries).Mascot search settings allowed for full tryptic peptides with up to 3 missedcleavage sites and variable modifications of carbamidomethyl (C), oxidation (M),and pyroglutamic acid (E, Q). The precursor and fragment mass tolerances wereset to �10 ppm and 0.8 Da for LTQ Orbitrap data and 75 ppm and 0.4 Da forTOF/TOF data. To evaluate the false-discovery rate (FDR), all peak lists weresearched against a decoy database using identical search settings. The decoydatabase was built using the decoy database tool at the Trans-Proteomic Pipeline(TPP; Seattle Proteome Center), comprising the reversed sequence entries of theaforementioned combined database. The FDR was calculated by determiningthe number of false-positive peptide hits from the decoy search versus thenumber of peptide identifications from the true search using the same Mascotscore as a significance threshold.
Only peptide hits with an individual ion score of �40 (Mascot significancethreshold at a P of �0.05) were accepted as significant identifications. Thisresulted in an FDR of �0.02 for all searches. A significant protein identificationrequired at least two significant peptide hits covering different sequences of theprotein. In addition, a protein that was identified by a single peptide-basedprotein identification in one experiment (IIV-9 particles analyzed by LC-MALDITOF/TOF or LC-ESI LTQ Orbitrap MS/MS or infected cells analyzed by LC-ESI LTQ Orbitrap MS/MS) that was also confirmed by a different peptideidentification covering another sequence stretch in one of the other experimentswas considered a significant multipeptide identification.
Nucleotide sequence accession number. The IIV-9 genome has been depositedin GenBank under accession number GQ918152.
RESULTS AND DISCUSSION
Genome assembly and properties. Sequencing of the IIV-9genome using a 454 FLX sequencer generated 20,734 se-quences totaling 5,597,884 bases of sequence with 50.4% and49.6% sequence orientation biases, for an average coverage of27-fold. The initial Newbler assembly generated 3 large and 10
FIG. 1. Phylogenetic trees of iridoviruses. (A) Alignment of thegenus Iridovirus based upon a partial major capsid protein sequence asdescribed in the work of Webby and Kalmakoff (38). The Chlorirido-virus IIV-3 and the Lymphocystivirus LCDV-1 are included. (B) Align-ment of representatives of the five iridovirus genera Chloriridovirus(IIV-3), Iridovirus (IIV-6, IIV-9), Lymphocystivirus (LCDV-1), Rana-virus (ISKNV), and Megalocytivirus (SGIV) based upon a concatenatedamino acid sequence of the 26 core iridovirus genes. Bootstrap supportfrom 1,000 iterations is indicated for all branches, with at least 70%support.
7902 WONG ET AL. J. VIROL.
Dow
nloa
ded
from
http
s://j
ourn
als.
asm
.org
/jour
nal/j
vi o
n 05
Jan
uary
202
2 by
92.
49.1
60.2
7.
small contigs that were subsequently assembled into a singlecontiguous sequence by targeted PCR-based cloning and se-quencing. The initial contig boundaries were defined by repeatsequences that the assembler was unable to resolve. The ge-nome was shown to be 205,791 bp in size, with a G�C contentof 31% (Table 1). This genome size compares to estimates of192.5 and 222.6 kbp, as estimated by restriction profiles usingstandard (37) and pulsed-field gel (39) electrophoresis, respec-tively. Based upon an estimated 4.7% terminal redundancy inthe IIV-9 genome (39), this equates to approximately 9.7 kbpof redundant sequence. Due to the high A�T content in thegenome and the challenge of resolving long single base runsusing 454 technology, a total of 34 PCR-based clones weregenerated to resolve 57 potential base conflicts. All base callswere inspected visually and resolved as necessary. In silicorestriction endonuclease profiles were compared to experi-mentally derived restriction endonuclease profiles (37) to con-firm correct global assembly of the genomic sequence (data notshown).
Genome analysis identified a range of complex repeat se-quences, including tandem, direct, dyad, and inverted repeats.The percentage of repeat sequences in the genome is depen-dent upon the stringency of parameters employed and rangesfrom 20 to 23% of the genome. The largest repeat identified is3.4 copies of a 1,002-bp repeat between nucleotides 69621 and73024, with a 75% consensus match. Identification of repeatsequences on the genome is illustrated by dot plot analysis(Fig. 2A). The repeats highlighted in the boxed region of thegenome shown in Fig. 2A represented 10 of the contigs gen-erated in the initial sequence assembly.
IIV-9 ORF and their predicted protein products. Analysis ofthe complete genome predicted 191 predominantly nonover-lapping ORF encoding proteins of 50 aa or more in length withan AUG start codon, with the genome displaying a codingdensity of 90% (Fig. 3). The genome shows a bias in that 63%of genes were oriented in the reverse direction. In conjunctionwith the genome analysis, we conducted a proteomic analysis,first to confirm expressed ORF and second to establish the firstprofile of expressed IIV-9 ORF in both isolated virions andinfected insect cells. Of the total of 191 ORF, 94 were identi-fied in either isolated virions (64 ORF detected) or infectedinsect cells (72 ORF detected), with 42 being expressed inboth (Table 2; see also Tables S1 to S3 in the supplementalmaterial). The number of expressed proteins in isolatedvirions roughly correlates with 44 identified proteins in aprevious proteomic study of SGIV particles (31). Openreading frames that are discussed in the following paragraphare marked with a superscript “p” if their expression hasbeen confirmed by proteomics of isolated virions or with asuperscript “i” if they were confirmed to be protein productsin infected insect cells.
The genome orientation and gene designation were definedby the start codon of the IIV-9 orf001R ortholog of the firstconserved iridovirus core gene in mosquito IIV-3 (orf004R[5]). Four short ORF (orf007, -088, -061, -139) that were rep-resented by dual heavily overlapping ORF in opposite orien-tations could not be resolved as forward or reverse by bioin-formatics analysis alone. orf007 and -088 were subsequentlydesignated orf007Rp and -088Lp based on the identification oftheir protein products by the proteomic analysis (Table 2; see
also Table S1 in the supplemental material). The remainingtwo ORF could not be resolved, and hence, both overlappingORF were designated orf061R and -061L or -139R and -139L,respectively.
The majority of IIV ORF have no predicted function. How-ever, a wide range of predicted proteins showed similarity toproteins involved in nucleotide metabolism and DNA rep-lication. These include enzymes required for deoxyribonu-cleotide synthesis, such as thymidylate synthase (095R),dUTPase (045R), deoxyribonucleoside kinase (098R), andboth the large and small subunits of ribonucleotide reductase(070Ri, 187Ri). The last two have been confirmed as expressedproteins in infected cells. There are two putative NUDIX hy-drolase proteins (075R and 152L), and these may play animportant role in regulating nucleotides in the host cell. Del-hon et al. (5) postulated that the IIV-3 ortholog of 075R (IIV-3080R) might act similarly to the vaccinia virus NUDIXortholog (with a D10R mutation) and function as a repres-sor of transcription and translation. The reported presence
FIG. 2. Sequence repeats and IIV-9 versus IIV-3 gene parity plotanalysis. (A) The IIV-9 genome was compared against itself by dot plotanalysis to identify repeats within the IIV-9 genome. The major clus-ters of sequence repeats are boxed. (B) The IIV-9 and IIV-3 geneorders were compared by parity plot analysis. Where three or moregenes are contiguous in both genomes, regardless of orientation, theyhave been boxed. Numbers on the x and y axes represent ORF num-bers.
VOL. 85, 2011 GENOMIC ANALYSIS OF IIV-9 7903
Dow
nloa
ded
from
http
s://j
ourn
als.
asm
.org
/jour
nal/j
vi o
n 05
Jan
uary
202
2 by
92.
49.1
60.2
7.
of an intein in the large subunit of ribonucleotide reductase(orf070Ri) was confirmed (10).
Forty-four genes that could be postulated to have a role inDNA metabolism or DNA replication were identified. Theseinclude viral DNA ligase (109R), DNA polymerase (116R),helicase/primase (120R,145R), PCNA (053R), endonuclease(081R), DNA exonuclease (040R), topoisomerase II (089Ri),and phosphodiesterase (022Ri) genes. However, only topo-isomerase II (089Ri) and phosphodiesterase (022Ri) have beenidentified in infected cells. Genes encoding putative chroma-tin-binding regions, such as SWID/MDM2 (056Rpi) and HMGbox domains (169Lpi), are also present. Although it is notknown if these affect the host or viral genome structure, the
identification of both proteins in isolated virions suggests apossible association with the viral genome.
This study is the first to identify a putative chitinase gene(020Ri) in an IIV. Analysis of this chitinase indicates that it isa member of the family 18 glycohydrolases (exochitinases) andis most closely related to the chitinase of a bacterial pathogenof fish, Yersinia ruckeri (57% identity), and to the chitinases ofother bacteria and slime molds. Baculovirus chitinases, alongwith cathepsin, have been shown to be important in facilitatingthe release of virus from the host (15). Despite the IIV-9chitinase displaying less than 20% identity to baculovirus chi-tinases, the presence of a viral cathepsin (177Rpi) in IIV-9 mayreflect similar roles of chitinase and cathepsin, acting in con-
FIG. 3. Open reading frame map of the IIV-9 genome. The 205,791-bp IIV-9 genome is represented as a solid line, and predicted open readingframes are indicated with arrows. Arrows representing genes in the forward (right) direction are stippled, and those in the reverse (left) directionare open. The 26 core IV genes are indicated with bold ORF numbering. Arrows with a bold outline are HTH 7 domain-containing orthologs oforf091L of IIV-3, and those with a broken outline are orthologs of IIV-6 468L. orf061R/L and -139R/L are almost fully overlapping genes facingin opposite directions and have been represented as both R and L forms.
cert to degrade the insect, thereby facilitating viral release anddissemination from the host insect (15). Both enzymes wereidentified in IIV-9-infected insect cells by our proteomic anal-ysis.
Expressed orf180Lpi also encodes a protein with strong sim-ilarity to baculovirus genes, possessing 46% identity and 66%similarity to orf110 of Choristoneura fumiferana nucleopolyhe-drovirus (CfDEF NPV). A related gene is also present in IIV-6(422L), and alignment of CfDEF NPV orf110, Epiphyas post-vittana NPV orf102, and the IIV-9 and IIV-6 proteins showsthe presence of a highly conserved pan-caspase DEVD cleav-age site. It is not known if this exploits caspase activity forprocessing or if it might regulate apoptosis. A further enzymeidentified in IIV-9 is a putative ErvI/augmenter of liver regen-eration (ALR) sulfhydryl oxidase (134Lp). This protein is com-mon in large cytoplasmic DNA viruses (29), and in commonwith poxviruses, this was found in the IIV-9 virus particle. Therole of this enzyme activity is unclear but has been postulatedto work in concert with glutaredoxin or thioredoxin systems forregulating cytoplasmic disulfide bonds and protein folding.IIV-9 encodes two proteins with putative thioredoxin domains,043Rp and 062Rpi, both of which were identified in the virusparticle.
The repeat sequences identified in the genome are located
predominantly in coding regions. This is reflected in the pres-ence of multiple copies of closely related genes on the genome(see Fig. S1 and S3 in the supplemental material). IIV-9orf006Lpi, -026R, -035L, -046Ri, -054Ri, -071Ri, -103R, -118R,-142R, -144Ri, and -182Ri form one cluster of paralogs, asreflected in amino acid identities ranging from 60 to 84%between the encoded proteins (see Fig. S1 in the supplementalmaterial). With InterProScan, all of these proteins were pre-dicted to contain helix-turn-helix 7 (HTH 7 [Pfam 02796])motifs and/or the more stable homeodomain motifs that areinvolved in DNA binding, with a wide spectrum of roles, rang-ing from transcription regulation to DNA repair (1). Theseproteins may have a role in the regulation of viral gene expres-sion or viral genome replication, with an array of closely re-lated proteins being involved in sequence-specific fine-tuningof the viral gene cascade. An alternative role could be in theresolution of the branched complexes generated during IVgenome replication, although it is not clear why so many copieswould be required.
In addition, IIV-9 orf063R, -131R, and -168R (53 to 79%identity) form a less well conserved cluster of proteins (Fig. 3;see also Fig. S2 in the supplemental material) whose genesdisplay some motif conservation to orf092R and -096R (46%identical). No motifs or predicted functions were identified for
a IIV-9 open reading frame number. Proteins identified by the proteomic experiments are indicated by bold ORF numbers for the purified IIV-9 particles and byunderlined ORF numbers for infected cells. Note that the protein products of 071R and 182R were identified by the same set of peptides and can therefore not bedistinguished unambiguously by the proteomic analysis.
b Most closely related gene by BLASTP analysis.c Matching IIV proteins, with the first-listed protein being the most closely related. Non-IIV proteins with a high similarity score to an IIV-9 protein by BLASTP
analysis are indicated. NCLDV species abbreviations are PBCV, Paramecium bursaria chlorella virus, and MAR, Marseilles virus. CfDEF NPV, Choristoneurafumiferana nucleopolyhedrovirus; Eppo NPV, Epiphyas postvittana nucleopolyhedrovirus.
d Amino acid percent identity for most closely related protein. * indicates similarity to the cluster of proteins encoded by IIV-6 468L and its homologous genes inIIV-6.
e TF IIS, transcription factor IIS; BIR, baculovirus inhibitor of apoptosis protein repeat; Bro, baculovirus repeated ORF; NUDIX, nucleoside diphosphate linked.f Protein identified by a single peptide hit.
7908 WONG ET AL. J. VIROL.
Dow
nloa
ded
from
http
s://j
ourn
als.
asm
.org
/jour
nal/j
vi o
n 05
Jan
uary
202
2 by
92.
49.1
60.2
7.
this cluster of repeated genes, but by BLAST analysis, theywere distantly related to the same helix-turn-helix cluster ofgenes identified above.
Analysis of the protein of orf068Lp, which is indicated bythe double-boxed repeat highlighted in Fig. 2A, using theXSTREAM protein tandem repeat finder (27) identified 4.8copies of a 131-aa repeat at aa 104 to 731 (Fig. 4A) and animmediately adjacent repeat consisting of 14.4 copies of an84-aa repeat at aa 714 to 1922 (Fig. 4B). The respective C- andN-terminal flanks of the repeats overlap. The orf067Lpi proteinpossesses a repeat related to that illustrated in Fig. 4A (Fig.4C). Both proteins have a high level of predicted -sheet com-position, and both were identified in the virus particle. Thehigh -sheet structure is similar to what is found in a range offiber structures, such as bacteriophage fibers and tubulin.BLAST analysis of a single copy of the 131-aa repeat identifieda very weak match between a short sequence located aroundthe conserved PDATT motif and bacteriophage fiber proteins(data not shown). However, the location of orf067Lpi andorf068Lp in the particle is unknown, and hence a possible rolein the surface fibril cannot be confirmed. An ortholog of thisprotein is in IIV-3 (091L) and IIV-6 (443R) but is not con-served in vertebrate IV, which would be consistent with thelack of surface fibrils on the vertebrate IV.
Relationship to other viruses. Of the 191 ORF predicted inIIV-9, 108 were most closely related to IIV-3 ORF (Table 2),indicating that IIV-9 is more closely related to the chlorirido-virus IIV-3 than to IIV-6. Analysis of IIV-3 shows that 114 ofthe 126 ORF in IIV-3 (5) were identified as having an orthologin IIV-9. In contrast, IIV-6 (Iridovirus genus) has 211 ORF, asdefined by Eaton et al. (8), of which only 97 have orthologs in
IIV-9. A total of 88 ORF are common to all 3 fully sequencedIIV. Interestingly, of the 45 ORF without an ortholog in otherIV, 23 encoded proteins that were smaller than 100 aminoacids, of which four (002R, 088L, 093L, 111L) were confirmedas expressed proteins by the proteomic analysis. In contrast tothe high level of conserved genes between IIV genomes, thereis a very low level of conservation in gene order. Comparisonto IIV-3 with a gene parity plot (Fig. 2B) indicates only 5clusters of 3 or more genes, with the largest conservation ofgene order and orientation being a cluster of 5 genes repre-sented by IIV-9 097R,101R and IIV-3 028R,032R (Table 2).IIV-9 and IIV-6 genomes possess no more than two genes thatare conserved in order in any one cluster.
The 26 core genes previously identified as being conserved inall iridoviruses (8) were identified in IIV-9, and consistentlywith other genes in the genome, no conservation of gene orderwas apparent for these conserved genes. Phylogenetic analysisof the coding sequences for all 26 genes collated as a concat-enated protein sequence for each of IIV-9, IIV-6, IIV-3, SGIV,LCDV-1, and ISKNV (Fig. 1B) shows the clear separation ofthe vertebrate and invertebrate IV. In addition, the core setprovides strong evidence for the main features of the phyloge-netic trees established for the partial MCP sequence (Fig. 1A)(38), with IIV-6 being in a separate clade and IIV-3 being moreclosely related to the major clade of IIV than its current tax-onomic position in a separate genus suggests.
There were 36 NCLDV genes identified in the genome,including nine conserved orthologs found in all NCLDV (19)and seven that were present in all four families but that aremissing from some lineages within those families. IIV-9 057R,076R, and 164L were most closely related to predicted pro-
FIG. 4. Tandem protein repeats in the proteins of orf068L and -067L. The repeat regions from aa 104 to 731 (A) and 714 to 1922 (B) of theorf068L protein are shown with residues matching the consensus (shaded). Underlined residues are the same amino acids. (C) The repeat sequencewithin the orf067L protein is shown aligned with the orf068L protein repeat shown in panel A (boxed underneath). The starred Y’s are the sameresidue (to orient the 068L and 067L repeats).
VOL. 85, 2011 GENOMIC ANALYSIS OF IIV-9 7909
Dow
nloa
ded
from
http
s://j
ourn
als.
asm
.org
/jour
nal/j
vi o
n 05
Jan
uary
202
2 by
92.
49.1
60.2
7.
teins of unknown function from Acanthamoeba polyphagamimivirus.
miRNA coding prediction. The analysis of miRNA shows anincreasing complexity of viral interactions with this posttran-scriptional control system, including the control of host andviral genes. Roles have included the establishment of latencyand the avoidance of mammalian immune responses, as well asmanipulation of the cellular environment to facilitate replica-tion. Examples of viral miRNA to date have predominantlyfocused upon viruses that are relatively slow in their replica-tion, such as the herpesviruses, and upon the role of virallyencoded miRNA in latency. Because IIV have a nuclear rep-lication phase, are relatively slow growing, and often have arange of sublethal effects upon their host insect, they are po-tential candidates for encoding miRNA.
Combined analysis for pre-miRNA sequences using VMirand MiPred generated seven possible pre-miRNAs (Table 3).All were located within open reading frames of unknown func-tion, and five had no predicted motifs observable. Three pu-tative pre-miRNAs were identified in the same orientation asthe associated ORF, and four were in the opposite orientation.The absence of host cell sequence information makes identi-fication of potential host target genes unfeasible. An XRNexonuclease gene (048R) is predicted on the genome of IIV-9,with orthologs in IIV-3 and IIV-6. This enzyme has a role inthe processing of miRNA, in particular, the degradation of ma-ture miRNA (24); hence, even if IIV genomes do not encodemiRNAs, there is a strong likelihood that IIV interact with smallnoncoding RNA systems within the cells that they infect.
The presence of an RNase III gene (orf034L) that has alsobeen identified in IIV-3, IIV-6, and vertebrate IV (45), and adsRNA binding protein (060R; IIV-6 340R), supports a rolefor noncoding RNA in IV replication. RNase III was identified
in purified virus particles and infected cells by our proteomicexperiments (Table 2; see also Tables S1 and S3 in the supple-mental material) and was previously identified in particles ofSGIV (31), confirming that this protein is produced and, hence,likely to have a role in IV replication. miRNA has also beenpredicted for soft-shelled turtle iridovirus (STIV) (18), and thepresence of miRNAs has recently been confirmed for SGIV (44).
Conclusions. IIV-9 is a member of the major IIV (group II)clade, and the complete genome provides insight into the re-lationships within and between IIV genera. The apparent closerelationship to IIV-3, a virus from a separate genus, andthe more distant relationship to IIV-6 have been confirmedthrough full-genome analysis. The genome encodes a widerange of proteins for which there is no functional prediction,and many of these are found in the complex virus particle. Thepresence of paralog proteins on the genome is a major con-tributor to the high incidence of repeat sequences associatedwith the genome, unlike with IIV-3, where the repeats aremore likely to be in noncoding regions (5). The clustering ofrepeats within predominantly the -sheet proteins suggeststhat these proteins may form filamentous structures that areassociated with the virus particle and, as such, are candidatesfor the surface fibrils identified on IIV-9 particles. As for otherIV, a large number of proteins are predicted to be involved innucleotide regulation and genome replication, consistent witha life cycle that includes DNA replication in both the cyto-plasm and nucleus and the branched concatemeric replicationstrategy of IV, which requires resolution of complex genomestructures (11).
ACKNOWLEDGMENT
This research was supported by the University of Otago.
TABLE 3. Predicted pre-miRNA sequences in the IIV-9 genomec
a R indicates that pre-miRNA is derived from the reverse genomic strand, and D indicates the direct genomic strand. The number refers to the pre-miRNA identifiedfrom the initial screen of the entire genomic sequence by VMir.
b First nucleotide position of the pre-miRNA in the IIV-9 genome.c The minimum free energy (MFE), P value, and percent confidence (conf) were determined by MiPred.
7910 WONG ET AL. J. VIROL.
Dow
nloa
ded
from
http
s://j
ourn
als.
asm
.org
/jour
nal/j
vi o
n 05
Jan
uary
202
2 by
92.
49.1
60.2
7.
REFERENCES
1. Aravind, L., V. Anantharaman, S. Balaji, M. M. Babu, and L. M. Iyer. 2005.The many faces of the helix-turn-helix domain: transcription regulation andbeyond. FEMS Microbiol. Rev. 29:231–262.
2. Benson, G. 1999. Tandem repeats finder: a program to analyze DNA se-quences. Nucleic Acids Res. 27:573–580.
3. Bromenshenk, J. J., et al. 2010. Iridovirus and microsporidian linked tohoney bee colony decline. PLoS One 5:e13181.
4. Chinchar, V. G., et al. 2005. Iridoviridae, p. 145–162. In C. M. Fauquet,M. A. Mayo, J. Maniloff, U. Desselberger, and L. A. Ball (ed.), Virustaxonomy. Eighth report of the International Committee on Taxonomy ofViruses. Elsevier Academic Press, San Diego, CA.
5. Delhon, G., et al. 2006. Genome of invertebrate iridescent virus type 3(mosquito iridescent virus). J. Virol. 80:8439–8449.
6. Delius, H., G. Darai, and R. M. Flugel. 1984. DNA analysis of insect irides-cent virus 6: evidence for circular permutation and terminal redundancy.J. Virol. 49:609–614.
7. Do, J. W., et al. 2004. Complete genomic DNA sequence of rock breamiridovirus. Virology 325:351–363.
8. Eaton, H. E., et al. 2007. Comparative genomic analysis of the family Irido-viridae: re-annotating and defining the core set of iridovirus genes. Virol. J.4:11.
9. Fowler, M., and J. S. Robertson. 1972. Iridescent virus-infection in fieldpopulations of Wiseana-Cervinata Lepidoptera-Hepialidae) and Witlesia sp.(Lepidoptera-Pyralidae) in New Zealand. J. Invertebr. Pathol. 19:154–155.
10. Goodwin, T. J., M. I. Butler, and R. T. Poulter. 2006. Multiple, non-allelic,intein-coding sequences in eukaryotic RNA polymerase genes. BMC Biol.4:38.
11. Goorha, R. 1982. Frog virus 3 DNA replication occurs in two stages. J. Virol.43:519–528.
12. Goorha, R., G. Murti, A. Granoff, and R. Tirey. 1978. Macromolecularsynthesis in cells infected by frog virus 3. VIII. The nucleus is a site of frogvirus 3 DNA and RNA synthesis. Virology 84:32–50.
13. Goorha, R., and K. G. Murti. 1982. The genome of frog virus 3, an animalDNA virus, is circularly permuted and terminally redundant. Proc. Natl.Acad. Sci. U. S. A. 79:248–252.
14. Grundhoff, A., C. S. Sullivan, and D. Ganem. 2006. A combined computa-tional and microarray-based approach identifies novel microRNAs encodedby human gamma-herpesviruses. RNA 12:733–750.
15. Hawtin, R. E., et al. 1997. Liquefaction of Autographa californica nucleo-polyhedrovirus-infected insects is dependent on the integrity of virus-en-coded chitinase and cathepsin genes. Virology 238:243–253.
16. He, J. G., et al. 2001. Complete genome analysis of the mandarin fishinfectious spleen and kidney necrosis iridovirus. Virology 291:126–139.
17. He, J. G., et al. 2002. Sequence analysis of the complete genome of aniridovirus isolated from the tiger frog. Virology 292:185–197.
18. Huang, Y., et al. 2009. Complete sequence determination of a novel reptileiridovirus isolated from soft-shelled turtle and evolutionary analysis ofIridoviridae. BMC Genomics 10:224.
19. Iyer, L. M., L. Aravind, and E. V. Koonin. 2001. Common origin of fourdiverse families of large eukaryotic DNA viruses. J. Virol. 75:11720–11734.
20. Jakob, N. J., K. Muller, U. Bahr, and G. Darai. 2001. Analysis of the firstcomplete DNA sequence of an invertebrate iridovirus: coding strategy of thegenome of Chilo iridescent virus. Virology 286:182–196.
21. Jancovich, J. K., et al. 2003. Genomic sequence of a ranavirus (familyIridoviridae) associated with salamander mortalities in North America. Vi-rology 316:90–103.
22. Jiang, P., et al. 2007. MiPred: classification of real and pseudo microRNAprecursors using random forest prediction model with combined features.Nucleic Acids Res. 35:W339–W344.
23. Juhl, S., et al. 2006. Assembly of Wiseana iridovirus: viruses for colloidalphotonic crystals. Adv. Funct. Mater. 16:1086–1094.
24. Kim, Y. K., I. Heo, and V. N. Kim. 2010. Modifications of small RNAs andtheir associated proteins. Cell 143:703–709.
25. Kurita, J., K. Nakajima, I. Hirono, and T. Aoki. 2002. Complete genomesequencing of Red Sea bream iridovirus (RSIV). Fish. Sci. 68:1113–1115.
26. Lu, L., et al. 2005. Complete genome sequence analysis of an iridovirusisolated from the orange-spotted grouper, Epinephelus coioides. Virology339:81–100.
27. Newman, A. M., and J. B. Cooper. 2007. XSTREAM: a practical algorithmfor identification and architecture modeling of tandem repeats in proteinsequences. BMC Bioinformatics 8:382.
28. Radloff, C., R. A. Vaia, J. Brunton, G. T. Bouwer, and V. K. Ward. 2005.Metal nanoshell assembly on a virus bioscaffold. Nano Lett. 5:1187–1191.
29. Senkevich, T. G., C. L. White, E. V. Koonin, and B. Moss. 2000. A viralmember of the ERV1/ALR protein family participates in a cytoplasmicpathway of disulfide bond formation. Proc. Natl. Acad. Sci. U. S. A. 97:12068–12073.
30. Shevchenko, A., et al. 1996. Linking genome and proteome by mass spec-trometry: large-scale identification of yeast proteins from two dimensionalgels. Proc. Natl. Acad. Sci. U. S. A. 93:14440–14445.
31. Song, W., Q. Lin, S. B. Joshi, T. K. Lim, and C. L. Hew. 2006. Proteomicstudies of the Singapore grouper iridovirus. Mol. Cell. Proteomics 5:256–264.
32. Song, W. J., et al. 2004. Functional genomics analysis of Singapore grouperiridovirus: complete sequence determination and proteomic analysis. J. Vi-rol. 78:12576–12590.
33. Sullivan, C. S., and A. Grundhoff. 2007. Identification of viral microRNAs.Methods Enzymol. 427:3–23.
34. Tan, W. G., T. J. Barkman, V. G. Chinchar, and K. Essani. 2004. Compar-ative genomic analyses of frog virus 3, type species of the genus Ranavirus(family Iridoviridae). Virology 323:70–84.
35. Tidona, C. A., and G. Darai. 1997. The complete DNA sequence of lympho-cystis disease virus. Virology 230:207–216.
36. Tsai, C. T., et al. 2005. Complete genome sequence of the grouper iridovirusand comparison of genomic organization with those of other iridoviruses.J. Virol. 79:2010–2023.
37. Ward, V. K., and J. Kalmakoff. 1987. Physical mapping of the DNA genomeof insect iridescent virus type 9 from Wiseana spp. larvae. Virology 160:507–510.
38. Webby, R., and J. Kalmakoff. 1998. Sequence comparison of the majorcapsid protein gene from 18 diverse iridoviruses. Arch. Virol. 143:1949–1966.
39. Webby, R. J., and J. Kalmakoff. 1999. Comparison of the major capsidprotein genes, terminal redundancies, and DNA-DNA homologies of twoNew Zealand iridoviruses. Virus Res. 59:179–189.
40. Williams, T. 2008. Natural invertebrate hosts of iridoviruses (Iridoviridae).Neotrop. Entomol. 37:615–632.
41. Williams, T., and J. S. Cory. 1994. Proposals for a new classification ofiridescent viruses. J. Gen. Virol. 75:1291–1301.
42. Yan, X., et al. 2000. Structure and assembly of large lipid-containing dsDNAviruses. Nat. Struct. Biol. 7:101–103.
43. Yan, X., et al. 2009. The capsid proteins of a large, icosahedral dsDNA virus.J. Mol. Biol. 385:1287–1299.
44. Yan, Y., et al. 2011. Identification of a novel marine fish virus, Singaporegrouper iridovirus-encoded microRNAs expressed in grouper cells by Solexasequencing. PLoS One 6:e19148.
45. Zenke, K., and K. H. Kim. 2008. Functional characterization of the RNaseIII gene of rock bream iridovirus. Arch. Virol. 153:1651–1656.
46. Zhang, Q. Y., F. Xiao, J. Xie, Z. Q. Li, and J. F. Gui. 2004. Complete genomesequence of lymphocystis disease virus isolated from China. J. Virol. 78:6982–6994.