1 Genomic and Proteomic Analysis of Invertebrate Iridovirus Type 9

JOURNAL OF VIROLOGY, Aug. 2011, p. 7900–7911 Vol. 85, No. 150022-538X/11/$12.00 doi:10.1128/JVI.00645-11Copyright © 2011, American Society for Microbiology. All Rights Reserved.

Genomic and Proteomic Analysis of Invertebrate Iridovirus Type 9�†Chun K. Wong,1 Vivienne L. Young,1 Torsten Kleffmann,2 and Vernon K. Ward1*

Department of Microbiology and Immunology1 and Centre for Protein Research, Department of Biochemistry,2

School of Medical Sciences, University of Otago, P.O. Box 56, Dunedin, New Zealand 9054

Received 30 March 2011/Accepted 23 May 2011

Iridoviruses (IV) are nuclear cytoplasmic large DNA viruses that are receiving increasing attention assublethal pathogens of a range of insects. Invertebrate iridovirus type 9 (IIV-9; Wiseana iridovirus) is amember of the major phylogenetic group of iridoviruses for which there is very limited genomic and proteomicinformation. The genome is 205,791 bp, has a G�C content of 31%, and contains 191 predicted genes, withapproximately 20% of its repeat sequences being located predominantly within coding regions. The repeatedsequences include 11 proteins with helix-turn-helix motifs and genes encoding related tandem repeat aminoacid sequences. Of the 191 proteins encoded by IIV-9, 108 are most closely related to orthologs in IIV-3(Chloriridovirus genus), and 114 of the 126 IIV-3 genes have orthologs in IIV-9. In contrast, only 97 of 211 IIV-6genes have orthologs in IIV-9. There is almost no conservation of gene order between IIV-3, IIV-6, and IIV-9.Phylogenetic analysis using a concatenated sequence of 26 core IV genes confirms that IIV-3 is more closelyrelated to IIV-9 than to IIV-6, despite being from a different genus of the Iridoviridae. An interaction betweenIIV and small RNA regulatory systems is supported by the prediction of seven putative microRNA (miRNA)sequences combined with XRN exonuclease, RNase III, and double-stranded RNA binding activities encodedon the genome. Proteomic analysis of IIV-9 identified 64 proteins in the virus particle and, when combined withinfected cell analysis, confirmed the expression of 94 viral proteins. This study provides the first full-genomeand consequent proteomic analysis of group II IIV.

Iridoviruses (IV) are members of the nucleocytoplasmiclarge DNA viruses (NCLDV) (19). They possess a linear dou-ble-stranded DNA (dsDNA) genome with circular permuta-tion and terminal redundancy (6, 13), and replication of theviral genome includes distinct nuclear and cytoplasmicphases (12). The genomes are encapsidated within an ico-sahedral shell ranging between 120 and 180 nm in diameterand comprised predominantly of a 50-kDa major capsidprotein (MCP). The invertebrate iridoviruses (IIV), studied bycryo-electron microscopy, have 2-nm-diameter surface fibrils(23, 42); for invertebrate iridovirus type 6 (IIV-6), these fibrilsextend from the 3-fold rotational axis of the 1,460 hexamericcapsids found in the virus particle (43). IV are divided into 5genera (Table 1), with members of three genera infectingpoikilothermic vertebrates and members of the Iridovirus andChloriridovirus genera infecting invertebrates. The Chlorirido-virus genus has only one member, IIV-3 (mosquito iridovirus),and the primary defining differences between the Chloriridovi-rus and Iridovirus genera are particle sizes of approximately180 and 135 nm, respectively, and the mosquito host rangerestriction of IIV-3 (4).

The vertebrate IV cause disease in fish, amphibians, andreptiles and have received considerable attention due to theireffects upon aquaculture. In contrast, the IIV cause predomi-nantly subpathogenic infections, and their consequently lim-ited utility for pest control has meant that less is known about

IIV. Of particular importance has been the recent study ofBromenshenk et al. (3) linking colony collapse disorder inhoney bees to coinfection with Nosema and an unidentifiediridovirus(es). A strong causal relationship was established;however, the identity of the IV was not established, at least inpart due to a lack of IIV genomic information. In addition, therefraction of light by assemblies of IIV particles offers newopportunities in materials development (23, 28) that wouldbenefit from more information on the virus particle and itsconstituents. The roles of viral proteins, such as the surfacefiber, in iridescence are unknown, and the proteins and func-tional activities associated with the virus particle remain to beelucidated. Central to this is the need for information on IIVgenomes and the proteomic analysis of the virus particle.

Fourteen iridovirus species have been fully sequenced (Ta-ble 1), with multiple members of the Ranavirus, Lymphocysti-virus, and Megalocytivirus genera providing a comprehensivecoverage of these vertebrate genera of IV. Vertebrate IV ge-nomes range from 105 kbp for tiger frog virus (17) to 186 kbpfor lymphocystis disease virus, China strain (LCDV-C) (46).The Ranavirus and Megalocytivirus species have G�C contentsof approximately 50%, while the Lymphocystivirus species haveG�C contents of less than 30%. There is a consistent lack ofgenome colinearity between IV except with very closely relatedisolates, although all IV sequenced to date possess a corecohort of 26 conserved genes (8). In contrast to the vertebrateIV, the only fully sequenced IIV are IIV-6 (Chilo iridovirus[CIV]) (20) and IIV-3 (mosquito chloriridovirus [MIV]) (5).IIV-6 is the type species of the Iridovirus genus, with a genomeof 212 kbp and a G�C content of 29%; however, phylogeneticstudies show that IIV-6 belongs in a clade distant from that ofmost iridoviruses (Fig. 1A) (38). IIV-3, with a genome of 191kbp and a G�C content of 48%, represents a different genus

* Corresponding author. Mailing address: Department of Microbi-ology and Immunology, University of Otago, P.O. Box 56, Dunedin,New Zealand 9054. Phone: 64 3 4799028. Fax: 64 3 4798540. E-mail:[email protected].

† Supplemental material for this article may be found at http://jvi.asm.org/.

� Published ahead of print on 1 June 2011.

7900

Dow

nloa

ded

from

http

s://j

ourn

als.

asm

.org

/jour

nal/j

vi o

n 05

Jan

uary

202

2 by

92.

49.1

60.2

7.

that may be more closely related to members of the Iridovirusgenus than its placement in a separate genus suggests.

To date, only limited sequence information is available frommembers of the major clade of IIV, defined as group II irido-viruses by Williams and Cory (41), and genome analysis of amember of this clade would provide information on the rela-tionships between disparate IIV. IIV-9 (Wiseana iridovirus[WIV]), a representative of the major clade, was isolated inNew Zealand from larvae of the pasture pest Wiseana spp.(Lepidoptera: Hepialidae) (9). The mechanism of transmissionof this virus is unknown, though the presence of this virus indamp and cryptic habitats is consistent with many other IIV(40), and suggestions of vector transmission have been made,though not confirmed. Like most invertebrate iridoviruses,IIV-9 replicates in larvae of the greater wax moth Galleriamellonella upon injection, and heavily infected larvae displaytypical iridescence upon accumulation of paracrystalline arraysof virus particles within infected tissues (9). IIV-9 also repli-cates in Spodoptera frugiperda (Sf9, Sf21) cells, albeit at therestricted temperature of 21°C. IIV-9 is a member of the majorclade of IIV, as determined by partial major capsid proteinphylogeny (38).

This study presents the complete genomic sequence of IIV-9and uses this information for proteomic analysis of IIV-9’sencoded proteins in purified virus particles and within infectedcells. Analysis of the genome indicates that IIV-9 is moreclosely related to IIV-3 than to IIV-6 and provides the firstcomplete genome from the major clade of invertebrate irido-viruses.

MATERIALS AND METHODS

IIV-9 purification, DNA extraction, and sequencing. Sf21 cells were grown inSF900II serum-free medium (Invitrogen, Auckland, New Zealand) and infectedwith dilutions of a field isolate of IIV-9 that had been passaged repeatedlythrough G. mellonella larvae. Infected cells were incubated for 5 days at 21°Cunder an agarose overlay and stained with neutral red. Individual plaques werepicked and passaged once in cell culture. One plaque isolate was randomlyselected, propagated in G. mellonella, and purified on sucrose gradients asdescribed previously (23). Genomic DNA was extracted by phenol-chloroformextraction (37), and 50 �l (100 ng �l�1) of genomic DNA in deionized water wassequenced using the Roche/454 GS FLX High Throughput Sequencing Serviceprovided by the Department of Anatomy and Structural Biology, University ofOtago. All contig junctions were determined by sequencing of available restric-tion fragment clones or by PCR. Briefly, primers were designed near the terminiof contigs and used with primers on adjacent contigs to generate PCR productsdirectly from genomic DNA using the Expand high-fidelity PCR kit (RocheDiagnostics, Auckland, New Zealand). The PCR products were either sequenceddirectly at the Allan Wilson Sequencing Centre, Palmerston North, New Zea-land, on an ABI 3730 automated sequencer or cloned into pGEMTeasy (Pro-mega Corp., Madison, WI) prior to sequencing. Sequence conflicts, long repeats,and long runs of single nucleotides were confirmed by PCR and sequencing ofthe region in question. All ABI 3730-generated sequences were edited in Seq-Man (DNAStar) for sequence quality prior to use.

Sequence analysis. Newbler Assembler software (454 Life Sciences, Branford,CT) was used to assemble data into unordered and unoriented contigs (defaultsettings). The contigs were exported to the SeqMan program in the Lasergenesuite of DNA analysis programs (DNAStar, Madison, WI) and reassembled intoa draft alignment using the SeqMan assembler (match size, 12; minimum matchpercentage, 80%; minimum sequence length, 100; maximum number of addedgaps per kb in the contig, 70; maximum number of added gaps per kb in thesequence, 70; maximum register shift difference, 70; last group considered, 2; gappenalty, 0.00; gap length penalty, 0.70). All contigs were aligned to generate adraft alignment with a minimum match percentage of 95%. PCR primers weredesigned using PrimerDesign (DNAStar). An in silico analysis of the restriction

TABLE 1. Fully sequenced genomes from vertebrate and invertebrate iridoviruses

Genus and virusa Genomesize (bp) % G�C ORFb Coding

density (%)Protein sizerange (aa)

GenBankaccession no. Reference

IridovirusIIV-9 205,791 31 191 90 50–2,051 GQ918152 This studyIIV-6 212,482 29 243c 85 40–2,432 AF303741 Jakob et al. (20)

ChloriridovirusIIV-3 191,132 48 126 68 60–1,377 DQ643392 Delhon et al. (5)

LymphocystivirusLCDV-1 102,653 29 110 82 40–1,199 L63545 Tidona and Darai (35)LCDV-C 186,250 27 240 67 40–1,193 AY380826 Zhang et al. (46)

RanavirusTFV 105,057 55 105 94 40–1,294 AF389451 He et al. (17)ATV 106,332 54 96 79 32–1,294 AY150217 Jancovich et al. (21)FV-3 105,903 55 98 80 50–1,293 AY548484 Tan et al. (34)STIV 105,890 55 105 80 40–1,294 EU627010 Huang et al. (18)SGIV 140,131 49 162 98 41–1,268 AY521625 Song et al. (32)GIV 139,793 49 120 83 62–1,268 AY666015 Tsai et al. (36)

MegalocytivirusISKNV 111,362 55 124 93 40–1,208 AF371960 He et al. (16)RBIV 112,080 53 118 86 50–1,253 AY532606 Do et al. (7)RSIV 112,414 53 93 86–1,309 BD143114 Kurita et al. (25)OSGIV 112,636 54 121 91 40–1,168 AY894343 Lu et al. (26)

a IIV-3, invertebrate iridescent virus type 3; IIV-6, invertebrate iridescent virus type 6; LCDV-1, lymphocystis disease virus 1; LCDV-C, lymphocystis disease virus,China strain; TFV, tiger frog virus; ATV, ambystoma tigrinum virus; FV-3, frog virus 3; STIV, soft-shelled turtle iridovirus; SGIV, Singapore grouper iridovirus; GIV,grouper iridovirus; ISKNV, infectious spleen and kidney necrosis virus; RBIV, rock bream iridovirus; RSIV, red sea bream iridovirus; OSGIV, orange spotted grouperiridovirus.

b Essentially nonoverlapping ORF encoding a minimum length of 40 to 62 aa.c Revised annotation of Eaton et al. (8).

VOL. 85, 2011 GENOMIC ANALYSIS OF IIV-9 7901

Dow

nloa

ded

from

http

s://j

ourn

als.

asm

.org

/jour

nal/j

vi o

n 05

Jan

uary

202

2 by

92.

49.1

60.2

7.

profile of the complete genome was performed using GeneQuest (DNAStar),and results were compared to published restriction profiles of IIV-9 genomicDNA as a confirmation of the assembly profile.

Tandem repeats within the IIV-9 genome were identified using Tandem Re-peats Finder (2), with parameters set for match and mismatch and indels equalto 2, 7, and 7, respectively. The minimum alignment score was set at 50, with amaximum period size of 2,000 bases. Direct, inverted, and dyad repeats wereidentified using GeneQuest (DNAStar) with an unlimited loop size. The minimumperiod sizes set for direct, inverted, and dyad repeats were 25 bp, 25 and 50 bp, and16 bp, respectively. Dot plot analysis was performed to identify DNA repeat clustersusing MegAlign (DNAStar), with a window size of 50 bp and a 75% match.

Open reading frames (ORF) encoding proteins with a minimum size of 50amino acids (aa) and that contained a start codon were designated using Seq-Builder (DNAStar). All designated ORF were named with “orf” followed bynumbers corresponding to their position and a forward/reverse (right [R]/left [L],respectively) designation to indicate their orientation. ORF that fell com-pletely within a larger ORF were excluded. Heavily overlapping open readingframes where the most likely ORF could not be determined were given thesame number but different orientations. All designated IIV-9 open readingframes were exported from SeqBuilder (DNAStar) to EditSeq (DNAStar),and BLASTP analysis of the predicted amino acid sequences was performedfor each open reading frame. Amino acid identity to the closest BLASTPmatch was performed using MegAlign (DNAStar). Analysis of IIV-9 ORFfunction was performed via the ExPASy Proteomics Server and includedInterProScan, SignalP, and PredictProtein. Protein repeats were identifiedusing the XSTREAM prediction server (27), and subsequent alignments ofprotein repeats were generated using MegAlign.

A phylogenetic tree was constructed based on the alignment of the 26 core

gene amino acid sequences found in IIV-9, IIV-6, IIV-3, Singapore grouperiridovirus (SGIV), lymphocystis disease virus 1 (LCDV-1), and infectious spleenand kidney necrosis virus (ISKNV) using MegAlign (DNAStar), with bootstraptrials set at 1,000. All core gene-encoded proteins were combined as one con-tinuous amino acid sequence in the same gene order prior to assembly. This wascompared to the partial major capsid protein tree as described in the work ofWebby and Kalmakoff (38).

The complete genome was scanned for miRNA coding regions using VMir(14, 33) and possible miRNA coding sequences further analyzed by MiPred(22). All images were generated using Microsoft PowerPoint and/or AdobePhotoshop CS4.

MS analysis. Purified IIV-9 virions or infected Sf21 cells harvested 24 hpostinfection were denatured in SDS-PAGE sample buffer, and proteins wereseparated on individual 10% SDS-PAGE gels using standard techniques. Thegels were stained with Coomassie G250 and protein lanes cut into five (for liquidchromatography coupled with electrospray ionization linear ion trap [LC-ESILTQ] Orbitrap tandem mass spectrometry [MS/MS] analyses of IIV-9 virionsand infected Sf21 cells) or eight (for LC–matrix-assisted laser desorption ion-ization–tandem time of flight [MALDI TOF/TOF] analysis of IIV-9 virions)equally sized fractions. Fractions were subjected to in-gel protein digestion withtrypsin essentially by following the protocol of Shevchenko et al. (30), using aliquid handling robotic workstation (DigestPro MSi; Intavis AG, Cologne, Ger-many). Each digested fraction was concentrated using a centrifugal vacuumconcentrator and reconstituted in a 10-�l aqueous solution of 2% (vol/vol)acetonitrile (ACN) supplemented with either 0.1% (vol/vol) trifluoroacetic acid(TFA) for LC-MALDI TOF/TOF analyses or 0.2% formic acid for LC-ESI LTQOrbitrap analyses.

Structural proteins from purified IIV-9 virions were analyzed by LC-MALDITOF/TOF MS and LC-ESI LTQ Orbitrap MS/MS, and proteins from infectedSf21 cells were analyzed by LC-ESI LTQ Orbitrap MS/MS according to thedetails of methods described in the supplemental material.

Peak lists were processed through the 4000 series Explorer software (AppliedBiosystems, MA) for MALDI TOF/TOF data and the Proteome Discoverer 1.1software (Thermo Scientific, San Jose, CA) for all ESI LTQ Orbitrap data usingthe software’s default settings. All peak lists were then searched with an in-houseMascot server (version 2.1.0; Matrix Science) against an amino acid sequencedatabase combining all predicted and translated IIV-9 ORF and all entries fromthe NCBI nonredundant sequence database, matching the taxa Lepidoptera andDrosophila melanogaster (downloaded January 2011; 355,290 sequence entries).Mascot search settings allowed for full tryptic peptides with up to 3 missedcleavage sites and variable modifications of carbamidomethyl (C), oxidation (M),and pyroglutamic acid (E, Q). The precursor and fragment mass tolerances wereset to �10 ppm and 0.8 Da for LTQ Orbitrap data and 75 ppm and 0.4 Da forTOF/TOF data. To evaluate the false-discovery rate (FDR), all peak lists weresearched against a decoy database using identical search settings. The decoydatabase was built using the decoy database tool at the Trans-Proteomic Pipeline(TPP; Seattle Proteome Center), comprising the reversed sequence entries of theaforementioned combined database. The FDR was calculated by determiningthe number of false-positive peptide hits from the decoy search versus thenumber of peptide identifications from the true search using the same Mascotscore as a significance threshold.

Only peptide hits with an individual ion score of �40 (Mascot significancethreshold at a P of �0.05) were accepted as significant identifications. Thisresulted in an FDR of �0.02 for all searches. A significant protein identificationrequired at least two significant peptide hits covering different sequences of theprotein. In addition, a protein that was identified by a single peptide-basedprotein identification in one experiment (IIV-9 particles analyzed by LC-MALDITOF/TOF or LC-ESI LTQ Orbitrap MS/MS or infected cells analyzed by LC-ESI LTQ Orbitrap MS/MS) that was also confirmed by a different peptideidentification covering another sequence stretch in one of the other experimentswas considered a significant multipeptide identification.

Nucleotide sequence accession number. The IIV-9 genome has been depositedin GenBank under accession number GQ918152.

RESULTS AND DISCUSSION

Genome assembly and properties. Sequencing of the IIV-9genome using a 454 FLX sequencer generated 20,734 se-quences totaling 5,597,884 bases of sequence with 50.4% and49.6% sequence orientation biases, for an average coverage of27-fold. The initial Newbler assembly generated 3 large and 10

FIG. 1. Phylogenetic trees of iridoviruses. (A) Alignment of thegenus Iridovirus based upon a partial major capsid protein sequence asdescribed in the work of Webby and Kalmakoff (38). The Chlorirido-virus IIV-3 and the Lymphocystivirus LCDV-1 are included. (B) Align-ment of representatives of the five iridovirus genera Chloriridovirus(IIV-3), Iridovirus (IIV-6, IIV-9), Lymphocystivirus (LCDV-1), Rana-virus (ISKNV), and Megalocytivirus (SGIV) based upon a concatenatedamino acid sequence of the 26 core iridovirus genes. Bootstrap supportfrom 1,000 iterations is indicated for all branches, with at least 70%support.

7902 WONG ET AL. J. VIROL.

Dow

nloa

ded

from

http

s://j

ourn

als.

asm

.org

/jour

nal/j

vi o

n 05

Jan

uary

202

2 by

92.

49.1

60.2

7.

small contigs that were subsequently assembled into a singlecontiguous sequence by targeted PCR-based cloning and se-quencing. The initial contig boundaries were defined by repeatsequences that the assembler was unable to resolve. The ge-nome was shown to be 205,791 bp in size, with a G�C contentof 31% (Table 1). This genome size compares to estimates of192.5 and 222.6 kbp, as estimated by restriction profiles usingstandard (37) and pulsed-field gel (39) electrophoresis, respec-tively. Based upon an estimated 4.7% terminal redundancy inthe IIV-9 genome (39), this equates to approximately 9.7 kbpof redundant sequence. Due to the high A�T content in thegenome and the challenge of resolving long single base runsusing 454 technology, a total of 34 PCR-based clones weregenerated to resolve 57 potential base conflicts. All base callswere inspected visually and resolved as necessary. In silicorestriction endonuclease profiles were compared to experi-mentally derived restriction endonuclease profiles (37) to con-firm correct global assembly of the genomic sequence (data notshown).

Genome analysis identified a range of complex repeat se-quences, including tandem, direct, dyad, and inverted repeats.The percentage of repeat sequences in the genome is depen-dent upon the stringency of parameters employed and rangesfrom 20 to 23% of the genome. The largest repeat identified is3.4 copies of a 1,002-bp repeat between nucleotides 69621 and73024, with a 75% consensus match. Identification of repeatsequences on the genome is illustrated by dot plot analysis(Fig. 2A). The repeats highlighted in the boxed region of thegenome shown in Fig. 2A represented 10 of the contigs gen-erated in the initial sequence assembly.

IIV-9 ORF and their predicted protein products. Analysis ofthe complete genome predicted 191 predominantly nonover-lapping ORF encoding proteins of 50 aa or more in length withan AUG start codon, with the genome displaying a codingdensity of 90% (Fig. 3). The genome shows a bias in that 63%of genes were oriented in the reverse direction. In conjunctionwith the genome analysis, we conducted a proteomic analysis,first to confirm expressed ORF and second to establish the firstprofile of expressed IIV-9 ORF in both isolated virions andinfected insect cells. Of the total of 191 ORF, 94 were identi-fied in either isolated virions (64 ORF detected) or infectedinsect cells (72 ORF detected), with 42 being expressed inboth (Table 2; see also Tables S1 to S3 in the supplementalmaterial). The number of expressed proteins in isolatedvirions roughly correlates with 44 identified proteins in aprevious proteomic study of SGIV particles (31). Openreading frames that are discussed in the following paragraphare marked with a superscript “p” if their expression hasbeen confirmed by proteomics of isolated virions or with asuperscript “i” if they were confirmed to be protein productsin infected insect cells.

The genome orientation and gene designation were definedby the start codon of the IIV-9 orf001R ortholog of the firstconserved iridovirus core gene in mosquito IIV-3 (orf004R[5]). Four short ORF (orf007, -088, -061, -139) that were rep-resented by dual heavily overlapping ORF in opposite orien-tations could not be resolved as forward or reverse by bioin-formatics analysis alone. orf007 and -088 were subsequentlydesignated orf007Rp and -088Lp based on the identification oftheir protein products by the proteomic analysis (Table 2; see

also Table S1 in the supplemental material). The remainingtwo ORF could not be resolved, and hence, both overlappingORF were designated orf061R and -061L or -139R and -139L,respectively.

The majority of IIV ORF have no predicted function. How-ever, a wide range of predicted proteins showed similarity toproteins involved in nucleotide metabolism and DNA rep-lication. These include enzymes required for deoxyribonu-cleotide synthesis, such as thymidylate synthase (095R),dUTPase (045R), deoxyribonucleoside kinase (098R), andboth the large and small subunits of ribonucleotide reductase(070Ri, 187Ri). The last two have been confirmed as expressedproteins in infected cells. There are two putative NUDIX hy-drolase proteins (075R and 152L), and these may play animportant role in regulating nucleotides in the host cell. Del-hon et al. (5) postulated that the IIV-3 ortholog of 075R (IIV-3080R) might act similarly to the vaccinia virus NUDIXortholog (with a D10R mutation) and function as a repres-sor of transcription and translation. The reported presence

FIG. 2. Sequence repeats and IIV-9 versus IIV-3 gene parity plotanalysis. (A) The IIV-9 genome was compared against itself by dot plotanalysis to identify repeats within the IIV-9 genome. The major clus-ters of sequence repeats are boxed. (B) The IIV-9 and IIV-3 geneorders were compared by parity plot analysis. Where three or moregenes are contiguous in both genomes, regardless of orientation, theyhave been boxed. Numbers on the x and y axes represent ORF num-bers.


Dow

nloa

ded

from

http

s://j

ourn

als.

asm

.org

/jour

nal/j

vi o

n 05

Jan

uary

202

2 by

92.

49.1

60.2

7.

of an intein in the large subunit of ribonucleotide reductase(orf070Ri) was confirmed (10).

Forty-four genes that could be postulated to have a role inDNA metabolism or DNA replication were identified. Theseinclude viral DNA ligase (109R), DNA polymerase (116R),helicase/primase (120R,145R), PCNA (053R), endonuclease(081R), DNA exonuclease (040R), topoisomerase II (089Ri),and phosphodiesterase (022Ri) genes. However, only topo-isomerase II (089Ri) and phosphodiesterase (022Ri) have beenidentified in infected cells. Genes encoding putative chroma-tin-binding regions, such as SWID/MDM2 (056Rpi) and HMGbox domains (169Lpi), are also present. Although it is notknown if these affect the host or viral genome structure, the

identification of both proteins in isolated virions suggests apossible association with the viral genome.

This study is the first to identify a putative chitinase gene(020Ri) in an IIV. Analysis of this chitinase indicates that it isa member of the family 18 glycohydrolases (exochitinases) andis most closely related to the chitinase of a bacterial pathogenof fish, Yersinia ruckeri (57% identity), and to the chitinases ofother bacteria and slime molds. Baculovirus chitinases, alongwith cathepsin, have been shown to be important in facilitatingthe release of virus from the host (15). Despite the IIV-9chitinase displaying less than 20% identity to baculovirus chi-tinases, the presence of a viral cathepsin (177Rpi) in IIV-9 mayreflect similar roles of chitinase and cathepsin, acting in con-

FIG. 3. Open reading frame map of the IIV-9 genome. The 205,791-bp IIV-9 genome is represented as a solid line, and predicted open readingframes are indicated with arrows. Arrows representing genes in the forward (right) direction are stippled, and those in the reverse (left) directionare open. The 26 core IV genes are indicated with bold ORF numbering. Arrows with a bold outline are HTH 7 domain-containing orthologs oforf091L of IIV-3, and those with a broken outline are orthologs of IIV-6 468L. orf061R/L and -139R/L are almost fully overlapping genes facingin opposite directions and have been represented as both R and L forms.


Dow

nloa

ded

from

http

s://j

ourn

als.

asm

.org

/jour

nal/j

vi o

n 05

Jan

uary

202

2 by

92.

49.1

60.2

7.

TABLE 2. IIV-9 predicted open reading frames

IIV-9 ORFa Nucleotide positions Length(aa)

Best match(es)b

Predicted motif and/orfunctione

IIV protein(s)c GenBankaccession no.

BLASTPscore

% aaidentityd

001R 30–1223 397 IIV-3 004R; IIV-6 067R YP_654576 466 62.6002R 1238–1516 92 Signal peptide003R 1741–1980 79004L 2999–2037 320 IIV-3 005L YP_654577 62 31.7 Signal peptide, RING finger005R 3158–4705 515 IIV-3 006R; IIV-6 118L YP_654578 582 56.9 NCLDV membrane protein006L 6016–4757 419 IIV-6 468L* NP_149463 285 44.9 Helix-turn-helix 7 motif007R 6133–6459 108 IIV-6 248R; PBCV N288R NP_149711 70 44.1 Transmembrane008R 6587–7456 289 IIV-6 404L NP_149867 102 32.5009Lf 8013–7519 164 IIV-22 15.9 kDa; IIV-3 15R P25097 134 48.6 15.9-kDa protein, 5� MCP

gene010R 8235–9689 484 IIV-9-MCP; IIV-1 MCP;

IIV-3 014L; IIV-6 274LO39163 987 100.0 MCP

011L 9967–9782 61012R 10245–11540 431 MAR 344; IIV-6 273R NP_149736 108 47.0013R 11545–11865 106 Transmembrane014L 12780–11935 281 IIV-6 219L; IIV-3 036R,091L NP_149682 239 51.8015R 13053–13316 87 IIV-3 013L YP_654585 84 55.8 Signal peptide016R 13458–16700 1080 IIV-3 035R; IIV-6 179R YP_654607 1154 51.2 Tyr protein kinase-like

domain017R 16743–17156 137 IIV-3 055R; IIV-6 349L YP_654627 148 54.1 TF IIS C-terminal domain018R 17457–17726 89 IIV-3 019R YP_654591 55 52.8 Bro-N domain019L 19158–17881 425 IIV-3 069L; IIV-6 198R YP_654641 389 49.9020R 19213–20130 305 Yersinia ruckeri chitinase ZP_04617184 352 57.2 Chitinase, family 18

glycohydrolase021R 20183–20626 147 IIV-3 057L YP_654629 44 25.6022R 20680–21762 360 IIV-3 056L; IIV-6 287R;

MAR 339YP_654628 301 44.7 Putative phosphodiesterase

023L 23370–21841 509 IIV-6 380R; IIV-3 010L,011L NP_149843 367 45.0 Serine/threonine proteinkinase

024R 23480–23920 146 IIV-6 293R NP_149756 123 44.9025R 24044–25165 373 IIV-3 012R; IIV-6 302L YP_654584 322 47.6 C2H2 Zn finger026R 25236–26552 438 IIV-6 468L*; IIV-3 093 NP_149463 306 44.9 Helix-turn-helix 7 motif027L 26619–27086 155 IIV-3 085L; IIV-6 325L YP_654657 188 58.4 Signal peptide028R 27029–27229 66 Transmembrane029R 27260–28093 277 IIV-3 054L YP_654626 243 48.0030Rf 28156–28518 120 IIV-3 102R; IIV-6 122R YP_654674 119 55.0031R 28627–29910 427 IIV-3 047R; IIV-6 337L YP_654619 446 66.3 Transmembrane032Lf 31068–30394 224 IIV-3 021L YP_654593 194 55.6 C3HC4 RING finger/BIR033L 31730–31113 205 IIV-3 022L YP_654594 142 45.4 Transmembrane034L 32649–31816 277 IIV-3 101R; IIV-6 142R YP_654673 430 75.8 RNase III035L 34117–32813 434 IIV-6 468L* NP_149463 295 43.8 Helix-turn-helix 7 motif036Lf 34736–34176 186 IIV-3 104L; IIV-6 355R YP_654676 271 66.7 Phosphatase037R 34846–35577 243 IIV-3 105R; IIV-6 359L YP_654677 330 65.3038L 37133–35655 492 IIV-6 159L,219L,261R,443R;

IIV-3 091L,36RNP_149622 135 33.3

039L 38695–37220 491 IIV-6 159L,219L,261R,443R;IIV-3 091L

NP_149622 151 36.1

040R 38826–40250 474 IIV-3 106R; IIV-6 030L YP_654678 629 63.9 ATP-dependent exo-DNase� subunit

041R 40283–41143 286 Transmembrane042L 41817–41188 209 IIV-3-071L; IIV-6 259R YP_654643 285 68.4 Transmembrane043R 42393–42890 165 IIV-3 020R; IIV-6 196R YP_654592 195 59.9 Thioredoxin

domain/isomerase044L 43528–42923 201 IIV-3 097L; IIV-6 170L;

MAR 216YP_654669 228 55.0 Holliday junction resolvase

045Rf 43660–44226 188 PBCV N269L XP_973701 137 44.4 Putative dUTPase046R 44338–45663 441 IIV-6 468L*; IIV-3 093L NP_149463 294 42.0 Helix-turn-helix 7 motif047L 45971–45708 87 Transmembrane048R 46162–47862 566 IIV-3 059L; IIV-6 012L YP_654631 729 63.9 XRN 5�–3� exonuclease049R 48016–48504 162050R 48602–49111 169 IIV-3 032R YP_654604 43 31.2051L 49744–49310 144 IIV-3 058R, IIV-6 391R YP_654630 200 65.3052L 50273–49851 140 IIV-6 413R; IIV-3 021L RING/U box motif053R 50475–51242 255 IIV-3 060L; PBCV A193L YP_654632 254 56.0 Proliferating cell nuclear

antigen054R 51304–52656 450 IIV-6 468L* NP_149463 303 45.4 Helix-turn-helix 7 motif055L 55623–52687 978 IIV-3 087L; IIV-6 022L YP_654659 1258 63.5 DEAD/H motif/putative

NTPase056R 55805–56563 252 IIV-3 070L; IIV-6 306R YP_654642 240 59.7 SWIB/MDM2 domain

Continued on following page


Dow

nloa

ded

from

http

s://j

ourn

als.

asm

.org

/jour

nal/j

vi o

n 05

Jan

uary

202

2 by

92.

49.1

60.2

7.

TABLE 2—Continued


Best match(es)b



BLASTPscore

% aaidentityd

057R 56642–57964 440 Acanthamoeba polyphagamimivirus L12

YP_142366 152 26.7

058L 59618–58074 514 IIV-3 098L; IIV-6 493L YP_654670 627 60.9 Serine/threonine proteinkinase

059R 59726–61090 454 IIV-3 039R; IIV-6 393L YP_654611 501 55.3060Rf 61210–61602 130 Ixodes scapularis RNA-binding

protein; IIV-6 340REEC17992 53 26.7 dsRNA binding protein

061L 61781–61599 60061R 61662–61832 56062R 61866–62225 119 IIV-3 041R; IIV-6 453L YP_654613 152 59.7 Thioredoxin domain063R 62315–63559 414 IIV-6 420R* NP_149883 204 33.9064Lf 64363–63656 235 Transmembrane065R 64427–66082 551 IIV-3 038R; IIV-6 098R YP_654610 589 53.2066L 66388–66197 63 IIV-3 043R; IIV-6 010R YP_654615 101 67.2 Transmembrane067L 69088–66401 894 IIV-3 091L; IIV-6

443R,261R,396LYP_654663 447 43.0

068L 75310–69158 2050 IIV-3 091L; IIV-6 443R,261R YP_654663 256 33.9 Transmembrane069R 75461–77572 703 IIV-3 074L; IIV-6 268L YP_654646 617 50.5070R 77864–80311 815 IIV-16-RNR; IIV-3 065R;

IIV-6 085LAAY24450.1 559 71.4 RNR large-chain precursor/

intein071R 80408–81727 439 IIV-6 468L*; IIV-3 093L NP_149463 289 43.7 Helix-turn-helix 7 motif072L 82690–81764 308 IIV-3 091L,036R,008L; IIV-6

219L,443RYP_654663 114 40.0

073Lf 83243–82767 158 IIV-3 042R; IIV-6 136R YP_654614 192 59.2074L 84548–83364 394 IIV-3 079L; IIV-6 282R YP_654651 535 69.8 Poxvirus very late

transcription factor075R 84711–85385 224 IIV-3 080R YP_654652 181 43.8 NUDIX hydrolase domain076R 85649–86833 394 Mimivirus L5, L12, R821,

R865, L754, R433 proteinYP_142359 151 30.3 Bro domain

077R 86853–87038 61078L 87331–87053 92 Signal peptide079R 87349–88107 252 IIV 3088R; IIV-6 075L YP_654660 410 77.9 NTPase domain080L 88717–88181 178 Histones Q27443 55 17.5 H4 and H3 histone domains081R 88803–89180 125 Pseudoalteromonas

haloplanktis TAC125YP_339369 92 48.0 GIY-YIG endonuclease

082L 90202–89231 323 IIV-3 044L YP_654616 333 51.7 Protein kinase domain083R 90412–90696 94 IIV-3 045R YP_654617 127 70.0084R 90821–92098 425 IIV-6 229L; IIV-3 046R NP_149692 389 47.6085R 92175–92822 215 IIV-6 378R,232R; IIV-3 100L NP_149841 216 69.4 2-Cys adaptor domain086L 93855–92863 330 IIV-3 099R; IIV-6 329R YP_654671 249 48.0087R 94000–95811 603 IIV-3 019R; IIV-6 420R YP_654591 318 56.4 Bro-N domain088L 96116–95925 63 Transmembrane089R 96461–99868 1135 IIV-3 086L; IIV-6 045L YP_654658 1401 62.0 DNA topoisomerase II090R 99858–100511 217 IIV-3 064L YP_654636 88 27.2091L 101269–100583 228 IIV-3 063R; IIV-6 309L YP_654635 170 47.6092R 101417–102835 472 IIV-6 420R*; IIV-3

019R,093LNP_149883 283 40.8

093L 103109–102885 74094L 104606–103173 477 IIV-3 061R; IIV-6 467R YP_654633 346 39.5095R 104681–105583 300 Bombyx mori thymidylate

synthaseXP_001033394 322 52.8 Thymidylate synthase

096R 105651–107114 487 IIV-3 019R,093R; IIV-6420R*

YP_654591 248 39.5

097R 107167–108060 297 IIV-3 028R YP_654600 191 39.7098R 108107–108679 190 IIV-3 029R; IIV-6 143R YP_654601 231 55.0 Deoxyribonucleoside kinase099L 109129–108725 134 IIV-3 030L YP_654602 109 42.3100R 109185–109688 167 IIV-3 031R: IIV-6 115R YP_654603 84 32.4101R 109843–110592 249 IIV-3 032R YP_654604 139 33.5102R 110660–110956 98 Macaca mulatta regulatory

subunitXP_001097695 60 35.4 Protein phosphatase 1C

binding103R 110998–112287 429 IIV-6 468L*; IIV-3 093L NP_149463 288 43.6 Helix-turn-helix 7 motif104L 112901–112320 193 IIV-3 033L: IIV-6 307L YP_654605 222 61.5 Signal peptide,

transmembrane105Rf 113487–114305 272 IIV-3 034R; IIV-6 077L YP_654606 164 38.6 C3H1 Zinc finger106R 114438–116879 813 IIV-3 094L; IIV-6 050L YP_654666 509 34.9107R 117308–117733 141 IIV-3 053L YP_654625 143 51.1108L 118009–117770 79109R 118083–119972 629 IIV-3 052L; IIV-6 205R YP_654624 387 41.3 NAD-dependent DNA ligase110R 120070–121347 425 IIV-3 007R YP_654579 415 52.6



Dow

nloa

ded

from

http

s://j

ourn

als.

asm

.org

/jour

nal/j

vi o

n 05

Jan

uary

202

2 by

92.

49.1

60.2

7.

TABLE 2—Continued


Best match(es)b



BLASTPscore

% aaidentityd

111L 121746–121513 77 Signal peptide112L 123866–121758 702 IIV-3 091L; IIV-6

443R,261R,396L,219L,159LYP_654663 226 43.1

113R 123945–127328 1127 IIV-3 009R; IIV-6 428L YP_654581 1801 77.2 DNA-dependent RNA Polsubunit 2

114R 127384–127998 204 IIV-6 404L NP_149867 238 60.5115L 129575–128067 502 IIV-3 051L; IIV-6 213R YP_654623 141 46.3116Rf 129723–133142 1139 IIV-3 120R; IIV-6 037L YP_654692 1503 66.1 DNA polymerase117R 133208–133576 122 IIV-6 049L NP_149512 88 41.5 Transmembrane118R 133691–135031 446 IIV-6 468L* NP_149463 277 39.7 Helix-turn-helix 7 motif119R 135057–135410 117120R 135520–138408 962 IIV-3 121R; IIV-6 184R YP_654693 1342 69.2 Helicase/primase121R 138595–138990 131 Amsacta moorei

entomopoxvirus AMV-075NP_064857 107 45.8

122L 139310–139029 93 IIV-3 126R YP_654698 91 48.4 Transmembrane123L 140261–139410 283 IIV-3 125R YP_654697 271 49.5124L 144216–140356 1286 IIV-6 443R,261R,396L; IIV-3

091LNP_149906 587 47.5

125L 144906–144316 196 IIV-3 124R YP_654696 112 42.3126R 144925–145308 127 IIV-3 123L YP_654695 80 39.5127L 145500–145342 52 Transmembrane128R 145520–145729 69 IIV-3 117L YP_654689 35 34.4129R 145818–148631 936 IIV-1 L96; IIV-3 084L; IIV-6

232RP22856 927 70.0 OTU-like cysteine protease

130R 148675–149193 172 IIV-3 083L; IIV-6 358L YP_654655 95 34.7131R 149308–150282 324 IIV-6 420R* NP_149883 179 36.5132R 150254–150514 86133R 150722–151150 142 IIV-3 082L YP_654654 82 34.5134L 151813–151349 154 IIV-3 096R; IIV-6 347L YP_654668 108 40.5 ErvI/Alr sulfhydryl oxidase

domain135R 151908–152948 346 Acyrthosiohon pisum

metalloprotein; IIV-3 095LXP_001945941 189 38.5 Matrix metalloproteinase

136L 154120–153011 369 IIV-3 091L,036R,008L; IIV-6443R,219L,317L

YP_654663 132 34.0

137L 154906–154187 239 IIV-3 067L; IIV-6 197R YP_654639 264 54.0 Protein tyrosine phosphatase138R 155025–156068 347 IIV-3 078R; IIV-6 244L YP_654650 402 56.4 Phosphodiesterase domain139L 156398–156222 58139R 156283–156492 69140L 157089–156538 183 IIV-3 073R, IIV-6 234R YP_654645 134 48.0 Transmembrane141R 157589–158050 153 IIV-3 072L; IIV-6 374R YP_654644 188 60.1142R 158110–159399 429 IIV-6 468L*; IIV-3 093L NP_149463 333 48.0 Homeodomain143L 162815–159438 1125 IIV-3 016R; IIV-6 295L YP_654588 1079 49.9144R 162892–164241 449 IIV-6 468L* NP_149463 308 45.2 Homeodomain145Rf 164430–165746 438 IIV-6 161L; IIV-3 109L,108L NP_149624 342 44.0 Helicase146R 165859–166065 68 IIV-6 212L,211L NP_149675 40 39.7147R 166109–166318 69 IIV-6 388R*; IIV-3 093L NP_149851 43 42.9148L 167577–166534 347149L 168222–167611 203 IIV-3 066L; IIV-6 357R YP_654638 102 30.9 Transmembrane150R 168376–170697 773 IIV-3 113L; IIV-6 155L,149L YP_654685 809 53.7151L 171154–170834 106 IIV-3 112R; IIV-6 466R YP_654684 101 48.1 Transmembrane152L 171684–171214 156 IIV-3 111R, IIV-6 414L YP_654683 203 63.7 NUDIX hydrolase domain153R 171761–172324 187 IIV-3 001R; IIV-6 395R YP_654573 104 47.3154L 173033–172515 172 IIV-3 092R; IIV-6 454R YP_654664 206 61.9 RPB5 domain155L 173580–173080 166 Burkholderia oklahomensis

EO147; MAR 217ZP_02357920 80 31.9 dNMP kinase

156L 174825–173689 378 Dictyostelium discoideum AX4 XP_636066 88 24.3157R 175059–175748 229 IIV-3 032R YP_654604 94 32.3158R 175933–176637 234 Ixodes scapularis E3 UBQ

ligaseEEC07169 48 24.7 RING finger

159R 176791–177318 175 IIV-3 018L; IIV-6 415R YP_654590 157 50.6160R 177923–179023 366 IIV-3 076L; IIV-6 369L YP_654648 372 52.2 XPG-like protein (excision

repair)161R 179110–179382 90162R 179441–179710 89163L 180006–179749 85164L 181405–180044 453 A. polyphaga mimivirus

L5,L12YP_142359 176 30.6 Bro-N domain

165R 181529–185557 1342 IIV-3 090L; IIV-6 176R YP_654662 1964 72.3 DNA-depependent RNA PolII large subunit

166L 186055–185801 84 IIV-3 089L YP_654661 44 48.8



Dow

nloa

ded

from

http

s://j

ourn

als.

asm

.org

/jour

nal/j

vi o

n 05

Jan

uary

202

2 by

92.

49.1

60.2

7.

cert to degrade the insect, thereby facilitating viral release anddissemination from the host insect (15). Both enzymes wereidentified in IIV-9-infected insect cells by our proteomic anal-ysis.

Expressed orf180Lpi also encodes a protein with strong sim-ilarity to baculovirus genes, possessing 46% identity and 66%similarity to orf110 of Choristoneura fumiferana nucleopolyhe-drovirus (CfDEF NPV). A related gene is also present in IIV-6(422L), and alignment of CfDEF NPV orf110, Epiphyas post-vittana NPV orf102, and the IIV-9 and IIV-6 proteins showsthe presence of a highly conserved pan-caspase DEVD cleav-age site. It is not known if this exploits caspase activity forprocessing or if it might regulate apoptosis. A further enzymeidentified in IIV-9 is a putative ErvI/augmenter of liver regen-eration (ALR) sulfhydryl oxidase (134Lp). This protein is com-mon in large cytoplasmic DNA viruses (29), and in commonwith poxviruses, this was found in the IIV-9 virus particle. Therole of this enzyme activity is unclear but has been postulatedto work in concert with glutaredoxin or thioredoxin systems forregulating cytoplasmic disulfide bonds and protein folding.IIV-9 encodes two proteins with putative thioredoxin domains,043Rp and 062Rpi, both of which were identified in the virusparticle.

The repeat sequences identified in the genome are located

predominantly in coding regions. This is reflected in the pres-ence of multiple copies of closely related genes on the genome(see Fig. S1 and S3 in the supplemental material). IIV-9orf006Lpi, -026R, -035L, -046Ri, -054Ri, -071Ri, -103R, -118R,-142R, -144Ri, and -182Ri form one cluster of paralogs, asreflected in amino acid identities ranging from 60 to 84%between the encoded proteins (see Fig. S1 in the supplementalmaterial). With InterProScan, all of these proteins were pre-dicted to contain helix-turn-helix 7 (HTH 7 [Pfam 02796])motifs and/or the more stable homeodomain motifs that areinvolved in DNA binding, with a wide spectrum of roles, rang-ing from transcription regulation to DNA repair (1). Theseproteins may have a role in the regulation of viral gene expres-sion or viral genome replication, with an array of closely re-lated proteins being involved in sequence-specific fine-tuningof the viral gene cascade. An alternative role could be in theresolution of the branched complexes generated during IVgenome replication, although it is not clear why so many copieswould be required.

In addition, IIV-9 orf063R, -131R, and -168R (53 to 79%identity) form a less well conserved cluster of proteins (Fig. 3;see also Fig. S2 in the supplemental material) whose genesdisplay some motif conservation to orf092R and -096R (46%identical). No motifs or predicted functions were identified for

TABLE 2—Continued


Best match(es)b



BLASTPscore

% aaidentityd

167L 186230–186060 56 Transmembrane168R 186298–187545 415 IIV-6 420R* NP_149883 181 33.8169L 188173–187583 196 IIV-3 068R; IIV-6 401R YP_654640 343 84.7 HMG box170Rf 188324–188803 159 Apis mellifera XP_624869 121 43.4 Dual-specificity phosphatase171R 188941–189651 236 IIV-3 116R YP_654688 46 20.2172L 190186–189704 160 IIV-3 119R YP_654691 73 48.0173R 190308–190547 79 IIV-6 420R,200R NP_149883 55 40.5174L 190957–190700 85 IIV-3 115R; IIV-6 342R YP_654687 97 64.5175R 191065–191487 140 IIV-3 114L YP_654686 51 27.2 Signal peptide176R 191553–192140 195 IIV-3 081L YP_654653 97 34.7 FasI domain177R 192245–193678 477 IIV-3 024R; IIV-6 361L,224L YP_654596 539 56.3 Cathepsin178R 193729–194130 133179R 194134–194400 88180L 195002–194424 192 CfDEF NPV 110; Eppo NPV

102NP_932719 182 46.1

181R 195132–196025 297 IIV-3 017R; IIV-6 335L YP_654589 320 60.4182R 196333–197700 455 IIV-6 468L* NP_149463 282 40.0 Helix-turn-helix 7,

homeodomain183R 197801–198733 310 IIV-3 107R; IIV-6 117L YP_654679 232 54.4 Transmembrane184L 199094–198795 99 IIV-3 023R YP_654595 145 68.4185R 199157–199636 159 IIV-3 050L YP_654622 162 63.6186L 202115–199674 813 IIV-3 049R YP_654621 95 18.0187R 202220–203323 367 IIV-3 048L; IIV-6 376L YP_654620 581 72.2 RNR small subunit188R 203536–204051 171 IIV-3 025R; IIV-6 111R YP_654597 87 33.5 Transmembrane189R 204093–204767 224 IIV-3 026R; IIV-6 350L YP_654598 296 67.6190R 204824–205276 150 IIV- 027R; IIV-6 157L YP_654599 73 31.8 C3HC4 RING finger191L 205760–205347 137

a IIV-9 open reading frame number. Proteins identified by the proteomic experiments are indicated by bold ORF numbers for the purified IIV-9 particles and byunderlined ORF numbers for infected cells. Note that the protein products of 071R and 182R were identified by the same set of peptides and can therefore not bedistinguished unambiguously by the proteomic analysis.

b Most closely related gene by BLASTP analysis.c Matching IIV proteins, with the first-listed protein being the most closely related. Non-IIV proteins with a high similarity score to an IIV-9 protein by BLASTP

analysis are indicated. NCLDV species abbreviations are PBCV, Paramecium bursaria chlorella virus, and MAR, Marseilles virus. CfDEF NPV, Choristoneurafumiferana nucleopolyhedrovirus; Eppo NPV, Epiphyas postvittana nucleopolyhedrovirus.

d Amino acid percent identity for most closely related protein. * indicates similarity to the cluster of proteins encoded by IIV-6 468L and its homologous genes inIIV-6.

e TF IIS, transcription factor IIS; BIR, baculovirus inhibitor of apoptosis protein repeat; Bro, baculovirus repeated ORF; NUDIX, nucleoside diphosphate linked.f Protein identified by a single peptide hit.


Dow

nloa

ded

from

http

s://j

ourn

als.

asm

.org

/jour

nal/j

vi o

n 05

Jan

uary

202

2 by

92.

49.1

60.2

7.

this cluster of repeated genes, but by BLAST analysis, theywere distantly related to the same helix-turn-helix cluster ofgenes identified above.

Analysis of the protein of orf068Lp, which is indicated bythe double-boxed repeat highlighted in Fig. 2A, using theXSTREAM protein tandem repeat finder (27) identified 4.8copies of a 131-aa repeat at aa 104 to 731 (Fig. 4A) and animmediately adjacent repeat consisting of 14.4 copies of an84-aa repeat at aa 714 to 1922 (Fig. 4B). The respective C- andN-terminal flanks of the repeats overlap. The orf067Lpi proteinpossesses a repeat related to that illustrated in Fig. 4A (Fig.4C). Both proteins have a high level of predicted -sheet com-position, and both were identified in the virus particle. Thehigh -sheet structure is similar to what is found in a range offiber structures, such as bacteriophage fibers and tubulin.BLAST analysis of a single copy of the 131-aa repeat identifieda very weak match between a short sequence located aroundthe conserved PDATT motif and bacteriophage fiber proteins(data not shown). However, the location of orf067Lpi andorf068Lp in the particle is unknown, and hence a possible rolein the surface fibril cannot be confirmed. An ortholog of thisprotein is in IIV-3 (091L) and IIV-6 (443R) but is not con-served in vertebrate IV, which would be consistent with thelack of surface fibrils on the vertebrate IV.

Relationship to other viruses. Of the 191 ORF predicted inIIV-9, 108 were most closely related to IIV-3 ORF (Table 2),indicating that IIV-9 is more closely related to the chlorirido-virus IIV-3 than to IIV-6. Analysis of IIV-3 shows that 114 ofthe 126 ORF in IIV-3 (5) were identified as having an orthologin IIV-9. In contrast, IIV-6 (Iridovirus genus) has 211 ORF, asdefined by Eaton et al. (8), of which only 97 have orthologs in

IIV-9. A total of 88 ORF are common to all 3 fully sequencedIIV. Interestingly, of the 45 ORF without an ortholog in otherIV, 23 encoded proteins that were smaller than 100 aminoacids, of which four (002R, 088L, 093L, 111L) were confirmedas expressed proteins by the proteomic analysis. In contrast tothe high level of conserved genes between IIV genomes, thereis a very low level of conservation in gene order. Comparisonto IIV-3 with a gene parity plot (Fig. 2B) indicates only 5clusters of 3 or more genes, with the largest conservation ofgene order and orientation being a cluster of 5 genes repre-sented by IIV-9 097R,101R and IIV-3 028R,032R (Table 2).IIV-9 and IIV-6 genomes possess no more than two genes thatare conserved in order in any one cluster.

The 26 core genes previously identified as being conserved inall iridoviruses (8) were identified in IIV-9, and consistentlywith other genes in the genome, no conservation of gene orderwas apparent for these conserved genes. Phylogenetic analysisof the coding sequences for all 26 genes collated as a concat-enated protein sequence for each of IIV-9, IIV-6, IIV-3, SGIV,LCDV-1, and ISKNV (Fig. 1B) shows the clear separation ofthe vertebrate and invertebrate IV. In addition, the core setprovides strong evidence for the main features of the phyloge-netic trees established for the partial MCP sequence (Fig. 1A)(38), with IIV-6 being in a separate clade and IIV-3 being moreclosely related to the major clade of IIV than its current tax-onomic position in a separate genus suggests.

There were 36 NCLDV genes identified in the genome,including nine conserved orthologs found in all NCLDV (19)and seven that were present in all four families but that aremissing from some lineages within those families. IIV-9 057R,076R, and 164L were most closely related to predicted pro-

FIG. 4. Tandem protein repeats in the proteins of orf068L and -067L. The repeat regions from aa 104 to 731 (A) and 714 to 1922 (B) of theorf068L protein are shown with residues matching the consensus (shaded). Underlined residues are the same amino acids. (C) The repeat sequencewithin the orf067L protein is shown aligned with the orf068L protein repeat shown in panel A (boxed underneath). The starred Y’s are the sameresidue (to orient the 068L and 067L repeats).


Dow

nloa

ded

from

http

s://j

ourn

als.

asm

.org

/jour

nal/j

vi o

n 05

Jan

uary

202

2 by

92.

49.1

60.2

7.

teins of unknown function from Acanthamoeba polyphagamimivirus.

miRNA coding prediction. The analysis of miRNA shows anincreasing complexity of viral interactions with this posttran-scriptional control system, including the control of host andviral genes. Roles have included the establishment of latencyand the avoidance of mammalian immune responses, as well asmanipulation of the cellular environment to facilitate replica-tion. Examples of viral miRNA to date have predominantlyfocused upon viruses that are relatively slow in their replica-tion, such as the herpesviruses, and upon the role of virallyencoded miRNA in latency. Because IIV have a nuclear rep-lication phase, are relatively slow growing, and often have arange of sublethal effects upon their host insect, they are po-tential candidates for encoding miRNA.

Combined analysis for pre-miRNA sequences using VMirand MiPred generated seven possible pre-miRNAs (Table 3).All were located within open reading frames of unknown func-tion, and five had no predicted motifs observable. Three pu-tative pre-miRNAs were identified in the same orientation asthe associated ORF, and four were in the opposite orientation.The absence of host cell sequence information makes identi-fication of potential host target genes unfeasible. An XRNexonuclease gene (048R) is predicted on the genome of IIV-9,with orthologs in IIV-3 and IIV-6. This enzyme has a role inthe processing of miRNA, in particular, the degradation of ma-ture miRNA (24); hence, even if IIV genomes do not encodemiRNAs, there is a strong likelihood that IIV interact with smallnoncoding RNA systems within the cells that they infect.

The presence of an RNase III gene (orf034L) that has alsobeen identified in IIV-3, IIV-6, and vertebrate IV (45), and adsRNA binding protein (060R; IIV-6 340R), supports a rolefor noncoding RNA in IV replication. RNase III was identified

in purified virus particles and infected cells by our proteomicexperiments (Table 2; see also Tables S1 and S3 in the supple-mental material) and was previously identified in particles ofSGIV (31), confirming that this protein is produced and, hence,likely to have a role in IV replication. miRNA has also beenpredicted for soft-shelled turtle iridovirus (STIV) (18), and thepresence of miRNAs has recently been confirmed for SGIV (44).

Conclusions. IIV-9 is a member of the major IIV (group II)clade, and the complete genome provides insight into the re-lationships within and between IIV genera. The apparent closerelationship to IIV-3, a virus from a separate genus, andthe more distant relationship to IIV-6 have been confirmedthrough full-genome analysis. The genome encodes a widerange of proteins for which there is no functional prediction,and many of these are found in the complex virus particle. Thepresence of paralog proteins on the genome is a major con-tributor to the high incidence of repeat sequences associatedwith the genome, unlike with IIV-3, where the repeats aremore likely to be in noncoding regions (5). The clustering ofrepeats within predominantly the -sheet proteins suggeststhat these proteins may form filamentous structures that areassociated with the virus particle and, as such, are candidatesfor the surface fibrils identified on IIV-9 particles. As for otherIV, a large number of proteins are predicted to be involved innucleotide regulation and genome replication, consistent witha life cycle that includes DNA replication in both the cyto-plasm and nucleus and the branched concatemeric replicationstrategy of IV, which requires resolution of complex genomestructures (11).

ACKNOWLEDGMENT

This research was supported by the University of Otago.

TABLE 3. Predicted pre-miRNA sequences in the IIV-9 genomec

miRNAa Start siteb Apexnucleotide Sequence (5�–3�) (length bp�) VMir

score % G�C MiPredMFE P value MiPred

% conf

MR171 11099 1134 CAAAGUCGACUCUUCCACUCGGAAAUGUGAAUGGUUUCCGAGUAAAAGCUGAAGAAAUGACUUUG (65)

130 42 �23.6 0.001 63

MR529 31179 31212 AAUGGGGUGUGUGAUGGAAUUGGAUUACCCACCCUUAUUUUAGGGUGGGUAAUUUUGUAUUACACUUUUGA (71)

181 38 �31 0.001 75

MR598 35790 35822 GGGUUGUGGGAGAGCCAACUGGAUCUACAUAUGUAGAUCCAGCUUGGGAUACAUCUACAGC (61)

127 48 �27.9 0.001 66

MD653 37793 37835 UGAGUAUUAUCAGCUUUUACCAAAGAUGCUGGAUCUACAAUUAACGUUGGACUAGCACCAGCUGAUAAACUUAAA (75)

127 36 �21.6 0.001 63

MR1469 89360 89395 UUUGGUAUUAUGUUGCACUUUUUAACCUUGAUGAAAUAUCCUUUUUUACGAGAAGAAGAAGUGCUCGAGUAUCA (74)

150 32 �20.2 0.005 63

MD2135 128453 128488 GGUGAUGUAAUCUGUGGAAUAACUCUGUUUAGUUUUUUUAGACAUUUGUUCAACAAAGAUUAAAUCGUCACCGU (74)

141 34 �22.9 0.001 69

MR2250 135129 135164 CGCCAUUAUAAUCAUUUUUAUAGUGGAUAGACUCCAAACUAUCAUUGUUCAAAUGAUUAUAAGAACCGG (69)

148 30 �18.6 0.001 64

a R indicates that pre-miRNA is derived from the reverse genomic strand, and D indicates the direct genomic strand. The number refers to the pre-miRNA identifiedfrom the initial screen of the entire genomic sequence by VMir.

b First nucleotide position of the pre-miRNA in the IIV-9 genome.c The minimum free energy (MFE), P value, and percent confidence (conf) were determined by MiPred.


Dow

nloa

ded

from

http

s://j

ourn

als.

asm

.org

/jour

nal/j

vi o

n 05

Jan

uary

202

2 by

92.

49.1

60.2

7.

REFERENCES

1. Aravind, L., V. Anantharaman, S. Balaji, M. M. Babu, and L. M. Iyer. 2005.The many faces of the helix-turn-helix domain: transcription regulation andbeyond. FEMS Microbiol. Rev. 29:231–262.

2. Benson, G. 1999. Tandem repeats finder: a program to analyze DNA se-quences. Nucleic Acids Res. 27:573–580.

3. Bromenshenk, J. J., et al. 2010. Iridovirus and microsporidian linked tohoney bee colony decline. PLoS One 5:e13181.

4. Chinchar, V. G., et al. 2005. Iridoviridae, p. 145–162. In C. M. Fauquet,M. A. Mayo, J. Maniloff, U. Desselberger, and L. A. Ball (ed.), Virustaxonomy. Eighth report of the International Committee on Taxonomy ofViruses. Elsevier Academic Press, San Diego, CA.

5. Delhon, G., et al. 2006. Genome of invertebrate iridescent virus type 3(mosquito iridescent virus). J. Virol. 80:8439–8449.

6. Delius, H., G. Darai, and R. M. Flugel. 1984. DNA analysis of insect irides-cent virus 6: evidence for circular permutation and terminal redundancy.J. Virol. 49:609–614.

7. Do, J. W., et al. 2004. Complete genomic DNA sequence of rock breamiridovirus. Virology 325:351–363.

8. Eaton, H. E., et al. 2007. Comparative genomic analysis of the family Irido-viridae: re-annotating and defining the core set of iridovirus genes. Virol. J.4:11.

9. Fowler, M., and J. S. Robertson. 1972. Iridescent virus-infection in fieldpopulations of Wiseana-Cervinata Lepidoptera-Hepialidae) and Witlesia sp.(Lepidoptera-Pyralidae) in New Zealand. J. Invertebr. Pathol. 19:154–155.

10. Goodwin, T. J., M. I. Butler, and R. T. Poulter. 2006. Multiple, non-allelic,intein-coding sequences in eukaryotic RNA polymerase genes. BMC Biol.4:38.

11. Goorha, R. 1982. Frog virus 3 DNA replication occurs in two stages. J. Virol.43:519–528.

12. Goorha, R., G. Murti, A. Granoff, and R. Tirey. 1978. Macromolecularsynthesis in cells infected by frog virus 3. VIII. The nucleus is a site of frogvirus 3 DNA and RNA synthesis. Virology 84:32–50.

13. Goorha, R., and K. G. Murti. 1982. The genome of frog virus 3, an animalDNA virus, is circularly permuted and terminally redundant. Proc. Natl.Acad. Sci. U. S. A. 79:248–252.

14. Grundhoff, A., C. S. Sullivan, and D. Ganem. 2006. A combined computa-tional and microarray-based approach identifies novel microRNAs encodedby human gamma-herpesviruses. RNA 12:733–750.

15. Hawtin, R. E., et al. 1997. Liquefaction of Autographa californica nucleo-polyhedrovirus-infected insects is dependent on the integrity of virus-en-coded chitinase and cathepsin genes. Virology 238:243–253.

16. He, J. G., et al. 2001. Complete genome analysis of the mandarin fishinfectious spleen and kidney necrosis iridovirus. Virology 291:126–139.

17. He, J. G., et al. 2002. Sequence analysis of the complete genome of aniridovirus isolated from the tiger frog. Virology 292:185–197.

18. Huang, Y., et al. 2009. Complete sequence determination of a novel reptileiridovirus isolated from soft-shelled turtle and evolutionary analysis ofIridoviridae. BMC Genomics 10:224.

19. Iyer, L. M., L. Aravind, and E. V. Koonin. 2001. Common origin of fourdiverse families of large eukaryotic DNA viruses. J. Virol. 75:11720–11734.

20. Jakob, N. J., K. Muller, U. Bahr, and G. Darai. 2001. Analysis of the firstcomplete DNA sequence of an invertebrate iridovirus: coding strategy of thegenome of Chilo iridescent virus. Virology 286:182–196.

21. Jancovich, J. K., et al. 2003. Genomic sequence of a ranavirus (familyIridoviridae) associated with salamander mortalities in North America. Vi-rology 316:90–103.

22. Jiang, P., et al. 2007. MiPred: classification of real and pseudo microRNAprecursors using random forest prediction model with combined features.Nucleic Acids Res. 35:W339–W344.

23. Juhl, S., et al. 2006. Assembly of Wiseana iridovirus: viruses for colloidalphotonic crystals. Adv. Funct. Mater. 16:1086–1094.

24. Kim, Y. K., I. Heo, and V. N. Kim. 2010. Modifications of small RNAs andtheir associated proteins. Cell 143:703–709.

25. Kurita, J., K. Nakajima, I. Hirono, and T. Aoki. 2002. Complete genomesequencing of Red Sea bream iridovirus (RSIV). Fish. Sci. 68:1113–1115.

26. Lu, L., et al. 2005. Complete genome sequence analysis of an iridovirusisolated from the orange-spotted grouper, Epinephelus coioides. Virology339:81–100.

27. Newman, A. M., and J. B. Cooper. 2007. XSTREAM: a practical algorithmfor identification and architecture modeling of tandem repeats in proteinsequences. BMC Bioinformatics 8:382.

28. Radloff, C., R. A. Vaia, J. Brunton, G. T. Bouwer, and V. K. Ward. 2005.Metal nanoshell assembly on a virus bioscaffold. Nano Lett. 5:1187–1191.

29. Senkevich, T. G., C. L. White, E. V. Koonin, and B. Moss. 2000. A viralmember of the ERV1/ALR protein family participates in a cytoplasmicpathway of disulfide bond formation. Proc. Natl. Acad. Sci. U. S. A. 97:12068–12073.

30. Shevchenko, A., et al. 1996. Linking genome and proteome by mass spec-trometry: large-scale identification of yeast proteins from two dimensionalgels. Proc. Natl. Acad. Sci. U. S. A. 93:14440–14445.

31. Song, W., Q. Lin, S. B. Joshi, T. K. Lim, and C. L. Hew. 2006. Proteomicstudies of the Singapore grouper iridovirus. Mol. Cell. Proteomics 5:256–264.

32. Song, W. J., et al. 2004. Functional genomics analysis of Singapore grouperiridovirus: complete sequence determination and proteomic analysis. J. Vi-rol. 78:12576–12590.

33. Sullivan, C. S., and A. Grundhoff. 2007. Identification of viral microRNAs.Methods Enzymol. 427:3–23.

34. Tan, W. G., T. J. Barkman, V. G. Chinchar, and K. Essani. 2004. Compar-ative genomic analyses of frog virus 3, type species of the genus Ranavirus(family Iridoviridae). Virology 323:70–84.

35. Tidona, C. A., and G. Darai. 1997. The complete DNA sequence of lympho-cystis disease virus. Virology 230:207–216.

36. Tsai, C. T., et al. 2005. Complete genome sequence of the grouper iridovirusand comparison of genomic organization with those of other iridoviruses.J. Virol. 79:2010–2023.

37. Ward, V. K., and J. Kalmakoff. 1987. Physical mapping of the DNA genomeof insect iridescent virus type 9 from Wiseana spp. larvae. Virology 160:507–510.

38. Webby, R., and J. Kalmakoff. 1998. Sequence comparison of the majorcapsid protein gene from 18 diverse iridoviruses. Arch. Virol. 143:1949–1966.

39. Webby, R. J., and J. Kalmakoff. 1999. Comparison of the major capsidprotein genes, terminal redundancies, and DNA-DNA homologies of twoNew Zealand iridoviruses. Virus Res. 59:179–189.

40. Williams, T. 2008. Natural invertebrate hosts of iridoviruses (Iridoviridae).Neotrop. Entomol. 37:615–632.

41. Williams, T., and J. S. Cory. 1994. Proposals for a new classification ofiridescent viruses. J. Gen. Virol. 75:1291–1301.

42. Yan, X., et al. 2000. Structure and assembly of large lipid-containing dsDNAviruses. Nat. Struct. Biol. 7:101–103.

43. Yan, X., et al. 2009. The capsid proteins of a large, icosahedral dsDNA virus.J. Mol. Biol. 385:1287–1299.

44. Yan, Y., et al. 2011. Identification of a novel marine fish virus, Singaporegrouper iridovirus-encoded microRNAs expressed in grouper cells by Solexasequencing. PLoS One 6:e19148.

45. Zenke, K., and K. H. Kim. 2008. Functional characterization of the RNaseIII gene of rock bream iridovirus. Arch. Virol. 153:1651–1656.

46. Zhang, Q. Y., F. Xiao, J. Xie, Z. Q. Li, and J. F. Gui. 2004. Complete genomesequence of lymphocystis disease virus isolated from China. J. Virol. 78:6982–6994.


Dow

nloa

ded

from

http

s://j

ourn

als.

asm

.org

/jour

nal/j

vi o

n 05

Jan

uary

202

2 by

92.

49.1

60.2

7.

1 Genomic and Proteomic Analysis of Invertebrate Iridovirus Type 9

Documents