Complete Sequence, Analysis and Organization of the Orgyia leucostigma Nucleopolyhedrovirus Genome

The complete genome of the Orgyia leucostigma nucleopolyhedrovirus (OrleNPV) isolated from the whitemarked tussock moth (Orgyia leucostigma, Lymantridae: Lepidoptera) was sequenced, analyzed, and compared to other baculovirus genomes. The size of the OrleNPV genome was 156,179 base pairs (bp) and had a G+C content of 39%. The genome encoded 135 putative open reading frames (ORFs), which occupied 79% of the entire genome sequence. Three inhibitor of apoptosis (ORFs 16, 43 and 63), and five baculovirus repeated ORFs (bro-a through bro-e) were interspersed in the OrleNPV genome. In addition to six direct repeat (drs), a common feature shared among most baculoviruses, OrleNPV genome contained three homologous regions (hrs) that are located in the latter half of the genome. The presence of an F-protein homologue and the results from phylogenetic analyses placed OrleNPV in the genus Alphabaculovirus, group II. Overall, OrleNPV appears to be most closely related to group II alphabaculoviruses Ectropis obliqua (EcobNPV), Apocheima cinerarium (ApciNPV), Euproctis pseudoconspersa (EupsNPV), and Clanis bilineata (ClbiNPV).


Introduction
The family Baculoviridae consists of rod-shaped, enveloped, and occluded viruses mainly pathogenic to insects in the orders Lepidoptera, Diptera and Hymenoptera [1]. In recent years, there has been continued interest in baculovirus genome studies mainly due to their application as biopesticides of major insect pests in forests and agriculture [2][3][4], and also as vectors for protein expression and gene therapy [5][6].
Baculoviruses exhibit two phenotypically distinct phenotypes: (i) budded virions (BV), and (ii) occlusion derived virions (ODV) that are produced during the early and late phases of virus replication, respectively. The ODV phenotypes for granuloviruses (GV) (Betabaculovirus) are embedded in a small granulin protein matrix unlike those of nucleopolyhedroviruses (NPV) that are contained in a large polyhedrin matrix. The OBs of GVs contain single virions while NPVs contain multiple virions within the polyhedrin matrix.
The ODVs of the alphabaculovirus infecting the whitemarked tussock moth (WMTM) Orgyia leucostigma (L.) have single nucleocapsids per virion. The WMTM is a common pest of balsam fir trees (Abies balsamea (L.) Miller) in Canada but will feed on a number of conifer, decidiuous trees, and agricultural crops such as blueberries (Vaccinium spp.). OrleNPV has been documented to contribute to the collapse of WMTM populations in Atlantic Canada [13]. Experimental applications of OrleNPV for the control of WMTM infestations have been made in Nova Scotia [14]. To better understand the molecular basis of OrleNPV pathogenicity, viral DNA from a Nova Scotia isolate was purified, sequenced and compared with other baculoviruses using various phylogenetic tools. The OrleNPV genome was found to be 156,179 bp in size and was confirmed to belong in the genus Alphabaculovirus, group II.

Nucleotide Sequence Analysis
The circular OrleNPV genome was 156,179 bp in size, making it the eighth largest baculovirus genome sequenced to date, with others ranging from 81,755 bp for Neodiprion lecontei NPV (NeleNPV) [12] to 178,733 bp for Xestia c-nigrum GV (XecnGV) [15]. The OrleNPV genome is AT-rich having only 39% G+Cs, lower than the average GC content of group I (44.9 %) and group II (41.6%) alphabaculoviruses [16]. As originally defined, [17], the adenine of methionine start codon of the polyhedrin gene represented the zero point on the OrleNPV physical map (Figure 1), and was designated as ORF 1 (Table 2). Overall, 135 putative ORFs, with a minimal size of 50 amino acids and with promoter motifs corresponding to various transcriptional profiles, were detected and encoded for 79% of the total genome (Tables 1 and 2). Fifty-eight (42.9%) of the ORFs were forward oriented, whereas 77 (57.1%) were reverse oriented ( Figure 1). Interestingly, OrleNPV encodes for relatively fewer genes (135) compared to other baculoviruses where the average number of ORFs is 140.5. This could be attributed to the presence of several copies of hrs, drs, and large intergenic regions, which collectively account for 21 % on the entire genome. Multiple copies of hrs and drs may be a replication strategy for the virus to initiate DNA replication from several regions leading to rapid rate of DNA replication in the host system.

Gene Organization
Gene parity plots were used to reveal the gene order between two baculoviruses, where closely related baculoviruses show colinear arrangement of genes and, conversely, colinearity decreases with increased divergence between baculoviruses [12,20]. Comparisons of relative gene order between OrleNPV and EcobNPV, ChchNPV, AcMNPV or OpMNPV homologues revealed five gene clusters that were conserved in these six alphabaculoviruses ( Figure 2). The gene clusters consisted of 57 genes where 22 (76%) core baculovirus genes and 23 (72%) lepidopteran specific genes were identified. Gene cluster 5 showed the largest collinear segment which contained a core cluster of four genes, helicase, ac96 (odv-e28), 38K and lef5, that are thought to be present in all baculoviruses [18]. The gene arrangements of these five gene clusters indicate further that OrleNPV is more closely related to group II than group I alphabaculoviruses. Inversions in gene clusters 3 and 4 were identified in all six of these group I alphabaculovirues ( Figure 2) and ChchNPV, a group II alphabaculovirus, showed inversion orientations in gene clusters 2 and 5 similar to group I alphabaculoviruses, AcMNPV and OpMNPV. These results are in agreement with a recent study, which showed similar gene arrangements between ClbiNPV and OrleNPV [21].

Homologous Regions and Direct Repeats
Homologous regions (hrs) are cis-acting elements present in many baculovirus genomes and most are characterized by the presence of several copies of direct repeats and imperfect palindromes.
Comparisons of 57 sequenced baculovirus genomes showed that 47 contain hrs and eight, including Trichoplusia ni SNPV (TnSNPV), Adoxophyles orona GV (AdorGV), Agrotis segetum GV (AgseGV), Cydia pomonella GV (CpGV), Crytophlebia leucotreta GV (CrleGV), Pieris rapae GV (PrGV), SlGV, and NeleNPV have no hrs (Table 1). Spodoptera litura SMNPV genome (SpltMNPV-G2), contains 17 hrs, which is the highest number of hrs reported to date [22]. Although the in vivo origin(s) of baculovirus DNA replication (oris) remain unclear, hrs have been implicated as putative oris, and as enhancers of early gene expression using transient replication assays [23][24][25]. In addition, the interpalindromic sequences of baculovirus hrs contain certain cAMP response elements (CRE), which could bind host transcriptions factors of the bZIP family and possibly provide a synergy for the binding of IE-1 to the hrs [26,27]. In other baculovirus genera, particularly some betabaculoviruses, hrs exhibit both spatial and sequence heterogeneity and are devoid of the consensus palidromic sequences such as those found in the type baculovirus AcMNPV. For example, of the 9 hrs of XecnGV, hrs 1-8 contains about 120 bp long imperfect 3-6 direct repeats with no obvious palindromes and are interspersed within AT-rich regions, but the other hr is located within an ORF [15]. The OrleNPV genome contained three hrs that were located in the latter half of the genome ( Table 2). Like other alphabaculoviruses, OrleNPV hrs were characterized by the presence of a single tandemly repeated imperfect palindrome sequence within a direct repeat of two to three copies. In AcMNPV genome, however, hrs have common repetitive elements with an imperfect palindrome of about 30 bp and are interspersed at multiple loci in the genome [27]. The size of the three OrleNPV hrs was variable (hr1 305 bp, hr2 219 bp and hr3 324 bp, ( Table 2). Similar variations were observed in EcobNPV, which appears to be closely related to OrleNPV [28]. However, given the different numbers of hrs, the combined size of the OrleNPV hrs is relatively smaller (848 bp or 0.5% of the genome) compared to the combined size of the EcobNPV hrs (3541 bp or 2.7% of the genome). Nevertheless, the similarity in number and arrangement of the hrs (Figure 3), coupled with the phylogenetic analysis ( Figure 4) strongly support the possibility of the two viruses sharing a common ancestor. In both OrleNPV and EcobNPV, there are indications of genome variability and arrangements around hr3. Here, two copies of OrleNPV ribonucleotide reductase genes (rr1 and rr2) are located on either side of the hr3 and in the same orientation. In EcobNPV genome, however, both rr1 and rr2 genes appear on the right side of hr3 and in opposite orientations. Moreover, EcobNPV pif1 gene does not flank hr3 as is the case of OrleNPV genome. Together, these findings suggest that, although both viruses appear to share a common ancestor (Figure 4), gene rearrangements around hr3 could have occurred through homologous recombination as the two viruses adapted in their specific hosts. Similar observations have been reported in other alphabaculoviruses although at different hr loci [29][30][31].
In addition to hrs, the OrleNPV genome contained six direct AT-rich repeat regions (drs) (except rep 5) containing 2 to 10 copies of direct tandem repeat sequences, ranging in size from 31 to 97 bp (Table 2). Common in other baculoviruses, drs have been implicated as origins of DNA replication [11,32,33]. They have also been shown to be more complex in structure compared to hrs and have similar sequence organization to those of eukaryotic oris [34][35][36][37]. Interestingly, the six drs located in the first half of the OrleNPV genome are absent in EcobNPV, ApciNPV, and Ectropis pseudoconspersa NPV (EupsNPV) in spite of their evolutionary relatedness. This observation suggests that these viruses may utilize different in vivo origins of DNA replication. This notion is supported by recent findings, which showed that none of the hrs is essential for baculovirus DNA replication [38].
Although structurally distinct to those of OrleNPV, non-hr repeats, have been reported in OpMNPV [29]. Since non-hrs mimics eukaryotic oris, and that OrleNPV and OpMNPV infect the same hosts, it is likely that the non-hrs were acquired from the host and may play an important role in viral DNA replication. It is worth noting that the only gene unique to OrleNPV (orle001) is flanked by rep1 and rep2 (Table 2). This gene may have been acquired from insect host and thus the drs may play a role in the horizontal transfer of genes into viral genomes. Although the molecular function of orle001 is unknown, based on its close proximity to these non-hr oris, it is also possible that orle001 product interacts with other replication machinery at the drs cis-acting elements during viral gene expression or DNA replication.

Baculovirus-Repeated ORFs
Bro genes are common in many baculovirus genomes and constitute a family of repetitive genes which vary in number and distribution. Among the current, sequenced baculovirus genomes, LdMNPV has the highest number of bro genes (16 copies), but they are absent in 13 baculovirus genomes including Maruca vitrata NPV (MaviNPV), Rachiplusia ou MNPV (RoMNPV), ApciNPV, Spodoptera exigua MNPV (SeMNPV), AdorGV, AgseGV, (ChocGV), CrleGV, PrGV, Plutella xylostella GV (PxGV), Neodiprion abietis NPV (NeabNPV), NeleNPV, and Neodiprion sertifer NPV (NeseNPV) ( Table 1). In addition to baculoviruses, homologues of bro genes have been reported in ascoviruses, iridoviruses, entomopoxviruses, and class II transposons of prokaryotes [39]. Although their role during baculovirus replication is not clear, bro gene products have been implicated as potential DNA binding protein involved in host transcriptional regulation and DNA replication [40,41]. Other putative functions of bro genes, which may depend on the target insect host, include enhancement of late phase of virus replication [39] and as CRM1-dependent nuclear export shuttle proteins [42]. Five bro genes were identified in OrleNPV (ORFs 23, 30, 49, 61 and 100) and were designated as bro-a, bro-b, bro-c, bro-d, and bro-e based on their order of appearance in the genome ( Table 2). Bro-a shares 27% and 38% identity with its respective homologues in ChchNPV and LdMNPV. Bro-b shares 25% identity with its LdMNPV homologue. Bro-c shares between 21% to 56% identity with its homologues in AcMNPV, EcobNPV, ChchNPV, and LdMNPV. Bro-d shares between 25% to 28% identity with its homologues in AcMNPV, EcobNPV, and LdMNPV. Bro-e shares between 47% to 64% identity with its homologues in EcobNPV, ChchNPV and LdMNPV. Based on LdMNPV Bro protein classification, bro-d belongs to group I, bro-b, bro-c, and bro-e (where bro-c and bro-e are more similar to each other than bro-b) belong to group II, and bro-a belongs to group III [43]. As shown in the phylogenic tree (Figure 4), ApciNPV is phylogenetically related to OrleNPV but lacks bro genes suggesting that the former may have lost bro genes by recombination as the viruses specialized to their hosts. Phylogenetic analysis of a multigene family of bro-like genes of other invertebrate DNA viruses and bacteria suggested that bro genes resulted from recombination events leading to loss, duplication, and acquisition of genes by horizontal gene transfer [39].

Inhibitors of Apoptosis (iap)
Baculovirus inhibitors of apoptosis (iap) genes help to circumvent insect defense mechanisms involving programmed death of virus-infected cells and may act as host range factors [51]. Some alphabaculoviruses such as AcMNPV, RoMNPV, Bombyx mori NPV (BmNPV), Maruca vitrata NPV (MaviNPV), and SpltMNPV contain the p35 family of anti-apoptosis genes in addition to the iap family of genes. Both families of anti-apoptotic genes have been shown to have complementary functions, for example, replication of AcMNPV p35 mutant was rescued by iap genes from other baculoviruses [52,53]. OrleNPV genome contained three iap genes but lacks p35. ORF 16 (iap3) showed highest identity with group I alphabaculovirus, Choristoneura fumiferana MNPV (CfMNPV) iap3 (61%) and with group II, TnSNPV iap1 (54%). ORF 43 (iap2), however, showed highest identity to EupsNPV iap1 (35%) and Anticarsia gemmatalis MNPV (AgMNPV-2D) iap-3 (30%) in respective group II and I alphabaculoviruses. The other copy of OrleNPV iap2 (ORF 63) showed highest identities with iap2 of group II alphabaculoviruses, EcobNPV (33%) and ChchNPV (32%) ( Table 2). All three proteins contained one copy of baculovirus inhibitor of apoptosis repeats (BIRs). BIRs contain a zinger-finger domain and a RING domain at the C-terminus and are involved in binding apoptosis-inducing proteins [54,55]. ORF 43 and 63 contained a zinc finger at the C-terminus, whereas ORF16 lacked a zinc finger domain which suggests that they may be involved in binding different forms of host apoptosis proteins.

Phylogenetic Analysis
To date, 31 core genes have been used to elucidate the evolutionary relationships of various baculovirus species [16]. In this study, the phylogenetic tree was generated based on the concatemers of ac22 homologue (pif-2) and lef-8 genes. These two genes were previously shown to generate robust tree that is comparable to that of baculovirus core genes [58] from the current baculovirus genomes available in the NCBI database. The tree showed a clear separation of baculoviruses into their current scheme of classification ( Figure 4) [7]. In addition, group I alphabaculoviruses and betabaculoviruses showed clear separation into two distinct clades (a and b), which have been reported in previous studies using the analysis of all baculovirus core genes [7,16,58]. This information further highlights the significance of these two conserved genes, albeit non-essential for replication, in elucidating the evolution of baculoviruses in their respective hosts [58]. Our results indicate that OrleNPV is most closely related to EcobNPV, ApciNPV, EupsNPV, and ClbiNPV ( Figure 4).

Virus Propagation, DNA Extraction and Purification
Third-instar O. leucostigma larvae were fed OrleNPV at a rate of 5 × 10 4 OBs per larva. Infected larvae were reared on artificial diet and were monitored for viral pathogenesis at various days post infection (dpi). Occlusion bodies were isolated from infected larvae as previously described with minor modifications [59,60]. In brief, 20 larvae were collected 10 dpi, homogenized with a hand blender, and stirred for 2 h in 0.5% SDS.
The homogenate was filtered through cheese cloth and centrifuged at 7000 × g for 15 min at 15 °C. The pellet containing the OBs and insect debris was washed three times in deionized water by centrifugation as describe above. The final pellet of purified OBs was resuspended in 5 mL of sterile distilled water. OBs were dissolved in an alkaline solution (1.0 M sodium carbonate and 0.4 M sodium thioglycolate) to release the occlusion derived virus (ODV). The solution was centrifuged at 1000 × g to remove debris and undissolved OBs. ODV was purified in a continuous 10-45% sucrose gradient as previously described [61]. The band containing the ODV was collected, diluted with TE (10mM Tris and 1.0 mM EDTA, pH 8.0), and centrifuged at 22000 rpm (SW28 rotor Beckman ultracentrifuge) for 2 h at 4 °C. The pellet was suspended in 500 μL of TE and digested with proteinase K (25 μL, 20 mg/mL) along with 1% N-lauryl-sarcosine (final concentration) for 2 h at 37 °C. Viral DNA was further purified by a CsCl gradient approach as previously described [60] followed by dialysis of viral DNA against several changes of TE for 72 h at 4 °C. Purified DNA was resuspended in TE, and quantified spectrophometrically to ascertain DNA yield and purity.

DNA Sequencing and Analysis
OrleNPV genome was determined using the shotgun sequence approach as previously described [12]. In brief, total genomic DNA was sheared into small random fragments by nebulization and cloned to generate an overlapping genomic library. Purified template DNA from the genomic library was sequenced in an ABI Prisms® 3700 analyzer using the BigDye® terminator chemistry (v.3.0/3) as per Agencourt Bioscience Corporation (Beverly, MA, USA) sequencing specifications. The overall sequence obtained accounted for a 12-fold genomic coverage. The OrleNPV genome was assembled by Agencourt BioScience. Lasergene DNAStar SeqManPro (version 7.2) was used to manually edit and verify the contiguous sequence data. Genomic data was remotely submitted to Emboss software suite and start-to-stop translational searching was performed using getOrf program [62]. Open reading frames (ORFs) encoding 50 or more amino acids with minimal overlap were accepted as putative genes based on the established criteria [17]. NCBI web blast perl script was used to submit relevant ORFs to the GenBank. Homologue identification was done using standard protein-protein BLAST (blastp) with default settings [63].
Sequences 160 bp upstream of the predicted ORF start codons were analyzed for potential promoter motifs using WebLogo [64]. Upstream sequences were also scanned for exact matches using regular expression notation native to perl for commonly known baculovirus promoter elements, including the TATA sequence (TATA/TATAW), CAKT and DTAAG, within 160 bp or 40 to 20 bp regions. Further motif optimizations were preformed on BioProspector and AlignACE results using BioOptimizer v 3.0 [65]. Repeat regions including hrs and direct repeat regions were identified using Tandem repeat finder [66], Emboss Palindrome [62], and Reputer [67].

Gene homology and Phylogenetic Analysis
Predicted ORFs were compared with homologues in four alphabaculoviruses, which included AcMNPV; NCBI reference (NC_001623) MNPV OpMNPV (NC_001875), ChchNPV (NC_007151), EcobNPV (NC_008586). Gene parity plots were used to analyze the gene order of OrleNPV relative to those representatives of group I and group II alphabaculoviruses mentioned above. Phylogenetic tree was generated using concatenated amino acid sequences derived from lef-8 and pif-2 genes for the 56 complete baculovirus genomes that were available in the NCBI database at the time of analysis. The tree was inferred using neighbour-joining method using the MEGA5 program [68].  Table 2. OrleNPV genome annotation. 1 ORFs were named starting from the polyhedrin gene (ORF1) to ribonuclease reductase 1 (ORF135). 2 Non-coding sequences including direct repeat region and the hrs are printed in bold. The direction of each gene is indicated by arrow heads. 3 Promoter motif for each gene is designated as Early (E) or Late (L) based on the consensus elements. 4 Respective homologues for EcobNPV, ChchNPV, OpMNPV, and AcMNPV ORFs are shown followed by the percentage amino acid identities in the brackets.

Conclusions
OrleNPV genome is the eighth largest baculovirus genome sequenced to date (157, 179 bp) and encodes for a total of 135 ORFs with one being unique to the virus. Interspersed within the genome are three hrs, five bros, and three iap genes, which are common features in most baculovirus genomes. Duplicate copies of dbp, odv-e66, and p26 were present in OrleNPV genome. Based on phylogenetic analysis and gene arrangements, OrleNPV appears to be most closely related to the group II alphabaculoviruses EcobNPV, ApciNPV, EupsNPV, and ClbiNPV. Together, the OrleNPV genomic data would be helpful components for future analysis of emerging baculovirus genomes and also in providing a deeper understanding of the molecular basis of OrleNPV pathogenicity.