Genome Analysis of a Novel Clade II.b Alphabaculovirus Obtained from Artaxa digramma

Artaxa digramma is a lepidopteran pest distributed throughout southern China, Myanmar, Indonesia, and India. Artaxa digramma nucleopolyhedrovirus (ArdiNPV) is a specific viral pathogen of A. digramma and deemed as a promising biocontrol agent against the pest. In this study, the complete genome sequence of ArdiNPV was determined by deep sequencing. The genome of ArdiNPV contains a double-stranded DNA (dsDNA) of 161,734 bp in length and 39.1% G+C content. Further, 149 hypothetical open reading frames (ORFs) were predicted to encode proteins >50 amino acids in length, covering 83% of the whole genome. Among these ORFs, 38 were baculovirus core genes, 22 were lepidopteran baculovirus conserved genes, and seven were unique to ArdiNPV, respectively. No typical baculoviral homologous regions (hrs) were identified in the genome. ArdiNPV had five multi-copy genes including baculovirus repeated ORFs (bros), calcium/sodium antiporter B (chaB), DNA binding protein (dbp), inhibitor of apoptosis protein (iap), and p26. Interestingly, phylogenetic analyses showed that ArdiNPV belonged to Clade II.b of Group II Alphabaculoviruses, which all contain a second copy of dbp. The genome of ArdiNPV was the closest to Euproctis pseudoconspersa nucleopolyhedrovirus, with 57.4% whole-genome similarity. Therefore, these results suggest that ArdiNPV is a novel baculovirus belonging to a newly identified cluster of Clade II.b Alphabaculoviruses.


Introduction
Baculoviruses are double-stranded DNA viruses that specifically infect the larvae of insect orders Lepidoptera, Hymenoptera and Diptera [1]. A typical baculoviral life cycle produces two distinct progeny virions: occlusion-derived virus (ODV) that initiates primary infection in the midgut epithelia of insect larvae and budded virus (BV) that spreads systemic infection within the infected larval body [2]. Baculoviruses are widely applied as biocontrol agents for pest control and as protein expression vectors [3][4][5]. According to phylogenetic analysis, Baculoviridae is classified into four genera: Alphabaculovirus (nucleopolyhedroviruses (NPVs) that specifically infect lepidopteran species), Betabaculovirus (granuloviruses (GVs) that specifically infect lepidopteran species), Gammabaculovirus (NPVs that infect hymenopteran species), and Deltabaculovirus (NPVs that infect dipteran species) [6,7]. Further, Alphabaculovirus genus can be subdivided into two large lineages (Group I and Group II) and four small lineages, while Group I contains Clade I.a and Clade I.b and Group II contains Clade II.a, Clade II.b, and Clade II.c based on the phylogenetic analysis of late expression factor 8 (lef-8), late expression factor 9 (lef-9), per os infectivity factor 2 (pif-2), and polyhedrin (polh) genes [8].

Genomic DNA Sequencing and Bioinformatics Analysis
Reads of the ArdiNPV DNA were generated using the Roche 454 GS FLX pyrosequencing system. Subsequently, the reads were filtered and underwent de novo assembly into contigs using the 454 Newbler software (version 2.7). Gaps or ambiguous regions were further confirmed by PCR and Sanger sequencing. The complete genome and annotation information of ArdiNPV were submitted to GenBank (accession number: MN233792). The tool EMBOSS stretcher [15] was used to calculate the global similarity of the two sequences (ArdiNPV and EupsNPV).
The Tandem Repeats Finder [16] and BLAST [17] were employed to discover homologous regions (hrs). FGENESV0 [18] and ORF finder [19] were used to predict hypothetical ORFs of ArdiNPV genome, with a length at or above 50 codons and minimal overlap (less than 200 bp). Further, the BLAST algorithm was used to compare and identify hypothetical ORFs against known baculoviral proteins. Gene parity plot was performed to assess the pairwise ORF synteny between ArdiNPV and selected baculoviruses [20].

Phylogenetic Analysis
Thirty-eight core protein sequences [21,22] were extracted from 107 sequenced baculovirus genomes (including ArdiNPV) (Tables S2 and S3) and concatenated in the same order as that in the Autographa californica MNPV (AcMNPV) genome. Alignments were performed by ClustalW with default parameters [23]. The Maximum Likelihood method was employed to construct the phylogenetic tree using Mega7 software [24] based on the LG+G model, with 1000 bootstrap values to confirm the reliability of the tree [25]. For the alignment of ChaB and DNA binding protein (dbp), ProtTest 3.4.2 was employed to calculate the best fit model of amino acid substitution [26]. Phylogenetic trees of dbp were constructed utilizing the LG+G model and ChaB utilizing the JTT+L+G model. However, all other parameters were the same as described above.

Sequencing and Characterization of ArdiNPV Genome
Using the Roche 454 sequencing system, 124,744 high-quality reads of the ArdiNPV genome were generated. A complete genome of ArdiNPV was assembled using 454 Newbler software, with 230× genome coverage. The final confirmed ArdiNPV genome had a length of 161,734 bp, with 39.1% G+C content. Further, it contained 149 putative open reading frames (ORFs) beyond 50 condons, covering 83% of the ArdiNPV genome (Table S1). The polyhedrin gene was designated as the first ORF, and the first A of its initiation codon was defined as the start of the genome, according to the convention. In addition, 73 and 76 ORFs were in the clockwise and counterclockwise orientations, respectively, based on transcription direction of the polyhedrin gene. Using the BLAST algorithm, the following genes were detected in the ArdiNPV genome: 38 baculovirus core genes (red), 23 lepidopteran baculovirus conserved genes (blue), 71 other baculovirus common genes (gray), and 10 bro genes (purple) (Figure 1). Moreover, seven hypothetical ArdiNPV unique genes (open arrows, Figure 1) were found without homolog sequences in the NCBI database.

Sequencing and Characterization of ArdiNPV Genome
Using the Roche 454 sequencing system, 124,744 high-quality reads of the ArdiNPV genome were generated. A complete genome of ArdiNPV was assembled using 454 Newbler software, with 230X genome coverage. The final confirmed ArdiNPV genome had a length of 161,734 bp, with 39.1% G+C content. Further, it contained 149 putative open reading frames (ORFs) beyond 50 condons, covering 83% of the ArdiNPV genome (Table S1). The polyhedrin gene was designated as the first ORF, and the first A of its initiation codon was defined as the start of the genome, according to the convention. In addition, 73 and 76 ORFs were in the clockwise and counterclockwise orientations, respectively, based on transcription direction of the polyhedrin gene. Using the BLAST algorithm, the following genes were detected in the ArdiNPV genome: 38 baculovirus core genes (red), 23 lepidopteran baculovirus conserved genes (blue), 71 other baculovirus common genes (gray), and 10 bro genes (purple) (Figure 1). Moreover, seven hypothetical ArdiNPV unique genes (open arrows, Figure 1) were found without homolog sequences in the NCBI database.  Hrs are repeated sequences with an imperfect palindromic core that is present in many baculovirus genomes. These regions act as enhancers of early gene transcription and may serve as origins of replication [27][28][29]. Although there are 4 hrs in EupsNPV genome, no hrs were found in the ArdiNPV genome. Other baculoviruses that do not contain hrs include Chrysodeixis chalcites NPV (ChchNPV) [30], Pseudoplusia includens NPV (PsinNPV) [31], and Trichoplusia ni SNPV (TnSNPV) of Clade II.a [32], as well as Buzura suppressaria NPV (BusuNPV) [33] and Clanis bilineata NPV (ClbiNPV) [34] of Clade II.b.
Viruses 2019, 11, x FOR PEER REVIEW 4 of 11 lepidopteran baculovirus conserved genes, gray = other baculoviral common genes, open arrows = unique genes of ArdiNPV, and purple = bro genes. The inner-circle indicates the gene locations. The collinearly conserved region of lepidopteran baculoviruses is also indicated.

Classification of ArdiNPV Genes
Among 149 hypothetical ArdiNPV ORFs, 142 ORFs have homologs in other baculoviruses, including 15 genes potentially related to viral DNA replication, 12 to gene transcription, 31 to structure and assembly, 11 to oral infection, 27 auxiliary genes, and 46 unknown genes (Table 1). In addition, ArdiNPV was found to encode the following seven unique genes, in which, any homolog in GenBank was not found through a BLAST search: orf5 (61 aa), orf6 (252 aa), orf22 (70 aa), orf27 (95 aa), orf62 (57 aa), orf113 (94 aa), and orf115 (182 aa). Further studies are required to explore whether these are functional ORFs of ArdiNPV.

ArdiNPV Belongs to a Cluster Clade II.b of Group II Alphabaculoviruses Which Contains a Second Copy of dbp Gene
ArdiNPV contains five multi-copy genes, including 10 copies of baculovirus repeated ORFs (bros), three of calcium/sodium antiporter B (chaBs), two of dbps, three of inhibitor of apoptosis protein (iaps), and two of p26s. So far, there have been six defined lineages of iap genes, named iap-1 to iap-6 in baculoviruses [46]. All the three iap genes of ArdiNPV belong to the iap-2 lineage, we, therefore, named them iap-2_1, iap-2_2, and iap-2_3. Multi-copies of genes are normally generated by gene duplication during evolution, therefore, phylogenetic analysis of those genes may provide insight into their evolutionary history. Here we focused on the phylogeny of chaBs and dbps.
ChaB is conserved in all completely sequenced alphabaculovirus and also in some GVs. It is a putative DNA binding protein and contains a 60-residue conserved domain in the N-terminal region. In HearNPV, chaB homologous gene is involved in viral DNA replication and BV production and is transcribed in the early stage of infection [47]. Most alphabaculoviruses contain two copies of chaB, while ArdiNPV and three other Clade II.b viruses (BusuNPV, HespNPV, and OrleNPV) have three copies of chaB in their genomes. A phylogenetic tree of baculovirus ChaB was conducted and the result is shown in Figure 4. According to phylogenetic tree, the two copies of alphabaculovirus ChaB proteins (Type I and Type II) are well separated, while the third ChaB clustered with those from GVs and is grouped within Type I. Interestingly, the third ChaB of ArdiNPV, BusuNPV, HespNPV, and OrleNPV are closely related to the ChaB of Agrotis segetum GV (AgseGV) (with bootstrap value of 95%), suggesting the third ChaB may come from GVs ( Figure 4).

ArdiNPV Belongs to a Cluster Clade II.b of Group II Alphabaculoviruses Which Contains a Second Copy of dbp Gene
ArdiNPV contains five multi-copy genes, including 10 copies of baculovirus repeated ORFs (bros), three of calcium/sodium antiporter B (chaBs), two of dbps, three of inhibitor of apoptosis protein (iaps), and two of p26s. So far, there have been six defined lineages of iap genes, named iap-1 to iap-6 in baculoviruses [46]. All the three iap genes of ArdiNPV belong to the iap-2 lineage, we, therefore, named them iap-2_1, iap-2_2, and iap-2_3. Multi-copies of genes are normally generated by gene duplication during evolution, therefore, phylogenetic analysis of those genes may provide insight into their evolutionary history. Here we focused on the phylogeny of chaBs and dbps.
ChaB is conserved in all completely sequenced alphabaculovirus and also in some GVs. It is a putative DNA binding protein and contains a 60-residue conserved domain in the N-terminal region. In HearNPV, chaB homologous gene is involved in viral DNA replication and BV production and is transcribed in the early stage of infection [47]. Most alphabaculoviruses contain two copies of chaB, while ArdiNPV and three other Clade II.b viruses (BusuNPV, HespNPV, and OrleNPV) have three copies of chaB in their genomes. A phylogenetic tree of baculovirus ChaB was conducted and the result is shown in Figure 4. According to phylogenetic tree, the two copies of alphabaculovirus ChaB proteins (Type I and Type II) are well separated, while the third ChaB clustered with those from GVs and is grouped within Type I. Interestingly, the third ChaB of ArdiNPV, BusuNPV, HespNPV, and OrleNPV are closely related to the ChaB of Agrotis segetum GV (AgseGV) (with bootstrap value of 95%), suggesting the third ChaB may come from GVs ( Figure 4).  The phylogenetic tree of all baculoviral DBP proteins is shown in Figure 5. To date, dbp has been found in all sequenced baculoviruses, except CuniNPV, and 14 baculoviruses contain a second copy of dbp. Interestingly, all 14 baculoviruses that contain a second copy of dbp belonged to Clade II.b ( Figure 5). The lineage of the second copy of dbp (dbp-2) grouped well with that of dbp-1 of alphabaculoviruses (with bootstrap value of 99%). It is likely obtained by gene duplication of dbp-1 in the ancestor of these viruses and remained during their evolution. DBP can unwind short DNA strands, protect ssDNA from hydrolysis reactions, and function as an intermediate in the DNA replication process [48]. Further, the dbp gene is essential for BV production [49]. Although it remains unclear whether there is redundancy in the function of the second dbp copy, it can serve as a useful marker to distinguish viruses containing two copies of dbps from other members of Group II Alphabaculoviruses. bootstrap values. Values of more than 50 are shown on the branch. The taxonomy lineages of the viruses and the types of ChaB are indicated on the right. The lineage of the third ChaB of ArdiNPV, BusuNPV, HespNPV, and OrleNPV and that of GVs is markered with a yellow box.
The phylogenetic tree of all baculoviral DBP proteins is shown in Figure 5. To date, dbp has been found in all sequenced baculoviruses, except CuniNPV, and 14 baculoviruses contain a second copy of dbp. Interestingly, all 14 baculoviruses that contain a second copy of dbp belonged to Clade II.b ( Figure 5). The lineage of the second copy of dbp (dbp-2) grouped well with that of dbp-1 of alphabaculoviruses (with bootstrap value of 99%). It is likely obtained by gene duplication of dbp-1 in the ancestor of these viruses and remained during their evolution. DBP can unwind short DNA strands, protect ssDNA from hydrolysis reactions, and function as an intermediate in the DNA replication process [48]. Further, the dbp gene is essential for BV production [49]. Although it remains unclear whether there is redundancy in the function of the second dbp copy, it can serve as a useful marker to distinguish viruses containing two copies of dbps from other members of Group II

Conclusions
In this study, the ArdiNPV genome was annotated and compared against other baculoviruses. Our results showed that ArdiNPV is a novel Clade II.b member which was most closely related to EupsNPV, with 57.4% genome similarity. Interestingly, among the 107 baculoviruses with full genome sequence, we found that only the members of Clade II.b contain a second copy of dbp, suggesting that the two copies of dbp can serve as a marker of the lineage. Also, some of the Clade II.b viruses Viruses 2019, 11, 925 9 of 11 contain a third copy of ChaB. Previously, the hosts of the Clade II.b have been shown to be insects specifically infecting woody plants [8], indicating there are some common genetic and ecological features of this lineage. These discoveries allowed a greater understanding of baculoviral evolution from a wider perspective.

Conflicts of Interest:
The authors declare no conflict of interest.