Identification and Full Characterisation of Two Novel Crustacean Infecting Members of the Family Nudiviridae Provides Support for Two Subfamilies

Multiple enveloped viruses with rod-shaped nucleocapsids have been described, infecting the epithelial cell nuclei within the hepatopancreas tubules of crustaceans. These bacilliform viruses share the ultrastructural characteristics of nudiviruses, a specific clade of viruses infecting arthropods. Using histology, electron microscopy and high throughput sequencing, we characterise two further bacilliform viruses from aquatic hosts, the brown shrimp (Crangon crangon) and the European shore crab (Carcinus maenas). We assembled the full double stranded, circular DNA genome sequences of these viruses (~113 and 132 kbp, respectively). Comparative genomics and phylogenetic analyses confirm that both belong within the family Nudiviridae but in separate clades representing nudiviruses found in freshwater and marine environments. We show that the three thymidine kinase (tk) genes present in all sequenced nudivirus genomes, thus far, were absent in the Crangon crangon nudivirus, suggesting there are twenty-eight core genes shared by all nudiviruses. Furthermore, the phylogenetic data no longer support the subdivision of the family Nudiviridae into four genera (Alphanudivirus to Deltanudivirus), as recently adopted by the International Committee on Taxonomy of Viruses (ICTV), but rather shows two main branches of the family that are further subdivided. Our data support a recent proposal to create two subfamilies within the family Nudiviridae, each subdivided into several genera.


Introduction
Few viruses infecting marine invertebrates have been formally characterised with most tentatively assigned to families based upon morphological, developmental and replicative characteristics within the host cell [1]. This was largely due to the lack of crustacean cell lines for culturing viral infections, but with the recent development and increasing availability of high-throughput sequencing technologies comprehensive descriptions now facilitate classifications and taxonomic placement of novel viruses, at least to family level [2,3]. Using these approaches, several previously unclassified crustacean viruses have been assigned to the family Nudiviridae [4].
Nudiviruses infect a wide array of arthropods and exhibit nuclear replication. They have double-stranded, circular DNA genomes ranging in size between 96 and 232 kbp. Their virions are enveloped and contain rod-shaped nucleocapsids [4][5][6][7]. Although gene order is poorly conserved among nudivirus genomes [4], to date 31 core genes have been identified as being shared amongst all members of the Nudiviridae, including homologs of several baculovirus core genes [8,9]. Viruses now classified in the family Nudiviridae have previously been named 'non-occluded baculoviruses' [10] and 'intranuclear bacilliform viruses' [11]. However, nudiviruses have been shown to form a distinct lineage separate from the baculoviruses [4], despite sharing a set of core genes and several ultrastructural features. The family Nudiviridae was initially named to reflect the lack of viral occlusion bodies ('nudi-'meaning bare), differentiating these non-occluded viruses from the occluded baculoviruses. However, there are now several examples of viruses classified in the family Nudiviridae, based on gene content and phylogeny, for which occlusion bodies have been observed and where genes encoding (structural) homologs of the baculovirus polyhedrin protein have been identified [8,12,13]. In addition, endogenous viral elements (EVEs) derived from nudiviruses have been reported to be incorporated into the genomes of multiple arthropod species [14][15][16][17].
The subdivision of the family Nudiviridae into four genera has recently been approved by the International Committee on Taxonomy of Viruses (ICTV) and will be published later this year (ICTV release 2021). The genera Alphanudivirus and Betanudivirus contain species affecting insect hosts. Penaeus monodon nudivirus (PmNV) was the first aquatic species to be placed within the Nudiviridae [12] and is now grouped with Homarus gammarus nudivirus (HgNV) [18] within the genus Gammanudivirus. Isolates of these two viruses were found in aquatic hosts from marine environments. A virus infecting cranefly larvae [8] belongs to the species Tipula oleracea nudivirus (ToNV) and was assigned to the genus Deltanudivirus. A third aquatic virus, Dikerogammarus haemobaphes nudivirus (DhNV) infecting a peracarid host from a freshwater environment was also tentatively placed within the family Nudiviridae, but has not yet been formally assigned to a genus. Due to the low level of similarity between the encoded proteins predicted for DhNV, when compared to members of the Gammanudivirus and Deltanudivirus, Allain et al. [19] proposed the erection of the additional genus Epsilonnudivirus to contain peracarid-infecting nudiviruses.
As highlighted previously [1,19], it is very likely there are additional nudiviruses infecting aquatic crustacean hosts from both marine and freshwater systems. Virome analysis of the European brown shrimp Crangon crangon identified sequences homologous to nudiviruses that formed three large contigs, but which could however not be assembled into a full genome sequence [20]. Crangon crangon bacilliform virus (CcBV) has been described in the brown shrimp, caught in the Clyde Estuary, UK [21]. The infection was initially described as an intranuclear bacilliform virus, owing to the ultrastructure, morphology and size of the virions [21]. This virus targets the hepatopancreatic epithelial cells, and infected cells display hypertrophied nuclei with marginalised chromatin and an eosinophilic inclusion body. The rod-shaped nucleocapsids are enveloped with a characteristic bulb shaped protuberance of the envelope at one end and measure 280 nm × 72 nm [21]. The virus has since been found to be ubiquitous within this species in European waters [22,23]. Similarly, Carcinus maenas bacilliform virus (CmBV) was described infecting the European shore crab from the Clyde and Tyne estuaries in the UK [23]. The virus also targets the hepatopancreatic epithelial cells and displays pathology comparable to that described for CcBV. CmBV has since been identified affecting crabs in both their native ranges in Northern Europe and invasive ranges in Atlantic Canada [24]. We re-isolated CcBV and CmBV from shrimp and crab tissues sampled in UK and Canadian waters, respectively, to enable in depth characterisation of these viruses.
Here, our aim was to collect histological, ultrastructural and genomic data from bacilliform viruses infecting the two aquatic crustacean species, C. crangon and C. maenas, and compare these characteristics to what is known from nudivirus infections from terrestrial and aquatic environments. First, collected samples of both crustacean species were analysed using histology and electron microscopy to identify the infection and analyse the virions morphologically. Next, genome sequencing and de novo assembly was carried out to determine the genome structure, the presence/absence of the nudivirus core genes and to compare related viruses using phylogenetics. Utilizing the new latinized binominal method for the naming of virus species, we propose the virus species Gammanudivirus cracrangoni and Gammanudivirus camaenasi, with the common names Crangon crangon nudivirus (CcNV) and Carcinus maenas nudivirus (CmNV), respectively, to be used in the rest of this manuscript.

Sample Collection
Crangon crangon specimens were caught off the coast of Belgium as described by Van Eynde et al. [22]. Carcinus maenas were collected from the shoreline in Canada as described by Bojko et al. [24]. Hepatopancreas samples were dissected from C. crangon and C. maenas samples and fixed in Davidson's sea water fixative for histology and 2.5% glutaraldehyde in 0.1 M sodium cacodylate buffer for electron microscopy. Hepatopancreas from each animal was also dissected for molecular analysis, C. crangon were snap frozen in liquid nitrogen and stored at −80 • C and samples from C. maenas were fixed in 100% ethanol.

Histology
Samples were fixed in Davidson's sea water fixative for a minimum of 24 h before samples were transferred to 70% industrial methylated spirit (Ethanol and Methanol mixture, Pioneer Research Chemicals Ltd., Colchester, UK). Fixed samples were processed to wax in a vacuum infiltration processor (Leica Peloris) using standard protocols. Sections were cut at a thickness of 3-5 µm on a rotary microtome and were mounted onto glass slides before staining with haematoxylin and eosin (H&E). Stained sections were analysed by light microscopy (Nikon Eclipse E800) and digital images were taken using the Lucia™ Screen Measurement System (Nikon, Surbiton, UK).
Samples were analysed for presence of viral inclusions within hypertrophied nuclei of the epithelial cells within the hepatopancreas tubules. A grading scheme (Grade 0 to Grade 4) as described by Stentiford and Feist [23] was used to determine high and low prevalence of the viral infections, for Grade 0 the viral infection appeared to be absent from the histology section whereas Grade 4 described that most cells within the hepatopancreatic tubules showed infection.

Transmission Electron Microscopy (TEM)
Hepatopancreas tissue was fixed in 2.5% glutaraldehyde (Agar Scientific, Stansted, UK) in 0.1 M sodium cacodylate buffer (pH 7.4) (Agar Scientific, Stansted, UK) for a minimum of 2 h at room temperature and rinsed in 0.1 M sodium cacodylate buffer (pH 7.4). Tissues were post-fixed for 1 h in 1% osmium tetroxide (Agar Scientific, Stansted, UK) in 0.1 M sodium cacodylate buffer. Samples were washed in three changes of 0.1 M sodium cacodylate buffer before dehydration through a graded acetone series. Samples were embedded in Agar 100 epoxy (Agar Scientific, Agar 100 premix kit medium) and polymerised overnight at 60 • C in an oven. Semi-thin (1-2 µm) sections were stained with Toluidine Blue for viewing with a light microscope to identify suitable target areas. Ultrathin sections (70-90 nm) of these areas were mounted on uncoated copper grids (Agar Scientific, Stansted, UK) and stained with 2% aqueous uranyl acetate (Agar Scientific, Stansted, UK) and Reynolds' lead citrate [25]. Grids were examined using a JEOL JEM 1400 transmission electron microscope and digital images captured using an Advanced Microscopy Techniques (AMT) XR80 camera and AMT V602 software.

DNA Extraction
Total DNA was extracted from hepatopancreas tissue samples of C. crangon using the DNeasy Blood and Tissue kit (Qiagen) DNA extraction kit (using manufacturer's instructions) in preparation for Illumina sequencing. Additionally, DNA was also extracted from these tissue samples for Nanopore sequencing using an adapted phenol:chloroform:isoamyl alcohol (PCI) method (25:24:1). Roughly 10 mg of tissue was added to 900 µL of Lifton's buffer (100 mM EDTA, 25 mM Tris-HCl pH 7.5, 1% SDS) with Proteinase K (0.2 mg/mL end concentration). Samples were homogenised very gently with a pellet pestle for 30 s and incubated at 56 • C overnight. One hundred microlitres of 5 M potassium acetate were added and mixed by gentle inversion prior to a 30 min incubation on ice and a 10 min centrifugation at 10,000 rpm. DNA was extracted from this supernatant with an equal volume of PCI and cleaned with a two-step ethanol precipitation. All mixing was done by gentle inversion and pellet resuspension was completed overnight in TE buffer without agitation [26]. DNA was quantified by Quantus fluorometer (Promega, UK) and checked by gel electrophoresis. Unless otherwise stated all chemicals were provided by Sigma-Aldrich (Gillingham, UK). The latter extraction method was also used to extract DNA from hepatopancreas tissue samples of C. maenas.

DNA Library Construction and Sequencing
DNA sequencing libraries were prepared for Illumina sequencing using the Nextera XT library preparation kit (Illumina, San Diego, CA, USA) and sequenced on an Illumina MiSeq using V3 chemistry (Illumina; 2 × 150 bp for C. crangon (5 individual samples pooled) and 2 × 300 bp for C. maenas (single sample)). A Nanopore sequencing library was constructed using the ligation sequencing kit SQK-LSK109 and the native barcoding kit (EXP-NBD103) for 5 C. Crangon samples (Oxford Nanopore Technologies Ltd., Oxford, UK). The barcoded DNA samples were pooled in equimolar concentrations and the prepared library loaded onto a SpotON flowcell R9.4.1 (FLO-MIN106) and sequenced for 11 h on a MinION device (Oxford Nanopore Technologies). Data were base-called and demultiplexed locally on a laptop using Albacore 2.3.1 (base-calling software released by Oxford Nanopore Technologies).

Sequence Assembly
Illumina reads obtained for the C. maenas and C. crangon samples were quality-trimmed using Fastp v0.20.0 ( [27]; default parameters) and normalised using BBnorm which is part of the BBMap suite ( [28]; default parameters). The quality-trimmed normalised Illumina reads of the C. maenas sample were assembled de novo using Unicycler v0. 4.8 ([29]; using the --no_correct parameter). For the C. crangon sample, all Nanopore reads were combined into a single fastq file, followed by demultiplexing and adapter and barcode sequence removal using Porechop v0.2.3 (default parameters) [30]. A hybrid de novo genome sequence assembly was performed using a combination of the quality-trimmed normalised Illumina and Nanopore reads with Unicycler v0.4.8 (using the --no_correct parameter). Resulting assembled contig sequences were submitted to similarity searches using blastn v2.9.0+ [31] and the NCBI nucleotide database (accessed on 5 July 2020) to identify potential viral sequences.
Sequence statistics and general manipulation of fasta/fastq files were performed using SeqKit v0.11.0 [32]. All reads for each sample were mapped to the corresponding assembled CmNV or CcNV genome sequences using Minimap2 2.17-r941 [33]. SAMtools v1.9 [34] was used to convert the resulting SAM files into BAM format and to sort and index these prior to statistical analysis and visualisation of the mapping results using QualiMap v2.2.2 [35]. The genome sequences were screened for tandem repeats using the Tandem Repeats Finder tool v4.09 ( [36]; default settings and alignment score > 100).

Gene Prediction and Annotation
Open reading frames were predicted using a selection of gene prediction tools, including Prokka v1.14.0 ( [37]; default settings and --kingdom Viruses), fgenesv0 (http: //www.softberry.com, accessed on 3 September 2020; standard code and circular sequence), GenemarkS [38]; using both the Intronless eukaryotic and Virus sequence types and genetic code 11, and Vgas [39]; using ATG as start codon type). Putative protein sequences were further analysed when they were predicted by two or more tools. The protein sequences were annotated using blastp v2.9.0+ sequence similarity searches ( [40]; E-value cut off 0.001) against a reference database of nudivirus protein sequences including PmNV (KJ184318.

Gene Orthology and Nudivirus Core Gene Analysis
Comparative proteome analyses across all nudiviruses, including the two newly sequenced viruses, were performed by blastp v2.9.0+ sequence similarity searches (E-value < 0.001) using the protein sequences of each virus and the nudivirus protein reference database described above. OrthoFinder v2. 3.11 ([41]; parameters: -A muscle -M msa -T raxml) was used to identify orthologous groups of proteins (orthogroups) in the proteomes of all of the nudiviruses and AcMNPV (NC_001623.1) as outgroup. Reciprocal BLAST searches were conducted using the protein sequences of all nudiviruses (blastp v2.9.0+; evalue cut-off 0.001; [40]) and the results were used together with the orthologous genes identified by OrthoFinder to identify core nudivirus genes. Genome maps showing the position and orientation of nudivirus core genes were created using the R packages gggenes (available at https://github.com/wilkox/gggenes; accessed on 13 February 2021) and ggplot2 [42] in RStudio v1.2.1335 [43]. To allow for better comparisons, the genome sequences were rearranged, such that all linear representations of the viral genomes started with the DNA polymerase gene.

Histological and Ultrastructural Observations
A total of 50 C. crangon were sampled, 90% of which were shown to be positive for infection with a bacilliform virus via histological analysis ( Figure 1). Infected nuclei within the hepatopancreatic epithelial cells were hypertrophied with marginalised chromatin and contained eosinophilic inclusion bodies ( Figure 1A). By applying a previously established severity grading scheme [23] for the pathology level of the infections, we revealed that 10% of shrimp were Grade 0 (uninfected), 20% Grade 1, 20% Grade 2, 30% Grade 3 and 20% Grade 4 infected. TEM showed enveloped virions with rod-shaped nucleocapsids and with a characteristic expansion of the envelope at one end ( Figure 1B), as previously described for the bacilliform virus [21].
As described by Bojko et al. [24], a bacilliform virus was also identified in 17.4% of C. maenas crabs sampled from Canada (n = 432). Pathology was similar to that described by Stentiford and Feist [23], with hepatopancreatic epithelial cells showing enlarged nuclei with marginalised chromatin and eosinophilic inclusion bodies ( Figure 1C). Using the severity grading scheme [23] the majority of infections were shown to be Grade 1 (76%, 20% Grade 2, 2% Grade 3 and 2% Grade 4). TEM revealed rod-shaped nucleocapsids within an envelope and with an extension at one end, however, unlike the virus in C. crangon, some of these nucleocapsids appeared as slightly bent and u-shaped within the envelopes ( Figure 1D). The crab-derived virions were also slightly larger measuring 340 nm by 75 nm (n = 30), compared to the CcNV virions which measured 280 nm by 71.8 nm [21].

De Novo Genome Assembly of Two Novel Nudiviruses
Illumina sequencing resulted in the generation of 843,215 and 2,601,878 read pairs for the Crangon crangon and Carcinus maenas samples, respectively. The number of read pairs remaining for the shrimp and crab samples after quality-trimming were 837,743 (99.4%) and 1,720,539 (66.1%) read pairs, respectively, and after normalisation, 175,562 and 250,312 read pairs, respectively. A total of 613,455 Nanopore reads were obtained with lengths ranging from 62 bp to 46,624 nt and a mean length of 748 nt. After removal of barcode sequences and reads <1000 nt in size, 66,775 reads remained with a mean length of 2184 nt.
Assembly of the Crangon crangon quality-trimmed and normalised Illumina sequences alone resulted in the generation of four contigs, of which the largest two contigs (89,711 nt and 41,818 nt in size) were shown to represent novel nudivirus genome sequences by blastn similarity searches. The third contig (11,337 nt) represented Crangon crangon 28S and 18S ribosomal DNA sequences, whereas the fourth contig (291 nt) did not have any hits with any of the sequences in the NCBI database. A hybrid assembly using both the Illumina read sequences and the Nanopore sequences produced two contigs, the largest of which represented a circular putative CcNV genome sequence of 132,068 nt in length, with a read coverage of 338x and a GC content of 29.5% (Figure 2A). The second contig (11,337 nt) was identical to the third contig generated by the Illumina sequences assembly only and represented Crangon crangon 28S/18S ribosomal DNA sequences. De novo assembly of the C. maenas sample using the normalised quality-trimmed Illumina reads resulted in 27 assembled contig sequences, the largest of which was a circular sequence of 113,840 nt in length, with a read coverage of 995× and a GC content of 38.8% ( Figure 2B). Using blastn similarity searches, this sequence also represented a novel putative nudivirus genome sequence; the other sequences showed high similarity to 18S ribosomal RNA, mitochondrial and microsatellite sequences of the host.

Open Reading Frame (ORF) Prediction
Annotation of the genome sequences identified 106 (ranging from 51 to 1488 nt) and 99 (ranging from 41 to 1828 nt in length) predicted protein sequences for CcNV and CmNV, respectively (based on a consensus prediction of open reading frames (ORFs) by four different software tools) as shown in Tables 1 and 2, and Tables S1 and S2 in the Supplemental File S1. The gene densities for CcNV and CmNV were 1.3 and 1.2 genes per kb, respectively.

Tandem Repeats
A total of 36 tandem repeats covering 19 separate regions were identified in the CcNV genome, ten of which overlapped predicted ORFs (Table S3). The lengths of these regions ranged from 58 to 367 nt, covering 3057 nt or 2.3% of the genome. In the CmNV genome, 39 tandem repeats were found that covered 22 genome regions. These regions ranged from 56 to 552 nt in length, with a total length of 3838 nt (3.4% of the genome) and overlapped with eight predicted ORFs (Table S4).

Protein Orthology and Gene Content
OrthoFinder assigned 1265 of the 1786 nudivirus and Autographa californica multiple nucleopolyhedrovirus genes (71.0% of total) to 237 orthogroups. In total, 50% of all genes were in orthogroups with five or more genes (G50 = 5) and were contained in the largest 103 orthogroups (O50 = 103). There were seven orthogroups with all species present and six of these consisted entirely of single-copy genes. The orthogroups and the single copy orthologues are listed in Table S5 in the Supplemental Material, along with gene descriptions derived from the corresponding NCBI nucleotide records. Table 1. Identified open reading frames (ORFs) in the CcNV genome. The start and end positions of the ORFs are shown, as well as the strand it was found on, the deduced protein length (in amino acids), the best hits obtained using blastp similarity searches using the NCBI nr protein database (E-value < 0.001), and additional information obtained from annotated orthogroups (Table S5). Nudivirus core genes are highlighted in grey.  Table 2. Identified open reading frames (ORFs) in the CmNV genome. The start and end positions of the ORFs are shown, as well as the strand it was found on, the deduced protein length (in amino acids), the best hits obtained using blastp similarity searches using the NCBI nr protein database (E-value < 0.001), and additional information obtained from annotated orthogroups (Table S5). Nudivirus core genes are highlighted in grey.   We were not able to find an ortholog of the baculovirus major capsid protein VP39 in CcNV and CmNV, but we did find a homolog of the major capsid protein identified through proteomic analysis in ToNV (ToNV ORF87; CcNV ORF14 and CmNV ORF15) [8]. This protein was found in 4 different orthogroups: OG0000071 (which includes DiNV ORF99, ENV ORF57, GbNV ORF64, KNV ORF69, MNV ORF63, OrNV ORF15 and TNV ORF18), OG0000729 (ToNV ORF 87), OG0000204 (HzNV-1 ORF89 and HzNV-2 ORF52 and OG0000098 (CcNV ORF14, CmNV ORF15, DhNV ORF14, HgNV ORF15 and PmNV ORF22 (Table S5). Reciprocal BLAST searches confirmed that the VP39 protein sequences of the nudiviruses in OG0000071 were similar, but different from the proteins in the other orthogroups. The protein sequences across OG0000729, OG0000204, OG0000098 were also found to be similar using BLAST searches.

Gene_ID (ORF) Start
Using orthology analysis and reciprocal BLAST searches we could not identify p6.9 in the CcNV and CmNV proteomes. It has been shown that the p6.9 homolog of OrNV and related alphanudiviruses from drosophilid hosts is a fusion of the homologs of GbNV p6.9 (ORF73) and GbNV ORF72 [4]. In other nudiviruses sequenced to date, this gene has been identified as an independent ORF; however, for some nudiviruses including DhNV, HzNV-1, HzNV-2 and PmNV this gene was not annotated in NCBI. The most likely explanation for this could be the repetitive serine (S) and arginine (R) sequences in the deduced protein sequences, which may not be identified by gene prediction tools as being part of potential protein sequences. Bezier et al. [8] identified separate ORFs for p6.9 in HzNV-1 (ORF142; position 210,245-210,493), HzNV-2 (position 24,375-24,127) and PmNV (position 64,881-65,078) and in the current study, this gene was identified in CcNV (72,007-72,231) and CmNV (position 45,460-45,651) using custom BLAST searches. However, a p6.9 homolog could not be found in the DhNV genome.
Another protein previously considered as a core gene, LEF-5, was found in the proteomes of all nudiviruses, apart from ENV, KNV and MNV. Using the vgas gene prediction tool [39] that was used for CcNV and CmNV in this study, we identified this gene in the genomes of all three nudiviruses: ENV (position 74,295-74,462), KNV (position 120,410-120,246) and MNV (position 57,038-57,274). Furthermore, GbNV_51-like was present in the proteomes of all nudiviruses apart from HzNV-1. However, in a previous study, this gene was found to be located between ORF57 and ORF 58 (position 79,463 to 79,900) in the HzNV-1 genome [8]. Previous work has also shown that the genes encoding for P47 and LEF-9 homologs are fused in the HzNV-1 and HzNV-2 genomes [8,18].
Remarkable was also the absence of the three thymidine kinase genes in CcNV, while in GmNV and in all other sequenced nudivirus genomes, three genes for thymidine synthesis have been identified (tk1-3). Compared to most other nudiviruses found in non-aquatic host species, CcNV has additional copies of the gene encoding orthologues for the baculovirus ODV-E66 protein (a total of five copies). Orthology analysis indicated that CcNV ORF29 originated from an ancestral gene shared by all other nudiviruses that contain odv-e66 in their genome and that it was duplicated three times, resulting in ORFs 30, 31 and 33 ( Figure 3B). An additional ODV-E66 homolog (CcNV ORF23) appears to be acquired by CcNV independently. All ODV-E66 genes are in close vicinity to each other in the genome and the similarity between these copies ranges between 42 and 85 percent at the nucleotide level. Similarly, gene duplication events resulted in multiple copies of odv-e66 in the genomes of DhNV (ORF18, ORF21 and ORF23), PmNV (ORF34 and ORF36) and HgNV (ORF24 and ORF25). Interestingly, only a single copy of this gene was found in CmNV (ORF56; Figure 3B). The baculovirus ODV-E66 protein plays an important role in oral infectivity and may help the virus to overcome the peritrophic membrane lining the midgut [50]. Whether increased levels of ODV-E66 would give these two nudiviruses a particular benefit in their crustacean hosts or their environment awaits further analysis.
To further analyse the evolutionary relationship between the various nudiviruses, we conducted phylogenetic analysis as described above, but this time based on a supermatrix of the deduced amino acid sequences of all genes shared by nudiviruses (5244 amino acids; the p6.9 protein sequence was excluded for these analyses as no sequence was (yet) obtained for DhNV), the nudiviruses core genes ( Figure 3B). This analysis further emphasized the demarcation of two major lineages within the nudiviruses as a whole.

Discussion
We used genomic data to characterise two previously described viruses infecting crustaceans, Crangon crangon bacilliform virus and Carcinus maenas bacilliform virus. Histological and ultrastructural details highlight morphological similarities between these viruses and viruses described from the hepatopancreas of other crustacean species (Penaeus monodon, Homarus gammarus and Dikerogammerus haemobaphes). Infected nuclei within the hepatopancreatic epithelial cells appeared hypertrophied with marginalised chromatin and contained eosinophilic inclusion bodies. Although appearing similar histologically, electron microscopy identified differences in the appearance of the virions between the two infections. Virions in C. crangon appeared smaller in size with straight nucleocapsids within the envelope, virions in C. maenas were larger and possessed a curved, slightly bent nucleocapsid within the envelope. Severity of infection also appeared to show variation with a larger number of C. crangon samples shown to be infected with Grade 3 and 4 infections when compared to C. maenas.
There are, however, currently no cell lines available for crustacean tissues [51], meaning that many classical viral classification techniques are not available to characterise crustacean viruses. The ability to generate full-length DNA genomes, as opposed to a set of separate contig sequences of novel pathogens), however significantly improves the classification and characterisation of crustacean DNA viruses. We have developed protocols to enable sequencing of large, non-culturable viral DNA genomes, involving viral DNA purification steps, sequencing on an Illumina MiSeq, and subsequent analysis of sequence data. One of the intrinsic limitations of Illumina sequencing, is that it is not capable of sequencing (large) repeat regions and regions of low complexity due to read length limitations, which in our case also prevented the assembly of full-length genome sequences. The additional use of MinIon technology provided long read sections of the genomes, allowing us to determine the order of the available contigs and to obtain complete, circular genomes. We used the complete genomes to study their evolutionary relationship and their position within described viral families.
The genome size, number of ORFs and presence of conserved genes in the two novel virus sequences shows that the two crustacean dsDNA viruses belong to the family Nudiviridae. Utilizing the new latinized binominal method for the naming of virus species, we propose they are named Gammanudivirus cracrangoni and Gammanudivirus camaenasi, with the common names Crangon crangon nudivirus (CcNV) and Carcinus maenas nudivirus (CmNV) respectively. Their position in phylogenetic trees ( Figure 3) indicated that these viruses each belong to separate clades and are genetically distinct from other viruses within these clades, thus should be classified as distinct species.
When comparing the gene content, it became clear that CcNV lacked the three thymidine kinase genes (tk1-3), found thus far in all other nudiviruses. Instead of encoding its own enzymes to synthesize thymidine monophosphate, a crucial molecule in DNA synthesis and viral DNA replication, CcNV apparently relies on the host's machinery for this process. A similar situation is seen in baculoviruses that also do not have any tk genes. Whether the absence of these genes affects the fitness of this virus is not clear, but from the histological images presented here ( Figure 1) and reported in other studies [21], it appears that CcNV replicates to high levels, despite the absence of viral tk genes.
Homologs for p6.9 were found in all nudiviruses, with the exception for DhNV. This gene encodes a small arginine-and serine-rich protein and plays an essential role in various viral physiological processes during infection [52,53]. It is conserved across all baculoviruses and nudiviruses and it is highly likely that this gene is also present in DhNV and that it has yet to be discovered.
The presence or absence of an obvious ortholog of the baculovirus major capsid protein VP39 (or evolutionary related protein) has not been identified with certainty for all nudiviruses. Alphanudiviruses and Betanudiviruses all have such an ortholog. In contrast, viruses in the genera Deltanudivirus and Gammanudivirus appeared to have a functional homolog in the form of a capsid protein of approximately 34 kDa, as first identified in ToNV (ORF087) through proteomic analysis [8]. Based on our analyses, we consider CcNV ORF14, CmNV ORF15, DhNV ORF14, HgNV ORF15, HzNV-1 ORF89, HzNV-2 ORF52, PmNV ORF22 and ToNV ORF87 to be vp39 homologs. We have therefore found VP39 homologs in all described nudiviruses and have included this gene within the core gene list, bringing the total number of nudivirus core genes to 28, as presented here. Interestingly, we show that nudiviruses described from aquatic hosts (DhNV, CcNV, PmNV, HgNV and CmNV) possess a highly similar gene order and orientation within the genome when compared with nudiviruses from terrestrial hosts. Viral genomes described from hosts in marine environments (CcNV, PmNV, HgNV and CmNV) showing identical gene order and orientation within the genomes (Figure 4). The genome of DhNV, described in a freshwater host, displayed variation in the orientation of the pif-0/p74 gene and in the gene order and orientation of the pif-6, ac81 and helicase 2 genes when compared with viral genomes from hosts in the marine environment. As highlighted previously, we expect further nudivirus infections affecting aquatic crustaceans to be identified from both marine and freshwater systems. Further comparisons between viral genomes from the two environments, in particular gene order and orientation, will be possible as these are described.
Within the family Nudiviridae, the ICTV has recently approved a proposal for four parallel genera [4]. A subdivision of the family into five parallel genera (Alphanudivirus to Epsilonnudivirus) has also been proposed [8,12,19], however, after studying core gene content and comparing the data between all described nudivirus species, we believe this subdivision in five parallel genera is no longer supported by the data. After the addition of the two aquatic nudiviruses (CmNV and CcNV, (Figure 3)) described in this paper, we observe a clear demarcation of two major clades) in the nudivirus phylogeny. The first clade contains nudiviruses found in Drosophila (Diptera: suborder Brachycera), Coleoptera and Orthoptera, while the second clade harbours nudiviruses isolated from crustaceans, Heliothis zea moths (Lepidoptera) and the marsh crane fly Tipula oleracea (Diptera: suborder Nimatocera). Within these two clades, subclades begin to appear that may be the basis for distinct taxonomic genera. We therefore propose to create two subfamilies within the family Nudiviridae, and sub-divide these further into genera. Recently, Liu et al. [9] came to a similar conclusion after analysing nudiviruses from two corn rootworm species (order Coleoptera) and gave a suggestion for the naming of the respective taxons. The subfamilies would be called Alphanudivirinae and Betanudivirinae, and the new CcNV and CmNV nudiviruses would group together with the other crustacean nudiviruses. However, after adding CcNV and CmNV sequences the phylogeny no longer shows a clear demarcation between the species placed by Liu et al. [9] in a proposed genus Helnudivirus (HzNV1-2 and ToNV) and the crustacean infecting nudiviruses, for which the genus name Malnudivirus was proposed (derived from the term class Malacostraca). Based on our data, a further subdivision of the subfamily Betanudivirinae is therefore not currently warranted.

Conclusions
In conclusion, we have shown, using histological, ultrastructural and genomic data that the bacilliform viruses infecting the two aquatic crustacean species, C. crangon and C. maenas, should be classified in separate genera of the family Nudiviridae under the species Gammanudivirus cracrangoni (Crangon crangon nudivirus) and Gammanudivirus camaenasi (Carcinus maenas nudivirus). The compiled genomic and phylogenetic information presented here provide further support for an alternative structure of the nudivirus family, as recently also proposed by Liu et al. [9], in which the Nudiviridae family will harbour the two subfamilies Alphanudivirinae and Betanudivirinae, each of which is subdivided into several genera.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10.339 0/v13091694/s1, Table S1. Full annotation table for CcNV, detailing the position of predicted gene sequences and their annotations based on blastp similarity searches (best hit; p < 0.001; NCBI nr protein database) and InterProScan results. Table S2. Full annotation table for CmNV, detailing the position of predicted gene sequences and their annotations based on blastp similarity searches (best