Discovery of Two Novel Negeviruses in a Dungfly Collected from the Arctic

Negeviruses are a proposed group of insect-specific viruses that can be separated into two distinct phylogenetic clades, Nelorpivirus and Sandewavirus. Negeviruses are well-known for their wide geographic distribution and broad host range among hematophagous insects. In this study, the full genomes of two novel negeviruses from each of these clades were identified by RNA extraction and sequencing from a single dungfly (Scathophaga furcata) collected from the Arctic Yellow River Station, where these genomes are the first negeviruses from cold zone regions to be discovered. Nelorpivirus dungfly1 (NVD1) and Sandewavirus dungfly1 (SVD1) have the typical negevirus genome organization and there was a very high coverage of viral transcripts. Small interfering RNAs derived from both viruses were readily detected in S. furcata, clearly showing that negeviruses are targeted by the host antiviral RNA interference (RNAi) pathway. These results and subsequent in silico analysis (studies) of public database and published virome data showed that the hosts of nege-like viruses include insects belonging to many orders as well as various non-insects in addition to the hematophagous insects previously reported. Phylogenetic analysis reveals at least three further groups of negeviruses, as well as several poorly resolved solitary branches, filling in the gaps within the two sub-groups of negeviruses and plant-associated viruses in the Kitaviridae. The results of this study will contribute to a better understanding of the geographic distribution, host range, evolution and host antiviral immune responses of negeviruses.


Introduction
Insect-specific viruses (ISVs) are those viruses that are confined exclusively to insects and which are unable to replicate in vertebrates or vertebrate cells [1]. ISVs have been largely overlooked for a long time because they do not cause disease in vertebrate hosts and usually have no economic impact on animals or plants. The recent rise of next generation sequencing and metagenomics has led to the discovery of a growing number of novel ISVs [2]. These have mostly been discovered in hematophagous insects, especially mosquitoes, as a result of a research into the risks that mosquito-borne viruses pose to the health of humans and domesticated animals [3]. Interestingly, the majority of ISVs are phylogenetically related to the classical arthropod-borne viruses (arboviruses) transmitted by mosquitoes. It has therefore been hypothesized that ISVs might be the ancestors of arboviruses and can act as natural regulators of the infection, replication and transmission of arboviruses [3,4]. Most ISVs to RNA keeper tissue stabilizer (Vazyme, Nanjing, China) at low temperature (4 • C) and sent to our laboratory for RNA extraction. Total RNAs were extracted using TRIzol reagent (Invitrogen, Waltham, MA, USA) following the manufacturer's instructions and subdivided to provide samples for transcriptome (approximately 2 µg), small RNA (sRNA) (approximately 5 µg) and virus genome Sanger sequencing (approximately 5 µg).

Transcriptome and sRNA Sequencing
For transcriptome sequencing, ribosomal RNA (rRNA) was first removed from the total RNA using Ribo-Zero Gold rRNA Removal Kit (Illumina, San Diego, CA, USA) before preparing the sequencing library. Paired-end (150 bp) sequencing of the RNA library was performed on the Illumina HiSeq 4000 platform (Illumina, San Diego, CA, USA) by Novogene (Tianjin, China). The transcriptome reads were quality trimmed and assembled de novo using the Trinity software (Version 2.8.5) with default parameters [21].
The cDNAs of the sRNA library were prepared using the Illumina TruSeq Small RNA Sample Preparation Kit (Illumina, San Diego, CA, USA). sRNA sequencing was performed on an Illumina HiSeq 2500 by Novogene (Tianjin, China). Preliminary treatment of sRNA raw data (removal of adapter, low quality, and junk sequence) was carried out as described previously [22].

Host Insect Identification
To accurately identify the dungfly species, the assembled contigs from the transcriptome were compared using Blastn with all the available cytochrome oxidase subunit 1 (COI) barcode records from the Barcode of Life Data (BOLD) Systems (http://www.boldsystems.org/) and the National Center for Biotechnology Information (NCBI) nucleotide (nt) database. The identified COI sequence of the dungfly was further confirmed by Sanger sequencing and submitted to GenBank with the accession number MT072894.

Virus Discovery and Confirmation by Reverse Transcription-PCR (RT-PCR)
To identify nege-like viral contigs, the assembled transcriptome contigs were compared to a nucleotide/protein database comprising representative negeviruses (Supplementary Table S1) downloaded from GenBank using BLAST+ (Version 2.9.0) and DIAMOND (Version 0.9.28.129). The e-value threshold for the comparisons was set at 2 × 10 −10 . The candidate nege-like virus contigs were then extracted using home-made perl script based on the significance of the e-value and the matched length of the contig. To eliminate false positives, the candidate nege-like virus contigs were further compared with the entire NCBI nucleotide (NT) and non-redundant (NR) protein databases. RT-PCR was then performed followed by Sanger sequencing to confirm the presence of the two full nege-like virus contigs using the method described previously [22]. The primers used for RT-PCR are listed in Supplementary Table S2.

Determination of Viral Genome Termini and Transcript Abundance
To obtain the full length of the two identified negeviruses in the insect sample, the extreme 5 and 3 terminal sequences were determined by rapid amplification of cDNA ends (RACE) using the SMARTer ® RACE 5 /3 kit (Takara, Beijing, China). After total RNA isolation, first-strand cDNA synthesis was performed to obtain 5 -RACE-ready and 3 -RACE-ready cDNA according to the manufacturer's instructions. Touchdown PCR was performed to amplify RACE products using 5 or 3 GSPs (gene-specific primers) and UPM (Universal Primer A Mix). The PCR products were then cloned into the pMD19-T vector (Takara, Beijing, China) and further verified by Sanger sequencing. The primers used for RACE are listed in Supplementary Table S2. Based on the Blast search results, the two identified negeviruses were named Nelorpivirus dungfly1 (NVD1) and Sandewavirus dungfly1 (SVD1) and the sequences were submitted to GenBank with the respective accession numbers MT344120 and MT344121.
To investigate the transcript abundance and coverage of the two identified negeviruses, the adaptorand quality-trimmed reads of the transcriptome were mapped back to the whole genome of NVD1 and SVD1 using Bowtie2 [23] and Samtools [24]. The coverage of the aligned reads to the virus genomes was further visualized using the Integrated Genomics Viewer [25].

Small RNA Analysis
To identify siRNAs derived from NVD1 and SVD1, clean sRNA reads 18-to 30-nt long were extracted and collapsed using FASTX-Toolkit (http://hannonlab.cshl.edu/fastx_toolkit/). The processed reads were mapped to the assembled full genomes of NVD1 and SVD1 using Bowtie software allowing for zero mismatches [27]. Downstream analysis for the mapped vsiRNA was performed with custom perl scripts and Linux bash scripts, including size distribution of vsiRNA, vsiRNA distribution along the corresponding viral genome, and 5 terminal nucleotide preference of 21 nt long vsiRNAs.

Prevalence of Nege-Like Viruses Were Investigated in Invertebrates
The prevalence of possible nege-like viruses in other hosts was investigated using the public Expression Sequence Tag (EST) and Transcriptome Shotgun Assembly (TSA) databases of NCBI. The putative protein sequences of representative known negeviruses (Supplementary Table S1) together with NVD1 and SVD1 were used as query, searching against the EST and TSA databases using tblastn. The potential novel nege-like viral contigs were then compared with the entire NCBI NT and NR databases to eliminate false positives.

Phylogenetic Analyses
Phylogenetic analysis used the amino acid sequences of the predicted RNA dependent RNA polymerase (RdRp) region of the newly identified nege-like viruses from this study, together with some previously described negeviruses from various hosts and plant viruses of the related families Kitaviridae and Virgaviridae. Sequences were obtained from NCBI, aligned using Muscle (Version 3.8.31) [28] and analyzed using the Maximum likelihood (ML) algorithm and the Jones-Taylor-Thornton (JTT) substitution model to construct a phylogenetic tree in MEGA X with 1000 bootstrap replications [29].

Negeviruses Identified in Dungfly
A total of 84,181 contigs were generated from de novo assembly of the clean RNA-seq reads (38,056,018). A Blast search among the COI sequences confirmed that the dungfly was Scathophaga furcata (Diptera: Scathophagidae). A BlastX search against the proteins of representative negeviruses suggested the presence of two potential new negeviruses. The nearly complete genomes of both were identified in the insect. One contig of 9212 nt was identified as a nelorpivirus (NVD1), and was most similar to the Loreto virus (LRV, YP_009351835.1) with protein sequence identities of 69%. The second contig (8858 nt), representing SVD1, was most similar to the sandewavirus Andrena haemorrhoa nege-like virus (AHNLV, YP_009553581.1) with identities of 55%. A Blastn search against the NCBI NT database did not find any other sequences closely related to either NVD1 or SVD1, indicating that the two viruses are probably new negeviruses. The full genome sequences of both viruses were then verified by RT-PCR followed by Sanger sequencing and RACE to determine their 5 and 3 termini.

Genome Organization of NVD1 and SVD1
The full-length sequences of NVD1 and SVD1 were respectively 9239 and 8894 nt long excluding the polyA tail. The predicted genome organization of both viruses is typical of that reported for negeviruses with three major ORFs ( Figure 1). ORF1 has the four conserved domains of the replication polyprotein (vMet, FtsJ, Hel and RdRP). N-glycosylation sites were predicted in ORF2 at amino acid positions 105, 138, 170, and 175 (NVD1), and positions 124, 161, 248, and 278 for SVD1. One (NVD1) or two (SVD1) transmembrane domains are present at the C-terminus of ORF2. While previously reported negeviruses and SVD1 have short intergenic regions between each of the ORFs, the ORFs 2 and 3 of NVD1 unusually overlap by 8 nt and are in different frames ( Figure 1A). Re-alignment of the RNA-seq reads to the reconstructed complete genomes of NVD1 and SVD1 show a very high mean coverage (6858× for NVD1 and 8857× for SVD1), suggesting that the viruses replicate very efficiently in their host. Viral transcripts were very highly elevated in the 3 region of the genome of both viruses ( Figure 1). 3.3. NVD1 and SVD1 Are Targeted by the Host siRNA-Based Antiviral RNAi Pathway siRNA-based RNA silencing is an important antiviral pathway in insects and is usually associated with the accumulation of vsiRNAs as viral RNA is degraded in a sequence-specific manner [30]. To better understand siRNA-based antiviral pathways in dungfly in response to negeviruses, we conducted a computational analysis of vsiRNAs in the sRNA library of S. furcata. A large number of siRNAs (18 nt-30 nt) derived from the two negeviruses were identified. A total of 68,717 sRNA reads (16,995 unique) mapped perfectly to the assembled genome of NVD1, accounting for 0.25% (1.65% unique) of the whole sRNA library. The corresponding vsiRNA reads for SVD1 totaled 38,672 (11,053 unique), accounting for 0.14% (1.07% unique) of the library. Most of these vsiRNAs were 21 nt long (69.5% and 70.1% of the totals for NVD1 and SDV1, respectively) and they were equally derived from the sense and antisense strands of the viral genomic RNA (Figure 2A,D), which are similar to the recent report [10]. The vsiRNAs were derived from the entire genome of both viruses including the untranslated regions, but there were notable asymmetric hotspots on both strands, suggesting that these regions might be preferential targets of the host immune system ( Figure 2B,E). The viral siRNAs of both viruses had a strong A/U preference in their 5 terminal nucleotide ( Figure 2C,F), which is typical of vsiRNAs from various organisms, including insects [22,31]. These characteristics provide strong evidence that the antiviral RNAi pathway of dungfly is actively involved in response to negevirus infection.

The Presence of Further Nege-Like Virus Sequences in Public Databases Suggests That They Occur in Many Different Insects
Negeviruses are well-known for their wide geographic distribution and broad host range but most of the well-described ones have been isolated from hematophagous insects such as mosquitoes and sandflies [3,15]. The identification of NVD1 and SVD1 in a different type of insect and from a much colder environment prompted a search for other, potentially new, nege-like viral sequences, within the current public databases. Seven potentially new negeviruses were identified in the TSA database originating from firefly, flower thrips, sucking bugs, and various fruit fly species, suggesting a diversity and prevalence of insect hosts for nege-like viruses in nature (Table 1). In addition, reanalysis of previous virome studies confirmed that the hosts of negeviruses are broader than insects (Table 2). Within Insecta, and in addition to the Diptera, hosts of negeviruses included representatives of Hemiptera, Coleoptera, Thysanoptera, Odonata, and Orthoptera. Another four classes of the Arthropoda were also represented, including three species in Arachnida, one species in Malacostraca, one species in Maxillopoda, and one species in Chilopoda. Outside the arthropods entirely were two species in Nematoda and one species in Cnidaria [19,32]. We also have unpublished data [33] from a field investigation in 2019, which identified three nege-like viruses in whitefly (Bemisia tabaci) and one in grasshopper (Metaleptea brevicornis), and the sequences of viral RdRP regions were submitted to NCBI GenBank with accession number as listed in Table 2. It is clear that negeviruses are common in Insecta generally and not just in hematophagous insects.

Putative New Phylogenetic Clades and Host Diversity of Negeviruses in Invertebrates
A phylogenetic tree was constructed using sequences of negeviruses from various hosts and those of closely related plant viruses. NVD1 clusters with two LRV isolates and some other insect viruses in the previously identified Nelorpivirus clade, while SVD1 falls clearly with AHNLV and other insect viruses in the Sandewavirus clade ( Figure 3). The topology of the tree also confirmed the close relationship of plant viruses in the family Kitaviridae that was previously reported [11,15,16]. Using plant viruses of the family Virgaviridae as an out-group, a number of other obvious groups of nege-like viruses can also be recognized. These include a branch with three insect viruses (Abisko virus, Adelphocoris suturalis virus, and Negelikevirus fruitfly3) and at least three other groups formed with high bootstrap value (labelled Group 1, Group 2, and Group 3 in Figure 3). There are also other several poorly resolved solitary branches for nege-like viruses. Although the Nelorpivirus and Sandewavirus clades are mostly viruses from the order Diptera within Insecta (mostly mosquitoes), there are also hosts in the orders Hemiptera (whitefly), Hymenoptera (bee), and from outside insects (house centipedes and spiders) from various regions of the world. Group 2 contains two closely related nege-like viruses, Sanxia atyid shrimp virus 1 and Beihai anemone virus 1, that are from hosts classified in different phyla (Arthropoda and Cnidaria) but from a similar ecological niche.

Discussion
Since the taxon Negevirus was initially suggested, more than 100 negeviruses have been isolated worldwide, particularly from in Asia, Africa, Oceania, Europe and America [15]. All previously discovered negeviruses were from the tropical, sub-tropical and temperate regions (latitudes between 42 • N and 42 • S), raising the possibility that negeviruses might be affected significantly by environmental factors and not be adapted to hosts living in extreme conditions such as low temperature [16]. In this study, two new negeviruses (NVD1 and SVD1) were identified in a dungfly (S. furcata) collected from the arctic region (latitude 79 • N), much further north than any previously described negevirus. Earlier work suggested that negeviruses from different clades might be found in the same host and geographic location [15]. This is supported and extended by our finding of two negeviruses from different clades (NVD1-Nelorpivirus, SVD1-Sandewavirus) within single individual host insect.
Negeviruses are also well-known for having a broad host range among biting Diptera, including nine genera of mosquitoes, that have been studied because of their importance to public health [3,5,15]. The viruses reported here and the subsequent in silico studies of public databases and published virome data show that the hosts of nege-like viruses are much more diverse. Known hosts now include insects belonging to the orders Diptera, Hemiptera, Coleoptera, Thysanoptera, Hymenoptera, Odonata, Orthoptera, and Lepidoptera and, interestingly, non-insect arthropods (spider, shrimp etc.) and even non-arthropod organisms (nematodes and anemone) ( Table 2). These results provide strong evidence to support the previous hypothesis that the host range of negeviruses might have been greatly underestimated due to current sampling bias in favor of biting or blood-sucking arthropods [5,15]. It is clearly no longer tenable to regard negeviruses as insect-specific or mosquito-specific viruses.
Phylogenetic studies and the discovery of Insect-specific viruses (ISV) genomic material integrated into the mosquito genome have led to the hypothesis that a number of pathogenic arboviruses may have acquired their dual host through long term adaptive evolution of former ISVs in vertebrates [20,45]. Negeviruses are genetically and evolutionarily related to plant viruses in the family Kitaviridae [5,6,17], and investigation of endogenous viral elements indicated that virga/nege-related viruses in insects and plants might share common viral origins [46]. Our phylogenetic analysis indicated that three nege-like viruses in the unassigned group 1 and Tetranychus urticae kitavirus are phylogenetically closer to plant viruses (Kitaviridae), filling the phylogenetic "gaps" between plant-associated viruses and the proposed two clades of negeviruses ( Figure 3). Interestingly, a newly reported nege-like virus (Fragaria vesca-associated virus 1) isolated from a symptomatic strawberry plant also shows high homology to Aphis glycines virus 3 [47], indicating that this unassigned group might be the key connection between nege-related arthropod viruses and plant viruses. The increasing number of newly discovered nege-like viruses will surely help to clarify the uncertain relationship between nege-like viruses and associated plant viruses. The phylogenetic relationships of some nege-like viruses in the tree were incongruent with host phylogeny, especially for the viruses in the three unassigned groups (Figure 3), indicating the possibility of cross-species virus transmission in a similar ecological niche.
The high numbers of viral transcripts of NVD1 (6858×) and SDV1 (8857×) in S. furcata show that the viruses were infecting and propagating in the insect and were not just contaminants. Previous studies have shown that negeviruses can replicate to high viral loads in cell lines of some mosquitoes and sandflies [5,15]. In addition, dsRNA intermediates of negeviruses can be detected in mosquito C6/36 cells in early stages of infection, indicating that this replication may use a dsRNA intermediate [9]. In this study, we detected and characterized vsiRNAs derived from both NVD1 and SDV1 in S. furcata. These were mostly 21 nt long and were more or less equally derived from both strands of the dsRNA replication intermediates, providing clear evidence that negeviruses are targeted by the host siRNA-based antiviral RNAi pathway. It will be interesting to investigate whether negeviruses can induce similar siRNA-based antiviral immunity in non-insect hosts in the future. Several studies have shown that infection by some strains of Wolbachia can upregulate the mosquito's innate immune system and then interfere with mosquito-borne virus replication by decreasing vector competence [48,49]. In addition, laboratory experiments have also indicated that insect-specific flaviviruses can downregulate the replication of heterologous flaviviruses in mosquito cells [50,51]. Since negeviruses can replicate actively in several well-known arthropod vectors of vertebrate viruses (mosquitoes, sandflies) and plant viruses (aphids, whiteflies), it will be fascinating to evaluate the impact of negevirus replication on the fitness and vector competence of these important arbovirus vectors. It is at least possible that, if true, negeviruses could have enormous potential value as biological control agents of pathogenic viruses.