Discovery and Prevalence of Divergent RNA Viruses in European Field Voles and Rabbits

The advent of unbiased metagenomic virus discovery has revolutionized studies of virus biodiversity and evolution. Despite this, our knowledge of the virosphere, including in mammalian species, remains limited. We used unbiased metagenomic sequencing to identify RNA viruses in European field voles and rabbits. Accordingly, we identified a number of novel RNA viruses including astrovirus, rotavirus A, picorna-like virus and a narmovirus (paramyxovirus). In addition, we identified a sobemovirus and a novel luteovirus that likely originated from the rabbit diet. These newly discovered viruses were often divergent from those previously described. The novel astrovirus was most closely related to a virus sampled from the rodent-eating European roller bird (Coracias garrulous). PCR screening revealed that the novel narmovirus in the UK field vole had a prevalence of approximately 4%, and shared common ancestry with other rodent narmoviruses sampled globally. Two novel rotavirus A sequences were detected in a UK field vole and a French rabbit, the latter with a prevalence of 5%. Finally, a highly divergent picorna-like virus found in the gut of the French rabbit virus was only ~35% similar to an arilivirus at the amino acid level, suggesting the presence of a novel viral genus within the Picornaviridae.


Introduction
RNA viruses likely infect every species of cellular life [1]. However, due to their potential and sometimes profound impact on public health and the agricultural industries, most attention has understandably been directed toward those viruses that affect humans and economically important animals and plants. Hence, our knowledge of the diversity, evolution and functional biology of RNA viruses infecting a wider range of host species is limited. This major sampling bias has been in part addressed by recent studies reporting the discovery of a multitude of new invertebrates and vertebrate viruses, massively increasing the known virosphere and demonstrating the importance of investigating species that are often overlooked [2][3][4][5][6][7][8].
Rodents are an important source of emerging virus infections [9] and are reservoir hosts for a wide range of viruses including coronaviruses, paramyxoviruses and picornaviruses [10][11][12][13]. The abundant and diverse virome harbored by rodents likely reflects their propensity to live in relatively large and dense populations and their speciose nature: Rodentia is the largest mammalian order, comprising 2200 species, and hence approximately 40% of all mammalian species [14,15]. Lagomorphs are closely related to rodents, although comprise only 89 extant species including rabbits and hares [16]. Recent reports have shown that rabbits and hares harbor an array of viruses such as caliciviruses, coronaviruses, bocaparvoviruses, hepatitis E and myxomavirus [17][18][19][20][21][22]. Importantly, the discovery of these novel viruses in rodents and lagomorphs has provided a more refined understanding of the evolutionary history of known pathogenic viruses, including coronaviruses, hepatitis A virus and hepaciviruses [23][24][25][26][27]. However, despite these recent advances, we have only scratched the surface of the potential diversity of rodent and lagomorph viruses.
In this study, we generated and investigated RNA sequencing data from European rodents and rabbits that were previously found to be positive for novel alphacoronaviruses and hantaviruses [11,20,25,28]. Our aim was to perform an unbiased discovery of additional viruses that were present in these samples, reveal their evolutionary history, and assess their prevalence in their host populations. Accordingly, we demonstrate the presence of novel and highly divergent RNA viruses and discuss the potential pitfalls of the use of enteric samples for virus discovery.
Additional screening for paramyxoviruses was performed on 130 field vole kidney samples collected from two locations in the UK (Cheshire and Leicestershire ( Figure 1A)) [11,28], using primers specific for the novel paramyxovirus identified in UKMa K4D. The primers were targeting the L gene F: 5 -ACTAATTACATTTACCAACAAGG-3 and R: 5 -AGTTGGTCACGTTCWATAAT-3 and generated a 245 bp PCR product.
All the samples used in this study were sourced from existing pest controls programs and approved by the University of Nottingham School of Veterinary Science Ethical Panel, reference numbers 1602 151102 and 1786 160518.

Nucleic Acid Preparation
Total RNA was extracted from 1 mm 3 sections of kidney (UKMa K4D) or intestinal tissue samples (UKMa1) using the GenElute ™ Mammalian Total RNA Miniprep Kit (Sigma Aldrich, Steinheim, Germany). Total RNA was extracted from intestinal fluid for the French L232 rabbit using the QIAamp Viral RNA Mini Kit (Qiagen, Hilden, Germany). The RNA to cDNA EcoDry™ Premix-Random Hexamers kit (Clontech, Saint-Germain-en-Laye, France)-was used for cDNA synthesis.

High-Throughput Sequencing
High-throughput sequencing was performed on total RNA extracted from all the three samples (UKMa K4D, UKMa1 and L232) using the Illumina HiSeq platform at Source Bioscience, Nottingham, UK. Genomic DNA (gDNA) was depleted using DNaseI, and ribosomal RNA (rRNA) was removed using NEBNext ® rRNA Depletion Kit (Human/Mouse/Rat) with RNA Sample Purification Beads (New England Biolabs, Ipswich, MA, USA) prior to the library construction. Each read length was 2 × 150 bp, and the insert size was 200 bp on average. All the sequence data generated were analyzed using the Geneious Prime 2019.0.4 software. The reads were assembled into contiguous sequence (contigs) with de novo assembly performed with Geneious Prime. By using BlastX, the assembled contigs were compared against the RefSeq sequence database of all the virus proteins downloaded from GenBank. The minimum e-value was set to 1 × 10 −5 to maintain both high sensitivity and low rates of false positive hits [2]. All the contigs that matched viruses were confirmed by PCR.

Analysis of Virus Abundance
The number of viral reads in each data set was measured by mapping the raw total reads generated from the high-throughput sequencing to the identified viral contigs. The host ribosomal protein L4 (RPL4) gene was used as a host gene reference marker. The Oryctolagus cuniculus RPL4 gene (NM_001195817.1) was used for the rabbit, while the Microtus ochrogaster RPL4 gene

Nucleic Acid Preparation
Total RNA was extracted from 1 mm 3 sections of kidney (UKMa K4D) or intestinal tissue samples (UKMa1) using the GenElute ™ Mammalian Total RNA Miniprep Kit (Sigma Aldrich, Steinheim, Germany). Total RNA was extracted from intestinal fluid for the French L232 rabbit using the QIAamp Viral RNA Mini Kit (Qiagen, Hilden, Germany). The RNA to cDNA EcoDry™ Premix-Random Hexamers kit (Clontech, Saint-Germain-en-Laye, France)-was used for cDNA synthesis.

High-Throughput Sequencing
High-throughput sequencing was performed on total RNA extracted from all the three samples (UKMa K4D, UKMa1 and L232) using the Illumina HiSeq platform at Source Bioscience, Nottingham, UK. Genomic DNA (gDNA) was depleted using DNaseI, and ribosomal RNA (rRNA) was removed using NEBNext ® rRNA Depletion Kit (Human/Mouse/Rat) with RNA Sample Purification Beads (New England Biolabs, Ipswich, MA, USA) prior to the library construction. Each read length was 2 × 150 bp, and the insert size was 200 bp on average. All the sequence data generated were analyzed using the Geneious Prime 2019.0.4 software. The reads were assembled into contiguous sequence (contigs) with de novo assembly performed with Geneious Prime. By using BlastX, the assembled contigs were compared against the RefSeq sequence database of all the virus proteins downloaded from GenBank. The minimum e-value was set to 1 × 10 −5 to maintain both high sensitivity and low rates of false positive hits [2]. All the contigs that matched viruses were confirmed by PCR.

Analysis of Virus Abundance
The number of viral reads in each data set was measured by mapping the raw total reads generated from the high-throughput sequencing to the identified viral contigs. The host ribosomal protein L4 (RPL4) gene was used as a host gene reference marker. The Oryctolagus cuniculus RPL4 gene (NM_001195817.1) was used for the rabbit, while the Microtus ochrogaster RPL4 gene (XM_005347781.3) was employed for the field vole. The abundance of viral reads was calculated by dividing the number of mapped reads by the total number of reads in each library. The graphs were generated with GraphPad Prism 8 (v8.1.2, GraphPad Software, San Diego, California USA).

Contig Confirmation PCR
Specific primers were designed for contig confirmation, and for gap-filling between contigs, based on the known sequences of the contigs. All the primers were evaluated in silico with the Primer3 online tool. The Geneious Prime software was used for mapping and annotating. PCR reactions were carried out in a PTC-200 Peltier Thermal Cycler (MJ Research, Waltham, Massachusetts, USA). The PCR reactions were performed with HotStarTaq Polymerase (Qiagen, Hilden, Germany) according to manufacturer's instructions. The PCR products were subjected to Sanger sequencing at Source Bioscience, Nottingham.
All the virus sequences generated in this study have been deposited on GenBank under the accession numbers MN626413-MN626440.

Phylogenetic Analysis
To facilitate phylogenetic analysis, 57 reference paramyxovirus sequences were downloaded from GenBank and the F, H, L, M, N and P genes extracted. Similarly, 50 rotavirus A reference sequences were downloaded for the VP1, VP2, VP3, VP6, NSP2, NSP3 segments, as were 47 reference sequences of astroviruses for ORF1a, ORF1b and capsid genes. Amino acid and nucleotide sequences were aligned using ClustalW [29] and phylogenetic analyses were performed using the maximum likelihood (ML) method within the Molecular Evolutionary Genetics Analysis version 7 (MEGA7) [30] package. Analyses of aligned amino acid sequences utilized a Jones-Taylor-Thornton (JTT) amino acid substitution model with uniform rates of variation, and complete deletion of gaps, with statistical robustness assessed using bootstrap resampling (500 pseudo-replicates). In addition, a phylogenetic analysis of nucleotide sequence data was performed using the General Time Reversible (GTR) nucleotide substitution model with a gamma distribution of rate variation, a class of invariant sites (Γ+I), and complete deletion of gaps, with statistical robustness assessed using bootstrap resampling (100 pseudo-replicates). As the nucleotide-based phylogenies (Figures S1-S5) were topologically similar to those inferred using amino acid sequences only the latter are shown in the main text.
Details on the number of sequences used for the phylogenetic analysis of each gene as well as their length are given in Table 1.

Abundance of Viral Reads
A total of 51,744,718 paired reads were generated for UKMa K4D, 63,152,134 for UKMa1, and 8,396,334 for L232. The rabbit L232 (intestinal wash) had the highest abundance of viral reads ranging between 0.0008% and 0.19%, followed by the field vole UKMa K4D (kidney) with 0.0003-0.002% and finally the field vole UKMa1 (intestine) with 0.00004-0.0002% (Figure 2A,B). The number of host RPL4 reads was proportional to the number of viral reads in each sample. The abundance of viral and RPL4 reads in each animal were correlated: the animals with the highest abundance of host reads also had the highest abundance of viral reads. A novel rotavirus A, picorna-like virus and coronavirus (previous study [20]) were identified in the rabbit, a novel paramyxovirus and hantavirus (previous study [28]) in the field vole UKMa K4D, and a novel astrovirus, rotavirus A and coronavirus (previous study [11,25]) in the field vole UKMa1 (Figure 2A,B). We now briefly describe each in turn.

Paramyxovirus
A novel morbilli-like paramyxovirus was discovered in the kidney of the UK field vole (UKMa K4D). A total of 6734 nt were retrieved for this virus, representing the F, H, L, M, N and P genes ( Table  1). In terms of sequence similarity, its closest relative was Bank vole virus (MF943130), with 76%, 61%, 71%, 87%, 90% and 62% similarity for F, H, L, M, N and P genes, respectively, at the amino acid level ( Table 2).
To determine the evolutionary relationship of the novel UK field vole paramyxovirus to the other viruses within the family, we performed phylogenetic analyses for each of the F, H, L, M, N and P genes. The same phylogenetic pattern was observed in every gene, with the rodent morbilli-like paramyxoviruses comprising a distinct clade within the family with >90% bootstrap support ( Figure  3, Figures S1 and S2). Specifically, this clade comprised UKMa K4D, Bank vole virus (MF943130), Mossman virus (NC005339) and Nariva virus (NC017937), and we suggest that this unassigned cluster may represent a novel rodent-specific genus within the Paramyxoviridae.

Paramyxovirus
A novel narmovirus was discovered in the kidney of the UK field vole (UKMa K4D). A total of 6734 nt were retrieved for this virus, representing the F, H, L, M, N and P genes (Table 1). In terms of sequence similarity, its closest relative was Bank vole virus (MF943130), with 76%, 61%, 71%, 87%, 90% and 62% similarity for F, H, L, M, N and P genes, respectively, at the amino acid level (Table 2).
To determine the evolutionary relationship of the novel UK field vole paramyxovirus to the other viruses within the family, we performed phylogenetic analyses for each of the F, H, L, M, N and P genes. The same phylogenetic pattern was observed in every gene, with the rodent narmoviruses comprising a distinct clade within the family with >90% bootstrap support (Figures 3, S1 and S2). Specifically, this rodent-specific clade comprised UKMa K4D, Bank vole virus (MF943130), Mossman virus (NC005339) and Nariva virus (NC017937), and has been named narmovirus genus within the Paramyxoviridae by ICTV.

Rotavirus A
Two rotaviruses A were discovered in the intestinal samples of the UK field vole (UKMa1) and the French rabbit L232. A total of 7565 nt and 13,278 nt (Table 1) were retrieved for UKMa1 and L232, respectively. Blastn analysis of all the available rotavirus A segments (VP1, VP2, VP3, VP6, VP7, NSP2, NSP3, and NSP4) revealed that the rabbit RVA was closely related (>96%) to other viruses from a range of host species including human, bovine, giraffe and roe deer ( Table 2). These findings were confirmed by the phylogenetics analysis for each segment (Figure 4, Figures S3 and S4). In contrast, the field vole rotavirus A was more divergent, with nucleotide similarities of~75% to the closest relatives in the VP1, VP2, VP3, VP6, NSP2 and NSP3 segments, suggesting a novel genotype of RVA in each of the retrieved segments (Table 2). Phylogenetic analysis showed that UKMa1 RVA formed a distinct lineage within rotavirus A species, although it loosely grouped with a mouse RVA in the VP3 phylogeny ( Figure 4 and Figure S3). Detailed conclusions regarding the evolutionary history of each segment of the rabbit and field vole RVA could not be drawn due to low bootstrap support on the branches of each tree. Additional PCR screening on more French rabbit intestinal washes from the same cohort revealed a 4.9% prevalence (13 positives in 268 samples). Six originated from Creuse department in central France, four from the neighboring Puy-de-Dôme department, with the other three from the departments Deux-Sèvres, Loire and Dordogne ( Figure 1B). All the positive samples originated from central France with the exception of Charente, whereas no positive cases were identified in the north-west and south-west departments. However, no more positives were detected in the UK rodent cohort.

Astrovirus
A novel mamastrovirus was discovered in the intestine of the UK field vole (UKMa1). In total, 4334 nt were retrieved for the ORF1a, ORF1b and the capsid genes (Table 1). BlastX results revealed that the closest relative was Astrovirus Er/SZAL6/HUN/2011 from a European roller bird (Coracias garrulus). The two viruses were 46%, 70% and 73% similar at the amino acid level in the ORF1a, ORF1b and capsid genes, respectively ( Table 2). Phylogenetic analysis of all the three genes showed that the field vole and roller viruses formed a distinct clade within the Mamastrovirus genus supported by >90% bootstrap replicates in capsid and ORF1a ( Figure 5 and Figure S5).

Picorna-Like Virus
Two contigs of a novel picorna-like virus were retrieved from the intestine of the French rabbit L232 with a total of 5642 nt (4752 nt and 890 nt) ( Table 1). A blastX analysis revealed that in the NTPase, 3C peptidase, RNA-dependent-RNA-polymerase (RdRp) and capsid proteins the novel rabbit picorna-like virus was only 44%, 52%, 61% and 34% similar at the amino acid level to their closest relative (Blackbird arilivirus) (NC040820) ( Table 2). Phylogenetic analysis revealed that the virus clustered with the Blackbird arilivirus with 100% bootstrap support ( Figure 6). Surprisingly, both viruses clustered within the invertebrate picorna-like viruses, suggesting that they might have in fact been derived from a dietary component.

Plant Viruses
In addition to the vertebrate and invertebrate viruses described above, two likely plant viruses were detected in the intestinal wash of the French rabbit: a sobemovirus and a luteovirus. A 1653 nt contig was retrieved for the sobemovirus, encompassing polyprotein P2ab ( Table 1). The polyprotein P2ab was both 98% similar to the closest relative, Cocksfoot mottle virus, at the amino acid level.
Three contigs with a total of 3909 nt were recovered for the novel luteovirus. They contained a peptidase, RdRp, coat protein ( Table 1) that were ~39%, 78 and 54% similar, respectively, to their closest relative at the amino acid level ( Table 2).

Discussion
Metagenomic next-generation sequencing enables the unbiased discovery of viruses within samples that would otherwise be missed with conventional methods targeted to specific viruses. Using this approach, we report the discovery of a novel field vole paramyxovirus (UKMa K4D) and a novel rabbit rotavirus A (L232). We also identifed a novel astrovirus, rotavirus A, in a field vole (UKMa1) and a picorna-like virus in a rabbit (L232). While unbiased metagenomics is capable of identifying novel viral species, it is noticeable that only a small proportion of the total number of reads corresponded to viral reads, confirming previous observations [31]. An obvious limitation of metagenomics in detecting viral genomes is the requirement for a sufficient number of viral reads within a library to detect a virus and construct complete virus genomes, which is itself dependent on the RNA quality within a sample [32]. Comparison of virus abundance in each animal revealed an association between number of viral/host reads and sample quality, with rabbit L232 being the best quality sample. The difference between the quality of the rabbit and the field vole samples is likely due to the method of collection; the rabbit was euthanized and immediately processed [20], whereas

Plant Viruses
In addition to the vertebrate and invertebrate viruses described above, two likely plant viruses were detected in the intestinal wash of the French rabbit: a sobemovirus and a luteovirus. A 1653 nt contig was retrieved for the sobemovirus, encompassing polyprotein P2ab ( Table 1). The polyprotein P2ab was both 98% similar to the closest relative, Cocksfoot mottle virus, at the amino acid level.
Three contigs with a total of 3909 nt were recovered for the novel luteovirus. They contained a peptidase, RdRp, coat protein ( Table 1) that were~39%, 78 and 54% similar, respectively, to their closest relative at the amino acid level (Table 2).

Discussion
Metagenomic next-generation sequencing enables the unbiased discovery of viruses within samples that would otherwise be missed with conventional methods targeted to specific viruses. Using this approach, we report the discovery of a novel field vole paramyxovirus (UKMa K4D) and a novel rabbit rotavirus A (L232). We also identifed a novel astrovirus, rotavirus A, in a field vole (UKMa1) and a picorna-like virus in a rabbit (L232). While unbiased metagenomics is capable of identifying novel viral species, it is noticeable that only a small proportion of the total number of reads corresponded to viral reads, confirming previous observations [31]. An obvious limitation of metagenomics in detecting viral genomes is the requirement for a sufficient number of viral reads within a library to detect a virus and construct complete virus genomes, which is itself dependent on the RNA quality within a sample [32]. Comparison of virus abundance in each animal revealed an association between number of viral/host reads and sample quality, with rabbit L232 being the best quality sample. The difference between the quality of the rabbit and the field vole samples is likely due to the method of collection; the rabbit was euthanized and immediately processed [20], whereas the field voles were a result of routine pest control, and there was a delay between time of death and collection and sample preservation [11].
This study presents the first evidence of a narmovirus in European field voles. In every gene analyzed this virus clustered with narmoviruses that had been isolated from a range of rodents from diverse locations globally. Specifically, other viruses within this clade were identified in bank voles from Russia [33], wild rats in Australia [34] and common cane mice in Trinidad [35]. Such a phylogenetic pattern suggests that rodent narmoviruses are temporally and spatially widespread and arose from a single common ancestor, although this will need to be confirmed with a larger sample of viruses. We previously observed a similar pattern of shared ancestry for rodent alphacoronaviruses [25].
Phylogenetic analysis of the rabbit rotavirus A revealed both high sequence similarity (>96%) and clustering with rotavirus A sequences sampled from different mammalian species. However, the lack of a several reference sequences for all the segments analyzed, together with the ability of rotaviruses A to reassort [36], makes it difficult to draw any detailed conclusions about the origin of these viruses. In contrast, the rotavirus A identified in the UK field vole was considerably more divergent phylogenetically, falling in a more basal position, although with some similarity to another rodent rotavirus. In addition, the closest relatives only exhibited~75% sequence similarity (Table 2), thus fulfilling the criteria for a novel rotavirus A genotype in each segment [37], and perhaps indicating the presence of another rodent-specific virus lineage.
Although next-generation metagenomic sequencing has fundamentally changed virus discovery, many viruses still lack information about their host species [38] and the interpretation of results is greatly complicated by commonplace reagent contamination [39]. It is therefore important to confirm any metagenomic findings by PCR on the original samples. In addition, depending on the tissue, it may be difficult to determine the exact virus host (i.e., intestinal tissue sample) due to 'contamination' from the diet of the animal. Indeed, previous studies have shown the detection of viruses in gut and fecal samples of humans and animals that are likely derived from their diet [40][41][42]. Due to their detection in multiple animals we are confident that the field vole and the rabbit were the likely hosts for the paramyxovirus and the rotavirus A, respectively. However, it has not been determined whether the field vole is the true host of the rotavirus and astrovirus detected. The field vole-related astrovirus clustered with an astrovirus detected in fecal samples of European rollers (Coracias garrulus) in Hungary [43]. This bird species is found in a wide variety of habitats and is known to feed on insects and small vertebrates, including rodents [44]. However, as noted by the original authors, this virus shared a common ancestor with rodent-borne astroviruses so it is possible that it has a dietary origin since European rollers prey on small rodents [44]. Hence, the clade comprising UKMa1 and Er/SZAL6/HUN/2011 astrovirus may in reality represent a rodent clade of mamastroviruses. This will require confirmation through the analysis of additional rodent astroviruses.
Another example of probable dietary contamination was the detection of the novel picorna-like virus and the plant viruses in the rabbit intestinal sample. According to our phylogenetic analysis, the host of the picorna-like virus was likely an invertebrate because it closely grouped with other invertebrate picorna-like viruses such as Beihai barnacle virus 4 (NC032446) and Beihai sesarmid crab virus 2 (NC032641). Dietary contamination was also the likely explanation for Blackbird arilivirus (NC040820) discovered in a metagenomic analysis of a mixed pool obtained from brain, lung and intestine tissues [45]. While the original authors suggested that the virus might have a plant origin, our phylogenetic indicates that both viruses in fact likely have an invertebrate host.
In sum, our results confirm that metagenomic next-generation sequencing is capable of discovering previously unknown and divergent RNA viruses. However, our data also highlight that definitive identification of the likely host species can be complicated because of dietary or environmental contamination. As a consequence, true assignment of host species requires wider prevalence studies together analysis of multiple tissue types.
Supplementary Materials: The following are available online at http://www.mdpi.com/1999-4915/12/1/47/s1, Figure S1: Phylogenetic relationships of the field vole paramyxovirus based on the L, F and H genes., Figure S2: Phylogenetic relationships of the field vole paramyxovirus based on the M, N and P genes, Figure S3: Phylogenetic relationships of the field vole and the rabbit rotavirus A based on the VP1, VP2 and VP3 segments, Figure S4: Phylogenetic relationships of the field vole and the rabbit rotavirus A based on the VP6, VP7 and NSP2 segments, Figure S5: Phylogenetic relationships of the novel field vole astrovirus based on the ORF1a, ORF1b and capsid genes.