Possible Arbovirus Found in Virome of Melophagus ovinus

Members of the Lipopteninae subfamily are blood-sucking ectoparasites of mammals. The sheep ked (Melophagus ovinus) is a widely distributed ectoparasite of sheep. It can be found in most sheep-rearing areas and can cause skin irritation, restlessness, anemia, weight loss and skin injuries. Various bacteria and some viruses have been detected in M. ovinus; however, the virome of this ked has never been studied using modern approaches. Here, we study the virome of M. ovinus collected in the Republic of Tuva, Russia. In our research, we were able to assemble full genomes for five novel viruses, related to the Rhabdoviridae (Sigmavirus), Iflaviridae, Reoviridae and Solemoviridae families. Four viruses were found in all five of the studied pools, while one virus was found in two pools. Phylogenetically, all of the novel viruses clustered together with various recently described arthropod viruses. All the discovered viruses were tested on their ability to replicate in the mammalian porcine embryo kidney (PEK) cell line. Aksy-Durug Melophagus sigmavirus RNA was detected in the PEK cell line cultural supernate after the first, second and third passages. Such data imply that this virus might be able to replicate in mammalian cells, and thus, can be considered as a possible arbovirus.


Introduction
With the advances in the transcriptomic approach, the number of newly described viruses has increased dramatically [1][2][3], which has led to a breakthrough in our understanding of viruses' biodiversity and evolution and has led us to rethink the existing virus systematics [4]. Many novel viruses were discovered in arthropods [1][2][3]. Viruses of arthropods are objects of special interest in virology since they can be vectors of the arboviruses, i.e., viruses that cycle between invertebrate and vertebrate hosts [5]. Moreover, arthropod viruses provide us with insights into viral evolution, host switching and virus pathogenicity. Blood-sucking invertebrates are hosts to many arboviruses, causing diseases of humans and domestic animals, and possess a great challenge to the healthcare system and the agricultural industry around the world [5]. While viromes of the well-established vector invertebrates, such as various species of mosquitoes [6][7][8] and ixodid ticks [9,10], are actively studied, other blood-sucking invertebrates, such as louse flies, receive less attention.
Louse flies of the Lipopteninae subfamily are blood-sucking ectoparasites of mammals. The sheep ked (Melophagus ovinus) has been widely distributed with sheep. It can be found

Sample Preparation and High-Throughtput Sequencing
Individual specimens of M. ovinus were homogenized using Tissue Lyser 2 (12 min, frequency 25 s −1 ). Prior to extraction of the nucleic acid, aliquots of the homogenate of keds collected from the same animal were pooled together in equal amounts. RNA was extracted from pooled homogenates using TRI Reagent LS (Sigma, St. Louis, MA, USA) according to the manufacturer's instructions. After extraction, host rRNA was depleted using a NEBNext Globin and rRNA Depletion Kit (NEB, E7750S, Ipswich, MA, USA) according to the manufacturer's instructions. The obtained RNA was used for library preparation without polyA-enrichment using a NEBNext Ultra II RNA Library Prep Kit for Illumina (NEB, E7770, Ipswich, MA, USA) according to the manufacturer's instructions. Final libraries were sequenced (single-end, 250-nt reads) on a HiSeq1500 (Illumina, San Diego, CA, USA). Raw reads were deposited in the sequence read archive (BioProject accession number PRJNA777535).

Sanger Sequencing
For Sanger sequencing, RNA was extracted from homogenates of the individual specimens using TRI Reagent LS (Sigma, St. Louis, MO, USA) according to the manufacturer's instructions. Reverse transcription was carried out using an MMLV RT kit (Evrogen, Moscow, Russia) with a random hexamer primer according to the manufacturer's instructions. cDNA was used for a PCR with DreamTaq DNA polymerase (Thermo Fisher Scientific, Vilnius, Lithuania) using virus-specific oligonucleotides (Table S1). PCR fragments were gel-purified with a QIAquick Gel Extraction Kit (QIAGEN, Hilden, Germany) and then sequenced with an Applied Biosystems 3500 genetic analyzer (Waltham, MA, USA) using a BigDye Terminator v3.1 Cycle Sequencing Kit (Thermo Fisher Scientific, Vilnius, Lithuania). The Obtained sequences were aligned in SeqMan v.7.0.0 using contigs from high-throughput sequencing as reference sequences.

Assembly and Analysis
Adapter sequences, bases with low quality (<Q30) and short reads (length < 35) were discarded using Trimmomatic v0.39 [26]. Trimmed reads were used for de novo contig assembly with SPAdes v3.13.0 [27]. The resultant contigs were screened for viral sequences using the blastn algorithm in BLAST v2.9.0+ with the nt database, and contigs containing virus-related sequences were extracted for further investigation. Open reading frames were extracted from such contigs and tested using the blastp algorithm to determine whether they were virus related.
For all contigs that showed a relation to viruses of the same family, an estimation of evolutionary divergence was performed to estimate whether all of them belonged to the same virus species. All contigs were aligned in the Mega X program. The obtained alignment then was used as the input datum to compute pairwise distances in the Mega X program with the default settings [28].
Contigs containing a complete coding region for the virus were extracted and used for further studies. In cases when there were several very closely related contigs in the same M. ovinus pool, Sanger sequencing of individual ked suspensions was performed. Contigs with the closest identities to the fragments from Sanger sequencing were used as viral sequences for further studies. In some cases, we were unable to perform a PCR due to a lack of material, or the PCR was negative. In that case, the consensus sequences were recomputed using the longest contig sequence as a reference with uGene v.1.32.0 [29] (up to 10% mismatches allowed). Obtained virus sequences were deposited in the GenBank database (accession numbers OL420682-OL420732).
Prevalence of the viral reads in each pool was estimated by aligning reads from the pool on the sequences of viral contigs using Bowtie2 v.2.3.5.1 software [30]. The reported percentage of the reads uniquely aligned to the viral genome was considered as a percentage of the viral reads in the probe.

Phylogenetics and Visualization
From the obtained contigs, either the polyprotein (if available) or RNA-dependent polymerase protein sequence was extracted. This sequence, along with homologs, was aligned using MAFFT v7.310 [31]. Alignments were processed with the TrimAL v1.4. rev 15 [32] program to remove ambiguously aligned regions, and maximum-likelihood phylogenetic trees were constructed with the phyML 3.3.20200621 [33] program with 1008 bootstrap replications. Phylogenetic trees were visualized in FigTree v. 1.4.4. Genomes of the viruses were drawn using custom Python script. All post-processing of the images was performed with the GIMP v.2.10.24 program.

Virus Passages in Pig Embryo Kidney Cell Line
A PEK cell line was used to assess the ability of the identified viruses to replicate in mammalian cells. The PEK cell line was maintained at 37 • C in Medium 199, Earle's Salts (FSASI Chumakov FSC R&D IBP RAS, Moscow, Russia), supplemented with 5% fetal bovine serum (FBS, Gibco, Paisley, UK).
For the experiment, cells were seeded in flat-sided cell culture tubes (NUNC, Thermo Fisher Scientific, Roskilde, Denmark) and cultivated for one to two days for a final cell count of 0.5-2 × 10 −6 cells per tube. Then, cells were infected, either with 200 µL of the homogenate of keds or with 200 µL of the cultural fluid collected from the previous virus passage, and incubated in the thermostat at 37 • C. After four days, the culture supernate was collected and stored.

High-Throughput Sequencing and Detection of Virus-like Contigs
We processed five pools of the sheep ked M. ovinus (two to six specimens in each pool) collected in the Republic of Tuva, Russia, in 2010 and 2012. No previously known viruses were identified in the samples, but five distinct types of contigs with homology to the viral polymerases were identified. All of them were close to the various groups of the RNA viruses, Rhabdoviridae (Sigmavirus), Iflaviridae, Reoviridae and Solemoviridae.

Iflaviridae-Related Contigs
Classical iflaviruses are non-enveloped, single-stranded, non-segmented, positivesense RNA viruses. The genome has a 9-11 kb length and encodes a single open reading frame (ORF). This is translated into the single polyprotein that is processed into structural (N-terminus) and non-structural proteins. All the iflaviruses were isolated from arthropods [34].
In the current work, we found contigs exhibiting similarities with members of Iflaviridae in all of the studied pools. Contigs from all the pools except pool #21 had the typical length for iflaviruses (around 10,200 nt) and encoded a single ORF~3050 amino acids in length ( Figure 1). They were not identical, with nucleotide divergence from 0.0002 to 0.0984 for different sequence pairs (Table S2). A protein blast of the polyprotein sequence showed around 43% identity with 96% coverage to the closest relative (Bactrocera tryoni iflavirus 1). The data showed that all of these contigs belong to the single novel virus named Khandagaity Melophagus ifla-like virus (KMIV). We then reassembled full virus genomes (Section 2.4) and obtained four full-genome sequences.
KMIV reads were rare, accounting for 0.02-0.14% of the total reads in the pools ( Table 2). The low abundance (0.02%) of the viral reads in pool #21 was likely the reason we were not able to assemble the full genome from its data.

Solemoviridae-Related Contigs
Classical solemoviruses are non-enveloped viruses with an~4.5 kb positive sense RNA genome that infects different groups of flowering plants. They rely on −1 ribosomal frameshifting, leaky scanning and generating subgenomic RNA to produce their proteins. The RNA-dependent RNA polymerase (RdRp) of Solemoviridae is phylogenetically close to the RdRps of the Luteoviridae family. Recently, many new viruses with RdRps related to Solemoviridae and Luteoviridae were discovered. Some of those novel viruses differed drastically in their overall genome structure, for example, by having a different ORF count and/or by splitting the genome into two segments [2]. KMIV reads were rare, accounting for 0.02-0.14% of the total reads in the pools ( Table 2). The low abundance (0.02%) of the viral reads in pool #21 was likely the reason we were not able to assemble the full genome from its data.   Here, we discovered multiple solemoviridae-related contigs in all of the studied pools. It should be noted, however, that said contigs were not closely related to the classical solemoviruses, but instead were closely related to the novel segmented solemoviridae-like viruses. The contigs from our data formed two separate clusters, with little homology between the clusters. Even within each cluster, the contigs were relatively diverse (up to a 0.22-nucleotide divergence) (Table S5). A protein blast of the putative polymerase showed 43% identity to the closest relative (Teise virus) for the first cluster of contigs. Hubei diptera virus 14 was the closest to the contigs in the second cluster with 59% identity in the polymerase. Using the second segment of closest viruses, we were able to recover the second segment for each cluster (Table S6). A protein blast of the first cluster of contigs (the first ORF on the second segment) showed 41% identity to the Motts Mill virus (second closest, Teise virus). The second segment for the second cluster of sequences showed 47-48% identity to the Hubei solemo-like virus based on the amino acid sequences of the first ORF. These data show that these two clusters of contigs belong to two separate novel viruses. The contigs from the first cluster were reassembled (Section 2.4) and named Bayan-Khairhan-Ula Melophagus solemo-like virus (BKUMSV), while the contigs from the second were named Ulaatai Melophagus solemo-like virus (UMSV).
UMSV and BKUMSV were found in all of the pools. The abundance of these viruses was close and varied between 0.41 and 1.17% (Table 2). These viruses belong to different parts of the Luteo-Solemo supergroup, but they have similar genome structures ( Figure 2). They contain two segments, with four ORFs located on them. The first segment is large: 3400 nt in BKUMSV and~2800 nt in UMSV. It encodes putative peptidase and polymerase. The second segment is smaller (~1500 nt for both viruses). The ORFs on the second segment likely take part in viral coat protein production. These two frames are divided by a single UAG stop-codon, making it likely that the mechanism of expression involves a stop-codon read-through.

Sigmavirus-Related Contigs
Sigmaviruses (Rhabdoviridae) are common pathogens of Drosophilidae, and recently, they were found in the other members of Diptera and other arthropods. They have a negative-sense RNA genome of around ~12.5 kb in length, encoding five to six genes. In Phylogenetic analysis based on the polymerase sequence showed that UMSV formed a monophyletic group with Hubei diptera virus 14 (found in the pool of diptera) [2] and Erysiphe necator-associated solemo-like virus 2 (discovered in the Erysiphe necator fungus) (Figure 2A). The other closely related viruses are Hubei solemo-like virus 42 [2] and Soybean thrips solemo-like virus 10 discovered in arthropods [39].

Sigmavirus-Related Contigs
Sigmaviruses (Rhabdoviridae) are common pathogens of Drosophilidae, and recently, they were found in the other members of Diptera and other arthropods. They have a negative-sense RNA genome of around~12.5 kb in length, encoding five to six genes. In addition to the five proteins classical for rhabdoviruses (L, G, M, P, N), they may have an additional protein X encoded between the P and M genes, or another additional ORF between the G and L genes [44].
In the current work, we found contigs exhibiting similarities with members of genus Sigmavirus in all of the studied pools. All the contigs had a classical Rhabdoviridae genome structure, encoding the L, G, M, P and N genes (Figure 3), but not encoding protein X. Most of the contigs were~11.6 kb in length; however, one from pool #22 was shorter (10,988 nt) and had a shortened polymerase sequence. Apart from this, the recovered contigs have low divergence in comparison to each other (from 0.00119 to 0.05223) (Table S3).
In the genus Sigmavirus, there are species discrimination criteria established by the International Committee on the Taxonomy of Viruses. These contigs match both criteria for the new species. They have low identity in RdRp to the closest relative (around 53% to the Wuhan Louse Fly Virus 9) and occupy a different ecological niche, as there is no other sigmavirus that infects M. ovinus. Taking these data into account, we conclude that all of these contigs belong to the single novel virus named Aksy-Durug Melophagus sigmavirus (ADMSV). ADMSV sequence abundances were relatively high in pools #21-24, representing 1.97-4.76% of the total reads, while being low (0.15%) in pool #20 (Table 2).

Reoviridae-Related Contigs
Viruses of the family Reoviridae have double-stranded linear RNA genomes, separated in the 9, 10, 11 or 12 segments. Viral RNAs are mostly monocistronic, although in some cases, a second ORF is present. Proteins are always encoded on only one strand of the RNA duplex. The biological properties of reoviruses are quite diverse. Some viruses infect only vertebrates or invertebrates, some are known arboviruses and some viruses replicate in both plants and arthropod vectors [45].
In the current work, we found contigs with similarities to the polymerases (first segment) of the reoviridae-like viruses in pools #23 and 24. A subsequent search allowed us to identify nine more contigs from each pool with homologies to the second to 10th segments of the reoviridae-like viruses. The segments found in the different pools were very close to each other (0.0005-0.029 nucleotide difference). At the same time, they were considerably different from any entries found in the GenBank (28-61% identity, dependent on the segment) (Table S4). This allowed us to conclude that these 10 contigs from each pool represent a single novel reoviridae-like virus. It was named Bercke-Baary Melophagus reo-like virus (BBMRV). The complete BBMRV genome consists of 10 genome segments, with the first one, which contains putative RdRp, being the largest (~4000 nt) and the tenth

Reoviridae-Related Contigs
Viruses of the family Reoviridae have double-stranded linear RNA genomes, separated in the 9, 10, 11 or 12 segments. Viral RNAs are mostly monocistronic, although in some cases, a second ORF is present. Proteins are always encoded on only one strand of the RNA duplex. The biological properties of reoviruses are quite diverse. Some viruses infect only vertebrates or invertebrates, some are known arboviruses and some viruses replicate in both plants and arthropod vectors [45].
In the current work, we found contigs with similarities to the polymerases (first segment) of the reoviridae-like viruses in pools #23 and 24. A subsequent search allowed us to identify nine more contigs from each pool with homologies to the second to 10th segments of the reoviridae-like viruses. The segments found in the different pools were very close to each other (0.0005-0.029 nucleotide difference). At the same time, they were considerably different from any entries found in the GenBank (28-61% identity, dependent on the segment) (Table S4). This allowed us to conclude that these 10 contigs from each pool represent a single novel reoviridae-like virus. It was named Bercke-Baary In the pools where BBMRV was found, its presence was higher than the abundance of any other virus found in the current work: 19.36% in pool 23 and 54.33% in pool 24 ( Table 2). Phylogenetic analysis of the sequence of the viral polymerase places BBMRV together with various reoviruses of Diptera, such as Hubei diptera virus 20 [2], Bobbyc reo-like virus [42] and Bloomfield virus [40] (Figure 4A).

Multiplication of Viruses in Mammalian Cells
Here, we studied the virome of the obligate blood-sucking ectoparasite M. ovinus. Such a lifestyle implies that the viruses it harbors may be transmitted to the mammalian host and potentially cause illness. We tested the ability of the discovered viruses to infect mammalian cells. We used a PEK cell as a model culture because it was previously shown to be able to support the reproduction of different arboviruses, including orbiviruses [46].
We infected a PEK cell with suspensions of the individual keds and then performed a second passage by infecting fresh PEK cells with cultural supernate from infected PEK cells. After this, we tested the supernate collected from the first, second and third passages for the presence of all five viruses described here using virus-specific oligonucleotides (Table S1). Each PCR-positive result was confirmed using Sanger sequencing of the obtained PCR product. When a PCR-positive result was not confirmed using Sanger sequencing, the probe was considered negative. Melophagus reo-like virus (BBMRV). The complete BBMRV genome consists of 10 genome segments, with the first one, which contains putative RdRp, being the largest (~4000 nt) and the tenth segment being the smallest (~1200 nt) ( Figure 4B). Each segment encodes one ORF. Based on the nucleotide sequence of segment one, the BBMRVs found in pools 23 and 24 were extremely similar, with only a 0.0005 divergence.
In the pools where BBMRV was found, its presence was higher than the abundance of any other virus found in the current work: 19.36% in pool 23 and 54.33% in pool 24 ( Table 2). Phylogenetic analysis of the sequence of the viral polymerase places BBMRV together with various reoviruses of Diptera, such as Hubei diptera virus 20 [2], Bobbyc reo-like virus [42] and Bloomfield virus [40] (Figure 4A).

Multiplication of Viruses in Mammalian Cells
Here, we studied the virome of the obligate blood-sucking ectoparasite M. ovinus. Such a lifestyle implies that the viruses it harbors may be transmitted to the mammalian host and potentially cause illness. We tested the ability of the discovered viruses to infect mammalian cells. We used a PEK cell as a model culture because it was previously shown to be able to support the reproduction of different arboviruses, including orbiviruses [46].  (Table 3).
KMIV was detected in ked suspensions #7457 and 13578, in the first passage of suspension #13581, in suspension #7456 only at the second passage, in suspension 13580 in both the first and second passages and in suspension #13579 and in the PEK cell supernate after second passage. ADMSV was detected in suspension #7464, in suspension #13581 only in the first passage, in suspension #7473 in the second passage and in suspensions #13579 and 7477 in the PEK cell supernate after the first and second passages. Moreover, ADMSV was detected in the suspension #7474 line throughout the three passages. Thus, ADMSV was detected in the first, second and third passages of the ked suspensions in the PEK cell culture (Figures S1-S4). This may indicate the ability of this virus to replicate in mammalian cells.

Discussion
In the last decade, meta-transcriptomic studies have revealed the extensive diversity of the RNA viruses of invertebrates [1,2]. Some of the new viruses are considered to be pathogenic arboviruses and are being extensively studied [46][47][48][49][50]. While the viromes of some blood-sucking ectoparasites, such as mosquitoes and ticks, are relatively wellstudied [6,7,9,10], there are few data on less-known species, such as louse flies [2]. Here, we present data on the virome of M. ovinus, a widely distributed sheep ectoparasite. Five pools of two to six specimens collected in the Republic of Tuva, Russia, were studied.
Bluetongue virus and Border disease virus were previously detected in the sheep ked [22,23]; however, no known pathogenic viruses were found in the present study. We were able to assemble five full genomes of novel viruses. All of the novel viruses were fairly divergent from known viruses. They belonged to the four major virus groups-Iflaviridae, Rhabdoviridae, Reoviridae Solemoviridae-representing different genome coding strategies. KMIV and ADMSV had a genome structure similar to the well-known viruses within their supposed virus family.
A segmented structure of some solemoviridae-like viruses was discovered recently, with the genome divided into two separate segments [2]. While we were able to recover both segments from our data, some of the closely related viruses-for example, Erysiphe necator associated solemo-like virus 2, Prestney Burn virus and Jeffords solemo-like virusonly have a sequence homological to segment one in the databases (as of 26 October 2021). This is even more relevant for reoviridae-like viruses. There are nine known segments of the Bloomfield virus, three for Hubei odonate virus 14, two for Hubei diptera virus 20 and only polymerase-encoding for Bobbyc reo-like virus and the Elf-Loch viruses [2,40,42]. Such a situation can arise for various reasons, ranging from a low prevalence of the virus in the sample to the loose homology of some proteins, making them hard to identify as viral proteins with standard procedures. In the current work, many of the proteins of BBMRV had significant similarity via a protein blast only to the homolog proteins of the Bloomfield virus. Moreover, we cannot even be certain that BBMRV has 10 segments, and not 12, as with some of the Reoviridae, as they may be left unidentified due to the low homology. The segment count not only remains one of the taxon-defining features of the Reoviridae [45] but can also shed light on the evolution of these viruses. Thus, it is crucial to try to recover as many viral segments as possible from the data obtained.
Melophagus ovinus is a blood-sucking insect, and it was collected directly from sheep in our study. This means it is possible that the viruses discovered in this work can infect M. ovinus, sheep or circulate between sheep and keds (as arboviruses). The majority of the closest relatives of the discovered viruses are various viruses found in other species (mostly non-parasitic) of the Diptera order. Such phylogenetic relationships imply that all the discovered viruses are arthropod-specific; however, we cannot exclude the possibility of them being sheep viruses (with viral RNA detected in the blood carried by M. ovinus) or arboviruses. Indeed, the viruses discovered in the current work only have a loose homology to the previously known viruses, and there are examples of the purely insectspecific viruses being closely related to arboviruses and vertebrate viruses [51]. We decided to test the ability of the discovered viruses to replicate in the mammalian cells in the PEK cell model. Three passages were performed. All the viruses were detected in at least one line after the first passage, while two (KMIV and ADMSV) were detected after both the first and second passages. Only ADMSV was detected after the third passage.
While detection after the first passage could be explained by detecting diluted RNA from the original ked suspension, it is less likely to be the case for the second passage and even less for the third. With ADMSV being continuously detected throughout three passages, we suggest that it is able to replicate in the mammalian PEK cells. However, we were not able to detect KMIV after the third passage in the PEK cells. It suggests that we were detecting a diluted virus from the ked suspension during the first and second passages. At the same time, in some cases, we were able to detect KMIV in the second passage while not detecting it on the first one (Table 3). Thus, it is possible that KMIV might replicate in the PEK cells on the low level, with our primer pair not being sensitive enough to always detect it. Additional experiments are needed to come to a conclusion on the ability of KMIV to replicate in the PEK cells.
According to the definition of the Subcommittee on the Evaluation of Arthropod-Borne Status, all the viruses can be divided as follows: (1) arbovirus, (2) probable arbovirus, (3) possible arbovirus, (4) probably not arbovirus, (5) not arbovirus. Categories one and five include viruses with their status proven beyond reasonable doubt. If the data on the virus arbovirus nature fail to meet strong criteria, such viruses are registered in categories two and four. The viruses included in category three (possible arbovirus) have data too meager for firm judgment [52]. The phylogenetic relationships of ADMSV suggest that it might be an insect virus. At the same time, our data also imply that it may be able to replicate in mammalian PEK cells. Such data hit on the possibility of ADMSV being an arbovirus, but this is not enough to conclude the arbovirus nature of this virus. This would mark ADMSV as a possible arbovirus, as per the abovementioned definition. Additional data on both replication in the mammalian cell cultures and evidence on vector-host cycling are needed to determine its arbovirus nature.
There are known Rhabdoviridae arboviruses [5]. However, sigmaviruses are known to only be transmitted vertically [53]. Recently, it has been speculated that some of the viruses of louse flies may infect bats [54]. Our data suggest that ADMSV-a louse fly derived sigmavirus-may be able to replicate in mammalian cells. Overall, it seems that the routes of transmission of blood-sucking dipteran sigmaviruses should be studied more thoroughly.

Conclusions
We studied the virome of M. ovinus, a blood-sucking ectoparasite of sheep. The full genomes of five novel viruses were assembled. Phylogenetically, all the discovered viruses clustered mostly with dipteran viruses. Our data suggest that Aksy-Durug Melophagus sigmavirus may be able to replicate in mammalian cells. Such data mark this virus as a possible arbovirus, and additional research is needed to precisely assess its biology.