Expanding the Repertoire of the Plant-Infecting Ophioviruses through Metatranscriptomics Data

Ophioviruses (genus Ophiovirus, family Aspiviridae) are plant-infecting viruses with non-enveloped, filamentous, naked nucleocapsid virions. Members of the genus Ophiovirus have a segmented single-stranded negative-sense RNA genome (ca. 11.3–12.5 kb), encompassing three or four linear segments. In total, these segments encode four to seven proteins in the sense and antisense orientation, both in the viral and complementary strands. The genus Ophiovirus includes seven species with viruses infecting both monocots and dicots, mostly trees, shrubs and some ornamentals. From a genomic perspective, as of today, there are complete genomes available for only four species. Here, by exploring large publicly available metatranscriptomics datasets, we report the identification and molecular characterization of 33 novel viruses with genetic and evolutionary cues of ophioviruses. Genetic distance and evolutionary insights suggest that all the detected viruses could correspond to members of novel species, which expand the current diversity of ophioviruses ca. 4.5-fold. The detected viruses increase the tentative host range of ophioviruses for the first time to mosses, liverwort and ferns. In addition, the viruses were linked to several Asteraceae, Orchidaceae and Poaceae crops/ornamental plants. Phylogenetic analyses showed a novel clade of mosses, liverworts and fern ophioviruses, characterized by long branches, suggesting that there is still plenty of unsampled hidden diversity within the genus. This study represents a significant expansion of the genomics of ophioviruses, opening the door to future works on the molecular and evolutionary peculiarity of this virus genus.


Introduction
A vast number of viruses are being discovered in this new metagenomic era, revealing a multifaceted and diverse evolutionary landscape of replicating entities and the complexities associated with their arduous classification [1]. Several strategies to lever this dynamically growing wide-ranging assemblage of viruses have led to an initial comprehensive proposal to generate a virus world megataxonomy [2]. Despite extensive and broad efforts to characterize the virus share of the biosphere, only an infinitesimal portion, which probably embodies less than one percent of the virosphere, appears to be characterized so far [3]. Consequently, our knowledge about the massive global virome, with its outstanding diversity and including every prospective host organism assessed so far, is scarce [4][5][6]. Data mining of publicly available transcriptome datasets derived from high-throughput sequencing (HTS) has become an efficient and inexpensive strategy to uncover the hidden diversity of the plant virosphere [5,7]. Data-driven virus discovery emerges in the context of a massive number of open datasets in the Sequence Read Archive (SRA) of the National Center for Biotechnology Information (NCBI). This wonderful reserve of sequences, which 2 of 21 is growing at an exceptional rate, represents a substantial (but still limited and biased) portion of all the organisms that populate our world, and the NCBI-SRA database is an efficient and cost-effective resource to identify novel viruses [8]. From a virus taxonomy perspective, a consensus statement has defined that viruses that are known only from metagenomic data can, and should, be incorporated into the official classification scheme of the International Committee on Taxonomy of Viruses (ICTV) [9].
Ophioviruses (genus Ophiovirus, family Aspiviridae) are plant-infecting viruses with non-enveloped, filamentous, naked nucleocapsid virions. Members of the genus Ophiovirus have a segmented single-stranded negative and possible ambisense RNA genome, encompassing three or four linear segments (in total ca. 11.3-12.5 kb) [10]. These segments encode four to seven proteins in the sense and antisense orientation, both in the viral and complementary strands [10]. The genus Ophiovirus includes seven recognized species with viruses infecting both monocots and dicots, mostly trees, shrubs and some ornamentals, and four out of these seven species are reported to be transmitted via soil-borne fungus of the genus Olpidium spp [10]. From a genomic perspective, as of today, there are complete genomes available for only four of these seven member species. In the context of a systematic expansion of virus discovery supported by the extensive use of HTS, a plethora of novel viruses of many families from diverse plants has been described. Nevertheless, to our knowledge, the diversity of ophioviruses appears to have stagnated, with no new ophiovirus species recognized by the ICTV since 2015. Two recent works have described the complete genome of a novel proposed ophiovirus associated with carrot, carrot ophiovirus 1 (CaOV1) [11], and another found in pepper, pepper chlorosis-associated virus (PCaV) [12]. In addition, the segment that encodes the capsid protein (CP) of a putative novel ophiovirus was assembled from transcriptomic data of Dactylorhiza hatagirea [13]. This is the first study oriented to identify and characterize ophiovirus sequences that are hidden in publicly available metatranscriptomic data, which resulted in the identification and characterization of 33 novel tentative ophioviruses. Our findings significantly expand the status quo of the genomics of ophioviruses, opening the door to future works on the molecular and evolutionary peculiarities of this virus genus and the Aspiviridae family.

Identification of Ophiovirus Sequences from Public Plant RNA-Seq Datasets
Two strategies were used to detect ophiovirus sequences. (1) Assembled and raw sequence data corresponding to the 1K study [14] were explored using tBlastn searches (E-value < 1e −5 ) for ophiovirus sequences using the NCBI-refseq proteins of ophioviruses in the 1KP:BLAST tool (https://db.cngb.org/onekp, accessed on 20 January 2023), and hits were curated with the raw SRA data retrieved from the NCBI BioProject PRJEB4922.
(2) The Serratus database was analyzed, employing the serratus explorer tool [5] using as the query the predicted RNA-dependent RNA polymerase protein (RdRP) of ophioviruses available in the NCBI-refseq database. The SRA libraries that matched the query sequences (alignment identity > 45%; score > 10) were further explored in detail.

Sequence Assembly and Virus Identification
Virus discovery was implemented as described elsewhere [15,16]. In brief, the raw nucleotide sequence reads from each SRA experiment that matched the query sequences in both the 1k and Serratus platforms were downloaded from their associated NCBI BioProjects ( Table 1). The datasets were pre-processed by trimming and filtering with the Trimmomatic v0.40 tool as implemented in http://www.usadellab.org/cms/?page=trimmomatic, accessed on 20 January 2023 with standard parameters except quality required, which was raised from 20 to 30 (initial ILLUMINACLIP step, sliding window trimming, average quality required = 30). The resulting reads were assembled de novo with rnaSPAdes using standard parameters on the Galaxy server (https://usegalaxy.org/, accessed on 20 January 2023). The transcripts obtained from the de novo transcriptome assembly were subjected to , accessed on 20 January 2023. The resulting viral sequence hits of each dataset were explored in detail. Tentative virus-like contigs were curated (extended and/or confirmed) by iterative mapping of each SRA library's filtered reads. This strategy is used to extract a subset of reads related to the query contig, use the retrieved reads from each mapping to extend the contig and then repeat the process iteratively using as query the extended sequence. The extended and polished transcripts were reassembled using the Geneious v8.1.9 (Biomatters Ltd., Boston, MA, USA) alignment tool with high sensitivity parameters.

. Sequence Analyses
ORFs were predicted with ORFfinder (minimal ORF length 150 nt, genetic code 1, https://www.ncbi.nlm.nih.gov/orffinder/, accessed on 20 January 2023) and the functional domains and architecture of translated gene products were determined using InterPro (https://www.ebi.ac.uk/interpro/search/sequence-search, accessed on 20 January 2023) and the NCBI Conserved domain database-CDD v3.20 (https://www.ncbi.nlm.nih.gov/ Structure/cdd/wrpsb.cgi, accessed on 20 January 2023) with e-value = 0.01. Furthermore, HHPred and HHBlits as implemented in https://toolkit.tuebingen.mpg.de/#/tools/, accessed on 20 January 2023 were used to complement the annotation of divergent predicted proteins with hidden Markov models. Transmembrane domains were predicted using the TMHMM version 2.0 tool (http://www.cbs.dtu.dk/services/TMHMM/, accessed on 20 January 2023). The predicted proteins were then subjected to NCBI-BLASTP searches against the non-redundant protein sequences (nr) database to filter out any virus-like sequences that did not show an ophiovirus protein as best hit.

Pairwise Sequence Identity
Percentage amino acid (aa) sequence identities of the predicted CP protein of the ophioviruses identified in this study, as well as those available in the NCBI database, were calculated using SDTv1.2 [42] based on MAFFT 7.505 (https://mafft.cbrc.jp/alignment/ software, accessed on 20 January 2023) alignments with standard parameters. Virus names, abbreviations and NCBI accession numbers of ophioviruses already reported are shown in Supplementary Table S1.

Phylogenetic Analysis
Phylogenetic analysis based on the predicted CP protein or the polymerase protein of all available ophioviruses was carried out using MAFFT 7.505 with multiple aa sequence alignments using G-INS-i and E-INS-i as the best-fit model, respectively. The aligned aa sequences were used as input to generate phylogenetic trees through the maximumlikelihood method with the FastTree 2.1.11 tool available at http://www.microbesonline. org/fasttree/, accessed on 20 January 2023. Local support values were calculated with the Shimodaira-Hasegawa test (SH) and 1000 tree resamples. The capsid proteins of two selected cytorhabdoviruses (alfalfa dwarf virus YP_009177015 and lettuce necrotic yellows virus YP_425087) were used as the outgroup in the CP tree. The polymerase proteins of three related and unclassified aspivirus-like viruses (nees' pellia aspi-like virus CAH2618860, Plasmopara viticola lesion ass. mycoophiovirus 1 QJX19787, grapevine-associated serpentolike virus 1 QXN75438) were used as the outgroup in the polymerase trees. To explore the potential phylogenetic co-divergence of ophioviruses with their associated host plants, plant host cladograms were generated in phyloT v.2 (https://phylot.biobyte.de/, accessed on 20 January 2023), based on NCBI hierarchical taxonomy. Host associations were based on connections manually inferred between viral and plant phylogram and cladograms.

Summary of Discovered Ophiovirus Genomic Sequences
In this study, through the identification, assembly and curation of raw NCBI-SRA reads of publicly available transcriptomic data, we identified genomic evidence of 33 novel ophioviruses. Full-length viral genome sequences were obtained for 12/33, and 5/33 of the putative viruses had all their RNA segments detected, while 16/33 had some missing, mostly derived from the technical difficulties of assembling segments that are at relatively low RNA levels during infection such as RNA 1 (Table 1, Supplementary  Table S2). Importantly, 85% of the identified viruses included the detection of two or more RNA segments of the virus in the same sequencing library, which improved the level of confidence in the discovery. The detected viruses were associated with 33 different plant host species (  Figure 1). S2). Importantly, 85% of the identified viruses included the detection of two or more RNA segments of the virus in the same sequencing library, which improved the level of confidence in the discovery. The detected viruses were associated with 33 different plant host species (Table 1). The majority of the host plants were herbaceous dicots, with 20 out of 33 identified as such. The remaining hosts were herbaceous monocots, liverworts, mosses and ferns (Table 1). The genomes of 15 out of 17 viruses with all RNA segments annotated had three segments, while two monocot-associated ophioviruses had four segments (Table 1, Figure 1).  Table 1.

Structural and Functional Annotation of Ophiovirus Sequences
The RNA segments of the detected viruses were found to encode various proteins, including the polymerase, movement protein, and capsid protein. The RNA 1 encoded two proteins at 3′ of the vcRNA, a large 261-280 kDa protein including the core polymerase module with the typical conserved motifs "A-E" of the RdRP, with the expected SDD  Table 1.

Structural and Functional Annotation of Ophiovirus Sequences
The RNA segments of the detected viruses were found to encode various proteins, including the polymerase, movement protein, and capsid protein. The RNA 1 encoded two proteins at 3 of the vcRNA, a large 261-280 kDa protein including the core polymerase module with the typical conserved motifs "A-E" of the RdRP, with the expected SDD signature sequence in motif "C" (Mononeg_RNA_pol, pfam00946). Separated by an intergenic region, the other ORF at 5' of the vcRNA, encoded a small protein with a size that ranged from 105 to 245 amino acids (aa) (Figure 1). Interestingly, this small protein was quite diverse in most of the viruses identified in this study, and no hits were found when BLASTP searches were conducted ( Table 1). The vcRNA 2 encoded a putative movement protein (MP) ranging from 47 to 58 kDa, and all the predicted MP proteins presented the 30K core MP domain (30K_MP, pfam17644). In addition, a few detected viruses encoded a small 6-10 kDa protein in the vRNA 2 with no blast hits or conserved domains, supporting the possibility of the ambisense coding strategy suggested for MLBVV. The vcRNA 3 encoded the capsid protein [10,43], ranging from 48-57 kDa and presenting an ssRNA negative plant viral coat protein nucleocapsid domain (Nucleocap, pfam11128) and no additional ORFs (Figure 1). The RNA 4 encoded a protein with unknown function with a size that ranged between 322 and 360 aa, in some instances including an overlapped ORF encoding a 10-12 kDa protein of unknown function. Nuclear localization signals were also found in the polymerase, MP and CP encoded by the viruses identified in this study

Pairwise Identities of Ophiovirus Sequences and Species Demarcation Criteria
The pairwise aa sequence identities between the CP proteins of all reported ophioviruses, including those identified in this study, showed great diversity with an identity ranging from 14.2% to 98.9%, but importantly with a mean identity of only 32.1% (Supplementary Figure S1). Using the molecular criterion for species demarcation threshold of 85% aa identity of the CP [10], all ophioviruses with complete CP coding regions assembled in this study with an identity below 85% were tentatively deemed to be members of new ophiovirus species (Supplementary Figure S2), increasing the number of potential members of the genus more than 4.5-fold. We suggest potential latinized binomial virus species names to include the viruses described here as members of novel species within the genus Ophiovirus (Table 2).

Phylogenetic Relationships between Ophioviruses and Hosts
Phylogenetic analyses based on the deduced CP protein aa sequences of the detected viruses revealed a complex evolutionary history, showing distinctive groups and associations ( Figure 2). One cluster included a group of 11 viruses with affinities to BlMaV, six to CPsV and a novel basal group of two viruses detected in Asteraceae-plants ( Figure 2). The other known clade of five ophioviruses was expanded with two grass viruses with affinities to LRNV, and the recently reported CaOV1 and PCaV were linked to the ML-BVV/TMMMV group and the freesia sneak virus (FreSV) and ranunculus white mottle virus (RWMV) group, respectively. More distantly, three small groups of viruses were found including four new viruses of orchids, and the third most basal group with very large branches of a virus associated with a poacea and another one with the aquatic plant Zostera japonica. Furthermore, a novel divergent clade was found, mostly represented by viruses detected in basal plants such as mosses, liverworts and ferns (Figure 2). Additional phylogenetic analyses based on the deduced RdRP protein aa sequences showed a similar evolutionary history of the corresponding viruses to the one predicted with the CP protein (Supplementary Figure S3), that is, shared local clustering of many viruses indicating co-divergence in both the CP and RdRP trees, consistent with a common phylogenetic trajectory (Supplementary Figure S3). In addition, we generated a tanglegram to compare the virus phylogram and plant host cladogram to further explore potential virus-host relationships (Figure 3 and Supplementary Figure S4). This analysis showed that viruses of some clades clearly co-diverged with their hosts, including an orchid-associated virus clade and a clade of fern, moss and liverwort viruses (Figure 3 and Supplementary Figure S4).

Discovery of Novel Ophioviruses Expands Their Diversity and Evolutionary History
Known ophioviruses are agronomically relevant, including viruses generating detrimental infections and disease in crops and ornamental plants. This status quo is grounded on a tradition of biased sampling oriented to virus discovery in symptomatic and economically important plants. In this scenario, ophiovirus presence is not expected in the sequencing libraries of non-symptomatic vegetables; thus, they are ideal candidates to be identified through the mining of publicly available metatranscriptomic data. However, in the context of massive efforts directed to virus discovery in plants, as of today, only the partial genome of just one novel tentative ophiovirus was discovered when publicly available transcriptome datasets were mined [13]. Therefore, to assess whether this apparently limited ophiovirus diversity was biological or technical, we directed our efforts to specifically address ophiovirus discovery. We extensively searched for these viruses in already available plant transcriptome datasets to expand the repertoire of plant-infecting ophiovirus. This in silico-driven search resulted in the identification of virus sequence evidence of 33 novel ophioviruses. We also detected three novel variants of members of two known ophiovirus species. This substantial number of newly discovered putative ophioviruses represents a 4.5-fold increase in the known ophioviruses, which undoubtedly shows the importance of data-driven virus discovery to expand our understanding of the genomic diversity and peculiarities of virus taxa, such as the ophiovirus.

Host Range and Genomic Organization of the Novel Ophioviruses
Most of the host plants in which the novel viruses of this study were identified are herbaceous dicots, which, overall, are the most common hosts of known ophioviruses. Ophioviruses were detected in liverworts, mosses and ferns for first time, thus expanding the host range of these viruses. Only two viruses with all RNA segments annotated had four segments, which is also a genomic organization of the ophioviruses Mirafiori lettuce big-vein virus (MLBVV), lettuce ring necrosis virus (LRNV) [10] and the recently reported carrot ophiovirus 1 [11]. Thus, the most frequent genomic organization found for ophioviruses consists of three RNA segments.

Genomic Features of the Discovered Ophioviruses
Like all previously reported ophioviruses [10], the RNA1 encoded the polymerase and a small protein. The RNA 1 small protein of the citrus psorosis virus (CPsV), the 24K protein, has been described to localize at the nucleus, is involved in miRNA misprocessing in citrus [44] and is an RNA-silencing suppressor [45]. The RNA2 encoded the putative MP, which was characterized as a cell-to-cell MP for CPsV (54K protein) and MLBVV (55K protein) [46,47]. All the predicted MP proteins detected presented the 30K core MP domain including the signature aspartate involved in cell-to-cell movement [48]. In addition, in the vRNA2, a highly divergent small protein was found to be encoded by few of the identified viruses, which is consistent with the proposed ambisense nature of RNA2 postulated for MLBVV, which harbors a 10 kDa protein of unknown function at the same locus [49]. Further, the RNA3 encoded the CP [10,43], with its typical ssRNA negative nucleocapsid domain. The RNA 4, which we identified only in three monocotassociated viruses, encoded a protein with unknown function. MLBVV RNA 4 contains a second overlapping ORF with no initiation codon and is proposed to be expressed by a + 1 translational frameshift, encoding a 10.6 kDa protein [49]. We failed to detect a similar additional overlapped ORF in the identified viruses, but we tentatively annotated a small ORF encoding a 12 kDa protein that was separated by an intergenic region at 3' of the vcRNA 4 of Agrostis ophiovirus, which was conserved in the virus sequences of both plant hosts where these viruses were detected. Similarly to what was previously reported for ophioviruses [10], we identified nuclear localization signals in the polymerase, MP and CP encoded by the ophioviruses identified in this study.

Sequence Diversity and Evolutionary Clues of Identified Ophioviruses
A great diversity was found within the pairwise aa sequence identities between the CP proteins of all reported ophioviruses, including those identified in this study. The overall low sequence identity determined suggests that there is likely a substantial amount of undiscovered ophioviruses that may inhabit this virus space, despite the numerous viruses identified in this study. The genetic distance assessment was complemented with phylogenetic insights to provide evolutionary clues of the identified viruses.
Previous studies placed the ophiovirus in two distinct clades, one including a closer relationship between MLBVV and tulip mild mottle mosaic virus (TMMMV) and a separate clade conformed by blueberry mosaic-associated virus (BlMaV) and CPsV. These two are placed more distantly to the other ophioviruses, suggesting that this might lead to the re-assignment of the existing species into two separate genera [10]. On the one hand, the long branches linking BlMaV and CPsV in previous analyses [10] undoubtedly constituted viral "dark matter", as at least 19 new viruses expand the bounds of the viral sequence space between these two viruses, including a novel basal group of two viruses detected in Asteraceae plants. The other clade was expanded with two grass viruses with affinities. Three small groups of viruses were found with a distant evolutionary history, including a virus associated with the aquatic plant Zostera japonica. Interestingly, a few years ago, the first endogenous sequence of an ophiovirus was detected in the genome of the related eelgrass Zostera marina [50]. In the genome of this plant, a CP-like sequence was found, flanked by transposable elements, suggesting an ancient shared evolutionary history of eelgrass and ophioviruses, and the possibility that this group of plants might host contemporary ophioviruses, which is in line with the detected virus hosted by eelgrass in this work. Moreover, we found a novel divergent clade that consisted of viruses associated with basal plants such as mosses, liverworts and ferns, which represents the first association of ophioviruses with non-vascular plants and pteridophytes. The phylogenetic analyses based on the deduced RdRP protein aa sequences showed a similar evolutionary history of the corresponding viruses, supporting the results based on CP assessment. For instance, fern-, moss-and liverwort-associated ophioviruses clustered together both in CP-and RdRPbased trees, suggesting that they share a unique evolutionary history among ophioviruses. The tanglegram showed that the orchid-associated virus clade and the clade of fern, moss and liverwort viruses clearly co-diverged with their hosts, suggesting a shared host-virus evolution in these groups. Nevertheless, the tanglegram topology also showed that for many of the ophioviruses, there is no apparent concordant evolutionary history with their potential plant hosts.

Ophiovirus Tentative Taxonomical Classification
The distinctive phylogenetic clustering and the significant divergence in terms of aa identity of the predicted proteins of several of the identified viruses raises questions about taxonomic classification. Currently, the family Aspiviridae includes a single recognized genus with seven member species, and following the molecular criterion for ophiovirus species demarcation of a CP amino acid sequence identity <85%, we suggest that all the identified viruses in this study could be members of novel species, which were named based on current guidelines [51]. Nevertheless, it has not escaped our notice that eventually, some of the groups of viruses reported here, if recognized, could be included in new genera within the Aspiviridae family, applying a genus demarcation criterion still not defined. The outstanding divergence we found in some identified viruses highlights the need for novel approaches to classify this emerging ophio-like virus diversity. For instance, a percentage CP identity threshold could also be defined as a genus demarcation criterion (e.g., <40-45%), which should be integrated with predictions based on phylogenetic insights. Moreover, the existence of unclassified aspi-like viruses reported with as yet unknown CP predicted proteins raises the possibility of using other genetic markers. One possibility to define subfamilies within Aspiviridae could be implemented by using an identity threshold of the RdRP as a molecular criterion (e.g., <30% identity), as is the case for several RNA virus families.

Potential Vectors and Transmission Modes
Members of four out of the seven ophiovirus species recognized so far are reported to be transmitted via soil-borne fungus of the genus Olpidium [10], while for CPsV, which is transmitted by vegetative propagation of the host, no natural vector had been identified [10]. Nevertheless, while we assessed thousands of sequencing libraries in the Serratus platform, we failed to robustly detect ophiovirus-like sequences in any fungal library. Interestingly, one of the ophioviruses identified in this study was discovered in a transcriptome dataset of bumblebees. Further inspection of the raw reads of this dataset retrieved a significant amount of plant reads, which, based on rRNA analysis, corresponded to the Boraginacea family. We tentatively linked this virus to this family of plants, and we cautiously speculate on the possibility that this ophiovirus could be pollen-associated and transported to other plants by bumblebees. In this line, a recent study characterized the pollen virome of wild plants, identifying plenty of pollen-associated viruses, but no ophioviruses [52]. Moreover, these authors found that the pollen virome is visually asymptomatic. This anecdotal observation and our difficulties in detecting ophiovirus-like sequences in fungal libraries could provide some grounds for the possibility that a share of ophioviruses could be vertically transmitted. Other lines of evidence could support this suggestion: i) host-virus co-divergence in some clades may implicate isolation and a lack of horizontal transmission and ii) an emerging characteristic persistent, chronic infections of several plant viruses that are vertically transmitted are latent/asymptomatic infections, a feature that could be shared by ophioviruses. Thus, further studies should be carried out to elucidate alternative transmission modes of ophioviruses beyond the fungally transmitted MLBVV, TMMMV, LRNV and FreSV [53,54].

Limitations of Sequence Discovery through Data Mining
There are many limitations in this study, for instance, the incapacity to return to the original biological material to repeat and check the assembled viral genome sequences is a noteworthy restriction of the data mining approach for virus discovery. Another restriction is derived from difficulties during the assembly of genome segments represented at relatively low viral RNA titters in sequencing libraries (e.g., RNA 1). This resulted in many detections where we failed to assemble complete or nearly complete genomes, or where the level of confidence on the consensus sequence is lower. The reader may find Supplementary Table S2 useful to assess the robustness of each identified virus sequence based on several metrics. Similarly, contamination, low sequencing quality, spill over and other technical artefacts could result in false positive detections, chimeric assemblies or poor host assignment. New RNAseq datasets derived from the predicted plant hosts would definitely improve and complement our results. In addition, a lack of a directed strategy to address virus segment termini, such as RACE, results in difficulties in determining bona fide RNA virus ends, which have conserved functional and structural cues in ophioviruses [10]. Some aspects of our strategy for virus discovery can overcome several of these limitations, providing additional evidence on identification, for instance, the detection of the same putative virus in independent libraries from the same plant host, a robust depth coverage of virus reads, the detection of more than one RNA segment of the virus in the same library or the detection of strains of a virus in evolutionarily related plants. Nevertheless, associations and detections should be complemented by further studies.

Conclusions
In summary, this study illustrates the significance of the analysis of NCBI-SRA public data as a valued tool to not only accelerate the discovery of novel viruses but also to increase our understanding of their evolution and to improve virus taxonomy. Using this approach, we looked for hidden ophio-like virus sequences to expand the repertoire of these viruses, expanding the potential existing members within the genus4.5-fold. Additionally, we fostered the most comprehensive phylogeny of ophioviruses to date and shed new light on the phylogenetic relationships and evolutionary landscape of this group of viruses. Future studies should focus not only on complementing our genomic predictions, but also on providing clues for the biology and ecology of these viruses such as associated symptomatology, transmission and putative vectors.

Supplementary Materials:
The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/v15040840/s1. Figure S1. Plot of frequency of percentage pairwise identity of ophiovirus complete capsid proteins generated using SDT v1.2 software based on MAFFT amino acid sequence alignments. Figure S2. Pairwise identity matrix of the amino acid sequences of the ophiovirus complete capsid proteins generated using SDT v1.2 software based on MAFFT alignments. The colored cut-off is based on ICTV demarcation criteria of ophioviruses, which include CP amino acid sequence identity <85% to be considered novel species (blue-light blue). Figure S3. Maximum-likelihood phylogenetic tree based on the amino acid MAFFT sequence alignments of the RdRp protein of all the ophioviruses reported thus far and in this study. The scale bar indicates the number of substitutions per site. The node labels indicate FastTree support values. The RdRp proteins of three related and unclassified aspivirus-like viruses (nees' pellia aspi-like virus CAH2618860, Plasmopara viticola lesion ass. mycoophiovirus 1 QJX19787, grapevine-associated serpento-like virus 1 QXN75438) were used as outgroup. Figure S4. Tanglegram contrasting phylogenetic relationships of the ophioviruses predicted with the CP protein (left) against an RdRP protein maximum-likelihood phylogenetic tree shown on the right. Links of well-supported clusters of viruses co-diverging in both trees are indicated in colors. Viruses corresponding to members of ICTV-recognized species are depicted in blue. The scale bar indicates the number of substitutions per site. Table S1. Virus names, abbreviations and NCBI accession numbers of ophiovirus sequences used in this study. Table S2. Additional data of each assessed NCBI-SRA library.