High-Throughput Sequencing of Small RNAs for Diagnostics of Grapevine Viruses and Viroids in Russia

The use of high-throughput sequencing (HTS) technology has led to significant progress in the identification of many viruses and their genetic variants. In this study, we used the HTS platform to sequence small RNAs (sRNAs) of grapevine to study the virome. Isolation of RNA was performed using symptomatic grapevines collected from commercial vineyards in Krasnodar Krai in 2017–2018. To determine the viromes of vineyards, we used an integrated approach that included a bioinformatic analysis of the results of sRNA HTS and the molecular method RT-PCR, which made it possible to identify 13 viruses and 4 viroids. Grapevine leafroll-associated virus 4 (GLRaV-4), Grapevine Syrah Virus-1 (GSyV-1), Raspberry bushy dwarf virus (RBDV), Australian grapevine viroid (AGVd), and Grapevine yellow speckle viroid 2 (GYSVd-2) were identified for the first time in Russia. Out of 38 samples analyzed, 37 had mixed infections with 4–11 viruses, indicating a high viral load. Analysis of the obtained sequences of fragments of virus genomes made it possible to identify recombination events in GLRaV-1, GLRaV-2, GLRaV-3, GLRaV-4, GVT, GPGV, GRSPaV, GVA, and GFLV. The obtained results indicate a wide spread of the viruses and a high genetic diversity in the vineyards of Krasnodar Krai and emphasize the urgent need to develop and implement long-term strategies for the control of viral grapevine diseases.


Introduction
The viticulture and viniculture industries are important economic constituents in many countries [1]. Due to a significant decrease in productivity, grapevine diseases caused by viral pathogens are major obstacles to the production of grapes and wine [2,3].
Grapevines are the host for the largest number of viruses among cultivated species [4]. More than 85 grapevine-infecting viruses are currently known [5]. About 30 of them are caused by the 4 disease complexes [3]. Sixteen nepoviruses are associated with infectious degeneration and decline and characterized by infectious malformations, yellow mosaic, vein banding, stunting and decline [6]. Five viruses are associated with grapevine leafroll disease and cause the reddening or yellowing of leaves with veins remaining green and the down-rolling of leaves [7][8][9]. Six viruses among them are vitiviruses and Grapevine rupestris stem pitting-associated virus (GRSPaV), which affect the woody cylinder, cambium tissues, bark and associated with rugose wood complex [10]. Four viruses of the fleck complex cause clearing of the small veins, with the appearance of translucent spots, wrinkling, twisting and curling upward of the leaves [3].
Traditional diagnostics methods based on the use of immunological (ELISA and lateral flow immunoassay) or molecular methods (PCR, reverse transcription PCR (RT-PCR), real-time RT-PCR, nested PCR, DNA microarray, and loop-mediated isothermal amplification) only allow the detection of viruses if information about the genome of the virus or antibodies against it is available. In contrast, high-throughput sequencing (HTS) technology provides a unique opportunity to detect the presence of both previously described and unknown viruses or viroids in the sample [11][12][13].
Despite the broad capabilities of high-throughput sequencing, there is currently no universal protocol for detecting viruses and obtaining complete information about the nucleotide sequence of any viral genome. The choice of protocol depends on the type of plant, the tissue under study, the virus to be detected, and the experience of the scientist [14,15].
One of the most commonly used methods for the detection of grapevine viruses is sRNA sequencing. It is believed that by using sRNA HTS, it is difficult to detect viruses that are constantly present in a plant, and it is not always possible to assemble complete genomes of new viruses [15]. However, despite existing difficulties, this approach has made it possible to establish sequences of five new viruses, including Grapevine Pinot Gris virus [16], Grapevine Roditis leaf-discoloration-associated virus [17], Grapevine vein clearing virus [18] and others [19,20], and examine the viromes of vineyards [16,[21][22][23].
The aim of this study was to investigate the viromes of vineyards and to study the distribution of viruses in vineyards in Krasnodar Krai, using the sRNA high-throughput sequencing method.

Bioinformatics Analysis of HTS Results
The sequencing reads were processed using the Geneious Prime v. 2020.0.4 software package (Biomatters, Ltd., Auckland, New Zealand) [25] according to the scheme presented in Figure 1. We used two bioinformatics pipelines named 'with Vitis' and 'Vitis free'. According to both pipelines, the reads were trimmed using the BBDuk tool (minimum length 10 bp, minimum quality 20), and quality control was carried out. Then, for the pipeline 'Vitis free', the reads were mapped to the grapevine reference genome (GCF_000003745), and unmapped reads were used for the removal of duplicated reads to obtain unique (nonredundant) reads and their de novo assembly. The step of mapping to the grapevine reference genome was avoided in the pipeline 'with Vitis', and redundant reads were directly used for the removal of duplicated reads and de novo assembly. For de novo assembly of non-redundant reads according to both pipelines, we used the Geneious and SPAdes assemblers and tested different kmers (−15, −17, −19). The resulting contigs were analyzed using blastn against the NCBI GenBank RefSeq of viruses and viroids (release date 31 January 2021). For further analysis, we selected contigs of plant viruses with E-Values below −5 to obtain the list of viruses and viroids found in the sRNA libraries. These contigs were also analyzed using the blastn tool in NCBI.
To assess the number of reads corresponding to each virus, we mapped redundant and non-redundant reads of both pipelines to the reference sequences of detected viruses with different sensitivity levels: low and medium-low sensitivity with default program parameters as well as custom sensitivity. The parameters of the custom sensitivity were as follows: gaps maximum per read-10%, maximum gap size-3, word length-18, ignore words repeated more than 12 times, maximum mismatches per read-10%, index word length-13, and maximum ambiguity-4. The coverage of the genomes of the detected viruses was calculated as a percentage after mapping the obtained reads to the corresponding reference genomes.

Data Validation by RT-PCR
Validation of the presence of each virus detected in silico was initially performed by RT-PCR on all pools of total RNA used for the sRNA HTS library preparation. Only viruses and viroids for which at least one contig with an E-Value below −5 was found were validated. If a virus was detected in the pool, we carried out PCR analysis on each total RNA sample of the corresponding library. Reverse transcription was performed using 1 µg of total RNA, Random Hexamer primers, and the RevertAid H Minus Reverse Transcriptase (EP0452, Thermo Fisher Scientific, Waltham, MA, USA) in accordance with the manufacturer's protocol.
One microliter of cDNA was used to perform PCR for the 18S rRNA gene for quality control and then to detect viruses using Taq DNA Polymerase (EP042, Thermo Fisher Scientific, USA) and previously published primers as well as primers designed on the basis of HTS reads (Supplementary Table S2). The visualization of PCR results was carried out by electrophoresis in 1.2% agarose gel stained with ethidium bromide. One amplicon was sequenced for each virus from two primers using the Sanger method using the Big Dye Terminator v.3.1 Cycle Sequencing Kit on an ABI PRIZM 3730 DNA Analyzer (Thermo Fisher Scientific, USA) in accordance with the manufacturer's instructions.
Sequencing results were analyzed using Finch TV 1.4.0 software [26]. After validation, the assembled sequences were submitted to GenBank (www.ncbi.nlm.nih.gov/genbank/) (accessed on 30 November 2021) [27]. Additionally, we determined the Pairwise Identity (%) by comparing the validated sequences with each other, with the reference genomes (using the Geneious Prime), and with the closest genomes (according to the results of the blastn analysis) (in NCBI).

Phylogenetic Analysis
To carry out the phylogenetic analysis, we amplified a fragment of the RNA2 polyprotein gene for GFLV and GRVFV, a fragment of the hsp70 gene for GLRaV-4, and fragments of coat protein genes for GLRaV-1, GLRaV-2, GLRaV-3, GFkV, GVT, GPGV, GRSPaV, GVA, GSyV-1, and RBDV. From each vineyard, we considered one sequenced isolate for this analysis. PCR was performed with 1 µL of cDNA using Phusion Hot Start II High-Fidelity DNA Polymerase (F530L, Thermo Fisher Scientific, USA) and previously published primers or those designed on the basis of HTS reads (Supplementary Table S3). The real-time PCR thermal conditions for all genes were 95 • C for 10 min, followed by 40 cycles of 95 • C for 10 s, and 55 • C (signal collection temperature) for 1 min. The amplification conditions were 98 • C for 30 s, followed by 40 cycles of 98 • C for 10 s, 56 • C for 30 s, 72 • C for 30 s per 1 kb, and a final elongation step of 72 • C for 10 min. The amplicons were sequenced bidirectionally using the Sanger method, and the nucleotide sequence was assembled using the Finch TV 1.4.0 program and submitted to GenBank (www.ncbi.nlm.nih.gov/genbank/) (accessed on 30 November 2021) [27] (Supplementary Table S4). Moreover, to determine the GLRaV-2 group, we used in silico RFLP analysis [28,29]. Multiple sequence alignment was performed using the Geneious Alignment tool of the Geneious Prime program. To determine the pairwise identity (%), the resulting sequences were compared with each other, with the reference genome, and with the closest genome. Phylogenetic trees were constructed using the maximum likelihood (ML) method with 1000 bootstrap replicates [30] in the RDP v4.10 program [31].

Recombination Analysis
The nucleotide sequences used for the phylogeny study of the detected viruses and others taken from GenBank were analyzed for the presence of recombinations, using the RDP v4.10 program with default program parameters. This program includes a set of eight recombination detection tools [31]. Recombination sites identified by three or more of the eight tools were considered 'significant and clear recombination events' [32]. Sites detected by two or fewer methods were considered 'tentative recombination events'.

Phytosanitary Monitoring
Phytosanitary monitoring of 12 commercial vineyards planted with 18 cultivars was carried out in Krasnodar Krai from July 2017 to September 2018 ( Figure 2). Among the selected vines, cultivars of Russian and foreign selection that are widespread worldwide were identified. Out of the 38 selected plants, 14 samples showed leafroll symptoms, four showed mosaic symptoms, 14 showed fanleaf symptoms, and 14 showed reddening of varying extents. We also observed shoot deformation, points along the veins and over the entire surface of the leaf, circles on the leaves, overgrowth, and diminishment of absence of leaves (Supplementary Table S1). Photographs of the selected plants are presented in the Supplementary Photographic Materials.

Results of the sRNA HTS Bioinformatics Analysis
As a result of sequencing of the sRNA libraries, we obtained 14.8-20.6 million reads for each library. After trimming the data, 14.3-20 million reads remained in the pipeline 'with Vitis' and 8.3-16 million redundant reads in the pipeline 'Vitis free'. The numbers of nonredundant reads were much lower, 1.8-2.8 million and 1.8-2.8 million reads, respectively (Supplementary Table S5).
After assembling unique reads using the SPAdes assembler, we obtained 156-1128 contigs with N50 from 149 until 227 bp. At the same time, the Geneious assembler allowed us to assemble a larger number of contigs: 136,580-259,103 but of smaller length (N50 = 56-57 bp). Contig assembly using SPAdes and Geneious with kmer15, −17, and −19 showed that, in most cases, assembling with kmer15 allowed more virus contigs to be detected than assembling with kmer17 and kmer19 (Supplementary Table S6).
When mapping all reads to the reference virus genomes, we also evaluated a number of Geneious program settings: low sensitivity, medium-low sensitivity, and custom sensitivity. As a result, we determined that, for our data, the best option was to map the reads using the 'custom sensitivity' settings. Using the default 'medium-low sensitivity' settings of the program resulted in the mapping of non-specific reads and a large number of gaps. The 'low sensitivity' parameters were found to be too strict for our data, i.e., reads with a small number of substitutions were not mapped to the reference sequence under these conditions. The results obtained from mapping the reads to reference virus sequences using different sensitivity parameters are shown in Supplementary Table S7.
The optimal pipeline for our data was found to be the pipeline 'with Vitis' including mapping of reads to reference sequences using the 'custom sensitivity' parameters.
According to the results of the blastn analysis of the de novo assembly of reads into contigs, none of the data processing algorithms allowed us to detect GVT. However, as a result of mapping the unique and redundant reads ('with Vitis') of each of the libraries to the reference genome GVT NC_035203, we were able to find 7787 non-redundant reads in the Lib-03_S20 library, one non-redundant read in the Lib-04_S21 library, and two redundant read in the Lib-05_S22 library (Supplementary Table S6). In a number of earlier studies devoted to the analysis of the grapevine virome, GVT was not detected in sRNA libraries [22,[38][39][40]. However, a reanalysis of the results obtained in 2018 and 2020 [21,22] by bioinformatics and molecular methods confirmed the presence of the virus in Hungarian vineyards [41].
During the course of this study, based on the sRNA HTS results, we were unable to reconstruct the complete genomes of the detected viruses; however, there is evidence that this is possible [16][17][18]. Despite this, the approach based on using sRNA and combining material from several symptomatic samples for sequencing allowed us to perform monitoring and identify viruses present in the pooled sample at the lowest cost so that we could proceed to the analysis of each plant by RT-PCR at the next stage of validation. In the future, if a detailed analysis is required, it will be possible to sequence sRNA with large coverage or total RNA to obtain longer reads and carry out a detailed analysis of the virome of an individual sample.

Validation of the Results of sRNA HTS by RT-PCR and the Phylogenetic Analysis of Detected Viruses
Summary of virus detection by RT-PCR is presented in Figure 3. The most frequently detected viruses in the samples were GFkV, GPGV, GRSPaV, GSyV-1, GYSVd-1, and HSVd. These pathogens were found in more than 50% of the samples.
Moreover, the analyzed samples contained viruses from the Closteroviridae family-GLRaV-1 (13% of samples), GLRaV-2 (11%), GLRaV-3 (21%), and GLRaV-4 (5%); the Betaflexiviridae family-GVT (42%) and GVA (16%); the Secoviridae family-GFLV (11%); and the Tymovidae family-GRVFV (26%). ArMV, GDeV, GRSLaV, and GRGV could not be detected in the samples, which confirms the results from the reinvestigation of the automated pipeline usage during the bioinformatics analysis. Based on the validation results, 37 out of 38 plants were determined to be simultaneously infected with 4 to 11 pathogens (Supplementary Table S8). Thus, our results show that the reliability of HTS differs for the detection of different types of viruses, which is consistent with the data obtained earlier [21,22,42]. Grapevine diseases may be caused and modified by interactions among multiple infectious agents [43]. The symptoms cannot be correlated with a causative agent due to the existence of multiple infections in the vineyard.
As a next step, the nucleotide sequences obtained for each detected virus and viroid were phylogenetically analyzed. The numbers of sequences selected for the construction of a phylogenetic tree and the search for recombination sites of viral genome fragments are presented in Supplementary Table S9, while the results are presented below for each virus family.
GLRaV-1 was detected in five samples from three libraries. The Russian isolates cluster with isolates from China and Italy and belong to the second phylogenetic group (according to Lehad et al. (2019) [48] (Supplementary Figure S1). The coat protein genes were found to be 93.8% identical to each other and 89.6-94.4% identical to the GLRaV-1 reference sequence NC_016509 (Supplementary Table S4).
Eight recombination events were detected in GLRaV-1 coat protein genes (Supplementary Table S10). In the Russian isolate V1609 (MZ031984), potential parents were two isolates found in other vineyards: the major parent CA21 (JF811846.1) and the minor parent V471 (MW810492).
GLRaV-2 was detected in four samples from the Zs85-SV2_S19 library (10.5% of the total number of samples in the study). At the same time, we were unable to confirm the presence of the GLRaV-2RG strain, the contigs of which were mapped to GRSLaV NC_004724.1 using two pairs of primers. Using the restrictions of the GLR2CP1/GLR2CP2 fragment, we established that MZ065369 belongs to group 3, like other Russian isolates that we found earlier in Krasnodar Krai [29]. Phylogenetic analysis of the complete nucleotide sequence of the coat protein gene of the detected isolate showed that it is 87.9% identical to the sequence of the reference genome GLRaV-2 NC_007448. The Russian isolate was shown to cluster with isolates of the H4 CNP group [49], the Russian isolate published by us in 2019 [46], and isolates from Italy (Supplementary Figure S2).
Two recombination events were detected in GLRaV-2 coat protein genes (Supplementary Table S10).
GLRaV-3 was detected in three libraries and eight samples (21% of the total number of samples in the study). The sequences of the coat protein genes of the Russian isolates were found to have a high level of identity to each other (99.7%) and to the GLRaV-3 reference sequence NC_004667 (99-99.4%) (Supplementary Table S4). According to the phylogenetic analysis, the identified isolates clustered with representative isolates of phylogroup I (according to Diaz-Lara et al. (2018) [50]), next to isolates from Portugal, Chile, Greece, and China (Supplementary Figure S3).
Seven recombination events were detected in GLRaV-3 coat protein genes (Supplementary Table S10).
According to modern taxonomy, GLRaV-4 belongs to the Ampelovirus genus [51], while GLRaV-5, GLRaV-6, GLRaV-9, GLRaV-Car, GLRaV-Pr, and GLRaV-Ob are its strains [52]. During this study, GLRaV-4 was detected in only one sample in the Lib-03_S20 library. This virus was detected in Russia for the first time. We amplified a fragment of the hsp70 sequence, and as a result, we found that our isolate had a rather low percentage identity (73.1%) to the GLRaV-4 reference sequence NC_016416 (Supplementary Table S4); however, the percentage identity of this fragment of the viral genome with the closest isolate from the USA was 96.4%. The phylogenetic analysis showed that the identified isolate clustered with isolates from the USA, and this was supported by a bootstrap value of 93% (Supplementary Figure S4).
Secoviridae. GFLV (genus Nepovirus), which is responsible for fanleaf degeneration disease, is one of the most important grapevine viruses worldwide. The presence of this pathogen has been confirmed on all continents where grapes are grown [53]. In the Russian territory, this virus was previously detected in Krasnodar Krai and the Republic of Crimea [29,54].
During the study, we identified only one representative of the genus Nepovirus, GFLV, which was detected in four samples from two libraries (10.5% of the total number of samples in the study). Analysis of the sequence of the RNA2 polyprotein gene fragment obtained as a result of sequencing showed that the Russian isolates were 95.6% identical to each other and 90.3-94.1% identical to the reference gene NC_003523 (Supplementary  Table S4). According to the phylogenetic analysis, the identified isolates clustered into separate clades next to genetic variants from Italy, Spain, and Tunisia (Supplementary Figure S5).
Thirty-seven recombination events were detected in GFLV polyprotein sequences (Supplementary Table S10).
Betaflexiviridae. GVA, a member of the Vitivirus genus, is one of the most common grapevine viruses. According to the literature, GVA is found in almost all regions of the world where grapes are grown [55,56]. In Russia, we previously detected GVA in vineyards in the Republic of Crimea and Krasnodar Krai [29,57].
During this study, we were able to detect GVA in six plants from three libraries. The coat protein genes of the detected isolates were 87.5% identical to each other and 87.2-88.8% identical to the reference genome GVA NC_003604 (Supplementary Table S4). As a result of the phylogenetic analysis of the coat protein genes of the Russian isolates, we identified three GVA molecular groups (according to Alabi et al. (2014) [58]): I, II, and IV. The isolates clustered with genetic variants from Greece, China, and the USA (Supplementary Figure S6). Moreover, one isolate clustered separately from representative isolates of the previously described GVA molecular groups.
Eight recombination events were found in GVA sequences (Supplementary Table S10). [21,59,60]. In Russia, it has been found in commercial vineyards [29,61]. The data obtained in this study are in line with the trend showing that the virus is widespread in vineyards worldwide [62][63][64]. GRSPaV was detected in all libraries by sRNA HTS and in 35 samples using RT-PCR. For the phylogenetic analysis, we amplified and sequenced 22 sequences of the GRSPaV coat protein genes, which were 88.2% identical to each other and 81.9-98.5% identical to the GRSPaV reference sequence NC_001948 (Supplementary Table S4). The Russian isolates had a uniform distribution throughout the phylogenetic tree. Eleven isolates belonged to the GRSPaV molecular group III, four to group VII, two to group I, and three to group II (according to Hily et al. (2018) and Selmi et al. (2020) [65,66]). Two isolates clustered separately from the representative isolates of the previously described GRSPaV molecular groups (Supplementary Figure S7).

GRSPaV (Foveavirus) is a widespread virus that infects grapevines
Twenty-three recombination events were found in GRSPaV coat protein genes (Supplementary Table S10), which may explain why frequent recombination can occur [67] as well as the existence of several remarkably different strains of this virus.
Another representative of the Foveavirus genus that we detected in the analyzed libraries was GVT. Recently, an analysis of the results of total RNA library sequencing allowed us to identify this virus in a commercial vineyard [47]. In this study, as a result of the bioinformatics analysis, GVT reads were found in three libraries. The sequencing data validation approach that we used (first examining the RNA library pools and then, for positive results, examining each sample for the presence of the virus) made it possible to identify GVT in all five sRNA libraries and in 16 samples (42.1% of the total number of samples in the study). Phylogenetic analysis of the coat protein genes of the Russian GVT isolates and sequences from GenBank showed a uniform distribution of the Russian isolates throughout the phylogenetic tree (Supplementary Figure S8). Our isolates clustered together with isolates that were previously assigned to groups IV, V, and I by Demian et al. (2021) [41]. Sequencing of sRNAs for the detection of viruses of the Foveavirus genus was performed previously, in particular, for GRSPaV [16][17][18]. However, in these studies, scientists were not able to detect GVT. Our results support the idea that the combined application of computer analysis methods for sequenced sRNAs and molecular biological methods is necessary for the most complete characterization of the grapevine virome, at least for some viruses, including GVT.
Five recombination events were detected in GVT sequences (Supplementary Table S10).
One recombination event was detected in the GPGV coat protein gene (Supplementary Table S10).
Tymovidae. GFkV (genus Maculavirus) is widespread worldwide [95]. Recently, it was detected in the territories of Bosnia and Herzegovina [96], the USA [97], Canada [98], Macedonia [99], the Republic of Korea [100], and the United Kingdom [101], but it is known to also be present in other countries. This is the first time that this virus has been described in the Russian territory [29]. We detected GFkV in all libraries by the bioinformatics method (Supplementary Table S6) and by RT-PCR (27 samples, 71.1%). The coat protein genes of the detected isolates were 94.5-97.9% identical to the Italian GFkV reference sequence NC_003347 and 96.1% identical to each other (Supplementary Table S4). The Russian isolates had a uniform distribution throughout the phylogenetic tree (Supplementary Figure S10). The analysis found no recombination events in GFkV coat protein genes (Supplementary Table S10).
Since the discovery of GRVFV (tentative genus Marafivirus) in vineyards in Greece (Elbeaino et al. (2001)), this virus has been found in a number of European countries. It has also been detected in China [102], Australia [103], Canada [104], Pakistan [105], Iran [106], Japan [107], and the Republic of Korea [108]. GRVFV was first detected in Russia during the analysis of the virome of a vineyard [47]. In this study, we detected GRVFV using bioinformatics methods in each library and validated its presence in 10 samples (26.3%) (Supplementary Table S10). Comparison of the genome fragments of Russian isolates showed that they were 87.7% identical to each other and 84.2-91.8% identical to the GRVFV reference sequence NC_034205, which indicates the high genetic diversity of this virus (Supplementary Table S4). The Russian isolates had a uniform distribution throughout the phylogenetic tree (Supplementary Figure S11). The analysis found no recombination events in sequences of the GRVFV genome fragments (Supplementary Table S10).
GSyV-1 (Marafivirus) has been described as a virus that infects grapevines in the USA [109], Chile [110], China [111], Spain [112], Italy [16], Republic of Korea [113], Hungary [114], Africa [115], Turkey [116], Slovakia [117], and other countries. This virus was found for the first time here in the Russian territory. GSyV-1 was detected using bioinformatics methods in all libraries, and this was confirmed by RT-PCR (33 samples infected, or 86.8%) (Supplementary Table S4 and Supplementary Table S9). For the phylogenetic analysis, the coat protein genes of 11 isolates were amplified. The obtained sequences were 93.9% identical to each other and 91.4-98.3% identical to the GSyV-1 reference sequence NC_012484 (Supplementary Table S4). The results of the phylogenetic sequence analysis of the GSyV-1 coat protein gene are provided in Supplementary Figure S12. The GSyV-1 isolates obtained for the first time in Russia clustered with isolates from Brazil, Hungary, and USA; one isolate (MZ436820) clustered separately (Supplementary Figure S12). The analysis found no recombination events in GSyV-1 coat protein genes (Supplementary  Table S10).

Idaeovirus.
RBDV is an economically important virus that infects raspberries and grapevines [118]. It has been identified on raspberries in Slovenia, Hungary, Turkey, Japan, China, Argentina, Sweden, Finland, and Belarus and on grapevines in Slovenia [119], Hungary [120], and Serbia [121]. This virus was detected here for the first time in Russia; it was not previously detected on either raspberries or grapevines. Using the bioinformatics analysis, RBDV was found in one library, and using RT-PCR, its presence was shown in one sample. The phylogenetic analysis using a 930 bp fragment of the RNA1 gene of RBDV (MZ457018) showed that the Russian isolate clustered next to isolates from Serbia, Hungary, and Ecuador (Supplementary Figure S13), while the phylogenetic analysis using the sequence of the RNA2 coat protein gene of RBDV (MZ457017) showed that the Russian isolate clustered next to isolates from Slovenia and Hungary (Supplementary Figure S13). It is worth noting that these isolates were sampled from both Rubus sp. and Vitis sp. In addition, a fragment of the RBDV RNA1 gene clustered next to isolates that were found not only on grapevines, but also on Rubus sp., Fraxinus excelsior, and Chenopodium quinoa plants, which indicates that there is a wide range of host plants for this pathogen. The nucleotide sequence of the RBDV coat protein gene (MZ457017) obtained by us was found to be 95.4% identical to the RBDV RNA2 reference sequence NC_003740. Moreover, a fragment of the RBDV RNA1 gene obtained as a result of data validation (MZ457018) was 94.4% identical to the RBDV RNA1 reference sequence NC_003739 (Supplementary Table S4). The analysis found no recombination events in RBDV sequences.

Viroids.
Grapevine viroids have been known for a long time and are widespread worldwide [122]. Currently, six viroids that infect grapevine are known [123]. Previously, we found HSVd and GYSVd-1 in Russia [47]. In this study, we detected the presence of four viroids: HSVd, AGVd, GYSVd-1 and GYSVd-2. It should be noted that AGVd and GYSVd-2 were identified in Russia for the first time.
Among all known viroids, HSVd has the widest range of host plants [124][125][126]. To date, the GenBank contains HSVd sequences found on hop, wild apple tree, apricot, sweet cherry, sour cherry, hibiscus, cucumber, grapevine, citrus fruits, plum, and peach [127]. As a result of the phylogenetic analysis of HSVd sequences, the Russian isolates were classified as belonging to the Plum-Hop/Cit3 group and the Hop group (Supplementary Figure S14). The sequences were found to be 98.8% identical to each other and 91.4-95.8% identical to the HSVd reference sequence NC_001351 (Supplementary Table S4). The analysis found no recombination events in HSVd sequences.
As a result of the bioinformatics analysis, GYSVd-1 was detected in all sRNA libraries, and this was confirmed by RT-PCR. It should be noted that in order to validate GYSVd-1, the number of amplification cycles for all libraries had to be increased from 40 to 45. PCR products obtained under the new conditions were also sequenced and their levels of similarity to GYSVd-1 were confirmed. The nucleotide sequences of these isolates were 97.8% identical to each other and 87.7-99.4% identical to the GYSVd-1 reference sequences NC_001920 (Supplementary Table S4). The Russian isolates of GYSVd-1 clustered on the phylogenetic tree next to isolates from Australia, Turkey, Spain, Chile, and Hungary (Supplementary Figure S15). No recombination events were detected in GYSVd-1 sequences.
GYSVd-2 was found in four out of five libraries (Zs85-SV2_S19, Lib-03_S20, Lib-04_S21, Lib-05_S22). In addition, according to the results of the de novo assembly by the SPAdes assembler, one contig (42 nucleotides) of GYSVd-2 was found in the Lib-06_S23 library, but the sequence was not amplified by RT-PCR. Moreover, the results of the blastn analysis showed that contig was 92.7% identical to the GYSVd-1 genome, which, together with the validation of GYSVd-1 by RT-PCR in the library Lib-06_S23, indicates that only GYSVd-1 is present in this library. The nucleotide sequences of the identified isolates were 97.9% identical to each other and 96.2-99.5% identical to the GYSVd-2 reference sequence NC_003612 (Supplementary Table S4). The Russian isolates of GYSVd-2 clustered on the phylogenetic tree next to isolates from China, Iran, India, the Republic of Korea, and Nigeria (Supplementary Figure S16). No recombination events were detected in sequences of GYSVd-2.
Thus, in this study, we used the HTS method to analyze sRNA libraries for the detection of plant viruses for the first time in Russia. The reliability and power of this approach were confirmed, making it possible to identify a wide species composition for the grapevine virome. Moreover, we detected five grapevine viruses and viroids for the first time in the Russian territory: GLRaV-4, GSyV-1, RBDV, AGVd, and GYSVd-2. Another 10 viruses and 2 viroids previously detected in Russia, some of which are economically important ones (for example, GLRaV-1, GLRaV-2, GLRaV-3, GVA, GFLV), were also detected by us in commercial vineyards.
The approach of using sRNA and combining material from several symptomatic samples for sequencing allowed us to carry out monitoring and determine the virome of the combined sample at the lowest cost, and at the next stage of validation, we were able to analyze the virome of an individual plant by RT-PCR. The detection of at least a few contigs or reads of a virus in a plant allows additional research to be performed in the future and the complete genome of the detected virus to be assembled using higher coverage of sequencing sRNA or, more preferably, the total RNA from an individual sample.
Our data show that exact analysis of the virome of each sample requires a combination of the sRNA HTS method and molecular biology-based methods. In particular, Sanger sequencing of the coat protein genes of the detected viruses made it possible to carry out a phylogenetic analysis and identify the molecular groups of the studied viruses. The phylogenetic analysis of the sequences of the detected viruses identified the presence of GVA in groups I, II and IV; GRSPaV in groups I, II, III, VII, and VIII; GVT in groups I, IV, and V; GLRaV-1 in group II; GLRaV-2 in group H4 CNP; GLRaV-3 in group I; and HSVd in groups Hop and Plum-Hop/Cit3. The clustering of the Russian isolates (HSVd, GLRaV-1, GRSPaV, and GVA) separately from the representative members of the groups from the GenBank database indicates their genetic diversity and suggests the presence of new molecular groups. This suggests that further analysis of the complete genomes of these viruses is needed. In addition, our recombination analysis of genome fragments of detected viruses allowed us to expand knowledge about their genetic variability and to discover new recombination sites for further confirmation using molecular biological methods.
Our research shows the need for the use of different approaches in the analysis of bioinformatics data. In particular, the use of two assemblers and the analysis of data without removing the reads of the host plant made it possible to thoroughly characterize the species composition of the virome. Moreover, using GVT as an example, we demonstrated that, at least for some viruses, mapping of reads from sRNA libraries to the reference genomes of grapevine viruses and viroids leads to the most reliable detection.
Both well-known varieties, such as Merlot, Saperavi, and Cabernet Sauvignon, and valuable local wine varieties, Krasnostop Zolotovsky, for example, were affected by a number of viruses, including economically important ones. The largest number of viruses was found in the samples of table varieties that can be explained by the source of the planting material. Wine varieties are much more common in Russia, but the importance of table varieties is also great; therefore, it is important to use high-quality planting materials and methods of viral infection elimination, which should be introduced in the grapevine selection.
In this study, among the detected viruses were both the economically important GLRaV-1, GLRaV-3, and GFLV, and viruses that were recently discovered (GVT, GSyV-1, and so on); their harmfulness is still being studied. Economically important viruses cause systemic infections in vineyards, reducing the yield and quality of berries, and they are used in the certification of planting material in many countries. Of particular importance are viruses that do not show symptoms and do not strongly affect the yield but are important for grafting, for example, and can interact with other viruses and strengthen their symptoms. Thus, it is important to obtain the most complete information about the viruses that cause the symptoms using metagenomics methods, as well as the viruses of which the infection is asymptomatic. In addition, the information about the presence of a virus complex in the virome can help in phenomics and in the study of the correlation between the manifestation of symptoms and the presence of viruses in a plant.

Conclusions
The application of the HTS method to the analysis of sRNA libraries in order to study the biodiversity of viral pathogens in Russian vineyards was shown to be expedient and useful. The complexity of virus populations detected in Krasnodar Krai highlights the need for detection methods that are able to identify all viruses and their variants in vineyards. A lack of systemic phytosanitary monitoring and agrotechnical, pest, carrier, and vector control are possible reasons for the presence of multiple infections. As the isolates are usually not clustered together on the phylogenetic trees, geographically, it is highly possible that they entered the country independently through infected planting material. The study of viral associations, with multiple infections, can help in the study of pathogenic effects on host plants as well as in the certification of plant material when laying vineyards. A comprehensive, coordinated approach is imperative to reduce the viral load.