A Diverse Virome of Leafroll-Infected Grapevine Unveiled by dsRNA Sequencing

Quebec is the third-largest wine grape producing province in Canada, and the industry is constantly expanding. Traditionally, 90% of the grapevine cultivars grown in Quebec were winter hardy and largely dominated by interspecific hybrid Vitis sp. cultivars. Over the years, the winter protection techniques adopted by growers and climate changes have offered an opportunity to establish V. vinifera L. cultivars (e.g., Pinot noir). We characterized the virome of leafroll-infected interspecific hybrid cultivar and compared it to the virome of V. vinifera cultivar to support and facilitate the transition of the industry. A dsRNA sequencing method was used to sequence symptomatic and asymptomatic grapevine leaves of different cultivars. The results suggested a complex virome in terms of composition, abundance, richness, and phylogenetic diversity. Three viruses, grapevine Rupestris stem pitting-associated virus, grapevine leafroll-associated virus (GLRaV) 3 and 2 and hop stunt viroid (HSVd) largely dominated the virome. However, their presence and abundance varied among grapevine cultivars. The symptomless grapevine cultivar Vidal was frequently infected by multiple virus and viroid species and different strains of the same virus, including GLRaV-3 and 2. Our data show that viruses and viroids associated with the highest number of grapevines expressing symptoms included HSVd, GLRaV-3 and GLRaV-2, in gradient order. However, co-occurrence analysis revealed that the presence of GLRaV species was randomly associated with the development of virus-like symptoms. These findings and their implications for grapevine leafroll disease management are discussed.


Introduction
The Canadian grape industry is constantly expanding, with a production area of more than 12,000 ha in 2019 for a total revenue of CAD 9 billion dollars [1]. Quebec is the third-largest wine grape producer in Canada in terms of acreage, tonnage and wine grape sales. Quebec's industry is also increasing the area dedicated to grapevine production. Traditionally 90% of the grapevine cultivars cultivated in Quebec were rustic (winter hardy), with nearly 50% of the grapevine production based on an interspecific hybrid Vitis sp. cultivar named Vidal, which was developed in the 1930s, because these hybrid cultivars were able to survive harsh winter conditions and produced mature berries in the short and warm growing season. Over the years, knowledge about the most promising cultivars and Most of our actions in agriculture to cure diseases are in fact done in a reactive fashion, i.e., actions are taken to control the disease once it is established on the crop. In the case of the relationship between grapevines and viruses, it is essential to switch from pathosystems to a more holistic view of the system where grapevine and its virome constitute a micro-ecosystem. From this standpoint, we would benefit from increasing our knowledge on disease epidemiology, anticipating potential threats and designing knowledge-based mitigation strategies before virus epidemics become endemic and difficult to manage [22,23]. Therefore, the objectives of the present work were to characterize the virome of the interspecific hybrid cultivar and compare it to the virome of V. vinifera cultivar. Specifically, the aims were to (i) characterize the diversity of the virome of leafroll-infected leaves of different grapevine cultivars under north-eastern weather conditions, (ii) investigate the association between viruses and symptom development and (iii) determine the genetic diversity of viruses and viroids that were detected for the first time in QC.

Plant Material
Leaf samples of asymptomatic and symptomatic leafroll-infected grapevine were collected from three different vineyards in Quebec, Canada (Figure 1). The presence of GLR was confirmed by RT-PCR using primers described by Xiao et al. [26] and Poojari et al. [27]. The first vineyard was at the Agriculture and Agri-Food Canada's experimental farm in Frelighsburg (Fr, latitude 45 • 03 12 N; longitude 72 • 51 42 W), the second at a commercial vineyard in Hemmingford (Hem, latitude 45 • 02 50.99 N; longitude 73 • 35 4.04 W), and the third at a commercial vineyard in Saint-Jacques-le-Mineur (SJM, latitude 45 • 16 39 N; longitude 73 • 25 4.82 W). The frequencies of sampled grapevine cultivars were 50%, 36%, 7%, 4% and 3% of Vidal (hybrid), Pinot noir (Vitis vinifera), Marechal Foch (hybrid), DM85 (hybrid), Seyval blanc (hybrid), respectively. A total of 140 composite samples from 28 grapevine plants (five entire leaves with petiole per plant) were collected per plant between July and September 2018 and 2019 and placed in a plastic bag or a sterile 50-mL centrifuge tube and brought back to the laboratory for cold storage at −20 • C. Leaves were washed with distilled water and roughly crushed before being homogenized in a liquid-nitrogen-cooled 50-mL stainless-steel (SS) grinding jar with one 20-25-mm SS ball with a Retsch tissue lyser MM 400 (Retsch, Haan, Germany). The powdered leaves (2.5-3 g), were then transferred in sterile 50-mL centrifuge tubes and kept at −80 • C until proceeding with double-stranded RNA (dsRNA) extraction.

dsRNA Extraction, Libraries Construction and Sequencing
Extraction of dsRNA from grapevine leaves was performed based on a modified version of the technique from Kesanakurti et al. [28]. Differences brought to the protocol were: black bean (Phaseolus vulgaris, Black Turtle Baking Beans (Vesey, York, PEI, Canada) was added as positive control to the homogenized sample-extraction buffer mix at a final concentration of 1% (w/w), and DNase I and RNase T1 digestions were made after the final dsRNA elution. The cultivar of P. vulgaris contains distinct endornavirus species, including Phaseolus vulgaris endorvirus1 (PvEV1), which was used to assess the efficiency of dsRNA extraction and for monitoring of correctness of high throughput sequencing results. Then digestion was stopped with EDTA 50 mM for 10 min at 65 • C. The dsRNA was denatured at 99 • C for 5 min in the presence of 1 µL of 60 µM random primers and 1 µL of 10 mM dNTP and 4 µL First Strand buffer, and then reverse transcribed with 400 U of Superscript III (Invitrogen) in a final volume of 20 µL for 50 min at 65 • C. The second-strand DNA was synthetized using Klenow polymerase DNA (Promega, Canada) treated with RNase H to get rid of any remaining RNA hybrids, and then cleaned using Agencourt AMPure XP magnetic beads (Beckman-Coulter). Libraries (28, 300 bp mean insert size) were constructed using the Nextera XT DNA library prep kit (Illumina) with 1 ng of input double-stranded cDNA. MiSeq 2 × 250 cycle paired-end sequencing was conducted using MiSeq Reagent Nano Kits v2 in the Illumina Miseq sequencer. The black bean added to all grapevine samples was sequenced alone to differentiate its virome from the grapevine virome associated with each sample.

Diversity of Virome in Leafroll-Infected Leaves of Different Grapevine Cultivars
The grapevine plants infected with leafroll expressed different intensity of symptoms. The cultivar Vidal, an interspecific hybrid, expressed little to no visible leafroll symptoms in comparison with other cultivars of Vitis vinifera (e.g., Pinot gris) (Figure 1). Over 2,228,600 reads were obtained with a percentage of mapped viral reads ranging between 3.12% and 19.50% of the total number of reads. The viral communities were relatively well-sampled across the three sampling sites. The plateau in terms of number of virus species was detected after a cumulus of 17 samples, and 60% of the viral species were detected after a cumulus of five samples, suggesting that many of the viral species were present in the majority of sampled plants (Figure 2A). The three most abundant virus species in the leafroll-infected samples were grapevine Ruspestris stem pitting virus (GRSPaV), grapevine leafroll-associated virus 3 (GLRaV-3) and grapevine leafrollassociated virus 2 (GLRaV-2). Hop stunt viroid (HSVd) was the fourth most abundant species across the grapevine samples ( Figure 2B). Using the dsRNA sequencing method, it was possible to detect RNA and DNA viruses and viroids. Except for GLRaV-3, all the viruses and viroids reported in Figure 3 were detected for the first time in Quebec (Table 1). Even though three viruses (GRSPaV, GLR and GLRaV-2) and one viroid (HSVd) largely dominated the virome, their presence and abundance can be sorted by grapevine cultivars (Figure 3). In the hybrid cultivar Vidal, GRSPaV dominated the virome with 61% of the viral population, whereas GLRaV-3 and GLRaV-2 represented 13% and 18% of the viral population, respectively, and the other viruses occupied between 0.001% and 0.1% of the virome. Grapevine virus H was only present in the hybrid cultivar Vidal. In the cultivar Pinot noir (Vitis vinifera), GLRaV-2 dominated the virome with 43% of the viral population, whereas GRSPaV, GLRaV-3 and HSVd represented 7%, 38% and 8% of the viral population, respectively, and the other viruses occupied between 0.01% and 2.7% of the virome. In the cultivar Marechal Foch (hybrid cultivar), GLRaV-3 dominated the virome with 43% of the viral population, whereas GRSPaV, GVE and HSVd represented 37%, 13% and 5.6% of the viral population, respectively, and the other viruses occupied between 0.01% and 0.7% of the virome. The HSVd was only detected in the Pinot noir and Marechal Foch cultivars, whereas grapevine red blotch-associated

Raw Data Treatment
The raw data fastq files were demultiplexed and checked for sequencing quality using fastQc (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Removal of adapter sequencing and reads quality trimming (minimum quality score of 20) were performed using Trimmomatic V.0.32 [29]. Paired fastq files were imported in two different pipelines developed to detect and discover viruses. The first pipeline used was Virtool, an in-house implemented pipeline developed by Rott et al. [30], and the second pipeline was VirFind, an online tool developed by Ho and Tzanetakis [31]. From Virtool, the depth, coverage and weight were deducted. The depth represents the number of times a genome is covered by mapped reads, the coverage measures the percentage of the mapped reads that cover the viral genome, and the weight represents the proportion of reads mapping to a virus that is proportional to the titer. Consensus virus detected by both pipelines was considered as a positive detection. In general, a coverage greater than 0.5 and a weight greater than 0.001 were considered as positive detection of a given virus. In addition, only samples with positive detection of the positive control virus (PvEV1) were conserved for further analysis. The mean proportion of viral read that mapped for a given virus (MPVR), total number of symptomatic leaves (TNSL) associated with a given virus, mean depth (MD), mean depth relative to the depth of the positive control virus (MDRC), mean relative abundance (MRA) and mean weight (MW) were calculated for each virus. The depth was used as a measure of absolute abundance and, since many multivariate methods are sensitive to the total abundance [32], the relative abundance of each detected virus in each sample was calculated using a function of the vegan package. Vegan functions (specpool, estimateR) were used to estimate virus species richness based on incidence in sample plants. The richness Z score and the richness of each detected virus relative to the richness of the positive control virus (PvEV1) were calculated.

Analysis of the Diversity of the Virome and Association between Viruses and Symptom Development
Because grapevine is subject to infection by more than 80 different viruses and viroids, obtaining the ideal number of samples to capture the diversity is challenging. Therefore, the virus species accumulation and abundance as function of virus species rank curves were made using the vegan and ade4 packages to assess the robustness of our sampling effort.
After standardization of the data using the generic function "scale" (R built-in command) to make variables comparable, the distance matrix was calculated, and functions of the ComplexHeatmap package with clustering were used to visualize the heatmap made from Pearson correlation analysis showing the association between variables (MPVR, MW, TNSL, MDRC, MD, MRA and genome size [GS]) and virus species. The pam function of the cluster package was used for partitioning of the data into k clusters around Medoids (k-Medoids, (k = 3)) because it is less sensitive to outliers compared to k-means [33]. In addition, the matrix of data displaying the normalized relative abundance with virus species in columns and the grapevine samples in rows was used to generate a heatmap and a hierarchical clustering to classify virus species and grapevine samples into different groups. The Bray-Curtis dissimilarity matrix and average linkage distance were generated using the vegan package.
To investigate the degree to which viruses species, leafroll symptoms and grapevine cultivars are negatively or positively associated with each other, a recently-published co-occurrence model that is metric-free, distribution-free and randomization-free was used [34,35]. The model determined the probability that observed frequency of co-occurrence between two events (e.g., virus or symptoms presence in a given sample) is significantly greater (large, Pgt) than expected (meaning a positive association), significantly less (small, Plt) than expected (meaning negative association), or not significantly different and approximately equal to expectation (meaning random association) [34,35]. Therefore, Pgt < 0.05 is considered to highlight a positive co-occurrence between the two events, and Plt < 0.05 is interpreted as a negative co-occurrence between the two events. The R package co-occur was used to determine the co-occurrence, and the probability (P) that two species (viruses or grapevine species) co-occur at j number of sampling units (a grapevine plant) is calculated as follows: where N 1 is the number of sampling units where the first event occurs, N 2 is number of sampling units where the second event occurs, and N is the total number of sampling units. To gain more robustness, additional data were gathered from five published manuscripts using the same NGS methodologies to characterize the virome of grapevine plants [9,[36][37][38][39]. In summary, a total of 66 sampling units and 1452 points of comparison were obtained and used in this analysis. Additionally, any event pairs that were expected to share less than one sampling unit were removed from the analysis.
To determine the genetic diversity of important viruses and viroids that were detected for the first time in QC, phylogenetic trees were generated from nucleotide alignments of partial (concatenated, details in Tables S2 and S3, respectively) or complete sequences of the genome from samples that were positive for a given virus. Maximum likelihood algorithms were carried out using bootstrap (1000 pseudo-random iterations) to assess the confidence of the branching pattern, and the phylogram package in R was used. Full genomes from different clades and geographical regions were downloaded from GenBank and added to the tree. Selection of reference genomes was performed by running a BLAST in GenBank using an isolate as query and from the result page; the genome with the highest alignment scores was chosen. One genome of the same cluster with selected isolate and at least two other genomes from different clusters were selected. The Newick format file was exported to the interactive tree of life platform for better visualization and creation of high-quality figures [40].
To display an overview of the diversity among the sequences, pairwise nucleotides comparison analysis was used to generate percentage of identity and distance for each pair of sequences using CLC Main Workbench software (V 20.0.3). The percentage of identity was calculated by dividing the number of identical residues in the alignment positions with the total number of overlapping alignment positions between the two sequences. The distance was measured using the Jukes-Cantor distance between the two sequences and is the proportion between identical and overlapping alignment positions between two sequences.

Diversity of Virome in Leafroll-Infected Leaves of Different Grapevine Cultivars
The grapevine plants infected with leafroll expressed different intensity of symptoms. The cultivar Vidal, an interspecific hybrid, expressed little to no visible leafroll symptoms in comparison with other cultivars of Vitis vinifera (e.g., Pinot gris) ( Figure 1). Over 2,228,600 reads were obtained with a percentage of mapped viral reads ranging between 3.12% and 19.50% of the total number of reads. The viral communities were relatively well-sampled across the three sampling sites. The plateau in terms of number of virus species was detected after a cumulus of 17 samples, and 60% of the viral species were detected after a cumulus of five samples, suggesting that many of the viral species were present in the majority of sampled plants ( Figure 2A). The three most abundant virus species in the leafroll-infected samples were grapevine Ruspestris stem pitting virus (GRSPaV), grapevine leafroll-associated virus 3 (GLRaV-3) and grapevine leafroll-associated virus 2 (GLRaV-2). Hop stunt viroid (HSVd) was the fourth most abundant species across the grapevine samples ( Figure 2B). Using the dsRNA sequencing method, it was possible to detect RNA and DNA viruses and viroids. Except for GLRaV-3, all the viruses and viroids reported in Figure 3 were detected for the first time in Quebec (Table 1). Even though three viruses (GRSPaV, GLR and GLRaV-2) and one viroid (HSVd) largely dominated the virome, their presence and abundance can be sorted by grapevine cultivars (Figure 3). In the hybrid cultivar Vidal, GRSPaV dominated the virome with 61% of the viral population, whereas GLRaV-3 and GLRaV-2 represented 13% and 18% of the viral population, respectively, and the other viruses occupied between 0.001% and 0.1% of the virome. Grapevine virus H was only present in the hybrid cultivar Vidal. In the cultivar Pinot noir (Vitis vinifera), GLRaV-2 dominated the virome with 43% of the viral population, whereas GRSPaV, GLRaV-3 and HSVd represented 7%, 38% and 8% of the viral population, respectively, and the other viruses occupied between 0.01% and 2.7% of the virome. In the cultivar Marechal Foch (hybrid cultivar), GLRaV-3 dominated the virome with 43% of the viral population, whereas GRSPaV, GVE and HSVd represented 37%, 13% and 5.6% of the viral population, respectively, and the other viruses occupied between 0.01% and 0.7% of the virome. The HSVd was only detected in the Pinot noir and Marechal Foch cultivars, whereas grapevine red blotch-associated virus (GRBV) was only detected in the Pinot noir cultivar ( Figure 3A-D). When considering the species abundance relative to the abundance of the positive control virus (PvEV1) that was added to the samples, only GLRaV-2, GLRaV-3 and GRSPaV were more abundant than PvEV1. In Marechal Foch grapevine cultivar, none of the detected viruses seem to be more abundant than PvEV1 ( Figure 3D). Foch grapevine cultivar, none of the detected viruses seem to be more abundant than PvEV1 ( Figure  3D).     The optimal number of clusters based on independent variables (MPVR, MW, TNSL, MDRC, MD, MRA and GS) used as inputs was 2. One cluster was composed by GLRaV-3, GLRaV-2 and GRSPaV, and the other consisted of the remaining viruses. However, the latter cluster can be divided into two subclusters, one subcluster composed of grapevine asteroid mosaic virus (GAMaV), grapevine Fleck virus (GFkV), HSVd and grapevine pinot gris virus (GPGV) and a second composed of the rest of the viruses (Figure 4). When we collapsed the variables to have two dimensions using discriminant principal component analysis, the viral population was divided into four groups (GRSPaV, GLRaV-3, GLRaV-2 and the rest of the viruses) ( Figure S1). Viruses and viroids that were associated with the highest number of plants expressing symptoms (variable TNSL) were GRSPaV, HSVd, GLRaV-3, GPG, GRBV and GLRaV-2 in gradient order ( Figure 4). However, this association may be random and not directly linked to the symptoms because many NGS studies described GRSPaV as one of the viruses of the background virome. Nevertheless, the analysis of the association between virus and leafroll symptom development will bring more insights. The optimal number of clusters based on independent variables (MPVR, MW, TNSL, MDRC, MD, MRA and GS) used as inputs was 2. One cluster was composed by GLRaV-3, GLRaV-2 and GRSPaV, and the other consisted of the remaining viruses. However, the latter cluster can be divided into two subclusters, one subcluster composed of grapevine asteroid mosaic virus (GAMaV), grapevine Fleck virus (GFkV), HSVd and grapevine pinot gris virus (GPGV) and a second composed of the rest of the viruses (Figure 4). When we collapsed the variables to have two dimensions using discriminant principal component analysis, the viral population was divided into four groups (GRSPaV, GLRaV-3, GLRaV-2 and the rest of the viruses) ( Figure S1). Viruses and viroids that were associated with the highest number of plants expressing symptoms (variable TNSL) were GRSPaV, HSVd, GLRaV-3, GPG, GRBV and GLRaV-2 in gradient order (Figure 4). However, this association may be random and not directly linked to the symptoms because many NGS studies described GRSPaV as one of the viruses of the background virome. Nevertheless, the analysis of the association between virus and leafroll symptom development will bring more insights. . Heatmap displaying the characteristics and association between detected viruses and the mean proportion of viral reads that mapped for a given virus (MPVR), total number of symptomatic leaves (TNSL) associated with a given virus, mean depth (MD), mean depth relative to the depth of the positive control virus (MDRC), mean relative abundance (MRA), mean weight and genome size (GS). The pam function of the cluster package was used for partitioning the data into k cluster around Medoids (k-Medoids, (k = 3)) because it is less sensitive to outliers compared to k-means [33]. R packages ComplexHeatmap, circlize, colorspace, dendextend, cluster and GetoptLong were used (see materials and methods section for more details). For virus names and abbreviations, please see Table  1.
The viral signature (virome composition) was clustered by sampling site (Figure 5). Sample names starting with "Bac" were from the first commercial vineyard (Hem), those started with "DSJ" were from the second commercial vineyard (SJM), and the rest of the samples were from the experimental farm (Fr, see Plant Material section). Samples from the same site were in the same cluster, except for three samples from Hem vineyard, and the grapevine cultivar type seems to have no impact on it ( Figure 5). The virome in the Hem and SJM commercial vineyards was dominated by GLRaV-3 and GLRaV-2, respectively, whereas it was dominated by GRSPaV at the experimental vineyard, except for four samples ( Figure 5). The GRBV was only detected in the SJM commercial vineyard, and the new grapevine-associated tymo-like virus (GaTLV) was only detected in the Fr vineyard, whereas the grapevine asteroid mosaic-associated virus (GAMaV) was only detected in the Hem commercial vineyard ( Figure 5, Table 1).

Figure 4.
Heatmap displaying the characteristics and association between detected viruses and the mean proportion of viral reads that mapped for a given virus (MPVR), total number of symptomatic leaves (TNSL) associated with a given virus, mean depth (MD), mean depth relative to the depth of the positive control virus (MDRC), mean relative abundance (MRA), mean weight and genome size (GS). The pam function of the cluster package was used for partitioning the data into k cluster around Medoids (k-Medoids, (k = 3)) because it is less sensitive to outliers compared to k-means [33]. R packages ComplexHeatmap, circlize, colorspace, dendextend, cluster and GetoptLong were used (see materials and methods section for more details). For virus names and abbreviations, please see Table 1.
The viral signature (virome composition) was clustered by sampling site (Figure 5). Sample names starting with "Bac" were from the first commercial vineyard (Hem), those started with "DSJ" were from the second commercial vineyard (SJM), and the rest of the samples were from the experimental farm (Fr, see Plant Material section). Samples from the same site were in the same cluster, except for three samples from Hem vineyard, and the grapevine cultivar type seems to have no impact on it ( Figure 5). The virome in the Hem and SJM commercial vineyards was dominated by GLRaV-3 and GLRaV-2, respectively, whereas it was dominated by GRSPaV at the experimental vineyard, except for four samples ( Figure 5). The GRBV was only detected in the SJM commercial vineyard, and the new grapevine-associated tymo-like virus (GaTLV) was only detected in the Fr vineyard, whereas the grapevine asteroid mosaic-associated virus (GAMaV) was only detected in the Hem commercial vineyard ( Figure 5, Table 1).

Association between Viruses and Virus-Like Symptom Development
The co-occurrence analysis removed 75 pairs (32.7%) that had an expected co-occurrence <1, and 156 pairs of combination were analyzed. Sixteen positive associations, 17 negative associations and 123 random associations were found (Table 2). Only those significant of the most important associations were displayed, and the events that did not have sufficient occurrence data were removed ( Figure 6). Grapevine leafroll symptom expression had a significant positive association with the cultivar Vitis vinifera (Pgt < 0.00000) and a significant negative association with the cultivar Vidal (Hybrid) (Plt < 0.00000) (Table 2, Figure 6, Table S1). Presence of GLRaV species (GLRaV-2 and GLRaV-3) was randomly associated with symptom development. When the analyses of co-occurrence were done without the cultivar Vidal, the association between GLD species and symptom expression was still random, and grapevine leafroll-associated virus 3 (GLRaV-3) was significantly and negatively associated with hop stunt viroid (Pgt < 0.00050) and grapevine red blotch-associated virus (GRBV). However, the number of plants that were infected with GRBV was low (9), and therefore the negative association between GRBV and GLRaV-3 was considered as weak (Table 2, Figure 6). a Represents virus presence, symptom presence in a given sample or a given variety; b represents number of sampling sites where a given event occurs; c represents observed number of sampling sites having two given events; d represents probability that the two given events occur at a sampling site; e represents expected number of sampling sites with the two given events; f represents probability that the two events would co-occur at a frequency less than the observed number of co-occurrence sampling sites if the two events were distributed randomly; g represents probability of co-occurrence at a frequency greater than the observed frequency. Viruses 2020, 12, x FOR PEER REVIEW 12 of 19 Figure 6. The percent of total pairings for virus species, leafroll-like symptoms and grapevine cultivars that are positive, negative and random (boxplot top graph). The bar outline in white shows the assemblage-wide percentages. A heatmap of the positive, negative and random associations (bottom-left graph) displays the probabilistic co-occurrence that observed frequency of co-occurrence between two events (e.g., virus, viroids or symptoms presence in a given sample) is significantly greater (large) than expected (meaning a positive association), significantly less (small) than expected (meaning negative association), or not significantly different and approximately equal to expectation (meaning random association)). Events are positioned to indicate the columns and the rows that represent their pairwise relationships. Graphic displaying the scatter plot of observed versus expected co-occurrence (bottom-right graph). Each event pair in the analysis is represented by a circle colored based on whether it was classified as positive (orange), negative (dark green) or random (gray). Any event pairs that were expected to share less than one sampling unit were removed from the analysis. R package co-occur was used (see materials and methods section for more details). See Table 1 for virus names and abbreviations. The bar outline in white shows the assemblage-wide percentages. A heatmap of the positive, negative and random associations (bottom-left graph) displays the probabilistic co-occurrence that observed frequency of co-occurrence between two events (e.g., virus, viroids or symptoms presence in a given sample) is significantly greater (large) than expected (meaning a positive association), significantly less (small) than expected (meaning negative association), or not significantly different and approximately equal to expectation (meaning random association)). Events are positioned to indicate the columns and the rows that represent their pairwise relationships. Graphic displaying the scatter plot of observed versus expected co-occurrence (bottom-right graph). Each event pair in the analysis is represented by a circle colored based on whether it was classified as positive (orange), negative (dark green) or random (gray). Any event pairs that were expected to share less than one sampling unit were removed from the analysis. R package co-occur was used (see materials and methods section for more details). See Table 1 for virus names and abbreviations.

The Genetic Diversity of Viruses and Viroids Detected for the First Time in Quebec
Among all viruses that were detected, it was possible to recover consensus sequences for GLRaV-2, HSVd and GRSPaV phylogenetic analyses, and these sequences were deposited in GenBank (MT899925-MT899930, MT769768-MT769774, MT832848-MT832891 and MT855968-MT855978). For GLRaV-2, partial sequences (5788 bp) from six samples (Table S2), all from the cultivar Pinot noir, were recovered and used for the phylogenetic analysis. Four corresponding sequences from full genomes of GLRaV-2 from China, Brazil and Canada (BC) available in GenBank were aligned and used to construct a phylogenetic tree. All our isolates clustered in the same group with isolates originating from China and BC (Canada). Within this cluster, two subgroups were supported by a significant bootstrap value >70%. Comparative analysis revealed nucleotide (NT) identities in the range of 98.78% to 99.98% between our isolates and those from BC (Canada, MH814500) and China (KU508672). Our isolates were dissimilar to the isolates from Brazil (KX774192) and BC (Canada, MH814498) in terms of NT identity and Jukes-Cantor distance (70.78% to 78.30% NT identity, Figure 7A,B). For HSVd, seven whole genomes (297-299 bp) from five samples were used for the phylogenetic analysis. Three full genomes of HSVd from USA, China and India retrieved from GenBank were added for the alignment and the construction of the phylogenetic tree ( Figure 7C). The phylogenetic relationship within HSVd isolates could not be determined due to the low bootstrap support for the clustering ( Figure 7C). Comparative analysis revealed that HSVd sequences were highly similar to the isolate from China (HM357802) in terms of NT identity and Jukes-Cantor distance (97.66% to 100% NT identity, Figure 7D) and slightly dissimilar to the HSVd isolate from citrus in USA from 1988 (NC001351). Overall, our HSVd full genomes were highly similar to isolates from China, India and USA (92.01% to 100% NT identity) ( Figure 7C,D). For GRSPaV, a total of 14 sequences, 11 from our isolates (5244 bp) (Table S3) and three corresponding sequences from GRSPaV full genome (one from USA and two from France) obtained from GenBank, were aligned and used to construct a phylogenetic tree (Figure 8). Two main well-supported GRSPaV phylogenetic groups were detected in the analysis of our isolates. Most of our isolates grouped into distant large subclusters. The first cluster was closely related (94.96% to 96.13% NT identity) to the GenBank sequences of GRSPaV from France (MG938295) and USA (AY368590) (Figure 8). The second cluster was phylogenetically diverse (79.85% to 92.95% NT identity) with some isolates (BacPN4, DM85, BacMF6 and Co15_56J) closely related (89.84% to 92.29% NT identity) to the GenBank sequence of GRSPaV isolate from France (MG938303). Presence of any GRSPaV phylo-groups was not associated with grapevine cultivars (Figure 8). Overall, all isolates of GRSPaV were in the same Clade 1 with recombinant genomes of isolates from France (MG938295 and MG938303) and USA (AY368590).
cluster was phylogenetically diverse (79.85% to 92.95% NT identity) with some isolates (BacPN4, DM85, BacMF6 and Co15_56J) closely related (89.84% to 92.29% NT identity) to the GenBank sequence of GRSPaV isolate from France (MG938303). Presence of any GRSPaV phylo-groups was not associated with grapevine cultivars (Figure 8). Overall, all isolates of GRSPaV were in the same Clade 1 with recombinant genomes of isolates from France (MG938295 and MG938303) and USA (AY368590). Figure 7. Maximum likelihood phylogenetic tree constructed from recovered consensus partial sequences of (A) grapevine leafroll-associated virus 2 (GLRaV-2) and full genome of (C) hop stunt viroid (HSVd). Neighbor-joining construction method with 1000 bootstrap replicates was used. Branch length represents phylogenetic distances determined with distance matrices of nucleotide sequences. Purple dots above critical branches are significant bootstrap values (>70%). GenBank accession number of fully sequenced genomes from different countries are highlighted for reference. Isolates from this study are represented and are not highlighted. The percent of identity (purple-red scale) and Jukes-Cantor distance (yellow-red scale) are presented for each pair of sequences of (B) GLRaV-2 and (D) HSVd. The percent of identity is the proportion of identical residues in the alignment positions to overlapping alignment positions between the two sequences. The distance was calculated using the Jukes-Cantor distance between the two sequences and is the proportion between identical and overlapping alignment position between two sequences. R package phylogram and the interactive tree of life platform were used to generate the tree (see the materials and methods section for more details). Figure 7. Maximum likelihood phylogenetic tree constructed from recovered consensus partial sequences of (A) grapevine leafroll-associated virus 2 (GLRaV-2) and full genome of (C) hop stunt viroid (HSVd). Neighbor-joining construction method with 1000 bootstrap replicates was used. Branch length represents phylogenetic distances determined with distance matrices of nucleotide sequences. Purple dots above critical branches are significant bootstrap values (>70%). GenBank accession number of fully sequenced genomes from different countries are highlighted for reference. Isolates from this study are represented and are not highlighted. The percent of identity (purple-red scale) and Jukes-Cantor distance (yellow-red scale) are presented for each pair of sequences of (B) GLRaV-2 and (D) HSVd. The percent of identity is the proportion of identical residues in the alignment positions to overlapping alignment positions between the two sequences. The distance was calculated using the Jukes-Cantor distance between the two sequences and is the proportion between identical and overlapping alignment position between two sequences. R package phylogram and the interactive tree of life platform were used to generate the tree (see the materials and methods section for more details).  sequences of grapevine Ruspestris stem pitting-associated virus (GRSPaV). Neighbor-joining construction method with 1000 bootstrap replicates was used. Branch length represents phylogenetic distances determined with distances matrices of nucleotide sequences. Purple dots above critical branches are significant bootstrap values (>70%). GenBank accession numbers of fully sequenced genomes from different countries are highlighted for reference. Isolates from this study are represented and are not highlighted. The percent of identity (purple-red scale) and Jukes-Cantor distance (yellow-red scale) are represented for each pair of sequences of GRSPaV. The percent of identity is the proportion of identical residues in the alignment positions to overlapping alignment positions between the two sequences. The distance was calculated using the Jukes-Cantor distance between the two sequences and is the proportion between identical and overlapping alignment position between two sequences. R package phylogram and the interactive tree of life platform was used to generate the tree (see the materials and methods section for more details).

Discussion
Growers' increasing experience and expertise in northern viticulture and climate change represent an opportunity for major change in Quebec's grapevine industry by allowing a shift from traditionally cultivated hybrid cultivars (e.g., interspecific hybrids Vidal) into cultivars with lower cold hardiness and disease tolerance, such as Vitis vinifera (Pinot noir). However, this situation comes with potential risk including grapevine leafroll disease (GLD), one of the most economically damaging grapevine diseases. Considering that the disease is latent in some grapevine hybrid cultivars (e.g., Vidal) and that these cultivars will continue to be grown, they can serve as a virus reservoir [14]. In this study, we characterized the virome of the interspecific hybrid cultivars and compared it to the virome of a V. vinifera cultivar to guide grapevine producers during the transition.
Grapevine leafroll-infected plants had a diverse virome in terms of composition, abundance and richness. Three viruses (GRSPaV, GLRaV-3 and GLRaV-2) and one viroid (HSVd) largely dominated the overall virome. However, their presence and abundance can be sorted by grapevine cultivars. Indeed, in the hybrid cultivar Vidal, GRSPaV dominated the virome with 61% of the viral population, and GLRaV-3 and GLRaV-2 represented 13% and 18% of the viral population, respectively, whereas in the cultivar Pinot noir (Vitis vinifera), GLRaV-2 dominated the virome with 43% of the viral population, and GRSPaV, GLRaV-3 and HSVd represented 7%, 38% and 8% of the viral population, respectively. The virome composition was clustered by sampling site and not by grapevine cultivars. These results suggested that GLRaV-2 and GLRaV-3 were widely distributed in Quebec and in the cultivar Vidal even though the relative abundance of these viruses was lower in Vidal than in Pinot noir. Therefore, Vidal cultivar could serve as a virus reservoir and play a key epidemiological role during the transition. In addition, Vidal cultivar did not express any notable visual symptoms despite the presence of both GLRaV-2 and GLRaV-3. Indeed, leafroll symptom expression had a significant positive association with the cultivar Vitis vinifera and a significant negative association with cultivar Vidal (Vitis sp. Hybrid). These results are supported by Beuve et al. [9] and Kovacs et al. [14], who noticed strong symptom expression associated with grapevine cultivar Pinot noir and that the grapevine leafroll viruses are latent in Vidal cultivar. Therefore, from a strictly GLD disease management decision-making standpoint, Vidal should be classified with cultivars such as Thompson Seedless and Sauvignon Blanc, in which GLD is nearly impossible to detect visually [41]. When establishing new cultivars (e.g., Vitis vinifera), growers cannot rely only on clean planting materials derived from virus-tested stocks. They should also test their vineyards for at least the presence of GLR (2 and 3), and the testing should particularly include the symptomless Vidal cultivar. In vineyards where it is not possible to randomly test enough Vidal plants, managers should consider maximizing the distance between new plantings and the existing grapevine plants to reduce potential GLD spread.
Most of the virus and viroid species detected, including GRSPaV, HSVd, grapevine syrah virus 1 (GSyV-1) grapevine fleck virus and grapevine Red Globe, can be considered as part of the "background" virome. These background viruses and viroids are not primarily critical in the expression of virus-like symptoms [42]. Indeed, the symptom expressions are believed to be associated with additional infection of rarely-damaging viruses (GLRaV-3, GLRaV-2, grapevine pinot gris virus, grapevine red blotch, etc.) that disrupt the existing virome, leading to the expression of symptoms [42]. Our data show that viruses and viroids associated with the highest number of plants expressing leafroll symptoms (variables TNSL) were GRSPaV, HSVd, GLRaV-3, GPG, GRBV and GLRaV-2, in gradient order. However, co-occurrence analysis revealed that the presence of GLR species (GLRaV-2 and GLRaV-3) was randomly associated with symptom development. This conclusion derived from analysis of 1452 points of comparison resulting from compilation of our data and data from five other published studies. In white grapevine cultivars such as Vidal, the symptoms are often subtle and difficult to diagnose, which can induce a bias in the co-occurrence analysis. However, even when the co-occurrence analyses were conducted without Vidal cultivar, the association between GLD virus species and symptom expression was still random. A possible explanation of this unexpected result may be that the link between GLR viruses (2 and 3) and the symptoms that have been demonstrated in many published studies is more likely related to the virus titer in a given plant than its presence or absence. Indeed, co-occurrence analyses are based on presence and absence and do not consider the virus titer. Hence, this is not an evidence of quantitative interaction between GLR viruses and symptom expression [43]. Monis and Bestwick demonstrated that symptomatic leafroll-infected leaves of grapevine had high titers of the virus and while undetectable virus titers was obtained from no leafroll symptoms samples [44].
Because symptoms result from complex abiotic and biotic interactions [8], the role of abiotic factors (e.g., weather) and the different genetic variants of GLR and other viruses and viroids that were detected may also explain this unexpected result. In fact, association between GLRaV-2 and symptoms were linked with the phylogenetic clustering of the genetic variants [45], and Thompson et al. [46] discovered a novel genetic variant of GLRaV-3 that induces no foliar symptoms. However, we did not have enough consensus sequences of GLRaV-3 to study the phylogenetic diversity of GLRaV-3 strains associated with our samples. The phylogenetic analysis of GLRaV-2 strains associated with our samples revealed a single monophyletic group that is highly similar to some isolates from BC and also significantly dissimilar to others isolates from BC. As in BC, this monophyletic group was associated with the cultivar Pinot noir, which indicated possible infection via planting materials [16]. However, as highlighted by Poojari et al. [16], there is a need to understand the impact of GLRaV-2 genetic variants in Canadian vineyards. In summary, more research is needed to shed light on GLD symptom expression resulting from the interaction of different genetic variants of GLR (2 and 3) and grapevine cultivars. Moreover, the role of genetic variability among strains of the most abundant viruses/viroids of the background virome, such as GRSPaV and HSVd, should be studied. Indeed, our isolates of GRSPaV were grouped into distant large clusters and were all classified in the same clade (clade 1) with other recombinant genomes from France [47]. Until today, no published studies have shown a relationship between strains from the four main clades of GRSPaV and symptom development even though it is known that some strains of GRSPaV can cause symptoms on certain grapevine rootstock genotype (Vitis Rupestris cv. St. George) [48]. Implication of HSVd in symptom development, which is not yet proven, cannot be related to the genetic variability among our HSVd strains because our HSVd sequences were highly similar and no significant dissimilarities of our HSVd full genomes were observed when compared to known grapevine HSVd. Nevertheless, as these two members of the background virome are frequently observed in grapevine-infected plants [9,42,49], more studies are needed to understand the corpus of these viruses and their potential role in symptom expression.

Conclusions
This study highlights the complexity of the grapevine virome associated with leafroll-infected plants of different cultivars. Independently of the grapevine cultivar, the plants were infected by multiple viral and viroid species and different variants of the same virus. These results are supported by Beuve et al. However, these authors described this complex virome only on the cultivar Pinot noir [9]. The symptomless hybrid cultivar Vidal presents a complex virome which, in contrast, was dominated by grapevine Rupestris stem pitting-associated virus despite the presence of grapevine leafroll disease (GLD) virus species. Therefore, removing and destroying of symptomatic grapevines and possibly vines immediately adjacent to those symptomatic grapevines, which is proven to be efficient to control GLD spread [8,50,51], will not be effective in vineyards where Vidal grapevines are grown because this latter cultivar will hamper the management efforts. During the transition and establishment of new and less winter hardy cultivars such as Vitis vinifera (e.g., Pinot noir) in Quebec, growers cannot rely only on certified material derived from virus-tested stocks, they also need to periodically test their vineyards for at least the presence of GLRaV (2 and 3), and the testing should particularly include the symptomless Vidal cultivar.
Supplementary Materials: The following are available online at http://www.mdpi.com/1999-4915/12/10/1142/s1, Figure S1: Discriminant principal component analysis of the virome and association between detected viruses and the mean proportion of viral reads that mapped for a given virus (MPVR), total number of symptomatic leaves (TNSL) associated with a given virus, mean depth (MD), mean depth relative to the depth of the positive control virus (MDRC), mean relative abundance (MRA), mean weight, and genome size (GS). Showing groups of species that induce similar response. Table S1: Co-occurrence table displaying association between all events. Table S2: GenBank accession numbers of Grapevine leafroll-associated virus 2 and their nucleotide positions of sequences used for generation of concatenated sequences in this study. Table S3: GenBank accession numbers of Grapevine Rupestris stem pitting-associated virus and their nucleotide positions of sequences used for generation of concatenated sequences in this study.