Genetic Structure and Molecular Variability Analysis of Citrus sudden death-associated virus Isolates from Infected Plants Grown in Brazil

Citrus sudden death-associated virus (CSDaV) is a monopartite positive-sense single-stranded RNA virus that was suggested to be associated with citrus sudden death (CSD) disease in Brazil. Here, we report the first study of the genetic structure and molecular variability among 31 CSDaV isolates collected from both symptomatic and asymptomatic trees in CSD-affected areas. Analyses of partial nucleotide sequences of five domains of the CSDaV genomic RNA, including those encoding for the methyltransferase, the multi-domain region (MDR), the helicase, the RNA-dependent RNA polymerase and the coat protein, showed that the MDR coding region was the most diverse region assessed here, and a possible association between this region and virus adaption to different host or plant tissues is considered. Overall, the nucleotide diversity (π) was low for CSDaV isolates, but the phylogenetic analyses revealed the predominance of two main groups, one of which showed a higher association with CSD-symptomatic plants. Isolates obtained from CSD-symptomatic plants, compared to those obtained from asymptomatic plants, showed higher nucleotide diversity, nonsynonymous and synonymous substitution rates and number of amino acid changes on the coding regions located closer to the 5’ end region of the genomic RNA. This work provides new insights into the genetic diversity of the CSDaV, giving support for further epidemiological studies.


Introduction
Citrus sudden death-associated virus (CSDaV) is a member of the genus Marafivirus in the family Tymoviridae, and has shown a strong association with Citrus sudden death (CSD), an important citrus disease in Brazil [1]. CSDaV virions are isometric particles of ≈30 nm in diameter, composed of a monopartite, positive-sense, single-stranded RNA genome of approximately 6.8 kb with a high cytosine content (37.4%) and encompassing two ORFs [1,2]. A large ORF (ORF1) encodes a 240 kDa polyprotein (p240) which contains conserved signatures of domains involved with replication and virion structure, including the methyltransferase (MT), the papain-like protease (PRO), the helicase (He), the RNA-dependent RNA polymerase (RdRP) domains and two subunits of the coat protein (CP) of 21 and 22 kDa in size, respectively [1]. Moreover, a multi-domain region that contains numerous predicted single domains is detected in ORF1 (between the MT and PRO domains), but the function of this region in CSDaV is unknown. The small ORF (ORF 2) at the 3' end region encodes a 16 kDa putative

Plant Collection
The CSDaV population was assessed from different citrus plants: different cultivars of sweet orange grafted on different rootstocks, susceptible and tolerant to CSD. A total of 31 plants was sampled: fifteen trees were asymptomatic and 16 trees had clear CSD symptoms (i.e., occurrence of yellow stain in the rootstock bark), including a tree grafted on Sunki mandarin of China, which is supposed to be tolerant to CSD, and trees grafted on CSD-susceptible rootstock (Rangpur lime), but intergrafted with tolerant rootstocks (Trifoliate orange and Cleopatra mandarin). Genotypes and symptom information are summarized in Table 1

RNA Extraction and RT-PCR Amplification
Total RNA was extracted from all samples using the RNeasy Plant Mini kit (Qiagen, Valencia, CA, USA) according to the manufacturer's instructions. A set of primers (Table 2) was designed to amplify nucleotide sequences, which corresponded partially to the five domains: the methyltransferase (MT), the multi-domain region (MDR), the helicase (He), the RNA-dependent RNA polymerase (RdRP) and the coat protein (CP) coding regions based on the CSDaV reference genomes (GenBank accession no. AY884005 and DQ185573) ( Figure 1). cDNAs were synthesized in a 20 µL volume of 1× Reaction Buffer, containing 0.5 mM dNTPs mix, 200 U of RevertAid H Minus M-MuLV Reverse Transcriptase (Thermo Scientific, Waltham, MA, USA), and 5 µM of a random hexamer primer. PCR reactions were performed in 25 µL volume, containing 1× High Fidelity PCR Buffer (Invitrogen, Carlsbad, CA, USA), 0.2 mM dNTP mix, 2 mM MgSO4, 0.02 U of Platinum Taq DNA Polymerase High Fidelity (Invitrogen) and 10 mM of each forward and reverse primers. The following PCR conditions were used: 94 • C for 2 min; 35 cycles each of 94 • C for 15 s, 55 • C (for all pair of primers) for 30 s and 68 • C for 1 min. The resulted PCR products were separated by electrophoresis in a 1% agarose gel and detected by ethidium bromide staining. Bands were cut from the gel and the PCR products were purified using a QIAquick gel extraction kit (Qiagen).

RNA Extraction and RT-PCR Amplification
Total RNA was extracted from all samples using the RNeasy Plant Mini kit (Qiagen, Valencia, CA, USA) according to the manufacturer's instructions. A set of primers (Table 2) was designed to amplify nucleotide sequences, which corresponded partially to the five domains: the methyltransferase (MT), the multi-domain region (MDR), the helicase (He), the RNA-dependent RNA polymerase (RdRP) and the coat protein (CP) coding regions based on the CSDaV reference genomes (GenBank accession no. AY884005 and DQ185573) ( Figure 1). cDNAs were synthesized in a 20 μL volume of 1× Reaction Buffer, containing 0.5 mM dNTPs mix, 200 U of RevertAid H Minus M-MuLV Reverse Transcriptase (Thermo Scientific, Waltham, MA, USA), and 5 μM of a random hexamer primer. PCR reactions were performed in 25 μL volume, containing 1× High Fidelity PCR Buffer (Invitrogen, Carlsbad, CA, USA), 0.2 mM dNTP mix, 2 mM MgSO4, 0.02 U of Platinum Taq DNA Polymerase High Fidelity (Invitrogen) and 10 mM of each forward and reverse primers. The following PCR conditions were used: 94 °C for 2 min; 35 cycles each of 94 °C for 15 s, 55 °C (for all pair of primers) for 30 s and 68 °C for 1 min. The resulted PCR products were separated by electrophoresis in a 1% agarose gel and detected by ethidium bromide staining. Bands were cut from the gel and the PCR products were purified using a QIAquick gel extraction kit (Qiagen).

Cloning and Sequencing
The purified PCR products were cloned into pGEM-T vector (Promega, Madison, WI, USA) using T4 DNA ligase (Promega) according to the manufacturer's instructions, followed by transformation into Escherichia coli DH5α competent cells [11]. Ten recombinant colonies were selected on screening media and confirmed by colony PCRs. Plasmid DNAs were extracted using the PureYield plasmid miniprep system kit (Promega) following the manufacturer's instructions and were bi-directionally sequenced using an Applied Biosystems 3730 DNA Analyzer.

Nucleotide Sequence Analysis
CSDaV reference sequences, identified as AY884005 (CSDaV) and DQ185573 (CSDaV strain p15) were downloaded from GenBank [12] and included in this analysis as representatives of CSDaV. Multiple nucleotide sequence alignments for each genomic region were obtained using the CLUSTAL W [13], and manually edited in the program MEGA 6.06 [14]. Neighbor joining phylogenetic trees were inferred with 1000 bootstraps in the MEGA 6.06 program and the generated trees were edited using FigTree version 1.4.2 [15]. A set of sequences for each genomic region of CSDaV were assessed using DnaSP software version 5.1 [16] to estimate genetic diversity and other population genetic parameters.
Recombination events among CSDaV isolates were examined using phylogenetic analysis and the boot-scan method in the SimPlot program [17]. Evidence of recombination was considered when 70% of permuted trees supported a particular grouping of sequences. The window width and the step size were set to 200 and 20 bp, respectively. The degree of selective constraints imposed on different regions of CSDaV genome was estimated with MEGA 6.06 program by analyzing the nonsynonymous and synonymous substitutions ratios (dN/dS = ω) using the Kumar method and bootstrap with 500 replicates [18]. Fixed effects likelihood (FEL), random effects likelihood (REL), and single likelihood ancestor counting (SLAC) tests, all included in the Hyphy package [19], were performed to determine the site specific selection pressure for the coding regions. For SLAC and FEL, the cut-off p-value was defined as 0.1 and for REL, a Bayes factor of 50 was selected as the cut-off value. Only positive selections determined by at least two methods were accepted [20].

Genetic Diversity of CSDaV Population
The presence of CSDaV was confirmed immediately after plants collection in both symptomatic and asymptomatic plants, including trees grafted on the CSD tolerant rootstocks, such as Cleopatra and Sunki mandarins, Swingle citrumelo and Poncirus trifoliata. Two step RT-PCRs with specific degenerate primers sets (Table 2)   of CSDaV genomic RNA including those encoding for the MT, the MDR, the He, the RdRp and the CP, respectively [21]. Figures S1 and S2 and Tables S1 and S2 in the supplemental material show all conserved domains detected from conserved domain search using the CSDaV reference genomes as queries in the NCBI Conserved Domain Database (CDD). A total of 31 CSDaV isolates were obtained (Table 3) and the number of sequences for each region is illustrated in Table 4. Nucleotide diversities were estimated based on the number of segregating sites (θw) and the average number of nucleotide differences per site between sequences (π). Overall, the genetic diversity for CSDaV isolates evaluated in this study was low ranging from 0.01013 (the CP fragment) to 0.04185 (the He fragment) (Table 4) with a mean genetic diversity of 0.026118. Table 3. List of CSDaV sequences obtained in this work. Accession numbers in the GenBank database for the different genomic regions of each isolates are indicated.

Phylogenetic Relationships of CSDaV Isolates
The sequences of four representative viruses from the genera Tymovirus (Turnip yellow mosaic  regions except the MDR segment because this region of CSDaV did not show any homology with any genomic region of four Tymoviridae representatives. Because six isolates were detected as possible recombinants based on the topology of phylogenetic trees (see details in the recombination analysis section), the final trees were constructed after removing these recombinants. In general, the topology of the MT (Figure 2a), the MDR (Figure 2b), the He (Figure 2c) and the RdRP (Figure 2d) trees was similar and showed the presence of two main groups of CSDaV isolates assessed in this study with high supporting bootstrap values equal or higher than 83%. The intra-group diversity was best illustrated in the MDR tree (Figure 2b). The topology of the CP tree was different, in which all CSDaV isolates formed a single un-resolved polytomy clade with a supporting bootstrap value of 99% (Figure 2e). For all phylogenetic trees, with the exception of the CP, the main groups were clustered separately from the two CSDaV reference sequences. Divergence between CSDaV reference sequences (AY884005 and DQ185573 isolates) was observed in the MDR, He and RdRP trees (Figure 2b-d).

Comparison of Genetic Diversity between Isolates from Asymptomatic and Symptomatic Plants
Based on the MT, the MDR, the He and the RdRP phylogenetic trees, group I of the CSDaV isolates was formed by the majority of isolates from asymptomatic plants, whereas group II contained the majority of isolates from symptomatic plants (Figure 2 and Table 5). To further strengthen these results, CSDaV consensus sequences were obtained from transcriptome sequencing, conducted for both symptomatic and asymptomatic plants by using Illumina next generation sequencing (NGS) technology [22]. The coding regions studied here were accessed in

Comparison of Genetic Diversity between Isolates from Asymptomatic and Symptomatic Plants
Based on the MT, the MDR, the He and the RdRP phylogenetic trees, group I of the CSDaV isolates was formed by the majority of isolates from asymptomatic plants, whereas group II contained the majority of isolates from symptomatic plants (Figure 2 and Table 5). To further strengthen these results, CSDaV consensus sequences were obtained from transcriptome sequencing, conducted for both symptomatic and asymptomatic plants by using Illumina next generation sequencing (NGS) technology [22]. The coding regions studied here were accessed in these consensus sequences and included in the phylogenetic analysis. Based on the MT, the MDR, the He and the RdRP, the consensus sequence obtained from the asymptomatic library clustered close to the reference isolates, whereas the consensus sequence obtained from the symptomatic library was grouped in group II ( Figure S3), which strongly supports the results presented above. Table 5. Number of CSDaV sequences from symptomatic and asymptomatic plants between the two main phylogenetic groups assessed in this study.
Compared to isolates from asymptomatic plants, the nucleotide diversities estimated only for isolates obtained from symptomatic plants were higher at about 2.2, 1.5, 1.1, 1.1 and 0.9 times for the MT, MDR, He, RdRP and CP regions, respectively ( Table 6). The dN/dS ratio values were higher for the MDR region for isolates from both symptomatic and asymptomatic plants. However, this estimated value for the isolates from symptomatic plants was 1.4 times higher than the ratio estimated for the isolates from asymptomatic plants ( Table 6). The deduced amino acid sequences from each CSDaV genomic region showed silent mutations between isolates from symptomatic and asymptomatic plants ( Table 7).

Recombination Analysis
Based on the phylogenetic trees constructed with all, including the possible recombinants CSDaV isolates ( Figure S4), six isolates, named CR8D, CLBR43S, VASW23S, HACL38S, HACL52D and HACR55S showed some phylogenetic incompatibilities and evidence of recombination events. The two root isolates (CLBR43S and CR8D) clustered in the same clade according to the RdRP and the CP phylogenetic trees ( Figure S4d,e), while they were placed separately based on the MT, the MDR and the He trees ( Figure S4a-c). Isolate VASW23S grouped separately in MT and the RdRP phylogenetic trees ( Figure S4a,d), but it clustered with the main groups in the MDR, the He and the CP trees ( Figure S4b,c,e). Isolates HACL38S, HACL52D and HACR55S were not compared in all regions of the genome analyzed here because we were not able to obtain PCR products for all segments, and they were excluded from the recombination analysis. We selected nine representative isolates from this study including three suggested recombinants and six isolates representing the two main groups of CSDaV, and two CSDaV reference sequences to concatenate their nucleotide sequences of the MT, the MDR, the He, the RdRP and the CP segments. Concatenated sequences were further analyzed using SimPlot. Both phylogenetic and Bootscan methods included in Simplot identified recombination signals as well as their possible parental sequences when VASW23S, CR8D and CLBR43S isolates were used as queries (Figures 3 and 4). Phylogenetic analysis of the concatenated sequences detected several recombination hotspots when different isolates were used as queries: positions 600 and 1322 when VASW23S isolate was used as the query (Figure 3a), positions 603, 1203 and 2458 when CR8D isolate was used as the query (Figure 3b) and positions 170 and 609 when CLBR43S isolate was used as the query (Figure 3c). On the other hand, Bootscan analyses demonstrated that the MDR and He segments of VASW23S isolate come from PRCR24D-like and VASW30S-like isolates, respectively (Figure 4a). For the CR8D isolate, a recombination event was detected by the Bootscan algorithm in which the MDR and He segments of CR8D were generated from two different origins: AY884005 and DQ185573 reference isolates, respectively (Figure 4b). When Bootscan analysis was performed using CLBR43S isolate as the query, we detected four recombination hotspots in which two of them were placed close to positions 170 and 619 (already shown by phylogenetic analysis), and two other hotspots were detected at positions 1271 and 1850 (Figures 3c and 4c). Furthermore, Bootscan results confirmed that the MDR and He segments of CLBR43S were generated from two CSDaV reference isolates (Figure 3c) and the region from the MT segment was likely driven by recombination events between a DQ185573 reference-like isolate and CR8D-like isolate. The RdRP and the CP segments from CLBR43S isolate showed phylogenetically inconsistent regions with some similarity with the CR8D isolate and the AY884005 reference isolate (Figure 4c).
positions 1271 and 1850 (Figures 3c and 4c). Furthermore, Bootscan results confirmed that the MDR and He segments of CLBR43S were generated from two CSDaV reference isolates (Figure 3c) and the region from the MT segment was likely driven by recombination events between a DQ185573 reference-like isolate and CR8D-like isolate. The RdRP and the CP segments from CLBR43S isolate showed phylogenetically inconsistent regions with some similarity with the CR8D isolate and the AY884005 reference isolate (Figure 4c).   based on concatenated nucleotide sequences of the MT, MDR, He, RdRP and CP genomic regions using Simplot. Three CSDaV isolates, VASW23S (a), CR8D (b) and CLBR43S (c), were used as query sequences and two CSDaV isolates were used as reference sequences. The Y-axis illustrates variation in identity percentage. Analyses were done using a sliding window of 200 bp and a step size of 20 bp. Red vertical dashed line shows the proposed recombination break point. Sequences compared with the query sequence are indicated in the legend.  Only the two root isolates detected as recombinants showed a close phylogenetic relationship to one of the CSDaV reference isolates: CR8D isolate in the MDR, the He and the RdRP trees and CLBR43S isolate in the RdRP tree ( Figure S4). However, these isolates were phylogenetically distant from the main groups of CSDaV isolates assessed in this study. According to the MT, MDR, He and  Only the two root isolates detected as recombinants showed a close phylogenetic relationship to one of the CSDaV reference isolates: CR8D isolate in the MDR, the He and the RdRP trees and CLBR43S isolate in the RdRP tree ( Figure S4). However, these isolates were phylogenetically distant from the main groups of CSDaV isolates assessed in this study. According to the MT, MDR, He and RdRP phylogenetic trees, isolate CLBR43S, obtained from tissues of Cleopatra mandarin rootstock, was found in a separate clade which was phylogenetically distant from the two main groups of CSDaV isolates ( Figure S4). Similar results were found for isolate CR8D, obtained from a Rangpur lime rootstock, according to MDR, He and RdRP phylogenetic trees ( Figure S4). Both of those separations were well-supported.

Selective Pressure for Different Genomic Regions of CSDaV
Evidence of positive selection was not found in any region of the genome for the CSDaV isolates included in this study. The mean ω (dN/dS) value for all genomic regions analyzed here was less than 1.0, indicating that all segments were subjected to negative or purifying selection. Among regions, the MT, He, RdRP and CP regions showed low ω ratios, while this ratio was higher for the MDR region (Table 4)

Discussion
We provided for the first time a snapshot of the genetic structure and variability among Brazilian CSDaV isolates collected from both symptomatic and asymptomatic citrus trees grown in fields affected by CSD disease. To date, only two CSDaV genome sequences were fully described, showing 11% nucleotide diversity between them [23] (GenBank accession no. AY884005 and DQ185573). Both of these well-described CSDaV isolates were obtained from Rangpur lime tissues as rootstock of sweet orange trees collected from the same citrus region assessed in this study [1,24], which is relevant since we can study the evolution of this virus in this CSD-affected area. In the current study, sequence analyses of five regions of the CSDaV genome representing almost 42% of the whole genome of 31 isolates, sampled from different hosts/plant tissues, showed a low genetic diversity. It is not a surprising finding because genetic stability has been considered as a rule in natural plant virus populations [23] and similar low genetic diversity was previously reported for many other RNA plant viruses [20,[25][26][27][28][29][30][31]. It has been shown that systemic infections and other events such as host change and transmission can impose bottlenecks, the most common effects of genetic drift, which have been inferred from the low genetic diversity of plant virus populations [32,33] and which might be the reason for the low genetic diversity among the CSDaV isolates.
The low nucleotide variability observed for the CP, the MT and the RdRP regions of CSDaV genome included in this study suggests that selective pressures in these segments are high to maintain nucleotide and amino acid conservation probably for biological functions [34]. It has been shown that the coat protein (CP) plays critical roles in virus packaging and stability, and interactions with plant host [34]. Similarly, the MT and the RdRP domains play key roles in viral replication, involving mRNA capping and enhanced stability of viral genomes (methyltransferase) [35]; and transcription and replication of RNA virus genomes (RdRP) [36]. On the other hand, the MDR and He regions demonstrated higher nucleotide variability. The MDR segment showed the highest genetic diversity among all studied regions here, and it was the only region that had one site detected as being under positive selection. The MDR segment represents a multi-domain region that contains numerous predicted single domains related to different activities. Interestingly, the MDR was the unique multi-domain region found along the CSDaV genome and was the single region that we could not align with other reference members of the family Tymoviridae. It seems that this region is unique and associated with CSDaV isolates and could be related with some processes of virus adaption [37] to a different host or plant tissues. However, at this time, there is no information about the function(s) of this multi-domain, and then further studies are needed to evaluate the real role of this region in CSDaV. Probably because the pair of primers designed for the He region was highly degenerate, we were not able to amplify the He segment in several samples assessed here and it may be possible that the low number of isolates (nine) has influenced the results. From this work, it is clear that there is a genetic diversity between the CSDaV isolates assessed here and the CSDaV references previously reported. The only isolates that showed close phylogenetic relationships with the CSDaV reference isolates were those isolated from the citrus roots, which were also detected as recombinants in this work, pointing the CSDaV reference sequences as the possible parents. Since we know that the CSDaV reference isolates were isolated from rootstock tissues of citrus trees as well [1,24], these results also provide some evidence of the heterogeneous distribution of virus variants at different locations (leaves and roots) within hosts. Other studies have already reported that the diversity of virus populations is different between old and young tissues, suggesting the tree could reflect the chronology of the appearance of virus diversity [38].
Phylogenetic analyses showed two new genetic clades for the CSDaV isolates included in this investigation, and one of them showed higher association with symptomatic trees. Higher nucleotide diversity, dN/dS ratio values and number of amino acid changes were found for isolates from symptomatic plants in coding regions located closer to the 5' end region of the CSDaV genome (MT and MDR), whereas coding regions located closer to the 3' end region showed more conservation. It is important to say that these isolates belonging to these two new genetic clades were all isolated from the citrus leaves, which have been shown to have CSDaV variants, compared to the CSDaV isolates from the roots (this work and the references isolates). It is possible that the CSDaV isolates infecting rootstock tissues were subjected to some positive selection pressures, mainly on the coding regions closer to the 5' end region, to be able to infect tissues in the citrus canopy, culminating with two different variants of CSDaV, where one of them might be more efficient in infecting CSD-susceptible plants and/or more severe in developing CSD symptoms. Other factors, such as the susceptibility of the citrus rootstock and climate (drought and higher temperature), seem to contribute to the development of the CSD. The confirmed presence of CSDaV in trees grafted on symptomatic and asymptomatic susceptible rootstocks, and symptomatic and asymptomatic tolerant rootstocks, suggest that CSDaV is able to infect a wide host range in CSD-affected regions, but the symptoms are not always developed. The results obtained here do not discard the possibility of a mixed or co-infection of the CSDaV and other virus(es), which was already proposed as a cause of CSD [1]. CSDaV and other members of the genus Marafivirus have been frequently associated in mixed or co-infections in other pathosystems. CSDaV was found to be part of a multiple virus infection in Pinot Noir grapevine [7] and in grapevine Syrah showing decline symptoms [9]. Recently, Villamor et al. [8] found CSDaV infecting California nectarines showing stem-pitting symptoms and also revealed the presence of a new virus of the genus Marafivirus, which shared 70% of nucleotide sequence similarities to CSDaV, co-infecting these plants. All these results obtained in this investigation could together provide new insights into the role of CSDaV in symptom development in plants affected by CSD and contribute to further epidemiological studies.
Supplementary Materials: The following are available online at http://www.mdpi.com/1999-4915/8/12/330/s1, Figure S1: Graphical summary showing the conserved domains detected from conserved domain search using the CSDaV AY884005 reference sequence as query in the NCBI Conserved Domain Database (CDD), Figure S2: Graphical summary showing the conserved domains detected from conserved domain search using the CSDaV DQ185573 reference sequence as query in the NCBI Conserved Domain Database (CDD), Figure S3: Bootstrap majority rule (70%) consensus trees reconstructed by the neighbor joining method for five genomic regions of CSDaV isolates including the consensus sequences from transcriptome sequencing of the symptomatic and asymptomatic plants by using Illumina platform, Figure S4: Bootstrap majority rule (70%) consensus trees reconstructed by the neighbor joining method for five genomic regions of CSDaV isolates including possible recombinant isolates, field collected and reference sequences, Table S1: Description of domains detected from conserved domain search using the CSDaV AY884005 reference sequence as query. The interval and E-value of each identified domain are shown, Table S2: Description of domains detected from conserved domain search using the CSDaV DQ185573 reference sequence as query. The interval and E-value of each identified domain are shown.