Multilocus Genotyping Reveals New Molecular Markers for Differentiating Distinct Genetic Lineages among “Candidatus Phytoplasma Solani” Strains Associated with Grapevine Bois Noir

Grapevine Bois noir (BN) is associated with infection by “Candidatus Phytoplasma solani” (CaPsol). In this study, an array of CaPsol strains was identified from 142 symptomatic grapevines in vineyards of northern, central, and southern Italy and North Macedonia. Molecular typing of the CaPsol strains was carried out by analysis of genes encoding 16S rRNA and translation elongation factor EF-Tu, as well as eight other previously uncharacterized genomic fragments. Strains of tuf-type a and b were found to be differentially distributed in the examined geographic regions in correlation with the prevalence of nettle and bindweed. Two sequence variants were identified in each of the four genomic segments harboring hlyC, cbiQ-glyA, trxA-truB-rsuA, and rplS-tyrS-csdB, respectively. Fifteen CaPsol lineages were identified based on distinct combinations of sequence variations within these genetic loci. Each CaPsol lineage exhibited a unique collective restriction fragment length polymorphism (RFLP) pattern and differed from each other in geographic distribution, probably in relation to the diverse ecological complexity of vineyards and their surroundings. This RFLP-based typing method could be a useful tool for investigating the ecology of CaPsol and the epidemiology of its associated diseases. Phylogenetic analyses highlighted that the sequence variants of the gene hlyC, which encodes a hemolysin III-like protein, separated into two clusters consistent with the separation of two distinct lineages on the basis of tufB gene sequences. Alignments of deduced full protein sequences of elongation factor-Tu (tufB gene) and hemolysin III-like protein (hlyC gene) revealed the presence of critical amino acid substitutions distinguishing CaPsol strains of tuf-type a and b. Findings from the present study provide new insights into the genetic diversity and ecology of CaPsol populations in vineyards.


Introduction
Bois noir (BN), a grapevine disease associated with "Candidatus Phytoplasma solani" (CaPsol) infection, causes typical grapevine yellows (GY) symptoms and results in important crop losses in the majority of vine-growing European countries, in the Middle East, and in South America [1,2]. Due to the involvement of multiple insect vectors and plant hosts, the biological cycle of CaPsol is extremely complex [3][4][5][6][7][8][9][10], hindering the development of control strategies for effective control of BN epidemics [11]. Molecular markers of genetic diversity among grapevine-affecting phytoplasmas set a solid foundation to improve knowledge of BN epidemiology. In Europe, sequence analysis of translation elongation factor EF-Tu gene tufB revealed two main tuf -types of CaPsol (tuf -type a and tuf -type b) present in diseased grapevines, as well as in alternative plant hosts nearby [3], suggesting that ecological differences could be associated with molecular diversification of CaPsol populations and their differential distributions. Numerous studies on molecular typing, based on anaylses of nucleotide sequences of more variable genes (e.g., secY, stamp, vmp1), showed large variability among CaPsol strain populations, shedding light on differences in virulence, origin, and host range of different strains [10,[12][13][14][15]. Multiple gene typing analysis was applied to investigate genetic diversity in various bacterial taxa [16][17][18][19]. Such molecular typing was also used to improve knowledge in phytoplasma classification [1,[20][21][22][23] and improve knowledge of the epidemiology of phytoplasmal diseases [4,[6][7][8]24]. In the present study, molecular characterization of CaPsol phytoplasma strains from Italian and North Macedonian vineyards was carried out by analyses of 16S rRNA and tufB genes and eight other previously uncharacterized genomic fragments. The study identified new molecular markers useful for fine differentiation of CaPsol genetic lineages associated with different biological and geographic features.

CaPsol Identification
The primer pair R16F1/R16R1, which is known to prime amplification of 16S rDNA from phytoplasmas classified in groups 16SrI and 16SrXII by PCR [25], was used to characterize 142 DNA samples from grapevines. All symptomatic samples yielded amplicons of approximately 1.1 kb. As expected, positive-control PCRs containing template DNA derived from periwinkle plants infected by the phytoplasma reference strain STOL also produced an amplicon of the same size, whereas negative-control PCRs containing healthy periwinkle DNA or water instead of DNA showed no observable DNA amplification following electrophoresis on agarose gels. All 142 amplicons from symptomatic field samples yielded MseI-RFLP patterns, visualized by electrophoresis on agarose gels, that were indistinguishable from one another and from the pattern typical of the reference strain STOL, indicating that the strains detected in diseased grapevines and other hosts were members of the subgroup 16SrXII-A. The MseI restriction patterns of 15 representative samples are shown in Figure 1a.

Characterization and Distribution of CaPsol tuf-Types
Fragments of tufB genes were amplified from all the grapevine samples in nested PCRs using the primer pair fTufAY/rTufAY. Two different HpaII-RFLP patterns were found among digested amplicons (Figure 1b). These two patterns were identical to those previously reported for CaPsol tuf-type a and tuf-type b, respectively [3]. These results were consistent with previous findings of two tuf-types present in vineyards of northern [5,8], central [10,12], and southern [26] Italy. In the present work, CaPsol tuf-type a and tuf-type b were detected in 49.3% and 50.7% of the 142 symptomatic grapevine samples tested, respectively, but the two CaPsol tuf-types were differentially distributed (Table 1). In northern Italy, 63 out of 85 (74.1%) symptomatic vines carried CaPsol tuf-type a, and in southern Italy, 30 out of 32 (91.7%) symptomatic vines carried CaPsol tuf-type b. In central Italy, the prevalent CaPsol population was also tuf-type b, as 14 out of 16 symptomatic vines carried CaPsol strains of this lineage. It is worth noting that central Italy has a similar latitude to North Macedonia. A previous study revealed that the dominant CaPsol type in North Macedonia was also tuf-type b [27]. Such differential distributions of CaPsol tuf-type a and tuf-type b in northern and central/southern Italy is conceivably linked to ecological differences, particularly the potential CaPsol reservoirs in the respective regions. The polyphagous planthopper Hyalesthes obsoletus is a known vector of CaPsol. It was reported that nettle (Urtica dioica L.) is the main host plant of H. obsoletus in northern Italy [28], while bindweed (Convolvulus arvensis L.) is a major host of the planthopper in central and southern Italy [26]. Since nettle and bindweed are likely reservoirs of CaPsol inoculum, it would be interesting to learn whether they also play a role as niches for the differentiation and adaptation of these two distinct CaPsol types.

Characterization and Distribution of CaPsol tuf-Types
Fragments of tufB genes were amplified from all the grapevine samples in nested PCRs using the primer pair fTufAY/rTufAY. Two different HpaII-RFLP patterns were found among digested amplicons ( Figure 1b). These two patterns were identical to those previously reported for CaPsol tuf -type a and tuf -type b, respectively [3]. These results were consistent with previous findings of two tuf -types present in vineyards of northern [5,8], central [10,12], and southern [26] Italy. In the present work, CaPsol tuf -type a and tuf -type b were detected in 49.3% and 50.7% of the 142 symptomatic grapevine samples tested, respectively, but the two CaPsol tuf -types were differentially distributed (Table 1). In northern Italy, 63 out of 85 (74.1%) symptomatic vines carried CaPsol tuf -type a, and in southern Italy, 30 out of 32 (91.7%) symptomatic vines carried CaPsol tuf -type b. In central Italy, the prevalent CaPsol population was also tuf -type b, as 14 out of 16 symptomatic vines carried CaPsol strains of this lineage. It is worth noting that central Italy has a similar latitude to North Macedonia. A previous study revealed that the dominant CaPsol type in North Macedonia was also tuf -type b [27]. Such differential distributions of CaPsol tuf -type a and tuf -type b in northern and central/southern Italy is conceivably linked to ecological differences, particularly the potential CaPsol reservoirs in the respective regions. The polyphagous planthopper Hyalesthes obsoletus is a known vector of CaPsol. It was reported that nettle (Urtica dioica L.) is the main host plant of H. obsoletus in northern Italy [28], while bindweed (Convolvulus arvensis L.) is a major host of the planthopper in central and southern Italy [26]. Since nettle and bindweed are likely reservoirs of CaPsol inoculum, it would be interesting to learn whether they also play a role as niches for the differentiation and adaptation of these two distinct CaPsol types. In the present study, a close examination of the CaPsol populations within Northern Italy unveiled a striking difference in distribution of the two CaPsol tuf -types in vineyards of Lombardy vs Veneto regions: while there was a high prevalence of tuf -type a (92.3%) in Veneto, an almost even distribution of CaPsol tuf -type a (58.7%) and tuf -type b (41.3%) was found in Lombardy. These differential distribution patterns within northern Italy could be explained by the differences in weeds present within the vineyards. In fact, in Europe, CaPsol tuf -type a and tuf -type b are mainly associated with nettle and bindweed, respectively [3]. Previous studies indicated that nettle is highly present within vineyards in Veneto [5], while in Lombardy it is found mainly in the surroundings rather than in the vineyards [8]. Likewise, in central and southern Italy where rainfall is limited in the summer, nettle is less frequent and bindweed prevails [29]. These considerations reinforce the idea that, even within a given geographic area, the variation in the prevalence of weed species among vineyards influences the composition of CaPsol strain populations in nearby grapevine plants [15].

Possible Role of Protein Encoded by tufB Gene (EF-Tu) in Host Selection
Forty symptomatic vine samples, representing seven geographic regions, were selected for further analyses. Half of the vine samples were infected with tuf -type a CaPsol strains and the other half were infected with tuf -type b CaPsol strains ( Table 2). Nested PCR conducted with primer pairs fusAF2/tufBR1 allowed the amplification of a 1399 bp DNA segment from the 40 samples. The nucleotide sequences of the amplicons were determined, with each amplicon containing a partial fusA gene (1-92 bp) and a full-length tufB gene (215-1399 bp). Alignment of the sequences confirmed the presence of two tufB sequence variants characteristic of tuf -type a (Acc. No. MW175420) and b (Acc. No. MW175421), respectively, distinguished on the basis of four single nucleotide polymorphisms (SNPs), positioned in the tufB gene at nucleotides 277, 880, 941, and 1304 relative to the annealing site of the primer fusAF2 ( Figure 2). The SNP at nucleotide 880 (C/T) differentiated between tuf -type a and b; this SNP accounts for the difference in HpaII-RFLP patterns, as previously reported [3] (Figures  1b and 2). Full protein sequences (344 amino acids) of elongation factor Tu obtained from in silico translation of tufB nucleotide sequences related to tuf -type a and b were aligned. The alignment revealed differences in amino acid composition at positions 243 (Val/Ile, corresponding to an SNP at nucleotide position 941) and 364 (Asp/Asn, corresponding to an SNP at nucleotide position 1304) ( Figure 2). Previous studies on plants and humans indicated that a single substitution between Val and Ile or between Asp and Asn could modify receptor binding activity [30] or enzymatic catalytic activity [31,32]. Thus, key amino acid substitutions in CaPsol tufB genes could also change EF-Tu activity and/or modify interactions with its binding protein(s). Although EF-Tu is well known as a cytoplasmic protein involved in translation [33], it was reported that EF-Tu can (i) be localized at the cell surface and act as a virulence factor [34], and (ii) interact with virulence factors inside the bacterial cytoplasm [35]. Previous studies reported differences in EF-Tu protein sequences of CaPsol strains infecting grapevine in Austria and Iran [36,37], and suggested that these differences in EF-Tu primary structure may act as CaPsol fitness factors in host selection.
From the CaPsol strains identified in the 40 selected vines, 15 multiple gene profiles associated with distinct CaPsol lineages (named as CaPsol lineage 1 to 15) were determined by the combination of genomic fragment sequence variants ( Table 2). The result from a phylogenetic analysis of hlyC gene sequences demonstrated that the two hlyC sequence variants grouped into two clusters (hlyC-1 and -2) were consistent with those identified on the basis of tufB gene sequences (Figure 3a and Figure S1a). In phytoplasmas, hlyC gene encodes a hemolysin III-like protein. It was reported that in humans [38] and other plant pathogens [39], hemolysins act as virulence factors. While it remains unknown whether hemolysin III-like proteins are involved in phytoplasma virulence and/or fitness in different hosts, our finding that the separation of the two CaPsol hlyC gene sequence variants (lineages) was parallel to the separation of the two CaPsol tuf types delineated previously certainly raises this possibility.
Two distinct clusters of sequence variants were also identified based on the alignment of the truB, glyA, and tyrS genes. While the sequence alignment alone did not provide a clear picture as to whether such clustering was parallel to the separation of the two CaPsol tuf types, results from a phylogenetic analysis on the concatenated hlyC, truB, glyA, and tyrS gene sequences showed that the 15 CaPsol lineages were grouped into two distinct clusters (cluster-1 and -2) (Figure 3b and Figure S1a-c), consistent with the separation of the two lineages delineated on the basis of tufB genes (Figure 3c). Further studies should be conducted to investigate whether EF-Tu and/or hemolysin III-like protein are involved in interactions of CaPsol with host plants and/or insect vectors, thereby driving adaptation to varied vineyard ecosystems.

RFLP Analyses: Prevalence of CaPsol Lineages
In silico digestion, carried out on the representative nucleotide sequences of CaPsol sequence variants of the hlyC, cbiQ-glyA, trxA-truB-rsuA, and rplS-tyrS-csdB genomic fragments, allowed generation of virtual RFLP profiles for the restriction enzymes SspI, Hpy188I, BsaHI, and HpyCH4V, respectively (Figure 4). For each genetic locus analyzed, the obtained virtual RFLP profiles consistently differentiated the two sequence variants identified in the present study ( Figure 2). In vitro RFLP assays, conducted on nested PCR products amplified from the 40 selected vines, produced the predicted restriction profiles for each sequence variant of each genomic fragment (data not shown). Having determined the resolution power of these RFLP assays, the CaPsol strains identified in the remaining 102 vines were attributed to lineages using in vitro digestion (Table 1). Based on the obtained collective RFLP patterns, each CaPsol strain was attributed to one of the 15 previously determined lineages ( Table 1). Molecular markers distinguishing such lineages can be exploited to study different aspects of BN disease, such as CaPsol strain population structure determination and epidemiology in different agroecosystems. In particular, the major contribution to the variability among these lineages is due to SNPs within the cbiQ-glyA, trxA-truB-rsuA, and rplS-tyrS-csdB genomic fragments. In fact, numerous studies reported the usefulness of genes not directly related to phytoplasma virulence (e.g., rplV-rpsC, groEL, map) as valuable markers for phytoplasma strain typing related to their ecology [40][41][42].

RFLP Analyses: Prevalence of CaPsol Lineages
In silico digestion, carried out on the representative nucleotide sequences of CaPsol sequence variants of the hlyC, cbiQ-glyA, trxA-truB-rsuA, and rplS-tyrS-csdB genomic fragments, allowed generation of virtual RFLP profiles for the restriction enzymes SspI, Hpy188I, BsaHI, and HpyCH4V, respectively ( Figure 4). For each genetic locus analyzed, the obtained virtual RFLP profiles consistently differentiated the two sequence variants identified in the present study ( Figure 2). In vitro RFLP assays, conducted on nested PCR products amplified from the 40 selected vines, produced the predicted restriction profiles for each sequence variant of each genomic fragment (data not shown). Having determined the resolution power of these RFLP assays, the CaPsol strains identified in the remaining 102 vines were attributed to lineages using in vitro digestion (Table 1). Based on the obtained collective RFLP patterns, each CaPsol strain was attributed to one of the 15 previously determined lineages (Table 1). Molecular markers distinguishing such lineages can be exploited to study different aspects of BN disease, such as CaPsol strain population structure determination and epidemiology in different agroecosystems. In particular, the major contribution to the variability among these lineages is due to SNPs within the cbiQ-glyA, trxA-truB-rsuA, and rplS-tyrS-csdB genomic fragments. In fact, numerous studies reported the usefulness of genes not directly related to phytoplasma virulence (e.g., rplV-rpsC, groEL, map) as valuable markers for phytoplasma strain typing related to their ecology [40][41][42]. The prevalence of CaPsol lineages was evaluated in the different geographic areas under study, leading to new insights into CaPsol populations in Italinan vineyards: (i) three lineages (13,14,15) were found exclusively in North Macedonia and not in Italy; (ii) a single lineage (6) was found in southern Italy; (iii) four lineages were found in central Italy, with two in Marche (10,12) and two in Tuscany (5,11); (iv) six lineages (1,2,4,6,7,10) were found in Veneto, three of which (1,4,10) represented 87% of CaPsol strains; (v) twelve lineages (1 to 12) were found in Lombardy, three of which (1,4,7) represented 65% of CaPsol strains, with three (3,8,9) found exclusively in The prevalence of CaPsol lineages was evaluated in the different geographic areas under study, leading to new insights into CaPsol populations in Italinan vineyards: (i) three lineages (13,14,15) were found exclusively in North Macedonia and not in Italy; (ii) a single lineage (6) was found in southern Italy; (iii) four lineages were found in central Italy, with two in Marche (10,12) and two in Tuscany (5,11); (iv) six lineages (1,2,4,6,7,10) were found in Veneto, three of which (1,4,10) represented 87% of CaPsol strains; (v) twelve lineages (1 to 12) were found in Lombardy, three of which (1,4,7) represented 65% of CaPsol strains, with three (3,8,9) found exclusively in this region (Table 1). Notably, a low number of CaPsol lineages, reflecting low genetic diversity within strain populations, was found in the geographic areas (Veneto, Marche, Tuscany, Apulia, Sicily, North Macedonia) where high prevalence of a single tuf -type was reported. In contrast, numerous CaPsol lineages, reflecting an elevated genetic diversity within strain populations, were found in Lombardy where the two tuf -types (a and b) were equally present. Such genetic heterogeneity within the CaPsol population in Lombardy was also noted in a recent study based on CaPsol molecular characterization using the hypervariable gene stamp [8]. It is reasonable to hypothesize that this variability in CaPsol strains could be related to the ecological complexity of vineyards and their surroundings, including the presence of multiple insect vectors and alternative plant hosts [8]. The RFLP-based typing method used in the present study could be considered to be a valuable tool for research on the ecology of CaPsol and the epidemiology of its associated diseases.  (Table 1). For each plant sample, 1 g of leaf petioles was stored at −30 • C until molecular analysis.

CaPsol Molecular Identification
Total nucleic acids were extracted from the leaf petioles of the examined plants, as previously described [43]. Detection and identification of CaPsol were carried out by means of nested PCR amplification of 16S rDNA primed by the universal primer pairs P1/P7 [44] and followed by the 16SrI group-specific primer pair R16F1/R16R1 [25], with a subsequent MseI-RFLP assay performed on the obtained amplicons. PCR and RFLP reaction conditions were as previously described [45]. PCRs were performed using Taq polymerase (Promega, Milan, Italy) in an automated thermal cycler (MasterCycler Gradient, Eppendorf, Milan, Italy). PCR and enzymatic digestion products were electrophoresed through 1% and 3% agarose gel, respectively, in Tris-Borate-Ethylenediaminetetraacetic acid (TBE) buffer, stained with Midori Green Advance (Biosigma, Venice, Italy), and visualized under a UV transilluminator. Total nucleic acids from periwinkle (Catharanthus roseus (L.) G. Don) infected by the phytoplasma strain STOL (CaPsol, subgroup 16SrXII-A) was used as the reference control. Total nucleic acids extracted from the healthy periwinkle and PCR mixture devoid of nucleic acids was used as the negative control.

Molecular Characterization of CaPsol Strains through Multilocus Genotyping Analysis
The tufB genotyping of CaPsol strains identified in infected samples was performed by nested PCR amplification using the primer pair fTuf1/rTuf1 followed by fTufAY/rTufAY, with subsequent HpaII-RFLP assays performed on the obtained amplicons [3]. The experimental controls, PCR conditions, and PCR-RFLP analysis were same as the above 16S rRNA gene analysis.
Multilocus genotyping analysis was carried out by employing 40 CaPsol strains representing distinct tuf -types (20 tuf -type a and 20 tuf -type b) and different geographic origins. Based on the genome sequences of CaPsol strain 284/09 (FO393427) and 231/09 (FO393428) [46], primer pairs were designed to amplify the complete tufB gene sequence and 8 genomic fragments (cbiQ-glyA, rplS-tyrS-csdB, trxA-truB-rsuA, hlyC, potC-potD, pnp, gyrA-gyrB, aspS-mesJ) containing 16 previously uncharacterized genes encoding proteins ( Table 3). The experimental controls, PCR conditions, and PCR-RFLP analysis were same as the above 16S rRNA gene analysis. Obtained amplicons of the full tufB gene and the 8 genomic fragments were sequenced in both senses (5X coverage per base position) by a commercial service (Eurofins Genomics, Germany). Nucleotide sequences were compiled in FASTA format, assembled by employing the Contig Assembling Program of the software BioEdit version 7.2 [47], and trimmed to the annealing sites of the related primers utilized by the nested PCRs. For each genomic fragment under study, nucleotide sequences were aligned using the ClustalW Multiple Alignment application within the software BioEdit, and single nucleotide polymorphisms (SNPs), restriction enzymatic sites, and deduced amino acid substitution were searched for.

Phylogenetic Analyses
Phylogenetic trees were established by aligning the nucleotide sequences of tufB, hlyC, glyA, truB, and tyrS genes of CaPsol strains representing genetic lineages identified in this study and those of available phytoplasma strains in National Center for Biotechnology Information (NCBI) GenBank. Moreover, phylogenetic analysis was also performed on hlyC, glyA, truB, and tyrS concatenated nucleotide sequences. The evolutionary history was inferred using the Maximum Likelihood method and the Tamura-Nei model [48]. Initial trees for the heuristic search were obtained automatically by applying Neighbor-Join and BioNJ algorithms to a matrix of pairwise distances estimated using the Maximum Composite Likelihood (MCL) approach, and then selecting the topology with the superior log likelihood value. Evolutionary analyses were conducted in MEGA X [49].

Survey on CaPsol Genetic Lineages by Restriction Fragment Length Polymorphism Analyis
SNPs distinguishing CaPsol genetic lineages were checked for their position in recognition sites for restriction enzymes by in silico restriction fragment length polymorphism (RFLP) assays using the software pDRAW32 (AcaClone Software, http://acaclone.com). The obtained virtual RFLP profiles were confirmed by actual digestion of genomic fragments amplified from the 40 grapevines selected for multiple gene sequencing. Nested PCR amplification of the genomic fragments (hlyC, cbiQ-glyA, trxA-truB-rsuA, and rplS-tyrS-csdB) was conducted on the remaining 102 CaPsol-infected grapevines (not included in the multilocus genotyping). Actual RFLP analyses were performed using the enzyme SspI on hlyC amplicons, Hpy188I on cbiQ-glyA amplicons, BsaHI on trxA-truB-rsuA amplicons, and HpyCH4V on rplS-tyrS-csdB amplicons, respectively. Digestion reactions were carried out as indicated by the enzyme manufacturer's instructions (New England Biolabs, Ipswich, MA, USA). RFLP profiles were visualized by electrophoresis on 3% agarose gel.