Phylogeny and Genetic Divergence among Sorghum Mosaic Virus Isolates Infecting Sugarcane

Sorghum mosaic virus (SrMV, the genus Potyvirus of the family Potyviridae) is a causal agent of common mosaic in sugarcane and poses a threat to the global sugar industry. In this study, a total of 901 sugarcane leaf samples with mosaic symptom were collected from eight provinces in China and were detected via RT-PCR using a primer pair specific to the SrMV coat protein (CP). These leaf samples included 839 samples from modern cultivars (Saccharum spp. hybrids) and 62 samples from chewing cane (S. officinarum). Among these, 632 out of 901 (70.1%) samples were tested positive for SrMV. The incidences of SrMV infection were 72.3% and 40.3% in modern cultivars and chewing cane, respectively. Phylogenetic analysis showed that all tested SrMV isolates were clustered into three clades consisting of six phylogenetic groups based on 306 CP sequences (this study = 265 and GenBank database = 41). A total of 10 SrMV isolates from South America (the United States and Argentina) along with 106 isolates from China were clustered in group D, while the remaining 190 SrMV isolates from Asia (China and Vietnam) were dispersed in five groups. The SrMV isolates in group F were limited to Yunnan province in China, and those in group A were spread over eight provinces. A significant genetic heterogeneity was elucidated in the nucleotide sequence identities of all SrMV CPs, ranging from 69.0% to 100%. A potential recombination event was postulated among SrMV isolates based on CP sequences. All tested SrMV CPs underwent dominant negative selection. Geographical isolation (South America vs. Asia) and host types (modern cultivars vs. chewing cane) are important factors promoting the genetic differentiation of SrMV populations. Overall, this study contributes to the global understanding of the genetic evolution of SrMV and provides a valuable resource for the epidemiology and management of the mosaic in sugarcane.

The potyviral genome encodes a long polyprotein that is processed by proteinases, giving rise to at least 10 mature proteins: P1 (protein 1 protease), HC-Pro (helper component protease), P3 (protein 3), PIPO (pretty interesting Potyviridae ORF), 6K1 (6 kDa peptide 1), CI (cylindrical inclusion), 6K2 (6 kDa peptide 2), NIa-Pro (nuclear inclusion a-protease), NIb (nuclear inclusion b, RNA-directed RNA polymerase), CP (capsid protein), as well as VPg (virus protein genome-linked) [5,13,14].The potyviral CP participates in various biological functions such as coating and protection of the RNA genome, aphid transmission, as well as cell-to-cell and long-distance movement [4,15].Meanwhile, this viral protein may also be involved in the regulation of CP stability and functional diversity during the viral life cycle through various post-translational modifications [4].However, these biological functions in SrMV CP have not yet been identified.The CP-coding region of these potyvirus-encoded proteins is preferentially a targeted region used for viral genetic diversity and phylogeny analysis as well as disease diagnosis using molecular and serological approaches [5,8,16].
Evolutionary driving forces, population structure, and differentiation among SrMV isolates have been investigated.For instance, insertion/deletion mutations, negative selection, and frequent gene flow were proposed to contribute to the genetic divergence and population structure of SrMV isolates [25].No obvious recombination event was found in the CP gene region of all tested SrMV isolates [24,25].High rates of mutation and recombination between potyvirus strains result in the creation of new viral strains.These novel isolates show a high degree of pathogenicity in a variety of host species and cultivars, which is posing a challenge to global crop production [4,26,27].Importantly, it is critical to distinguish between SrMV strains when breeding resistant sugarcane genotypes.However, these research aspects of SrMV remain unclear.In China, SrMV is one of main viruses infecting sugarcane, particularly modern commercial cultivars, followed by SCSMV [8,25,28].Therefore, in this study we extensively analyze the occurrence, distribution, genetic variation, and population differentiation of SrMV infecting sugarcane based on viral CP fragments.These findings offer insights into the virus's prevalence in China's sugarcane-growing regions and crucial recommendations for the management and prevention of mosaic disease.

Detection of SrMV Using RT-PCR
The SrMV was detected using RT-PCR in 632 of 901 (70.1%) leaf samples.The incidences of SrMV-positive were 72.3% and 40.3% in modern cultivars and chewing cane, respectively.Subsequently, 265 representative CP fragments (approximately 850 bp) were selected for a further sequence analysis.

Phylogenetic Relationship among SrMV Isolates
A phylogenetic analysis showed that all the 306 SrMV isolates (this study = 265 and GeneBank library = 41) were clustered into three clades (I, II, and III), including six different groups (A-F) with 4-120 isolates.Clades I and II consisted of two (A and B) and three (C-E) groups, respectively.Clades III included a unique group F.Moreover, 39.2% and 37.9% of SrMV isolates were assigned to groups A and D, respectively.Apart from 106 SrMV isolates from China, 10 isolates from the United States and Argentina were clustered into group D. The remaining SrMV isolates from Asia (China and Vietnam) were clustered in six groups (Figure 1).Notably, the 18 SrMV isolates from chewing cane were distributed in four groups ( SrMV-A, -D, -E, and -F).The frequency of SrMV phylogroups over eight Chinese sugarcane-planting provinces is shown in Figure 2. The SrMV isolates from groups A and D were observed in eight provinces, while the SrMV isolates from group B were found in seven provinces except Guangdong (GD).Additionally, the SrMV isolates from group E were found in six provinces except Sichuan (SC) and Yunnan (YN) provinces.Group C was present in four provinces including Fujian (FJ), Guangxi (GX), Hainan (HN), and Sichuan, but group F only occurred in Yunnan province.

Detection of SrMV using RT-PCR
The SrMV was detected using RT-PCR in 632 of 901 (70.1%) leaf samples.The incidences of SrMV-positive were 72.3% and 40.3% in modern cultivars and chewing cane, respectively.Subsequently, 265 representative CP fragments (approximately 850 bp) were selected for a further sequence analysis.

Phylogenetic Relationship among SrMV Isolates
A phylogenetic analysis showed that all the 306 SrMV isolates (this study = 265 and GeneBank library = 41) were clustered into three clades (I, II, and III), including six different groups (A-F) with 4-120 isolates.Clades I and II consisted of two (A and B) and three (C-E) groups, respectively.Clades III included a unique group F.Moreover, 39.2% and 37.9% of SrMV isolates were assigned to groups A and D, respectively.Apart from 106 SrMV isolates from China, 10 isolates from the United States and Argentina were clustered into group D. The remaining SrMV isolates from Asia (China and Vietnam) were clustered in six groups (Figure 1).Notably, the 18 SrMV isolates from chewing cane were distributed in four groups ( SrMV-A, -D, -E, and -F).The frequency of SrMV phylogroups over eight Chinese sugarcane-planting provinces is shown in Figure 2. The SrMV isolates from groups A and D were observed in eight provinces, while the SrMV isolates from group B were found in seven provinces except Guangdong (GD).Additionally, the SrMV isolates from group E were found in six provinces except Sichuan (SC) and Yunnan (YN) provinces.Group C was present in four provinces including Fujian (FJ), Guangxi (GX), Hainan (HN), and Sichuan, but group F only occurred in Yunnan province.

Sequence Identities between SrMV Populations
The sequence identities of 265 SrMV isolates obtained in this study ranged from 70.3 to 100% (nucleotide) and from 73.8 to 100% (amino acid).In each phylogenetic group, the minimum sequence identities of 73.2% (nucleotide) and 80.5% (amino acid) were observed between SCJK003 (MZ419743) and other isolates in group D (Table 1).Among six phylogenetic groups, nucleotide sequence identities ranged between 69.0% (between groups D and E) and 97.5% (between groups C and D), while amino acid sequence identities were 72.8% (between groups A and D) and 100% (between groups C and D).Notably, obvious divergence was exhibited between clade I and the other two clades (II and III), as evidenced by lower nucleotide sequence identities (<85%) among SrMV isolates, except those between the SCJK003 isolate (group D) and 111 SrMV isolates in group A. In addition, the nucleotide and amino acid sequence identities between geographical groups were between 71.3-100% and 74.9-100%, respectively (Table S1).Meanwhile, nucleotide and amino acid identities between host origin groups were shared by 71.3-100% and 74.9-100%, respectively (Table S2).

Sequence Identities between SrMV Populations
The sequence identities of 265 SrMV isolates obtained in this study ranged from 70.3 to 100% (nucleotide) and from 73.8 to 100% (amino acid).In each phylogenetic group, the minimum sequence identities of 73.2% (nucleotide) and 80.5% (amino acid) were observed between SCJK003 (MZ419743) and other isolates in group D (Table 1).Among six phylogenetic groups, nucleotide sequence identities ranged between 69.0% (between groups D and E) and 97.5% (between groups C and D), while amino acid sequence identities were 72.8% (between groups A and D) and 100% (between groups C and D).Notably, obvious divergence was exhibited between clade I and the other two clades (II and III), as evidenced by lower nucleotide sequence identities (<85%) among SrMV isolates, except those between the SCJK003 isolate (group D) and 111 SrMV isolates in group A. In addition, the nucleotide and amino acid sequence identities between geographical groups were between 71.3-100% and 74.9-100%, respectively (Table S1).Meanwhile, nucleotide and amino acid identities between host origin groups were shared by 71.3-100% and 74.9-100%, respectively (Table S2).
To further investigate the variation among SrMV CP sequences, 12 representative CP amino acid sequences (two sequences in each phylogroup) were aligned.At least four Insertion/deletion (InDel) at the N-terminal and 26 mutation sites were exhibited in SrMV CP sequences (Figure S1).It is noteworthy that no deletion, but a unique site mutation, was present among these CP amino acid sequences in the SrMV-F group as compared to other groups.

Neutrality Test and Selection Pressure on SrMV Populations
Nucleotide diversity (π) showed that SrMV CP sequences in the Asian population had a higher genetic variation (π = 0.12160), while the sequences in the American population had a lower genetic variation (π = 0.02200).However, the π values of the SrMV CP sequences in modern cultivars and chewing cane were 0.12062 and 0.10265, respectively, indicating a higher genetic variation of SrMV CP in both host origins.A neutrality test showed that Tajima's D values for four SrMV populations were all negative, suggesting that the SrMV population exhibited a trend of expansion.Conversely, Tajima's D values for neutrality tests were not statistically significant (p > 0.10) in all cases.Meanwhile, the ratios of dN/dS ranged from 0.070 to 0.078 (less than 1) among four populations, suggesting that the SrMV CP gene was under a negative selection (Table 3).

Genetic Differentiation and Gene Flow between SrMV Populations
Geographic (Asia vs. America) and host (modern cultivars vs. chewing cane) origins showed considerable genetic differentiation as detected by three permutation-based statistical tests (Ks*, Z*, and Snn) that reached significant levels (p < 0.05).The Fst values were >0.33, and the Nm values were <1.0 between geographical groups (Asia vs. America), suggesting that the gene flow between two populations was not frequent.Conversely, the Fst values were <0.33, and the Nm values were >1.0 between host origins (modern cultivars vs. chewing cane), indicating that the gene flow between two populations was frequent (Table 4).

Discussion
The crop productivity is affected by a wide range of adverse environmental factors, including biotic and abiotic stress [29].Mosaic can cause losses ranging from 17% to 50% in susceptible varieties [8].A survey of the occurrence and distribution of causal agents is an important step for prevention and control for this disease.However, SrMV is often mixed with other viruses causing mosaic diseases, and, therefore, distinguishing the species or strain of viruses causing mosaic disease is nearly impossible through a visual observation [8,25,30,31].In this study, the RT-PCR technology was employed to accurately identify SrMV.A higher SrMV detection rate was found in modern cultivars than chewing cane.A lower SrMV detection rate existed in chewing cane because of the lower vulnerability of the host to SrMV pathogenesis [21,32].In addition to cultivar resistance, vector populations and their vagility being subjected to ecosystem simplification also affects virus infection rates [33].However, this difference in interaction between the virus and sugarcane host need to be further explored.In addition to SrMV, SCSMV is another main causal agent of mosaic in sugarcane modern cultivars in China [8,21,22].
According to our findings, the SrMV isolates from China and South America were grouped together in the SrMV-D group, while the SrMV isolates from Asia were distributed throughout the six phylogroups.Compared to a previous study by Luo et al. (2016), a large number of sugarcane samples was used in this study, but no new phylogroup was proposed [25].Nonetheless, more phylogroups were discovered in some specific provincial regions in China.For example, Luo et al. (2016) proposed that only one phylogroup (SrMV-G1) occurred in Guizhou province, while the results of the current study indicate that four phylogroups ( SrMV-A, -B, -D, and -E) were in this region [25].The possible reason is that more leaf samples with mosaic were analyzed in this study.The low sequence identities among the SrMV isolates were indicative of high genetic divergence.The viral species demarcation in Potyviridae family is typically based on the sequence identity of the CP-coding region with a threshold of <76% (nucleotide) and <82% (amid acid) [34].Therefore, even if these isolates are in line with the threshold of viral species in this family, more research is required to determine whether SrMV isolates from Yunnan Province clustered in phylogroup F belong to a unique quasispecies or species.New viral species will be considered following investigations based on full genome sequences (genomic feature and phylogeny), together with additional data about biological characteristics such as host range and vector [5].A high rate of viral genome mutation aids in the creation of novel strains, including resistance-breaking isolates [4].Our data showed that there are at least four InDels in the N-terminal of SrMV CPs and numerous site mutations across the viral CP sequence.Notably, an obvious feature of CP amino acid sequences in the SrMV-F group is no deletion, but a unique mutation site is present compared to other groups.It is unclear whether these different SrMV isolates in different phylogroups are associated with the variation of viral pathogenicity.Additionally, recombination is a major driving force in the evolution of potyviruses [35,36], but this evolutionary force seems to be an uncommon mechanism of speciation [35].Our data showed that there was a potential recombination event in all tested SrMV CP sequences.However, no recombination was found in previous studies by Zhang et al. (2015) [24] and Luo et al. (2016) [25].Natural selection is another important evolutionary mechanism and driving force for viral population variation, and purification selection accelerates the elimination of harmful mutations in genes as well as the formation of a stable population genetic structure [37].Notably, all potyvirus genomes undergo a negative selection, with certain genes such as HC-Pro, CP, Nia, and NIb being more strongly selected than others [35].In this study, the tested SrMV CP was subjected to negative selection.
The genetic makeup of viral populations is significantly influenced by geographic isolation [37].However, modern travel and trade have grown to be significant factors in the transmission of viruses and the swapping of their hosts [35].Sugarcane is a vegetative propagated crop and frequent exchange of germplasm resources or plant settings between Asian countries, which likely resulted in the absence of obvious population divergence of SrMV within the Asian population.Similarly, Wang et al. (2017) also demonstrated that there was no obvious geographic difference among SrMV isolates [38].But, to some extent, SrMV populations in China were linked to their geographical origins [24].Here, our data showed that geographic isolation plays a significant role in the divergence of SrMV isolates between Asia and South America.The host type is another crucial factor leading to the genetic differentiation of plant viruses [33,37].Based on the phylogenetic analysis of 18 Chinese SrMV isolates, they were divided into two virus populations associated with host types (moder cultivar and chewing cane) [21].Our data revealed that the phylogenetic grouping of SrMV isolates was not related to two host sources.On the other hand, these SrMV isolates were strongly differentiating the populations of chewing cane and modern cultivars, according to genetic differentiation analysis.A large-scale study of SrMV samples, host sources, and sugarcane-planting regions should be carried out to further explore SrMV population differentiation.Overall, various driving forces contribute to form different SrMV populations or quasispecies.

RT-PCR Detection
Total RNA was extracted from leaf samples using the TRIzol ® Reagent (Invitrogen, Carlsbad, CA, USA).After the quality and quantity of total RNA were checked, these RNA samples were used for molecular detection using RT-PCR [39].The HiScript II 1st Strand cDNA Synthesis Kit (Novozan, Nanjing, China) was used to synthesize cDNA from each RNA sample (1.0 µg) with the reverse transcription primer Oligo (dT) 23 VN.The set of SrMV-specific primers SrMV-F (5 -ACAGCAGAWGCAACRGCACAAGC-3 ) and SrMV-R (5 -CTCWCCGACATTCCCATCCAAGCC-3 ) was used for PCR amplification [39].The PCR amplification in a 25 µL volume included 1 µL cDNA, 12.5 µL Premix Taq (Ex Taq Version 2.0 plus dye) (TaKaRa, Dalian, China), and 1 µL of each primer (10 µmol/L).The PCR was performed in the following conditions: 94 • C for 5 min; followed by 35 cycles at 94 • C for 30 s, 52 • C for 30 s, and 72 • C for 1 min; a final extension at 72 • C for 10 min.The PCR products were analyzed via a gel electrophoresis on 1.0% agarose gels.

Cloning and Sequencing of RT-PCR Fragments
The target fragments from partial SrMV-positive PCR products were eluted using a Gel Extraction Kit (OMEGA Bio-Tek, Norcross, GA, USA).The purified PCR fragments were ligated into the pMD19-T vector (TaKaRa) and then transformed into Escherichia coli DH5α competent cells.Three positive colonies from each leaf sample were sent to Sangon Biotech Co., Ltd.(Shanghai, China) for sequencing.The inserted fragments were sequenced bidirectionally using the M13 universal primers.

Sequence Alignment and Phylogenetic Analysis
A total of 306 CP sequences (this study = 265 and GenBank database = 41) trimming the primer pair sequences were used for sequence alignment and phylogenetic analysis, including 288 sequences from modern cultivars and 28 sequences from chewing cane (Table S3).Sequence alignment was carried out using the ClustalW algorithm implemented in MEGA 10.1.8software [40] The Neighbor-joining (NJ) method was used to construct the phylogenetic tree, and the robustness of the nodes of the phylogenetic tree was assessed from 1000 bootstrap replicates.A sequence of SCMV isolate SCMV-HZ (NC_003398) was used as outgroup.Sequence identity analysis was conducted using BioEdit 7.1.9software [41].

Evaluation of Population Genetic Parameters
All population genetic parameters of SrMV CP sequences based on different geographical origins (South America vs. Asia) and sugarcane hosts (modern cultivar vs. chewing cane) were calculated using DnaSP version 5.10.01 software [43].Genetic parameters included nucleotide diversity (π) [44] and Tajima's D [45].Three statistical test values (Ks*, Z*, and Snn) were used to evaluate the genetic differentiation between SrMV populations.|Fst| > 0.33 or Nm < 1 indicates that gene flow between populations is not frequent, while |Fst| < 0.33 or Nm > 1 suggests frequent gene flow.The selection pressure on SrMV CP in each population was evaluated by calculating the ratio of nonsynonymous (dN) and synonymous (dS) substitutions in nucleotide sequences.Positive, neutral, and negative selections were indicated by dN/dS ratios >1, =1, and <1.

Conclusions
In this study, the molecular divergence and population structure of SrMV isolates infecting sugarcane (modern cultivars and ancient chewing cane) were investigated based on the CP fragment sequences.A high incidence (70.1%) of the samples was tested positive for SrMV through an RT-PCR assay.Based on 306 SrMV CP sequences, three clades including six phylogenetic groups were proposed.High genetic diversity was present among all tested SrMV isolates based on CP sequence identities ranging from 69.0% to 100%.The SrMV-A and -D groups dispersed in eight sugarcane planting regions/provinces, but SrMV-F only occurred in Yunnan province, China.Our data suggested that numerous evolutionary driving forces such as nucleotide mutant, gene recombination, and purifying selection as well as geographical and host isolation contributed to form different SrMV populations around the world.These findings enrich the information of the genetic diversity of this virus.However, the molecular divergence and genetic population of SrMV at the complete genome level remain unclear.In addition, the molecular mechanism of the interaction between this virus and host sugarcane is largely unknown.Therefore, these research aspects need to be further explored.

Supplementary Materials:
The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/plants12213759/s1. Table S1 Percent identities (%) of nucleotide (low-left) and amino acid (up-right) sequences of SrMV CP within and between different geographical origins.Table S2 Percent identities (%) of nucleotide (low-left) and amino acid (up-right) sequences of SrMV CP within and between host types.Table S3 Information of sequences from sorghum mosaic virus (SrMV) isolates worldwide.Figure S1.Insertion/deletion (InDel) and site mutation among amino acid sequences of SrMV CP.Twelve representative CP sequences (two sequences in each phylogroup) were selected for the alignment by DNAMAN version 6 software.Insertion/deletion (InDel) is showed in red boxes.A unique site mutation of SrMV CP between SrMV-F and other groups is marked with a solid triangle.

Figure 1 .
Figure 1.Phylogenetic tree based on nucleotide sequences of coat protein (CP) from 306 sorghum mosaic virus (SrMV) isolates.All tested SrMV CP sequences included 265 isolates from this study

Figure 1 .
Figure 1.Phylogenetic tree based on nucleotide sequences of coat protein (CP) from 306 sorghum mosaic virus (SrMV) isolates.All tested SrMV CP sequences included 265 isolates from this study plus 41 isolates from the GenBank database.A sequence of sugarcane mosaic virus (SCMV) isolate SCMV-HZ (GenBank accession no.NC_003398) was used as outgroup.The number (n) of isolates in each phylogroup is in parentheses.

Table 1 .
Percentage identities (%) of nucleotide (low-left) and amino acid (up-right) sequences of SrMV coat protein within and between phylogenetic groups a .

Table 1 .
Percentage identities (%) of nucleotide (low-left) and amino acid (up-right) sequences of SrMV coat protein within and between phylogenetic groups a .
a Nucleotide sequence identities (%) of SrMV CP within phylogenetic groups are shown in parentheses.

Table 2 .
Recombination signals detection among 306 SrMV isolates based on the CP genes.

Table 3 .
Genetic variation and population genetic parameters between different populations based on SrMV CP sequences.
a ns, non-significant.b dN/dS, the ratio of nonsynonymous (dN) and synonymous (dS) substitution.

Table 4 .
Tests of genetic differentiation and gene flow among SrMV groups based on geographical origins and host types a .