Complete Genomic Sequence Analysis of a Sugarcane Streak Mosaic Virus Isolate from Yunnan Province of China

In recent years, the sugarcane streak mosaic virus (SCSMV) has been the primary pathogen of sugarcane mosaic disease in southern China. In this study, the complete genome of a sugarcane mosaic sample (named YN-21) from Kaiyuan City, Yunnan Province, was amplified and sequenced. By comparing the amino acid sequences of YN-21 and 15 other SCSMV isolates from the NCBI database, the protease recognition site of SCSMV was determined. YN-21 had the highest nucleotide and amino acid identities of 97.66% and 99.30%, respectively, in comparison with the SCSMV isolate (JF488066). The P1 had the highest variability of 83.38–99.72% in the amino acid sequence, and 6K2 was the most conserved, with 97.92–100% amino acid sequence identity. A phylogenetic analysis of nucleotide and amino acid sequences clustered the 16 SCSMV isolates into two groups. All the Chinese isolates were clustered into the same group, and YN-21 was closely related to the Yunnan and Hainan isolates in China. Recombination analysis showed no major recombination sites in YN-21. Selective pressure analysis showed that the dN/dS values of 11 proteins of SCSMV were less than 1, all of which were undergoing negative selection. These results can provide practical guidance for monitoring SCSMV epidemics and genetics.


Introduction
Sugarcane streak mosaic virus (SCSMV; species Sugarcane streak mosaic virus, genus Poacevirus, family Potyviridae) has been the primary pathogen of sugarcane mosaic disease in southern China in recent years [1,2].SCSMV infection causes continuous or discontinuous chlorotic stripes in sugarcane leaves, decreasing cane tonnage and saccharose yield [3].The intensity of symptoms varies with the variety of race [4].In addition to infecting sugarcane, SCSMV can be artificially inoculated into sorghum, corn, crowfoot grass, and Sudan grass of the Gramineae family [3,5].SCSMV can be transmitted mechanically or through stem cuttings over long distances, and no insect vector has been found to transmit SCSMV [3].A high incidence of SCSMV has been reported in sugarcane-producing areas such as China, India, Thailand, and Indonesia [3,[6][7][8] and has also been reported in Iran and Africa [9,10].Currently, the management of SCSMV involves growing virus-free planting materials and disease-resistant sugarcane varieties [11].
The population structure of viruses depends on the environment, mainly geographical and host factors [16].Gillaspie et al. isolated SCSMV from sugarcane germplasm introduced to the US from Pakistan in 1978 [17].Then, Hema et al. obtained an SCSMV isolate from India and found that the isolate was similar to the Pakistani isolate in nucleotide identity of about 93.6% in 3 ends and considered that the virus was a new genus in the Potyviridae family [12,18].In a study of SCSMV, Wang et al. found that the population structure of Chinese isolates had no heritable difference but was different from that of other countries based on the CP gene [19].Zhang et al. found that the Chinese isolates clustered into one group, showing prominent geographical characteristics [20].Based on the HC-Pro gene of SCSMV, Bagyalakshmi et al. found a large difference in the variability of the nucleotide (nt) and amino acid (aa) sequences between Indian isolates and those from other Asian countries [21].Although the genetic characteristics of SCSMV have been previously reported in Yunnan province, China, these studies only focus on the P1, CP, HC-Pro, and NIa-Pro genes of SCSMV.There are no reports on the genetic variation within the complete genome of SCSMV Yunnan isolates in China.
In this study, we collected the sugarcane sample with SCSMV (named YN-21) from the Sugarcane Research Institute, Yunnan Academy of Agricultural Sciences in Kaiyuan City, Yunnan Province, and obtained the full genome sequence of YN-21, which was compared with other SCSMV sequences deposited in GenBank, and the genome structure and codon usage bias of SCSMV was analyzed.Phylogenetic analyses of genetic diversity and population composition were performed based on whole-genome sequences.The study provides a background for the in-depth study of the virus's origin, occurrence, and evolution and the formulation of appropriate prevention and control of the virus.
The total RNA was extracted using Total RNA Extraction Reagent (Vazyme, Nanjing, China) according to the manufacturer's instructions.The first strand cDNA was constructed using HiScript II 1st Strand cDNA Synthesis Kit (Vazyme, Nanjing, China).The cDNA was stored at −20 • C till further use.Using cDNA as a template and the primers designed to amplify 7 overlapping segments of the coding region of the total genome (Figure 1B) (Table A1), the short sequences of the viral genome were amplified using a 2 × taq master mix (Vazyme, Nanjing, China).The reaction conditions were as follows: initial denaturation at 95 • C for 3 min; denaturation at 95 • C for 15 s; annealing for 15 s and extension at 72 • C (extension time according to 1 kb/min) with 32 cycles; and final extension at 72 • C for 10 min.The 3 -UTR sequence was obtained using a 2 × taq master mix with the primers SCSMV-9408F and Oligo(dT) 18 .Also, the 5 -UTR sequence was amplified using the Rapid Amplification Kit for cDNA Ends (RACE) (Sangon Biotech, Shanghai, China) according to the manufacturer's protocol.The 5 RACE primers SCSMV-413-R and SCSMV-337-R were designed according to the isolate YN-21 5 terminal nucleotide sequence obtained from 1F/1R.After PCR amplification, 5 µL each of the PCR products was resolved in 1% agarose gel to observe the target bands, and the PCR products were sent to Sangon Biotech (Shanghai) for sequencing.The complete genome of YN-21 was obtained by sequence assembly.

Recombination Analysis, Phylogenetic and Genetic Distance Analysis
Recombination analysis was performed using RDP [25], GENECONV [26], BOOTSCAN [27], MAXCHI [28], CHIMAERA [29], 3SEQ, and SISCAN [30] algorithms in the RDP4 software.Each software parameter adopted the default value, and the p-value was 0.05 [31].When more than 4 kinds of algorithms supported recombination, and p < 1.0 × 10 −6 , the fragment was considered to have clear recombination, otherwise it was considered that there was no recombination.To check the presence of recombination breakpoints in the analyzed data, the Genetic Algorithm for Recombination Detection (GARD) method, as implemented in the Datamonkey server (http://www.datamonkey.org/(accessed on 28 September 2022)), was used.

Recombination Analysis, Phylogenetic and Genetic Distance Analysis
Recombination analysis was performed using RDP [25], GENECONV [26], BOOTSCAN [27], MAXCHI [28], CHIMAERA [29], 3SEQ, and SISCAN [30] algorithms in the RDP4 software.Each software parameter adopted the default value, and the p-value was 0.05 [31].When more than 4 kinds of algorithms supported recombination, and p < 1.0 × 10 −6 , the fragment was considered to have clear recombination, otherwise it was considered that there was no recombination.To check the presence of recombination breakpoints in the analyzed data, the Genetic Algorithm for Recombination Detection (GARD) method, as implemented in the Datamonkey server (http://www.datamonkey.org/(accessed on 28 September 2022)), was used.

Phylogenetic and Genetic Distance Analysis
The Clustal W program in MEGA v11.0 software was used to align the nucleotide sequences of 16 SCSMV isolates, including YN-21 [32].All phylogenetic trees were generated by the maximum likelihood method with 1000 bootstraps [33].Maize dwarf mosaic virus (MDMV, AJ001691) was included as the outgroup.Meanwhile, the Kimura two-parameter model (K2P) in MEGA v11.0 was used to calculate the genetic distance between and within groups.

Genome Base Composition and Codon Preference of YN-21
Sugarcane (race ROC22) leaves showing the typical sugarcane mosaic symptoms were collected in the Yunnan province of China (Figure 1A).
The total length of the YN-21 (GenBank number OR259188) genome is 9808 nucleotides and the coding sequence is 9390 nucleotides which encodes 3130 amino acids (Table A2).Within the polyprotein, leucine (Leu) was the most abundant, with 264, accounting for 8.43%, while cysteine (Cys) was the least abundant, with 51, accounting for 1.63%.

Phylogenetic and Genetic Distance Analysis
The Clustal W program in MEGA v11.0 software was used to align the nucleotide sequences of 16 SCSMV isolates, including YN-21 [32].All phylogenetic trees were generated by the maximum likelihood method with 1000 bootstraps [33].Maize dwarf mosaic virus (MDMV, AJ001691) was included as the outgroup.Meanwhile, the Kimura two-parameter model (K2P) in MEGA v11.0 was used to calculate the genetic distance between and within groups.

Selection Pressure Analysis
Selection pressure on each viral gene was predicted by calculating the dN/dS (nonsynonymous/synonymous) value between different genes.The codon selection pressure of the 11 genes in the 16 SCSMV isolates was determined by FEL [34] in the online analysis software Datamonkey, and the p-value was 0.1.Values of dN/dS < 1, dN/dS = 1, and dN/dS > 1 indicated negative (or purifying) selection, neutral evolution, and positive (or diversifying) selection, respectively.

Genome Base Composition and Codon Preference of YN-21
Sugarcane (race ROC22) leaves showing the typical sugarcane mosaic symptoms were collected in the Yunnan province of China (Figure 1A).
The total length of the YN-21 (GenBank number OR259188) genome is 9808 nucleotides and the coding sequence is 9390 nucleotides which encodes 3130 amino acids (Table A2).Within the polyprotein, leucine (Leu) was the most abundant, with 264, accounting for 8.43%, while cysteine (Cys) was the least abundant, with 51, accounting for 1.63%.
By comparing the nucleotide sequences of SCSMV isolates, the percentage identities between YN-21 and the other 15 isolates were high, up to 96.5%.Of these, the highest identity rates of the 11 genes within the SCSMV isolates is 85.88-97.98% of CP nucleotide sequences.At the genome level, the nucleotide and amino acid sequence identities between YN-21 and the other 15 SCSMV isolates were 80.98-97.66%and 92.12-99.30%,respectively.Among the 15 isolates, isolate JF488066 had the highest nucleotide and amino acid sequence identities with isolate YN-21 (Table A3).

Recombination Analysis
The analysis performed with RDP4 revealed the presence of recombination breakpoints in 7 of 16 SCSMV isolates (Table 2).Meanwhile, the recombination analysis of the coding sequences (CDS) of 11 genes showed that there were no obvious recombination sites in 6K1, 6K2, and VPg.Three recombination regions were found in CI, and one each was found in CP and P1, respectively (Table 3).The CDS with recombinant sites belonged to cluster group II, and no recombinant was found in group I, indicating that the frequency of recombination was low among SCSMV isolates from China.In order to test the reliability of recombination, GARD was used for recombination analysis, and the results of GRAD were found to be consistent with those of RDP4.In order to reduce the impact of recombination on the population structure, we excluded the three genes that have recombination and divided the SCSMV genome into two segments, A (HC-Pro, P3, 6K1) and B (6K2, VPg, NIa-Pro, NIb), using the maximum likelihood method to construct a phylogenetic tree with a bootstrap value of 1000 (Figure 3).

Phylogenetic Analysis
Using the MDMV (AJ001961) genome as the outgroup, the phylogenetic tree of the complete genome of YN-21 and 15 other isolates was constructed by the maximum likelihood method using the software MEGA v11.0 (Figure 4).The sixteen SCSMV isolates were clustered into two groups (group I and group II).Group I included YN-21 and eight isolates from Yunnan (YN-21211, ID, JP1, JP2) and Hainan (HN-YZ49) in China, Thailand (THA-NP3), Myanmar (MYA-Formasa), and India (TPT).Group II included seven isolates from Iran (IR-Khuz57, IR-Khuz6), Pakistan (PAK), and India (IND5268, INDR-71, IND369, IND671).The results indicated that the YN-21 isolate was closely related to the SCSMV isolates from the Yunnan and Hainan provinces in China.We found that the grouping of the phylogenetic tree without recombinant genes is consistent with that of the full genome, but there are small differences between the second partition and the full genome.

Phylogenetic Analysis
Using the MDMV (AJ001961) genome as the outgroup, the phylogenetic tree of the complete genome of YN-21 and 15 other isolates was constructed by the maximum likelihood method using the software MEGA v11.0 (Figure 4).The sixteen SCSMV isolates were clustered into two groups (group Ⅰ and group Ⅱ).Group Ⅰ included YN-21 and eight isolates from Yunnan (YN-21211, ID, JP1, JP2) and Hainan (HN-YZ49) in China, Thailand (THA-NP3), Myanmar (MYA-Formasa), and India (TPT).Group Ⅱ included seven isolates from Iran (IR-Khuz57, IR-Khuz6), Pakistan (PAK), and India (IND5268, INDR-71, IND369, IND671).The results indicated that the YN-21 isolate was closely related to the SCSMV isolates from the Yunnan and Hainan provinces in China.We found that the grouping of the phylogenetic tree without recombinant genes is consistent with that of the full genome, but there are small differences between the second partition and the full genome.To verify the reliability of the grouping results, we further calculated the genetic distance between and within groups (Table 4).The average genetic distances of group Ⅰ and group II were 0.03 and 0.10.The genetic distances within the group were lower than the genetic distances between groups, indicating that the results of the phylogenetic analysis are reliable.

Selection Pressure Analysis
The dN/dS value of different populations of SCSMV was less than 1, and there was one positive selection site in each of P1 (322nd codon) and P3N-PIPO (240nd codon) (Table 5).The dN/dS value of 0.112 for P3N-PIPO was the highest, and the dN/dS value of 0.0013 for 6K2 was the lowest, showing that different populations of SCSMV were under strong negative selection pressure.To verify the reliability of the grouping results, we further calculated the genetic distance between and within groups (Table 4).The average genetic distances of group I and group II were 0.03 and 0.10.The genetic distances within the group were lower than the genetic distances between groups, indicating that the results of the phylogenetic analysis are reliable.

Selection Pressure Analysis
The d N /d S value of different populations of SCSMV was less than 1, and there was one positive selection site in each of P1 (322nd codon) and P3N-PIPO (240nd codon) (Table 5).The d N /d S value of 0.112 for P3N-PIPO was the highest, and the d N /d S value of 0.0013 for 6K2 was the lowest, showing that different populations of SCSMV were under strong negative selection pressure.

Discussion
In a previous study, Xu et al. compared the sequence of PAK (GQ388116) with other representative species of the Potyviridae family and obtained the polyprotein cleavage site of SCSMV [1].In this study, we compared the complete genome sequences of 16 SCSMV isolates and found that the conserved positions of amino acids were different from the results of Xu et al.-the amino acids at the cleavage site are more conservative, which may be because the comparison may only be made in the same species-and deduced the cleavage site of P3N-PIPO.Based on the results of gene RSCU analysis, we found that 25 codons in YN-21 had RSCU values greater than 1, of which 11 codons ended in A and 9 codons ended in U, indicating that YN-21 prefers codons ending in A and U. Codon translation rates are positively correlated with its usage frequency and receptor tRNA concentration [35].Therefore, we believe that the codon bias of YN-21 exists as a result of its evolution toward higher rates of gene expression, and may also be influenced by tRNA abundance.In other words, YN-21 has evolved to continuously select codons that produce proteins rapidly, thereby increasing the growth rate of the virus.
Recombination occurs frequently in Potyviridae, which can improve the adaptive evolution of RNA virus to the host, break host resistance, and produce virus strains suitable for epidemic [36,37].Previous studies have shown that the HC-Pro gene of Indian SCSMV isolate showed recombination events [21], while the CP gene of Chinese SCSMV rarely recombined and was under negative selection pressure [20].In this study, the coding sequence of SCSMV was analyzed by recombination analysis and selective pressure analysis.The recombination sites were only found in the CI, CP, and P1 genes, and no obvious recombination sites were found in YN-21 isolates.The d N /d S value among different populations of SCSMV is less than 1, but there is one positive selection site in P1 and P3N-PIPO, indicating that Chinese SCSMV isolates seldom recombine and are under negative selection pressure.This shows that recombination is not the main driving force of SCSMV molecular evolution, but negative selection is the main force driving SCSMV evolution.
The genome contains a great deal of genetic information about biological evolution.By establishing a phylogenetic tree, we can infer the genetic relationship among species and the evolutionary process of speciation [38].Based on the phylogenetic analysis of the P1 and CP genes, SCSMV populations were grouped into two groups, and all Chinese SCSMV isolates were clustered into one group [7].The phylogenetic analysis of SCSMV CP genes showed obvious geographical variations, and the SCSMV isolates from China were clustered into one group [19].However, SCSMV isolates from India, Australia, South Africa, and the United States were distributed into 14 phylogroups, implying that the virus isolates could not be simply classified according to the geographical origin of the host species [39].Based on the phylogenetic analysis of the complete genomes of 16 SCSMV isolates, we found that the SCSMV isolates from China clustered into one group, and YN-21 was closely related to the isolates from the Yunnan and Hainan provinces of China, indicating that the population of the SCSMV isolates from China had no obvious genetic differentiation, which was consistent with the results of the complete gene sequence analysis of SCSMV by Li et al. [40].After removing the genes with recombination sites, phylogenetic analyses reveal that the grouping of SCSMV isolates into two main phylogenetic groups is consistent with both phylogenetic trees constructed based on the full genome sequences as well as those constructed based on particular partitions.Some viruses from Potyviridae can be transmitted by aphids in a non-persistent way, such as the sugarcane mosaic virus [41] and the sorghum mosaic virus [11].SCSMV can be transmitted mechanically by infiltration, friction inoculation, and knife injury, as well as through asexual reproduction materials such as stem cuttings.There is no evidence of insect transmission [3], possibly due to the lack of KITC, PTK, and DAG motifs in the HC-Pro and CP, which are necessary for the aphid transmission of SCSMV [1].Germplasm exchange between different countries and the transfer of seedlings between different regions is the only path for the long-distance transmission of SCSMV.YN-21 is closely related to isolates from Thailand and Myanmar, which may be due to the frequent germplasm exchange between Yunnan province of China, Myanmar, and Thailand in recent years and the spread of SCSMV with the spread of seedlings [42].

Figure 1 .
Figure 1.The symptoms of plants infected by SCSMV (A).Genome structure and amplification strategy of YN-21 (B).

2. 2 . 1 .
Sequence Assembly and Determination of Protease Cleavage Sites In order to obtain the complete genome sequence of YN-21, we used DNAMAN V6.0.3.99 and SeqMan program in DNASTAR V11.1 software to assemble the sequence of the fragments [22].Complete genome sequences of 15 SCSMV isolates were obtained from the NCBI (National Center for Biotechnology Information, https://www.ncbi.nlm.nih.gov/(accessed on 25 September 2022)) database, and aligned together with the YN-21 genome sequence using the DNAMAN program and the online analysis database EMBL-EBI (https://www.ebi.ac.uk/ (accessed on 31 July 2023)).According to the protease digestion site of Potyviridae described by Adams et al., we reclassified the protease digestion sites of SCSMV [13,15,23,24].

Figure 1 .
Figure 1.The symptoms of plants infected by SCSMV (A).Genome structure and amplification strategy of YN-21 (B).

2. 2 .
Sequence Analysis 2.2.1.Sequence Assembly and Determination of Protease Cleavage Sites In order to obtain the complete genome sequence of YN-21, we used DNAMAN V6.0.3.99 and SeqMan program in DNASTAR V11.1 software to assemble the sequence of the fragments [22].Complete genome sequences of 15 SCSMV isolates were obtained from the NCBI (National Center for Biotechnology Information, https://www.ncbi.nlm.nih.gov/(accessed on 25 September 2022)) database, and aligned together with the YN-21 genome sequence using the DNAMAN program and the online analysis database EMBL-EBI (https: //www.ebi.ac.uk/ (accessed on 31 July 2023)).According to the protease digestion site of Potyviridae described by Adams et al., we reclassified the protease digestion sites of SCSMV [13,15,23,24].

2. 2 . 4 .
Selection Pressure Analysis Selection pressure on each viral gene was predicted by calculating the d N /d S (nonsynonymous/synonymous) value between different genes.The codon selection pressure of the 11 genes in the 16 SCSMV isolates was determined by FEL [34] in the online analysis software Datamonkey, and the p-value was 0.1.Values of d N /d S < 1, d N /d S = 1, and d N /d S > 1 indicated negative (or purifying) selection, neutral evolution, and positive (or diversifying) selection, respectively.

Figure 3 .
Figure 3. Phylogenetic tree of two partitions of SCSMV genomes obtained after deleting recombinant genes of YN-21 (triangle symbol) and 15 SCSMV isolates.The MDMV was used as an outgroup.The bootstrap value was 1000.Phylogenetic tree including HC-Pro, P3, and 6K1 gene sequences (A).Phylogenetic tree including 6K2, VPg, NIa-Pro, and NIb gene sequences.(B).

Figure 3 .
Figure 3. Phylogenetic tree of two partitions of SCSMV genomes obtained after deleting recombinant genes of YN-21 (triangle symbol) and 15 SCSMV isolates.The MDMV was used as an outgroup.The bootstrap value was 1000.Phylogenetic tree including HC-Pro, P3, and 6K1 gene sequences (A).Phylogenetic tree including 6K2, VPg, NIa-Pro, and NIb gene sequences.(B).

Figure 4 .
Figure 4. Maximum likelihood tree calculated from the complete genome of YN-21 (triangle symbol) and 15 SCSMV isolates.The MDMV was used as an outgroup.The terminal node consists of the accession number, isolate name, collection location, and time.

Figure 4 .
Figure 4. Maximum likelihood tree calculated from the complete genome of YN-21 (triangle symbol) and 15 SCSMV isolates.The MDMV was used as an outgroup.The terminal node consists of the accession number, isolate name, collection location, and time.

Table 1 .
Genomic structure of YN-21 and protease cleavage sites determined by the comparison of YN-21 with other SCSMV isolates.Conserved amino acids in the cleavage sites are bolded.

Table 4 .
Genetic distances within and between groups.

Table 4 .
Genetic distances within and between groups.

Table A2 .
The amino acid compositions and codon usage frequency of YN-21.

Table A3 .
Nucleotide/amino acid identity rate between SCSMV-YN and 15 other isolates.