Analysis of Synonymous Codon Usage Bias in Potato Virus M and Its Adaption to Hosts

Potato virus M (PVM) is a member of the genus Carlavirus of the family Betaflexviridae and causes large economic losses of nightshade crops. Several previous studies have elucidated the population structure, evolutionary timescale and adaptive evolution of PVM. However, the synonymous codon usage pattern of PVM remains unclear. In this study, we performed comprehensive analyses of the codon usage and composition of PVM based on 152 nucleotide sequences of the coat protein (CP) gene and 125 sequences of the cysteine-rich nucleic acid binding protein (NABP) gene. We observed that the PVM CP and NABP coding sequences were GC-and AU-rich, respectively, whereas U- and G-ending codons were preferred in the PVM CP and NABP coding sequences. The lower codon usage of the PVM CP and NABP coding sequences indicated a relatively stable and conserved genomic composition. Natural selection and mutation pressure shaped the codon usage patterns of PVM, with natural selection being the most important factor. The codon adaptation index (CAI) and relative codon deoptimization index (RCDI) analysis revealed that the greatest adaption of PVM was to pepino, followed by tomato and potato. Moreover, similarity Index (SiD) analysis showed that pepino had a greater impact on PVM than tomato and potato. Our study is the first attempt to evaluate the codon usage pattern of the PVM CP and NABP genes to better understand the evolutionary changes of a carlavirus.


Introduction
Potato virus M (PVM) is a RNA virus in the Carlavirus genus, Betaflexviridae family. PVM was first isolated from potato (Solanum tuberosum) in the United States in 1923 [1], and then isolated from pepino (Solanum muricatum), tomato and tobacco [2,3]. PVM is transmitted in a non-persistent manner by aphid and causes large economic losses of nightshade crops worldwide [2][3][4][5][6][7]. PVM contains a single stranded positive-sense RNA molecule that is approximately 8.5 kb in length and is enveloped by flexuous filamentous virions [8]. PVM possesses a cap structure at the 5 end, a poly(A) tail at the 3 end, and encodes six open reading frames (ORFs) [2,8,9]. ORF1 encodes a multi-domain protein that includes methyltransferase, helicase, and polymerase domains and is involved in RNA replication [10,11]. Three putative proteins within the so-called triple gene blocks (TGBs) are encoded by overlapping ORF2-4, and are involved in membrane binding, suppression of RNA silencing and cell-to-cell movement [12][13][14]. Coat protein (CP) and cysteine-rich nucleic acid binding protein (NABP) are encoded by ORF5 and ORF6, respectively [8,15].
Generally, the degeneracy of the genetic code allows for 61 triplet codons that encode all 20 amino acids such that many of them are synonymous [16,17]. Thus, codons encoding the same amino acid are termed synonymous codons. Interestingly, synonymous codons are not randomly used in a given cellular species, and the preference for specific codons over other synonymous codons determined in Emboss explorer (http://www.bioinformatics.nl/emboss-explorer/), where the GC12s is the mean of GC1s and GC2s.

Relative Synonymous Codon Usage (RSCU) Analysis
As described by Sharp and Li (1986), the RSCU value of a codon is the ratio between the observed and expected usage frequency with the assumption that all codons for a particular amino acid are used equally [54]. An RSCU value equals 1 indicates that there is no bias for that codon, whereas codons with RSCU values of >1.6 and <0.6 are considered to be "overrepresented" and "underrepresented", respectively. The RSCU values were calculated using the following equation: where RSCU ij is the relative synonymous codon usage value of the i-th codon for the j-th amino acid, and g ij is the observed number of the i-th codon for the j-th amino acid that has an "n i " type of synonymous codon. The mean RSCU values of the PVM CP and NABP genes were calculated using MEGA 7 to determine the codon usage patterns without the effect of the amino acid composition and sequence length.

Principal Component Analysis (PCA)
PCA is a multivariate statistical method that was used to identify the correlations between variables and samples. Each strain was represented as a 59-dimensional vector to reduce the effect of the amino acid composition on codon usage and the RSCU value of each sense codon corresponds to each dimension. Whereas, codons UGG and AUG and the three termination codons were excluded from the analysis. PCA was analyzed using Origin 8.0.

Effective Number of Codons (ENC) Analysis
The ENC was used to calculate the absolute codon usage bias of the PVM CP and NABP coding sequences, regardless of the number of amino acids and gene length. The ENC values range from 20 (only one synonymous codon is used, an extreme codon usage bias) to 61 (the synonymous codons are used equally, no bias). The ENC values were calculated as follows: where F k (k = 2, 3, 4, 6) represents the mean values for F k and k represents k-fold degenerate amino acids. F k is estimated with the following formulae: where n is the total occurrences number of the codon for that amino acid, while where n i is the total number of the i-th codon for the corresponding amino acid. The ENC was calculated using CodonW v1.4.2. In general, the smaller the ENC value, the stronger the codon preference is. It is also accepted that ENC values ≤35 are indicative of genes with significant codon bias.

ENC-Plot Analysis
ENC-plot analyses (ENC values against GC3s values), which consist of plotting GC3s values in the abscissa and the ENC values in the ordinate, are used to investigate the role of mutational pressure in codon usage bias. If the only factor driving the codon usage bias is mutation pressure, these points will be on the standard curve. Otherwise, other factors such as natural selection may play a crucial role. The ENC is estimated with the following formulae: where s indicates for the value of GC3s.

Parity Rule 2 (PR2) Analysis
Parity rule 2 (PR2) plot analysis was performed to investigate the effects of mutation and natural selection on the codon usage of individual genes by exploring the relationship of the four-codon amino acid families, with A3/(A3+U3) plotted against G3/(G3+C3). The center of the plot is where A=U and G=C (PR2), indicating a balance between mutation pressure and natural selection.

Neutrality Analysis
The influence of mutation bias and natural selection on codon usage were investigated by neutral plot (GC12 values against GC3 values). The mutation pressure is represented by the slope of the regression line plotted between the GC12 and GC3 contents, where regression lines that fall near the diagonal (slope = 1.0) indicates no or weak external selection pressure. In contrast, if regression curves deviate from the diagonal, it indicates a significant influence of natural selection on codon usage bias.

Codon Adaptation Index (CAI) Analysis
The codon adaptation index (CAI) was calculated using the CAIcal SERVER (http://genomes.urv. cat/CAIcal/RCDI/). CAI is a quantitative measure that predicts the highest relative adaptation of the viruses to their potential host. CAI values range from 0 to 1, and sequences with higher CAIs are indicative of a stronger adaptability to the host.

Relative Codon Deoptimization Index (RCDI) Analysis
The relative codon deoptimization index (RCDI) values for the PVM CP and NABP coding sequences were calculated using the RCDI/eRCDI server (http://genomes.urv.cat/CAIcal/RCDI/) to determine the codon deoptimization trends. An RCDI value of 1 indicates that the virus follows the host codon usage pattern and displays a host-adapted codon usage pattern. In contrast, RCDI values higher than 1 indicates lower adaptability.

Similarity Index (SiD) Analysis
To measure the effect of the codon usage bias of the hosts on the PVM CP and NABP genes, the SiD was calculated as follows: where a i indicates the RSCU value of 59 synonymous codons of the PVM coding sequences, b i indicates the RSCU value of the identical codons of the potential host, and D(A, B) (SiD) indicates the potential impact of the entire codon usage of the host on the different clades of the PVM CP and NABP genes. The SiD values range from 0 to 1.0, with higher values indicating that the host has a dominant effect on the usage of codons.

Gravy and Aroma Statistics
The Gravy value is calculated by CodonW (v1.4.2) and indicates the effect of protein hydrophobicity on codon usage bias, ranging in value from −2 to 2. In contrast, the Aroma value measures the effect of aromatic hydrocarbon proteins on codon usage bias.

Statistical Analysis
The correlation analysis was performed to identify the relationships between the GC, GC3s, ENC, the first two principal component axes, Aroma and Gravy, which were calculated using Spearman's rank correlation analysis, with an highly significant relationship (**) of p < 0.01 and a significant relationship (*) of 0.01 < p < 0.05. All statistical analyses were performed using Origin 8.0.

Recombination and Phylogenetic Analysis
The occurrence of recombination events can influence phylogenetic and codon usage analyses at either the gene or genome levels [55,56]. Only a previous reported recombinant isolate 501 (KC129095) [41] was observed in the CP gene regions, while no obvious recombination events were detected in NABP gene. After discarding the recombinant sequences, we conducted the phylogenetic analyses using the NJ method based on above data sets. Consistent with the findings of He et al. (2019) [41], three lineages were formed based on the CP ( Figure S1) and NABP ( Figure S2) coding sequences. Compared with the results of He et al., in both the CP and NABP trees, two novel potato isolates from Bangladesh and one novel potato isolate from Yunnan province of China were clustered into GP2, while two novel tomato isolates from Slovakia were clustered into GP1.

Nucleotide Composition Analysis
The nucleotide compositions of the PVM CP and NABP coding sequences were determined to explore the potential influence of compositional constraints on codon usage. We observed that the nucleotides G and A were the most abundant in the CP coding sequences, with mean compositions of 29.86 ± 0.53% and 26.63 ± 0.54% (Table S2), respectively, compared with C (23.56 ± 0.25%) and U (19.94 ± 0.64%). In contrast, the nucleotide composition at the third position of synonymous codons (A 3S , C 3S , G 3S and U 3S ) in the CP coding sequences were significantly different from the nucleotide composition. The most frequent nucleotide was G 3S (40.25% ± 0.020), followed by U 3S (29.65% ± 0.018), A 3S (27.44% ± 0.019) and C 3S (26.61% ± 0.010). In contrast, the nucleotides U and G were most abundant in the NABP coding sequences, with mean compositions of 30.27 ± 0.74% and 27.75 ± 2.47% (Table S3), respectively, followed by A (23.83 ± 1.09%) and C (18.52 ± 1.67%). In addition, the mean U 3S (48.91% ± 0.020) and G 3S (30.33% ± 0.015) values were also higher compared with A 3S (26.13% ± 0.018) and C 3S (15.93% ± 0.017) in the NABP gene sequences (Table S3). The GC composition was higher than that of AU in the CP coding sequences, with 53.43 ± 0.58% observed compared with 46.57 ± 0.58% (Table S2), respectively, indicating that there is a GC-biased composition of PVM CP coding sequences. Additionally, the average GC contents at the first, second, and third positions (for GC12s and GC3s) of the CP coding sequences were 52.69 ± 0.45% and 54.89 ± 1.24%, respectively. However, the AU contents (54.04 ± 1.84%) were significantly higher than that of GC (45.86 ± 1.32%) in the NABP coding sequences (Table S3), indicating that the PVM NABP coding sequences were AU-rich. Furthermore, the mean GC12s and GC3s of the NABP coding sequences were 50.07 ± 1.09% and 37.44 ± 2.31%, respectively.

U-and G-Ending Codons Are Preferred in PVM CP and NABP Coding Sequences
To estimate the codon usage pattern of the PVM CP and NABP coding sequences, RSCU analysis was performed (Table 1). In the CP gene, 12 out of 18 preferred codons were G/U-ending and exhibited an equal distribution of G and U (G-ending: 6; U-ending: 6), while the remaining six were C/A-ending (C-ending: 4; A-ending: 2) ( Table 1). This result shows that G-and U-ending codons are preferred in the PVM CP coding sequences. Within these preferred codons, irrespective of the PVM host, four had an RSCU value > 1.6, with the highest being GUG (2.41), indicative of extreme over-presentation, whereas the remaining preferred codons had RSCU values >0.6 and <1.6. No optional synonymous codons were underrepresented (RSCU < 0.6) from the PVM CP gene. In addition, we also calculated the host-specific RSCU values of the CP coding sequences and observed that G/U-ending codons were more common than C/A-ending codons. In the NABP gene, among the 18 preferred codons, 14 were U/G-ending (U-ending: 10; G-ending: 4) and four were A/C-ending (A-ending: 2; G-ending: 2) ( Table 1). This result shows that U-and G-ending codons are also preferred in the PVM NABP coding sequences. Within these preferred codons, 12 had a RSCU value >1.6, while the remaining six preferred codons that had RSCU values >0.6 and <1.6. Similar to the CP gene, no optional synonymous codons were underrepresented (RSCU < 0.6) for the NABP gene. According to the hosts, we also calculated the RSCU values of the NABP coding sequences and observed that G/U-ending codons were more common than C/A-ending codons. The three exceptions were the UUA, AUU and UAU codons (coding for L, I and Y, respectively), which only exhibited preferred use in potato isolates (Table 1). Both the CP and NABP RSCU analyses suggested that the preferred codons have been mostly influenced by compositional constraints (G and U in this case). In our RSCU analysis, several codons were differentially selected in different host plants in CP and NABP genes. For example, three codons AUA, AUC, and AUU encode amino acid I, however AUA was the most frequently used codons in potato rather than AUC in tomato and pepino in CP gene, whereas AUU was the most frequently used codons in potato rather than AUA in tomato and pepino in NABP gene.

Codon Usage Bias of the PVM CP and NABP Coding Sequences
The ENC values were calculated to infer the magnitude of the PVM CP and NABP codon usage bias. In general, the smaller the ENC value, the stronger the codon preference is. It is also accepted that ENC values ≤35 are indicative of genes with significant codon bias. Individually, the highest ENC values were both observed for the CP and NABP coding sequences from potato hosts (Figure 1), whereas the lowest values for the CP and NABP coding sequences were all obtained from pepino hosts ( Figure 1). In addition, the mean ENC values for the overall CP and NABP genes were 54.91 ± 2.28 (Table S2) and 49.66 ± 4.22 (Table S3), respectively, indicating a relatively stable and conserved genomic composition with a low codon usage bias in all of the assayed PVM coding sequences. Compared with CP coding sequences, the lower ENC values for the NABP gene suggested a slightly greater codon bias than was observed for the CP gene.

Trends in Codon Usage Variations
PCA is a multivariate statistical method that was used to identify the correlations between variables and samples. In this study, PCA was performed to assess the synonymous codon usage variation in the CP and NABP coding sequences of PVM. The first four principal axes (axes 1-4) of the CP and NABP coding sequences accounted for 65.96 and 65.23% of the synonymous codon usage variation, respectively ( Figure 2). The values of the first four axes for the CP coding sequences were 30.56, 18.03, 10.35 and 7.02% (Figure 2A), while those observed for NABP were 27.97, 16.23, 13.95 and 7.08% ( Figure 2B). These values revealed that axis 1 was the major factor affecting codon usage for the CP and NABP genes. Furthermore, we explored the distribution of the CP and NABP coding sequences in different hosts based on the RSCU values on the first two axes (Figure 3). The PCA for both the CP and NABP genes showed several overlapped sites among the three different hosts, these suggesting similar codon usage trends ( Figure 3). Notably, because the analysis included only eight tomato isolates of PVM, these results require further confirmation.

Trends in Codon Usage Variations
PCA is a multivariate statistical method that was used to identify the correlations between variables and samples. In this study, PCA was performed to assess the synonymous codon usage variation in the CP and NABP coding sequences of PVM. The first four principal axes (axes 1-4) of the CP and NABP coding sequences accounted for 65.96 and 65.23% of the synonymous codon usage variation, respectively ( Figure 2). The values of the first four axes for the CP coding sequences were 30.56, 18.03, 10.35 and 7.02% (Figure 2A), while those observed for NABP were 27.97, 16.23, 13.95 and 7.08% ( Figure 2B). These values revealed that axis 1 was the major factor affecting codon usage for the CP and NABP genes. Furthermore, we explored the distribution of the CP and NABP coding sequences in different hosts based on the RSCU values on the first two axes (Figure 3). The PCA for both the CP and NABP genes showed several overlapped sites among the three different hosts, these suggesting similar codon usage trends ( Figure 3). Notably, because the analysis included only eight tomato isolates of PVM, these results require further confirmation.

Trends in Codon Usage Variations
PCA is a multivariate statistical method that was used to identify the correlations between variables and samples. In this study, PCA was performed to assess the synonymous codon usage variation in the CP and NABP coding sequences of PVM. The first four principal axes (axes 1-4) of the CP and NABP coding sequences accounted for 65.96 and 65.23% of the synonymous codon usage variation, respectively ( Figure 2). The values of the first four axes for the CP coding sequences were 30.56, 18.03, 10.35 and 7.02% (Figure 2A), while those observed for NABP were 27.97, 16.23, 13.95 and 7.08% ( Figure 2B). These values revealed that axis 1 was the major factor affecting codon usage for the CP and NABP genes. Furthermore, we explored the distribution of the CP and NABP coding sequences in different hosts based on the RSCU values on the first two axes (Figure 3). The PCA for both the CP and NABP genes showed several overlapped sites among the three different hosts, these suggesting similar codon usage trends ( Figure 3). Notably, because the analysis included only eight tomato isolates of PVM, these results require further confirmation.

ENC-Plot Analysis
We performed an ENC-plot analysis for GC3s to study the factors influencing the codon usage bias of PVM according to the CP and NABP coding sequences. If the points fall below the expected curve, the codon usage is said to be affected by selection pressure rather than mutation pressure, whereas mutational pressure is indicated when the data points fall onto the expected curve. In both the CP and NABP coding sequence plots, the PVM isolates from different hosts and groups mostly clustered together below the expected ENC curve (Figure 4), indicating that the natural selection pressure was more important than mutation pressure in the PVM isolates. In relation to hosts, stronger natural selection on synonymous codon usages were observed in pepino-hosted CP and NABP genes. However, several PVM potato isolates fell onto the expected curve for both the CP and NABP coding sequences (Figure 4), indicating that the influence of mutation pressure was not completely absent.

ENC-Plot Analysis
We performed an ENC-plot analysis for GC3s to study the factors influencing the codon usage bias of PVM according to the CP and NABP coding sequences. If the points fall below the expected curve, the codon usage is said to be affected by selection pressure rather than mutation pressure, whereas mutational pressure is indicated when the data points fall onto the expected curve. In both the CP and NABP coding sequence plots, the PVM isolates from different hosts and groups mostly clustered together below the expected ENC curve (Figure 4), indicating that the natural selection pressure was more important than mutation pressure in the PVM isolates. In relation to hosts, stronger natural selection on synonymous codon usages were observed in pepino-hosted CP and NABP genes. However, several PVM potato isolates fell onto the expected curve for both the CP and NABP coding sequences (Figure 4), indicating that the influence of mutation pressure was not completely absent.

ENC-Plot Analysis
We performed an ENC-plot analysis for GC3s to study the factors influencing the codon usage bias of PVM according to the CP and NABP coding sequences. If the points fall below the expected curve, the codon usage is said to be affected by selection pressure rather than mutation pressure, whereas mutational pressure is indicated when the data points fall onto the expected curve. In both the CP and NABP coding sequence plots, the PVM isolates from different hosts and groups mostly clustered together below the expected ENC curve (Figure 4), indicating that the natural selection pressure was more important than mutation pressure in the PVM isolates. In relation to hosts, stronger natural selection on synonymous codon usages were observed in pepino-hosted CP and NABP genes. However, several PVM potato isolates fell onto the expected curve for both the CP and NABP coding sequences (Figure 4), indicating that the influence of mutation pressure was not completely absent.

Neutrality Plot
To determine the degree of mutational pressure and natural selection on the codon usage in PVM, we performed neutrality analyses between GC12 and GC3 for all of the sequences and grouped the results by the PVM hosts for the CP and NABP coding sequences ( Figure 5). As nucleotide changes at the third position of the codon do not contribute to changes in the amino acid, these changes are indicative of a mutational pressure alone. In contrast, nucleotide changes that bring about changes in the altered amino acid lead to selection pressure. Significant positive correlations were observed between the GC12 and GC3 values (r 2 = 0.0757, P = 7.76 × 10 −4 ; r 2 = 0.2328, P = 2.75 × 10 −8 ) for the PVM CP and NABP coding sequences, respectively ( Figure 5A,C). The slopes of the linear regression were 0.1009 and 0.1871 for all CP and NABP coding sequences ( Figure 5A,C), demonstrating that mutation pressure accounted for 10.09 and 18.71% of the selection pressure, whereas natural selection accounted for 88.91 and 81.29%, respectively. With respect to the hosts, the slope of the linear regression for the PVM CP coding sequences from potato was 0.1749 ( Figure 5B) and had a significant p value (p = 0.00156), whereas the correlations between GC12 and GC3 for pepino and tomato were not significant (p > 0.05), with observed slopes of −0.0159 and 0.2037 ( Figure 5B), respectively. For the NABP coding sequences, significant correlations between GC12 and GC3 for pepino and tomato were observed, with slopes of 0.0769 and 0.1899 ( Figure 5D), respectively, whereas the correlation for potato was not significant (p > 0.05), with a slope of 0.0298 ( Figure 5D). Above all, natural selection was the dominant pressure driving the codon usage bias for the CP and NABP coding sequences of PVM.

Neutrality Plot
To determine the degree of mutational pressure and natural selection on the codon usage in PVM, we performed neutrality analyses between GC12 and GC3 for all of the sequences and grouped the results by the PVM hosts for the CP and NABP coding sequences ( Figure 5). As nucleotide changes at the third position of the codon do not contribute to changes in the amino acid, these changes are indicative of a mutational pressure alone. In contrast, nucleotide changes that bring about changes in the altered amino acid lead to selection pressure. Significant positive correlations were observed between the GC12 and GC3 values (r 2 = 0.0757, P = 7.76 × 10 −4 ; r 2 = 0.2328, P = 2.75 × 10 −8 ) for the PVM CP and NABP coding sequences, respectively ( Figure 5A,C). The slopes of the linear regression were 0.1009 and 0.1871 for all CP and NABP coding sequences ( Figure 5A,C), demonstrating that mutation pressure accounted for 10.09 and 18.71% of the selection pressure, whereas natural selection accounted for 88.91 and 81.29%, respectively. With respect to the hosts, the slope of the linear regression for the PVM CP coding sequences from potato was 0.1749 ( Figure 5B) and had a significant p value (p = 0.00156), whereas the correlations between GC12 and GC3 for pepino and tomato were not significant (p > 0.05), with observed slopes of −0.0159 and 0.2037 ( Figure  5B), respectively. For the NABP coding sequences, significant correlations between GC12 and GC3 for pepino and tomato were observed, with slopes of 0.0769 and 0.1899 ( Figure 5D), respectively, whereas the correlation for potato was not significant (p > 0.05), with a slope of 0.0298 ( Figure 5D). Above all, natural selection was the dominant pressure driving the codon usage bias for the CP and NABP coding sequences of PVM.

Parity Analysis
We performed a PR2 bias plot analysis to determine whether the biased codon selection was restricted to highly biased genes. There is no bias in the selection or mutation pressure when the plot lie on the center, where both coordinates are 0.5 [19]. The results showed that U and G are more frequently used than A and C in the PVM CP and NABP coding sequences ( Figure 6A,B), demonstrating that the codon usage pattern of PVM is also shaped by mutation pressure and other factors, including natural selection.

Parity Analysis
We performed a PR2 bias plot analysis to determine whether the biased codon selection was restricted to highly biased genes. There is no bias in the selection or mutation pressure when the plot lie on the center, where both coordinates are 0.5 [19]. The results showed that U and G are more frequently used than A and C in the PVM CP and NABP coding sequences ( Figure 6A,B), demonstrating that the codon usage pattern of PVM is also shaped by mutation pressure and other factors, including natural selection.

Natural Selection is a Major Player in Shaping PVM Codon Usage Patterns
To investigate the influence of natural selection on PVM codon usage patterns, we performed linear regression analysis between the general average hydropathicity (GRAVY) and aromaticity (ARO) values and the values for ENC, GC3S, GC, and the first two principle axes. Our correlation analysis based on the CP coding sequences indicated that GRAVY is significantly positively correlated with ENC but negatively correlated with axis 2. In addition, ARO showed a significant positive correlation with ENC and axis 1 but was significantly negatively correlated with GC3s, GC, and axis 2 ( Table 2). For the NABP coding sequences, our correlation analysis indicated that GRAVY is significant negatively correlated with ENC, GC3s, GC, and axis 1, whereas ARO showed a significant positive correlation with GC3s axes 1 and 2 ( Table 2). These results indicated that the general average hydropathicity and aromaticity are linked to the codon usage variation in PVM, indicating the influence of natural selection on the codon usage pattern.  ns non-significant (p > 0.05); * represents 0.01 < p < 0.05; ** represents p < 0.01.

Codon Usage Adaptation in PVM
To assess the codon usage optimization and adaptation of PVM to its hosts, an analysis of codon adaptation index values was performed. Particular hosts are considered to be more suitable for isolates with higher CAI values than lower values. The mean CAI values of the CP coding sequences were 0.652, 0.628, and 0.617 for the pepino, tomato, and potato hosts, respectively, while those of the NABP coding sequences were 0.721, 0.707 and 0.680 for the pepino, tomato, and potato hosts, respectively ( Figure 7A). These values indicated that PVM host adaptation was greatest for pepino and the lowest for potato. Furthermore, we performed relative codon deoptimization index (RCDI)

Natural Selection is a Major Player in Shaping PVM Codon Usage Patterns
To investigate the influence of natural selection on PVM codon usage patterns, we performed linear regression analysis between the general average hydropathicity (GRAVY) and aromaticity (ARO) values and the values for ENC, GC3S, GC, and the first two principle axes. Our correlation analysis based on the CP coding sequences indicated that GRAVY is significantly positively correlated with ENC but negatively correlated with axis 2. In addition, ARO showed a significant positive correlation with ENC and axis 1 but was significantly negatively correlated with GC3s, GC, and axis 2 ( Table 2). For the NABP coding sequences, our correlation analysis indicated that GRAVY is significant negatively correlated with ENC, GC3s, GC, and axis 1, whereas ARO showed a significant positive correlation with GC3s axes 1 and 2 ( Table 2). These results indicated that the general average hydropathicity and aromaticity are linked to the codon usage variation in PVM, indicating the influence of natural selection on the codon usage pattern. ns non-significant (p > 0.05); * represents 0.01 < p < 0.05; ** represents p < 0.01.

Codon Usage Adaptation in PVM
To assess the codon usage optimization and adaptation of PVM to its hosts, an analysis of codon adaptation index values was performed. Particular hosts are considered to be more suitable for isolates with higher CAI values than lower values. The mean CAI values of the CP coding sequences were 0.652, 0.628, and 0.617 for the pepino, tomato, and potato hosts, respectively, while those of the NABP coding sequences were 0.721, 0.707 and 0.680 for the pepino, tomato, and potato hosts, respectively ( Figure 7A). These values indicated that PVM host adaptation was greatest for pepino and the lowest for potato. Furthermore, we performed relative codon deoptimization index (RCDI) analysis to assess the cumulative effects of codon biases on gene expression. The mean RCDI values were highest for potato, followed by tomato and pepino ( Figure 7B), indicating that codon usage deoptimization was the highest for the potato. Additionally, SiD analysis was also performed to understand how the PVM codon usage pattern is affected by the three hosts' codon usage patterns (Figure 8). We observed that the SiD value of pepino was higher than that of tomato and potato in both the CP and NABP coding sequences, indicating that pepino had a greater impact on the virus than tomato and potato.
Viruses 2019, 11, x FOR PEER REVIEW 12 of 17 analysis to assess the cumulative effects of codon biases on gene expression. The mean RCDI values were highest for potato, followed by tomato and pepino ( Figure 7B), indicating that codon usage deoptimization was the highest for the potato. Additionally, SiD analysis was also performed to understand how the PVM codon usage pattern is affected by the three hosts' codon usage patterns ( Figure 8). We observed that the SiD value of pepino was higher than that of tomato and potato in both the CP and NABP coding sequences, indicating that pepino had a greater impact on the virus than tomato and potato.   analysis to assess the cumulative effects of codon biases on gene expression. The mean RCDI values were highest for potato, followed by tomato and pepino ( Figure 7B), indicating that codon usage deoptimization was the highest for the potato. Additionally, SiD analysis was also performed to understand how the PVM codon usage pattern is affected by the three hosts' codon usage patterns ( Figure 8). We observed that the SiD value of pepino was higher than that of tomato and potato in both the CP and NABP coding sequences, indicating that pepino had a greater impact on the virus than tomato and potato.

Discussion
The codon usage pattern of viruses and hosts reflects the occurrence of evolutionary changes, such as evasion from the host's immune system, survival, adaption [38,39,[57][58][59]. Codon usage plays a significant role in the evolution of animal and human viruses, whereas our understanding of that in the evolution of plant viruses is limited. Until now, the codon usage pattern of citrus tristeza virus (CTV) [60], papaya ringspot virus (PRSV) [61], rice stripe virus (RSV) [62] and begomoviruses [63] had been reported. Low codon usage bias and higher genomic stability were observed from CTV, PRSV and RSV [60][61][62]. PVM, a carlavirus belonging to the family of Betaflexviridae, is an economically important pathogen of potato and pepino worldwide [3][4][5][6]. In the present study, comprehensive analyses of codon usage of PVM based on the CP and NABP coding sequences were performed. We observed that (1) the PVM CP coding sequences were GC-rich, while the NABP coding sequences were AU-rich; (2) U-and G-ending codons were preferred in the PVM CP and NABP coding sequences; (3) a relatively stable and conserved genomic composition with a low codon usage bias was observed in the PVM CP and NABP coding sequences; (4) natural selection and mutation pressure shaped the codon usage patterns of the PVM CP and NABP gene, with natural selection being the most important factor; (5) both CAI and RCDI analyses revealed that the greatest adaption of PVM was to pepino, followed by tomato and potato; and (6) pepino had a greater impact on PVM than tomato and potato. Jenkins and Holmes (2003) reported that the overall nucleotide composition can influence codon usage bias [64]. Our nucleotide composition results showed that the PVM CP and NABP coding sequences were GC and AU-rich, respectively. In general, a GC-or AU-rich composition tends to correlate with the RSCU patterns for several organisms, including viruses [39,57,59]. For example, AU-rich genomes tend to contain codons ending with A and U, whereas GC-rich genomes tend to contain codons ending with G and C. This trend supports the influence of mutation pressure. However, we observed that U-and G-ending codons are preferred in the PVM CP and NABP coding sequences despite a higher percentage of GC versus AU, similar to the RSCU patterns observed for the Zika virus [57]. Nucleotide composition and RSCU analyses showed that selection of the preferred codons in PVM was primarily influenced by compositional constraints (G and U in this case), which accounts for the presence of mutation pressure.
We observed that the ENC values of the PVM CP and NABP coding sequences were higher than 35 (Tables S2 and S3), indicative of a low degree of preference. Thus, we considered that the lower codon usage pattern in PVM could aid in facilitating infectivity in multiple hosts. Additionally, the strains isolated from pepino had a lower codon usage bias compared to tomato and potato. Moreover, this CAI analysis further confirmed that PVM is more adapted to pepino than tomato or potato.
Normally, the balance of mutation pressure and natural selection plays a significant role in the codon usage in eukaryotes and prokaryotes [35,39,65,66]. The results of the ENC-plot, neutrality plot and PR2 analyses clearly showed that PVM is influenced by natural selection and mutation pressure to variable degrees. Additionally, ENC-plot, neutrality plot and linear regression analysis between GRAVY and ARO values and those of the ENC, GC3S, GC, and first two principle axes demonstrated that natural selection is the major factor that has contributed to shaping codon usage of PVM.
The emergence, dynamics, and evolution of infectious diseases can be influenced by host-parasite interactions [67][68][69][70]. PVM was isolated from pepino and tomato early in this decade but was firstly isolated from potato in the early 1920s [1][2][3]. Our previous results showed that the diversification of PVM in potato is more diverse than pepino and tomato [41], and PVM pepino isolates appear to be experiencing a new expansion [3]. In the present study, the results of our CAI analysis also showed that the PVM CP and NABP genes were strongly adapted to pepino. Moreover, RCDI analysis also supported that highest codon usage deoptimization occurred for isolates for potato, followed by tomato and pepino. These results were consistent with those of the CAI analysis, because a low RCDI may indicate strong adaptation to a host [71]. Furthermore, the SiD value for isolates from pepino was higher than those observed for tomato and potato, both in the CP and NABP coding sequences, indicating that the selection pressure of pepino on PVM was greater than that of tomato and potato, in agreement with the neutrality and ENC-plot analysis results. Therefore, a strong link between PVM and pepino was observed in this study, although potato has always been suggested to be the primary PVM host [1].
In summary, in this study, the codon usage patterns of the PVM CP and NABP genes were studied for the first time to better understand the evolutionary changes of a carlavirus. Results from this study promote a better understanding of the evolutionary changes of PVM, which could assist in the prevention and control of this virus.
Supplementary Materials: The following are available online at http://www.mdpi.com/1999-4915/11/8/752/s1, Figure S1: The neighbor-joining (NJ) trees calculated from the CP gene sequences of potato virus M obtained in this study. The numbers at each node indicate the percentage of bootstrap samples in the NJ trees. Horizontal branch length is drawn to scale, with the bar indicating 0.05 nt replacements per site. Figure S2: The neighbor-joining (NJ) trees calculated from the NABP gene sequences of potato virus M obtained in this study. The numbers at each node indicate the percentage of bootstrap samples in the NJ trees. Horizontal branch length is drawn to scale, with the bar indicating 0.02 nt replacements per site. Table S1: The PVM isolates using in this study.

Conflicts of Interest:
The authors declare no conflicts of interest.