Comprehensive Analysis of Codon Usage on Porcine Astrovirus

Porcine astrovirus (PAstV), associated with mild diarrhea and neurological disease, is transmitted in pig farms worldwide. The purpose of this study is to elucidate the main factors affecting codon usage to PAstVs. Phylogenetic analysis showed that the subtype PAstV-5 sat at the bottom of phylogenetic tree, followed by PAstV-3, PAstV-1, PAstV-2, and PAstV-4, indicating that the five existing subtypes (PAstV1-PAstV5) may be formed by multiple differentiations of PAstV ancestors. A codon usage bias was found in the PAstVs-2,3,4,5 from the analyses of effective number of codons (ENC) and relative synonymous codon usage (RSCU). Nucleotides A/U are more frequently used than nucleotides C/G in the genome CDSs of the PAstVs-3,4,5. Codon usage patterns of PAstV-5 are dominated by mutation pressure and natural selection, while natural selection is the main evolutionary force that affects the codon usage pattern of PAstVs-2,3,4. The analyses of codon adaptation index (CAI), relative codon deoptimization index (RCDI), and similarity index (SiD) showed the codon usage similarities between the PAstV and animals might contribute to the broad host range and the cross-species transmission of astrovirus. Our results provide insight into understanding the PAstV evolution and codon usage patterns.


Introduction
Porcine astroviruses (PAstVs), comprising five distinct lineages (PAstV1-PAstV5), are highly prevalent in both diarrheic and clinically healthy pigs [1][2][3]. PAstV-3 is found in tissues from the central nervous system of piglets and sows with encephalomyelitis and neural necrosis [4,5]. Infection of piglets with PAstV-1 could cause mild diarrhea, growth retardation, and damage to the villi of the small intestinal mucosa [6]. PAstV-4 was detected in the nasal swabs [7] and the feces of pigs [8,9]. Co-infection of individual pigs with several lineages of PAstVs has also been observed [9]. Both PAstV-2 and PAstV-5 have been identified in the brains of newborn piglets with congenital tremors [10]. PAstV-2 and PAstV4 were simultaneously detected in the blood samples of apparently healthy domestic pigs, while the coexistence of PAstV-2, PAstV-4, and PAstV-5 has been observed in porcine fecal samples collected from the same farms [9,11].

Effective Number of Codons (ENC)
The effective number of codons is designed to quantify how far the codon usage of a gene departs from equal usage of synonymous codons, regardless of the gene lengths and the number of amino acids [26]. The value of ENC ranges from 20 (if only one synonymous codon is exclusively used for the corresponding amino acid) to 61 (if all of the synonymous codons are used with no preference) [26,27]. The smaller the ENC value of a gene is, the stronger the extent of codon preference of this gene. ENC values were calculated using the following Equation [26]: where F i (i = 2, 3, 4, 6) represents the mean of F i values for the i-fold degenerate amino acids. F i can be calculated using the equation below [26]: where n represents the total number of observed codons for that amino acid; and n j represents the total observed number of the jth codon for that amino acid. The ENC values for PAstVs CDSs were calculated using the cordon package (version 1.4.0) [28] of R (version 3.6.2) [24].

Relative Synonymous Codon Usage (RSCU)
To measure the non-uniform usage of synonymous codons in a coding sequence, RSCU is defined as the ratio of observed to expected codon frequency under equal codon usage without being affected by the amino acid compositions or the CDS sizes of different gene samples [16]. Synonymous codons with RSCU values <1.0, =1.0, and >1.0 represent negative codon usage bias, no bias, and positive codon usage bias, respectively. Furthermore, synonymous codons with RSCU values >1.6 and <0.6 were regarded as "overrepresented" and "underrepresented" codons, respectively [29]. The RSCU was calculated as: where g ij represents the observed number of the ith codon for the jth amino acid, which is encoded by n i synonymous codons [30]. The RSCU index was calculated for each sequence using the seqinr package (version 3.6-1) [23] of R (version 3.6.2) [24].

Principal Component Analysis (PCA)
Principal component analysis (PCA) is a multivariate statistical method that reduces data dimensionality by performing a covariance analysis between factors [31]. To investigate the dominant patterns and variations in the codon usage of PAstVs CDSs, we performed a PCA with the RSCU values of the PAstVs genome. For transforming the RSCU values into uncorrelated variables, the RSCU Viruses 2020, 12, 991 4 of 19 value of each PAstV sequence was distributed into a 59-dimensional vector corresponding to the 59 synonymous codons by excluding AUG, UGG, and three terminal codons. A matrix comprising 59 RSCU values of each sequence was built for the PCA and transformed into several major axes. PCA was performed on the obtained RSCU dataset by using the factoextra package (version 1.0.6) [32] of R (version 3.6.2) [24].

ENC-Plot Analysis
In the ENC-plot analysis, the projection of ENC-values versus GC3s is commonly used to explore factors influencing the codon usage patterns, e.g., selection [26]. In an ENC plot, the observed and expected ENC values are compared to determine the influence of structuring synonymous codon usage bias. The expected ENC values for all of the GC3 compositions, ranging from 0 to 1, were calculated using the following equation: ENC expected = 2 + s + 29 where s is the frequency of G + C at the third codon position of synonymous codons. An expected curve was generated using the expected ENC values. In the ENC-GC3s plot, if observed ENC values fell on the curve of expected ENC values, it meant that mutation was the main force acting on third-position bases of codons, whereas if observed ENC values fell considerably below the expected curve, it meant that selection was the main force driving codon usage bias.

Neutrality Plot Analysis
A neutrality plot was used to identify the effects of natural selection and mutation pressure on the codon usage patterns [33]. The obtained GC3 and GC12 values (means of GC1 and GC2) of the synonymous codons were plotted on the horizontal and vertical axes, respectively, to produce a scatter diagram for the neutrality plot. The regression line was plotted between the GC3-variable and the GC12-variable. The slope (regression coefficient) of the regression line is regarded as the mutation-selection equilibrium coefficient [33]. If all of the points are distributed along the diagonal (slope = 1) and the correlation between GC3-variable and GC12-variable is statistically significant, this indicates that mutation is the main force shaping the codon usage. Alternatively, if the regression curve is parallel or tilted toward the horizontal axis (close to zero slope), selection is considered as the dominant factor. The regression analysis, which estimates the linear relationship between GC3-variable and GC12-variable, was performed by using R (version 3.6.2) [24].

Parity Rule 2 (PR2) Analysis
Parity rule 2 (PR2) plot analysis is another method used to investigate the influence of mutation and selection on codon usage. In the PR2 plot, AT bias [A3/(A3 + T3)] and GC bias [G3/(G3 + C3)] were chosen as the ordinate and abscissa, respectively. The center of the plot, i.e., A = U and G = C (PR2), defined as coordinates of the origin (0.5, 0.5), indicates no bias between the influences of mutation pressure and natural selection [34,35].

Codon Adaptation Index (CAI) Analysis
Codon adaptation index (CAI) is a quantitative measure for assessing the codon usage similarities between viral genes and their hosts [36]. The values of CAI range from 0 to 1. The virus sequences with higher CAI values are considered to be preferred over those with lower CAI values. The CAI analysis of the PAstV coding sequences was performed with CAIcal [36]. The reference datasets of synonymous codon usage patterns of chicken (Gallus gallus), duck (Anas platyrhynchos platyrhynchos), human (Homo sapiens), dog (Canis lupus familiaris), horse (Equus caballus), mouse (Mus musculus), pig (Sus scrofa),  Tables (CoCoPUTs) database updated in January 2020 [37].

Relative Codon Deoptimization Index (RCDI)
RCDI is used to estimate the codon usage deoptimization trend of a virus to its host [38]. In a virus, the RCDI value of 1 indicates the virus has complete host-adapted codon usage pattern, while a value of RCDI higher than 1 indicates low adaptability to a host [39]. RCDI values of PAstV sequences were calculated using CAIcal [36].

Similarity Index (SiD) Analysis
To measure the influence of the codon usage patterns of the host on codon usage bias of PAstVs CDSs, a SiD analysis was performed. SiD is calculated using the following equation: where R(A,B) is defined as the cosine of the angle included between the A and B spatial vectors; a i means the RSCU value of 59 synonymous codons of the PAstV coding sequence; b i is the RSCU value of the same codon in the host; and D(A,B) represents the potential effect of the overall codon usage of the host on that of PAstV [40]. A high value of SiD indicates that the host has dominant effects on the codon usages of the virus.

Statistical Analysis
A non-parametric Kruskal-Wallis test was separately used to determine any significant differences between the values of CAI, RCDI and SiD of the four PAstV subtypes. The p-values for Dunn's multiple comparisons were adjusted with the Benjamini-Hochberg method. The level of significance was set at p < 0.05. The statistical analysis was performed with the package dunn.test (version 1.3.5) [41] of R (version 3.6.2) [24].

PAstV-5 Subtype Has the Basal Position of Phylogenetic Tree
To explore the phylogeny among the five subtypes of PAstV strains, we constructed the phylogenetic trees using ML and BI methods. Results showed the topologies of the BI and ML trees were identical ( Figure 1). The subtype PAstV-5 was a basal clade in the BI and ML trees, followed by PAstV-3, PAstV-1, PAstV-2, and PAstV-4, suggesting the ancestors of PAstV may have undergone multiple differentiations before forming the five existing subtypes.

RSCU Patterns of PAstV
The codons of UGU[Cys] and GAU[Asp] were preferably used by all four PAstV subtypes. The preferred codon usage profiles of PAstV-3 and PAstV-5 were very similar: 16 out of 18 preferred codons were commonly used, with exceptions for the preferred codons of isoleucine and glutamine ( Table 2) showed that the first two principal axes accounted for 41.9% and 24.6% of the total variation of RSCUs ( Figure S1). The points representing the four subgroups of PAstV genomes were mapped and clustered in clearly separate regions, although a small degree of overlap existed between PAstV-3 and PAstV-5 ( Figure 2). Taken together, the RSCU analyses revealed the RSCU patterns of four PAstV genotypes, and compositional constraints of third position nucleotides in codons (G/C-ending codons of PAstV-2 versus A/T-ending codons of PAstV-3, PAstV-4 and PAstV-5) had the most influence on the selection of the preferred codons. The trend of the 59 synonymous codon usages indicated that the evolution of the four genotypes of PAstVs might be influenced.

Dinucleotide Frequency Abundancy Influences the Codon Usage Bias of PAstV
We performed a dinucleotide analysis on the four subtypes of PAstVs to understand the possible effect of dinucleotide frequencies on the codon usage. Dinucleotides UpG were overrepresented (Pxy ≥ 1.23), whereas dinucleotides CpG and UpA were underrepresented (Pxy ≤ 0.78) in the genome CDSs of the four subtypes of PAstVs (Table 3 and Figure 3). Additionally, dinucleotide CpA was overrepresented (Pxy ≥ 1.23) in the genome CDSs of PAstV-2 and PAstV-4, and dinucleotide CpU was over-represented in the genome CDSs of PAstV-5. These results showed that significant biases of the dinucleotide content variation were observed in the four subtypes of PAstVs.
In order to determine the effect of dinucleotide usage on codon usage bias, we compared the over-representative and under-representative dinucleotides with preferred and under-representative codons. Among eight CpG-containing codons, five codons (GCG, CCG, CGA, UCG, and ACG) were under-represented (RSCU value < 0.6), indicating that dinucleotide CpG were inhibited. Furthermore, the RSCU values of all six UpA-containing codons (AUA, CUA, UUA, GUA, UAC, and UAU) were <1.6, suggesting that dinucleotide UpA were inhibited. Among all five UpG dinucleotides-containing codons, UGU codon was found to be a preferred codon, and RSCU values of all UpG dinucleotides-containing codons were >0.6. For all eight CpA dinucleotides-containing codons, five (CAC, CCA, CAG, UCA, and ACA) and six (GCA, CAU, CCA, CAA, UCA, and ACA) codons were preferentially used synonymous codons in PAstV-2 and PAstV-4, respectively. Of the eight CpU dinucleotides-containing codons, five codons (GCU, CUU, CCU, UCU, and ACU) were preferentially used in PAstV-2. These results indicated that dinucleotide abundance influences the codon usage bias of the four subtypes of PAstVs.

Dinucleotide Frequency Abundancy Influences the Codon Usage Bias of PAstV
We performed a dinucleotide analysis on the four subtypes of PAstVs to understand the possible effect of dinucleotide frequencies on the codon usage. Dinucleotides UpG were overrepresented (P xy ≥ 1.23), whereas dinucleotides CpG and UpA were underrepresented (P xy ≤ 0.78) in the genome CDSs of the four subtypes of PAstVs (Table 3 and Figure 3). Additionally, dinucleotide CpA was overrepresented (P xy ≥ 1.23) in the genome CDSs of PAstV-2 and PAstV-4, and dinucleotide CpU was over-represented in the genome CDSs of PAstV-5. These results showed that significant biases of the dinucleotide content variation were observed in the four subtypes of PAstVs.
In order to determine the effect of dinucleotide usage on codon usage bias, we compared the over-representative and under-representative dinucleotides with preferred and under-representative codons. Among eight CpG-containing codons, five codons (GCG, CCG, CGA, UCG, and ACG) were under-represented (RSCU value < 0.6), indicating that dinucleotide CpG were inhibited. Furthermore, the RSCU values of all six UpA-containing codons (AUA, CUA, UUA, GUA, UAC, and UAU) were <1.6, suggesting that dinucleotide UpA were inhibited. Among all five UpG dinucleotides-containing codons, UGU codon was found to be a preferred codon, and RSCU values of all UpG dinucleotides-containing codons were >0.6. For all eight CpA dinucleotides-containing codons, five (CAC, CCA, CAG, UCA, and ACA) and six (GCA, CAU, CCA, CAA, UCA, and ACA) codons were preferentially used synonymous codons in PAstV-2 and PAstV-4, respectively. Of the eight CpU dinucleotides-containing codons, five codons (GCU, CUU, CCU, UCU, and ACU) were preferentially used in PAstV-2. These results indicated that dinucleotide abundance influences the codon usage bias of the four subtypes of PAstVs.

Identification of the Forces Influencing Codon Usage Patterns
To evaluate the forces shaping the codon usage patterns in the four genotypes of PAstVs, PR2 bias, ENC plots, and neutrality analyses were carried out. In the PR2 bias analysis, significant deviations from the parity rules were observed (A ≠ U, C ≠ G) (Figure 4), indicating that the extent of the evolutionary forces shaping the codon usage patterns of the four subtypes of PAstVs were not

Identification of the Forces Influencing Codon Usage Patterns
To evaluate the forces shaping the codon usage patterns in the four genotypes of PAstVs, PR2 bias, ENC plots, and neutrality analyses were carried out. In the PR2 bias analysis, significant deviations from the parity rules were observed (A U, C G) (Figure 4), indicating that the extent of the evolutionary forces shaping the codon usage patterns of the four subtypes of PAstVs were not unique.
In the ENC plot, all of the ENC values of PAstV strains fell below but were close to the expected ENC curve ( Figure 5). Additionally, sequences of PAstV-2 and PAstV-4 were clustered separately, whereas sequences of PAstV-3 and PAstV-5 were clustered together in the ENC plots. These results indicate that mutation pressure and natural selection led to the codon usage bias of the four genotypes of PAstVs.
The neutrality analysis between the GC3s and GC12s values was employed to determine the extent of the two evolutionary forces on the codon usage pattern of PAstV strains. A significant correlation between GC3s and GC12s was observed in the PAstV-5 strains (y = 0.4089x + 0.3156; R 2 = 0.657; p < 0.01) ( Figure 6). Thus, the percentage of constraints of natural selection was 59.11% for the PAstV-5 strains. No significant correlation between GC3s and GC12s was observed in the genomes of PAstV-2 (p = 0.2639; R 2 = 0.062), PAstV-3 (p = 0.679; R 2 = 0.0161), or PAstV-4 (p = 0.158; R 2 = 0.0925) strains. Therefore, natural selection plays a dominant role in driving codon usage bias for these three subtypes. Overall, the above results indicate that the effect of directional mutation pressure is present in the codon usage of PAstV-5, but natural selection dominates the evolution of codon usage of the four subtypes of PAstVs.
Viruses 2020, 12, x FOR PEER REVIEW 11 of 19 unique. In the ENC plot, all of the ENC values of PAstV strains fell below but were close to the expected ENC curve ( Figure 5). Additionally, sequences of PAstV-2 and PAstV-4 were clustered separately, whereas sequences of PAstV-3 and PAstV-5 were clustered together in the ENC plots.
These results indicate that mutation pressure and natural selection led to the codon usage bias of the four genotypes of PAstVs. The neutrality analysis between the GC3s and GC12s values was employed to determine the extent of the two evolutionary forces on the codon usage pattern of PAstV strains. A significant correlation between GC3s and GC12s was observed in the PAstV-5 strains (y = 0.4089x + 0.3156; R 2 = 0.657; p < 0.01) ( Figure 6). Thus, the percentage of constraints of natural selection was 59.11% for the PAstV-5 strains. No significant correlation between GC3s and GC12s was observed in the genomes of PAstV-2 (p = 0.2639; R 2 = 0.062), PAstV-3 (p = 0.679; R 2 = 0.0161), or PAstV-4 (p = 0.158; R 2 = 0.0925) strains. Therefore, natural selection plays a dominant role in driving codon usage bias for these three subtypes. Overall, the above results indicate that the effect of directional mutation pressure is present in the codon usage of PAstV-5, but natural selection dominates the evolution of codon usage of the four subtypes of PAstVs.   unique. In the ENC plot, all of the ENC values of PAstV strains fell below but were close to the expected ENC curve ( Figure 5). Additionally, sequences of PAstV-2 and PAstV-4 were clustered separately, whereas sequences of PAstV-3 and PAstV-5 were clustered together in the ENC plots.
These results indicate that mutation pressure and natural selection led to the codon usage bias of the four genotypes of PAstVs. The neutrality analysis between the GC3s and GC12s values was employed to determine the extent of the two evolutionary forces on the codon usage pattern of PAstV strains. A significant correlation between GC3s and GC12s was observed in the PAstV-5 strains (y = 0.4089x + 0.3156; R 2 = 0.657; p < 0.01) ( Figure 6). Thus, the percentage of constraints of natural selection was 59.11% for the PAstV-5 strains. No significant correlation between GC3s and GC12s was observed in the genomes of PAstV-2 (p = 0.2639; R 2 = 0.062), PAstV-3 (p = 0.679; R 2 = 0.0161), or PAstV-4 (p = 0.158; R 2 = 0.0925) strains. Therefore, natural selection plays a dominant role in driving codon usage bias for these three subtypes. Overall, the above results indicate that the effect of directional mutation pressure is present in the codon usage of PAstV-5, but natural selection dominates the evolution of codon usage of the four subtypes of PAstVs.   sequences. The curve represents the expected ENC values for all GC3 compositions. PAstV-2, PAstV-3, PAstV-4, and PAstV-5 strains are represented in orange, green, blue, and purple, respectively.

PAstV Strains Adaptation to Host Species
The analyses of CAI, RCDI, and SiD values were employed to evaluate the codon usage similarities between the PAstV strains and potential host species. The results based on CAI values show that PAstV presented the highest CAI value to ducks, followed by chickens, humans, dogs, horses, mice, pigs, cats, cattle, and sheep, while it was comparatively unsuitable for growth in the goat ( Figure S2). PAstV-2 displayed the significant higher CAI values to pig compared with the other three subtypes of PAstVs (Figure 7). Comparable RCDI analysis showed that the mean RCDI values of PAstV-3, PAstV-4, PAstV-5 were significantly higher than PAstV-2 ( Figure 8), suggesting the codon deoptimization of PAstV-2 is less than PAstV-3, PAstV-4, and PAstV-5. PAstV-3 and PAstV-4 were significantly higher than the PAstV-2 and PAstV-5 in SiDs (Figure 9), indicating that the pigs had a significantly deeper effect on PAstV-3 and PAstV-4 than PAstV-2 and PAstV-5.

PAstV Strains Adaptation to Host Species
The analyses of CAI, RCDI, and SiD values were employed to evaluate the codon usage similarities between the PAstV strains and potential host species. The results based on CAI values show that PAstV presented the highest CAI value to ducks, followed by chickens, humans, dogs, horses, mice, pigs, cats, cattle, and sheep, while it was comparatively unsuitable for growth in the goat ( Figure S2). PAstV-2 displayed the significant higher CAI values to pig compared with the other three subtypes of PAstVs (Figure 7). Comparable RCDI analysis showed that the mean RCDI values of PAstV-3, PAstV-4, PAstV-5 were significantly higher than PAstV-2 ( Figure 8), suggesting the codon deoptimization of PAstV-2 is less than PAstV-3, PAstV-4, and PAstV-5. PAstV-3 and PAstV-4 were significantly higher than the PAstV-2 and PAstV-5 in SiDs (Figure 9), indicating that the pigs had a significantly deeper effect on PAstV-3 and PAstV-4 than PAstV-2 and PAstV-5. Viruses 2020, 12, x FOR PEER REVIEW 13 of 19

Discussion
In this study, we analyzed the phylogenetic relationship of PAstV. Phylogenetic analysis demonstrated that the PAstV-5 occupied the basal position in the phylogenetic tree, indicating the multiple differentiations of PAstV. Given that the phylogenetic differentiation in porcine astrovirus might imply its evolutionary history, the identification of the phylogeny of porcine astrovirus provides valuable insight into the origin and evolution of porcine astrovirus.
To adapt to changes in the host and the environment, RNA viruses evolve by altering the composition of their genomes [42]. As an important indicator of viral evolution, codon usage preference is affected by many factors, including natural selection, mutation pressure, composition of the genomes or genome regions, and gene length [43]. To dissect evolutionary forces of codon usage bias, a total of 67 complete coding sequences of PAstV genomes were used to perform a comprehensive analysis of codon usage among PAstV-2, PAstV-3, PAstV-4, and PAstV-5.
The genotype-specific preferences of the four subtypes of PAstVs were observed in the third nucleotide position of the codons. More specifically, PAstV-2 tends to use the G/C ending codons, whereas PAstV-3, PAstV-4, and PAst-5 prefer A/U ending codons. Codon usage bias, largely determined by the nucleotide at the third position of the codon, allows a different perspective on the evolution of the virus [29]. Differences in the nucleotide usage of genome CDSs demonstrate that nucleotide composition indeed affects codon usage bias of the four subtypes of PAstVs.
The effective number of codons (ENC) was calculated to identify bias in the use of synonymous codons. High ENC values (>40) have been identified in many animal viruses, such as porcine circovirus 3 [44], porcine deltacoronavirus [45], and rabies virus [46]. In our study, the mean ENC values of PAstV-2, PAstV-3, PAstV-4, and PAstV-5 were 56.265 ± 0.602, 53.059 ± 0.656, 52.007 ± 0.678, and 53.647 ± 0.316, respectively, demonstrating that a low codon preference was present in the four subtypes of PAstVs. As suggested by previous reports [47,48], the four subtypes of PAstVs with low codon bias may have a selective advantage for their efficient replication in pigs.
In order to understand the codon usage patterns, RSCU values of 59 synonymous codons were estimated. The results of the RSCU analysis revealed that A/U-ended codons were preferentially used

Discussion
In this study, we analyzed the phylogenetic relationship of PAstV. Phylogenetic analysis demonstrated that the PAstV-5 occupied the basal position in the phylogenetic tree, indicating the multiple differentiations of PAstV. Given that the phylogenetic differentiation in porcine astrovirus might imply its evolutionary history, the identification of the phylogeny of porcine astrovirus provides valuable insight into the origin and evolution of porcine astrovirus.
To adapt to changes in the host and the environment, RNA viruses evolve by altering the composition of their genomes [42]. As an important indicator of viral evolution, codon usage preference is affected by many factors, including natural selection, mutation pressure, composition of the genomes or genome regions, and gene length [43]. To dissect evolutionary forces of codon usage bias, a total of 67 complete coding sequences of PAstV genomes were used to perform a comprehensive analysis of codon usage among PAstV-2, PAstV-3, PAstV-4, and PAstV-5.
The genotype-specific preferences of the four subtypes of PAstVs were observed in the third nucleotide position of the codons. More specifically, PAstV-2 tends to use the G/C ending codons, whereas PAstV-3, PAstV-4, and PAst-5 prefer A/U ending codons. Codon usage bias, largely determined by the nucleotide at the third position of the codon, allows a different perspective on the evolution of the virus [29]. Differences in the nucleotide usage of genome CDSs demonstrate that nucleotide composition indeed affects codon usage bias of the four subtypes of PAstVs.
The effective number of codons (ENC) was calculated to identify bias in the use of synonymous codons. High ENC values (>40) have been identified in many animal viruses, such as porcine circovirus 3 [44], porcine deltacoronavirus [45], and rabies virus [46]. In our study, the mean ENC values of PAstV-2, PAstV-3, PAstV-4, and PAstV-5 were 56.265 ± 0.602, 53.059 ± 0.656, 52.007 ± 0.678, and 53.647 ± 0.316, respectively, demonstrating that a low codon preference was present in the four subtypes of PAstVs. As suggested by previous reports [47,48], the four subtypes of PAstVs with low codon bias may have a selective advantage for their efficient replication in pigs.
In order to understand the codon usage patterns, RSCU values of 59 synonymous codons were estimated. The results of the RSCU analysis revealed that A/U-ended codons were preferentially used over G/C-ended codons in the genomes of PAstVs-3,4,5, while PAstV-2 tended to use G/C-ending preferred codons. The PCA plot showed a clear separation among different PAstV subtypes, indicating that synonymous codon usage is distinct for each subtype of PAstV strains. These results showed that despite being a single-stranded RNA virus with a very high mutation rate, PAstV has a relatively stable synonymous codon usage at a subtype level.
Although RSCU analysis is generally used to investigate synonymous codon usage patterns, it has limitations in revealing the forces that affect codon usage [49]. Therefore, the codon usage analysis was further carried out on the dinucleotides of the four subtypes of PAstVs. The results indicated remarkable divergence of dinucleotide patterns among the four subtypes of PAstVs. In coding sequences of PAstV genomes, dinucleotides CpG and UpA were underrepresented, and dinucleotide UpG was overrepresented. Dinucleotide CpA was specifically overrepresented in the genome CDSs of PAstV-2 and PAstV-4. The frequency of dinucleotides is affected by codon usage, mutation pressure, and natural selection [50]. CpA and UpG increases are regarded as a compensatory mechanism for both CpG and UpA reduction [51,52]. Low CpG content in viruses is usually considered to function to evade host defense and to be affected by natural selection [50,53]. UpA is another dinucleotide that is commonly underrepresented in viral genomes due to natural selection [48]. Decreasing the content of UpA can reduce the sensitivity of ribonuclease, which is conducive to maintaining the stability of mRNA [50] and to avoiding energy instability [54]. The results demonstrated that CpG and UpA were underrepresented in the four subtypes of PAstVs, suggesting that natural selection may have an important role in modeling the codon usage patterns of PAstV strains.
To better understand the roles of mutation pressure and natural selection in shaping the codon usage, PR2 analysis, ENC-GC3 plots, and neutrality analysis were performed. We found a non-proportional distribution from the parity rules, suggesting that both mutation pressure and natural selection contributed to codon usage bias of the four subtypes of PAstVs. The ENC-GC3 plot showed that the points representing PAstV sequences fell below the expected ENC curve. For the ENC-GC3s correlation analysis, if the codons were only affected by the mutation pressure, the actual ENC observations would fall above the ENC expectation curve on the plot of ENC against GC3s [55]. Conversely, if the actual observations of ENC values fall far below the expected curve of ENC values, it means that natural selection has played a major role in codon usage patterns [50]. Therefore, the analysis using the ENC plot indicated mutation pressure and natural selection has driven the codon usage bias of the four subtypes of PAstVs. Although the ENC-GC3s analysis provides a method for quantitative analysis of codon usage bias, this method does not accurately measure the contributions of natural selection and mutation pressure to the codon usage bias of a species [56,57]. To provide more information on this issue, neutral evolution analysis was performed. According to the neutrality plot, the codon usage patterns of PAstV subgroups were determined under different evolutionary pressures. Specifically, mutation pressure and natural selection contributed 40.89% and 59.11% to shaping codon usage pattern of PAstV-5. Natural selection accounted for 94.01%, 96.84%, and 89.35% driving the codon usage bias of PAstV-2, PAstV-3, and PAstV-4, respectively. Taken together, these results suggest that different evolutionary pressures are acting on the four subtypes of PAstVs. Both mutation pressure and natural selection influence codon usage patterns of PAstV-5, while natural selection is the dominant evolutionary force driving codon usage bias of PAstV-2, PAstV-3, and PAstV-4.
The emergence, dynamics, and evolution of viral diseases are determined by host-virus interactions. Intriguingly, all of the four subtypes of PAstVs have the highest CAI value to ducks among the 10 tested hosts. Multiple interspecies transmission events have occurred among human astroviruses, non-human mammalian astroviruses, and avian astroviruses [58]. There have been reports of Avastrovirus infecting mammalian species in ecotones, such as small and medium sized farms that rear multiple species [58,59]. The prevalent interspecies transmission of astroviruses reflects their varying origins. Codon usage study can easily predict the carrier hosts that may act as a source of infection in other co-circulating species [50]. The high CAI value of four subtypes of PAstVs to animals indicated a similar codon usage pattern between animals and PAstVs. We proposed the similar codon usage between PAstV and hosts might advance the cross-species transmission of astrovirus. The values of CAI, RCDI, and SiD may reveal the different adaptabilities of four subtypes of PAstVs to pigs. Of these, PAstV-2 may be most adaptive to pigs in theory than the others, in view of its high value of CAI and low values of RCDI and SiD. This might explain to some extent why PAstV-2 was found as the predominant genotype in many countries [60]. Future studies are warranted to pay more attention to the epidemiology and pathogenicity of PAstV-2 strains.
Codon usage analysis could be used to design the protein-based vaccine against pathogenic viruses based on the control of viral protein expression. Attenuation by the deoptimizations of dinucleotides and/or codons has achieved as a rapid and efficient strategy for attenuation of various small RNA viruses which causes attenuation of viral virulence, and is used to the development of live, attenuated RNA virus vaccines with superior genetic stabilities [61][62][63][64]. Conversely, the optimizations of dinucleotides and/or codons in viral genes increase the protein expression level dramatically and are often performed for vaccine research to increase the immunogenicity of the target [64]. Besides, codon usage bias provides a theoretical basis for studying the transcript regulation, function, and pathological relevance of viral protein. A new transcription regulation was found in some persistent viruses which use poor codons in a distinctive way to temporarily regulate late expression of structural gene products [65]. Information regarding the codon usage pattern and host adaptability of the four subtypes of PAstVs may be useful to identify the potential hosts and the suitable experimental animal models for pathogenesis and vaccine researches.

Conclusions
To our knowledge, this study is for the first time to reveal the codon usage pattern for PAstVs. Phylogenetic analysis result showed the clade PAstV-5 occupies the basal position of the phylogenetic tree. The results from nucleotide composition analysis show that the genome CDSs of PAstVs-3,4,5 are rich in A/U nucleotides in comparison to G/C nucleotides. The C/G-ended codons are the preferentially used synonymous codons in the PAstV-2, whereas AU-ended codons were the preferred synonymous codons in the PAstV-3, PAstV-4, and PAstV-5. Natural selection and mutation pressure are the main factors influencing the codon usage bias in the PAstV-5 genome. The codon usages of PAstV-2, PAstV-3, and PAstV-4 are mainly constrained by selection pressure. The high similar codon usage between PAstV and animals might account for the broad host range and the cross-species transmission of astrovirus. Overall, the information from this study provides new insights for understanding PAstV evolution regarding codon usage pattern and host adaptability.
Supplementary Materials: The following are available online at http://www.mdpi.com/1999-4915/12/9/991/s1, Figure S1: Scree plot of percentage of explained variances for each principal component of the RSCU values of PAstV complete CDS. Figure S2: CAI analysis of the PAstV complete coding genomes in relation to potential host species. Table S1: The detailed information of PAstV strains in NCBI nucleic acid database. Table S2: The nucleotide composition and properties of complete CDS of the PAstVs.
Author Contributions: H.W. analyzed the data, and wrote and finalized the manuscript. Z.B. and C.M. collected the data. Z.C. and J.Z. proposed the work. Z.B., C.M., Z.C., and J.Z. revised the important technical content of the manuscript, and finalized the manuscript. All authors approve the version to be published, and agree to take responsibility for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. All authors have read and agreed to the published version of the manuscript.

Conflicts of Interest:
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.