Analysis of Codon Usage Bias in Xyloglucan Endotransglycosylase (XET) Genes

Xyloglucan endotransglycosylase (XET) genes are widely distributed in most plants, but the codon usage bias of XET genes has remained uncharacterized. Thus, we analyzed the codon usage bias using 4500 codons of 20 XET genes to elucidate the genetic and evolutionary patterns. Phylogenetic and hierarchical cluster analyses revealed that the 20 XET genes belonged to two groups. The closer the genetic distance, the more similar the codon usage preference. The codon usage bias of most XET genes was weak, but there was also some codon usage bias. AGA, AGG, AUC, and GUG were the top four codons (RSCU > 1.5) in the 20 XET genes. CitXET had a stronger codon usage bias, and there were eight optimal codons of CitXET (i.e., AGA, AUU, UCU, CUU, CCA, GCU, GUU, and AAA). The RSCU values underwent a correspondence analysis. The two main factors affecting codon usage bias (i.e., Axes 1 and 2) accounted for 54.8% and 17.6% of the total variation, respectively. Multiple correspondence analysis revealed that XET genes were widely distributed, with Group 1 genes being closer to Axis 1 than Group 2 genes, which were closer to Axis 2. Codons with A/U at the third codon position were distributed closer to Axis 1 than codons with G/C at the third codon position. PgXET, ZmXET, VlXET, VrXET, and PcXET were biased toward codons ending with G/C. In contrast, CitXET, DpXET, and BrpXET were strongly biased toward codons ending with A/U, indicating that these XET genes have a strong codon usage bias. Translational selection and base composition (especially A and U at the third codon position), followed by mutation pressure and natural selection, may be the most important factors affecting codon usage of 20 XET genes. These results may be useful in clarifying the codon usage bias of XET genes and the relevant evolutionary characteristics.


Introduction
Codon usage bias, which is usually the result of adaptive evolution, refers to the unequal frequency of codons during translation [1]. Codon usage bias is of great importance in the chemical distances between amino acids, as the occurrence of the errors also relies on the frequency of different codons [2]. Even within a genome, different codon usage preferences are observed among genes [3]. Recent studies have revealed that specific synonymous codon usage biases affect protein folding [4,5]. When the gene expression level was higher, the evolutionary rates and selective pressure were lower, but the codon usage bias was strong [6]. Except for tryptophan (Trp) and methionine (Met), all amino acids are encoded by at least two synonymous codons [7]. However, the synonymous codons do not occur with equal frequency, and a specific codon usage pattern is referred to as codon usage bias [8]. Mutation, natural selection, and random drift are the three major factors contributing to codon usage bias [9,10]. There are some additional factors, including gene length and expression level, tRNA abundance, RNA stability, protein structure, and

Clustering Analysis
The coding sequences of 20 different XET genes were analyzed using the neighborjoining method to construct a phylogenetic tree. The following genes were observed to belong to Group (Table 1). These results suggest that the similarity in codon usage increases with increasing evolutionary relatedness of genes. Group 2 consisted of seven XET genes, including Pyrus communis XET (PcXET), Gossypium barbadense XET (GbXET), Zea mays XET (ZmXET), Actinidia deliciosa XET (AdXET), Vigna radiata cultivar T44 XET (VrXET), Vigna luteola XET (VlXET), and Pennisetum glaucum XET (PgXET) (Figure 1). In contrast to the corresponding Group 1 data, the ranges of the ENc values and GC3 contents were relatively wide (i.e., 31.1-56.5 and 43.2-97.9, respectively) ( Table 1). These observations indicated that the codon usage was different among these evolutionarily related genes. and Pennisetum glaucum XET (PgXET) (Figure 1). In contrast to the corresponding Gr 1 data, the ranges of the ENc values and GC3 contents were relatively wide (i.e., 31.1and 43.2-97.9, respectively) ( Table 1). These observations indicated that the codon us was different among these evolutionarily related genes.    Figure 2). These results suggested that different XET genes had diverse codon usage patterns, but there was also some codon usage bias.

Analysis of Codon Usage Bias in 20 XET Genes
The average GC content of the 20 analyzed XET genes was 50.9% (range: 38.8-69.5%). The GC content was highest in the third codon position (i.e., 58.2%). Additionally, the mean ENc value was 48.60 (range: 31.14-60.71). With ENc values <35, a strong codon bias was associated with PgXET, VlXET, VrXET, and ZmXET. These data suggested that some XET genes exhibited a directional codon usage, reflecting the codon usage bias of a few XET genes ( Table 1).
U3s: frequency of the nucleotide U at the third codon position; C3s: frequency of the nucleotide C at the third codon position; A3s: frequency of the nucleotide A at the third codon position; G3s: frequency of the nucleotide G at the third codon position; GC3s: frequency of the nucleotides G + C at the third codon position; GC: G + C content; CAI: codon adaptation index; CBI: codon bias index; ENc: effective number of codons; Gravy: protein hydrophobicity.
Aromo: protein aromaticity coefficients were greater than 0.83 ( Figure 4). The multiple correspondence analysis revealed that the XET genes were widely distributed, with Group 1 genes closer to Axis 1 than the Group 2 genes, which were closer to Axis 2 ( Figure 5). The codons with A/U at the third codon position were distributed closer to Axis 1 than the codons with G/C at the third codon position ( Figure 6). These results suggested that the base composition, especially for codons with A/U at the third codon position, had some effect on codon usage bias. Thus, many factors might influence codon usage bias. Axis 1 represented the main factor affecting codon usage bias, but the base composition was another very important factor.

Base Composition Affects the Formation of Codon Usage Bias
To clarify the relationships among the two groups of XET genes regarding the third codon positions of synonymous codons, the RSCU value for each codon underwent a multidimensional preference analysis. The Group 1 and Group 2 genes were closely associated with codons ending with A/U and G/C, respectively. The relatively short distance between PgXET, ZmXET, VlXET, VrXET, and PcXET and GC3s suggested that the codon usage might be biased toward G/C. However, CitXET, DpXET, and BrpXET were very close to AU3s, suggesting a strong codon usage bias toward codons ending with A/U (Figures 5 and 6). Correlation analysis between CAI and ENc was used to determine the effects of translation selection and mutation pressure on codon usage bias of XET genes. The results showed that CAI was significantly negatively correlated with ENc (r = −0.737, p < 0.01) (Figure 7), reflecting that translational selection contributed more to codon usage patterns of the 20 XET genes than mutations.   To clarify the relationships among the two groups of XET genes regarding the third codon positions of synonymous codons, the RSCU value for each codon underwent a multidimensional preference analysis. The Group 1 and Group 2 genes were closely associated with codons ending with A/U and G/C, respectively. The relatively short distance between PgXET, ZmXET, VlXET, VrXET, and PcXET and GC3s suggested that the codon usage might be biased toward G/C. However, CitXET, DpXET, and BrpXET were very close to AU3s, suggesting a strong codon usage bias toward codons ending with A/U (Figures 5 and 6). Correlation analysis between CAI and ENc was used to determine the effects of translation selection and mutation pressure on codon usage bias of XET genes. The results showed that CAI was significantly negatively correlated with ENc (r = −0.737, p < 0.01) (Figure 7), reflecting that translational selection contributed more to codon usage patterns of the 20 XET genes than mutations.

Discussion
The biological process of genetic information from mRNA to protein depends on codon coding [34]. Codon is an important part of the output of nucleic acid information [35,36]. Previous research has found that codon usage bias is mainly influenced by mutation pressure and natural selection [37]. The codon usage bias exists widely in different species [38,39]. However, there are significant differences among different species in the main factors that guide densification code usage bias.
The GC composition has been shown to drive amino acids and codon usage that are closely related to the usage pattern of the third codon base (GC3) [40]. There was a wide

Discussion
The biological process of genetic information from mRNA to protein depends on codon coding [34]. Codon is an important part of the output of nucleic acid information [35,36]. Previous research has found that codon usage bias is mainly influenced by mutation pressure and natural selection [37]. The codon usage bias exists widely in different species [38,39]. However, there are significant differences among different species in the main factors that guide densification code usage bias.
The GC composition has been shown to drive amino acids and codon usage that are closely related to the usage pattern of the third codon base (GC3) [40]. There was a wide range of the GC3s values of GC content in XET codons (27.4-97.9%). The GC3 content of PgXET, VlXET, VrXET, and ZmXET was more than 93%, indicating that these XET genes preferred significantly to end with G/C base, while CitXET, BrfXET, BrpXET, and MtXET preferred the codons ending with A/T(U) base. These results suggested that mutation pressure was the main factor affecting codon usage of 20 XET genes.
In this study, phylogenetic and hierarchical cluster analyses produced similar results. Compared to the corresponding Group 1 data, the ranges of ENc values and GC3 contents were relatively wide (i.e., 31.1-56.5 and 43.2-97.9, respectively) ( Table 1). These observations indicated that codon usage was different, and different XET genes had different codon usage patterns, but there was also some codon usage bias.
ENc and CAI are two parameters related to gene expression level. A subsequent series of analyses confirmed that most XET genes have an ENc value > 35, suggesting these genes exhibit a general codon usage pattern. The exceptions were ZmXET, VlXET, VrXET, and PgXET. The ENc is one of the most important factors affecting codon usage preferences. An ENc value approaching 20 indicates a strong codon usage bias. In contrast, an ENc value close to 61 implies that the codons are used relatively equally [3]. Many factors contribute to biased synonymous codon usage in the 20 XET genes. For example, base composition influences codon usage bias (e.g., CitXET, DpXET, and BrpXET), especially regarding codons with A/U at the third codon position.
In order to evaluate the synonymous codon usage pattern with amino acids of different XET genes, the codon usage bias of various codons was detected by RSCU, which reflected the frequency of a specific codon relative to that of the synonymous codons. A relatively strong codon usage bias is indicated by an RSCU value > 1. In contrast, an RSCU value < 1 indicates that a particular codon is used less frequently than the other synonymous codons [41]. In this study, AGA, AGG, AUC, and GUG were the top four codons (RSCU > 1.5) in the 20 XET genes. A comparison of the RSCU values of the other 19 XET genes revealed a similar codon usage bias, while CitXET had a stronger codon usage bias. There were eight optimal codons of CitXET (i.e., AGA, AUU, UCU, CUU, CCA, GCU, GUU, and AAA), suggesting CitXET was biased toward the synonymous codons with A or U at the third codon position. We found that XET activity was specific to the elongation of root cells during citrus seedling etiolation in previous research [42]. Consequently, the codon usage bias may be very closely related to the functionality of CitXET. In order to confirm this assumption, further research is needed.
The correlation between the first two spindles (Axis 1 and Axis 2) and the nucleotide content of 20 XET genes was analyzed. The results showed that there were various significant correlations between the two spindle axes and nucleotide content. In addition, there was a significant negative correlation between Gravy values with Axis 1 and a significant positive correlation between Gravy with Axis 2. In contrast, there was a positive correlation between Aromo and Axis 1 and a significant negative correlation between Aromo and Axis 2 ( Figure 4). These results suggested that Axis 1 and Axis 2 had important roles in shaping EXT codon usage patterns.
The CAI, with values ranging from 0 to 1, is another effective measure of codon usage bias. The higher the CAI value, the better the adaptability of the sequence [6]. CAI value approaching 1 indicates a strong codon bias [3]. In the present study, with the exception of PgXET and ShXET, the analyzed genes had CAI values between 0.2 and 0.3. These results indicated that the expression levels of the XET genes were not randomly variable. In previous studies, the correlation between CAI and ENc values was used to demonstrate the effects of translational selection and mutation pressure on codon usage bias [43][44][45]. A correlation coefficient approaching −1 suggests that the translation selection effect is greater than that of mutations on codon usage bias [7]. Our correlation analysis showed that CAI was significantly negatively correlated with ENc (r = −0.737, p < 0.01) (Figure 7), reflecting that translational selection contributed more to codon usage patterns of the 20 XET genes than mutations. Similar results were reported by Qiu et al. [46], but these observations were inconsistent with the findings of other studies, which suggested that mutations had a stronger effect on codon usage bias than translation selection in some viruses [47,48]. This inconsistency may be because the virus genes are more extensively mutated than the plant XET genes. In other words, XET genes may be relatively conserved.

Sequence Data
We downloaded all the available XET gene sequences from the NCBI GenBank database (http://www.ncbi.nlm.nih.gov/, accessed on 1 January 2023) and analyzed their codon usage bias. The details of the available 20 available XETs are provided in Table 1.

Analysis of Codon Usage in XET Coding Sequences
We used several indicators to analyze the XET coding sequence codon usage. The codon adaptation index (CAI), relative synonymous codon usage (RSCU), the effective number of codons (ENc), codon bias index, frequency of optimal codon usage, hydrophobicity, aromaticity, as well as the T3s, C3s, A3s, G3s, GC, and GC3s contents were calculated using the CodonW 1.4.4 program (http://codonw.sourceforge.net/, accessed on 1 January 2023). The CAI analysis of XET genes was performed using the CAIcal server [49]. CAI, which is the geometric mean of the relative use of codons in genes, is used to measure the adaptability of genes to the codon usage of high-expression genes [38].
The correlation between nucleotide content was calculated using SPSS 20.0 statistical software. A Pearson correlation coefficient was calculated. ENc value was calculated to measure the degree of deviation from equal use of synonymous codons of the ORF of the XET genes. The ENc value, reflecting the degree of codon usage bias, ranges from 20 (when only one synonymous codon is selected for the corresponding amino acid) to 61 (when all synonymous codons are used identically) [50]. When the ENc value is greater than 35, the codon usage deviation is considered to be low [2].

Phylogenetic and Hierarchical Cluster Analyses
The default parameters of MEGA 7.0.12 were used to construct the phylogenetic tree based on the 20 XET coding sequences, and the neighbor-join (NJ) method was used with a bootstrap value of 1000 replicates [51]. Specific parameters were as follows: test of phylogeny was the bootstrap method, substitutions type was amino acid, substitution model was the Poisson model, rates among sites were uniform rates, pattern among lineages was the same (homogeneous), and gaps/missing data treatment was complete deletion. The RSCU values underwent a hierarchical cluster analysis.

Comparison of the XET Codon Usage Patterns
RSCU represents the ratio between the observed usage frequency of a codon in a gene sample and the expected usage frequency in the synonymous codon family, assuming that all codons of a particular amino acid are used equally. The RSCU (i.e., ratio of the observed codon usage to the expected usage [52]) was used to investigate the overall synonymous codon usage bias among the 20 XET genes. In a comparison of the XET codon usage pattern, if the RSCU value for the polyprotein-coding region of XET and that of the same codon were both <1.0, >1.5 or between 1.0 and 1.5, their codon usage patterns were judged to be similar. In addition, synonymous codons with RSCU values >1.5 and <1.0 are treated as high-frequency codons and low-frequency codons, respectively [43]. The preferred codons, which were defined as those whose ∆RSCU > 0.08, were analyzed as described by Zhang et al. [53].
The RSCU index was calculated as follows (G ij is the observed number of the ith codon for the jth amino acid, which has N i kinds of synonymous codons).

Correspondence and Multidimensional Preference Analyses
The mathematical procedure of correspondence analysis converts the RSCU values into a series of dimensional factors, and the results can be used to analyze major trends in codon usage patterns among different samples. Each gene is represented by a 59-dimensional variable, and each dimension matches the RSCU value of a codon, excluding AUG, UGG, and stop codon. Correspondence analysis was performed using the CodonW 1.4.4 program (http://codonw.sourceforge.net/, accessed on 1 January 2023), and correlation analysis was performed for the first two axes (Axis 1 and Axis 2) of correspondence analysis [7]. Principal component analysis of the codon usage frequencies and multiple correspondences and multidimensional preference analyses of the RSCU value for each codon of the 20 XET genes were conducted using SPSS 20.0 program.

Conclusions
In conclusion, codon usage was different, and different XET genes had different codon usage patterns. The codon usage bias of most XET genes is weak, but there was also some codon usage bias. AGA, AGG, AUC, and GUG were the top four codons (RSCU > 1.5) in the 20 XET genes. CitXET had a stronger codon usage bias, and there were eight optimal codons of CitXET (i.e., AGA, AUU, UCU, CUU, CCA, GCU, GUU, and AAA). PgXET, VlXET, VrXET, ZmXET, CitXET, BrfXET, BrpXET, and MtXET have strong codon biases. Translational selection and base composition, followed by mutation pressure and natural selection, may be the most important factors affecting codon usage of 20 XET genes. Although codon usage bias is not necessarily considered in traditional phylogenetic analyses, the data presented here clarify the codon usage patterns for 20 XET genes. Our findings may be useful for comprehensively characterizing the factors mediating genetic evolution.