Codon Usage Bias Analysis in Macronuclear Genomes of Ciliated Protozoa

Ciliated protozoa (ciliates) are unicellular eukaryotes, several of which are important model organisms for molecular biology research. Analyses of codon usage bias (CUB) of the macronuclear (MAC) genome of ciliates can promote a better understanding of the genetic mode and evolutionary history of these organisms and help optimize codons to improve gene editing efficiency in model ciliates. In this study, the following indices were calculated: the guanine-cytosine (GC) content, the frequency of the nucleotides at the third position of codons (T3, C3, A3, G3), the effective number of codons (ENc), GC content at the 3rd position of synonymous codons (GC3s), and the relative synonymous codon usage (RSCU). Parity rule 2 plot analysis, Neutrality plot analysis, ENc plot analysis, and correlation analysis were employed to explore the main influencing factors of CUB. The results showed that the GC content in the MAC genomes of each of 21 ciliate species, the genomes of which were relatively complete, was lower than 50%, and the base compositions of GC and GC3s were markedly distinct. Synonymous codon analysis revealed that the codons in most of the 21 ciliates ended with A or T and four codons were the general putative optimal codons. Collectively, our results indicated that most of the ciliates investigated preferred using the codons with anof AT-ending and that codon usage bias was affected by gene mutation and natural selection.


Introduction
Codons are nucleotide triplets of messenger RNA that carry genetic information. In all organisms, there are 64 kinds of codons, including three kinds of stop codons and 61 kinds of codons encoding 20 amino acids. The phenomenon that the same amino acid is encoded by more than one synonymous codon is called the degeneracy of the codon [1]. Except for methionine (Met) and tryptophan (Trp) which are encoded by ATG and TGG respectively, the other amino acids are encoded by 2-6 synonymous codons [2]. The non-randomness of synonymous codons is called codon usage bias (CUB). CUB can be mainly caused by base preference and natural selection in the genome, which is the result of the balance of mutation, natural selection, and genetic drift [3][4][5][6].

Effective Number of Codons
The effective number of codons (ENc) is a typical parameter to measure the magnitude of synonymous codon usage bias for any gene [55]. The ENc value quantifies synonymous codons usage frequency of a gene and is independent of gene length or amino acid composition [56]. ENc values range from 20 (extreme bias for using one codon) to 61 (no bias for using synonymous codons). In general, an ENc value < 35 is considered a significant CUB for the gene in question [56,57]. The Enc was calculated using the formulation of codon family (F CF ) in the equation given by [58]: Then, the ENc could be calculated by: where n i is the count of codon i in m amino acid family and m is the number of codons in an amino acid family. The subscript CF stands for "codon family" and refers to the fact that F CF and ENc. CF are for a specific codon family rather than for a gene. ENc values were plotted against GC3s values to reveal the determinants of CUB, thereby indicating whether there are other factors affecting CUB. The standard curve of the plot was calculated by the following formula [58]: If codon usage is limited only by GC mutation bias, the predicted ENc value will be on or near the standard curve. If the predicted ENc values are considerably far from the standard curve, the CUB is mainly affected by natural selection.
The ENc ratio was calculated according to the following formula: The ENc ratio value shows the difference between observed and expected ENc values [59,60].

Codon Adaptation Index (CAI)
The CAI is an effective measure for the relative adaptiveness of CUB in one gene compared with highly expressed genes [61]. A high CAI value indicates a stronger CUB and a higher expression level. The CAI value ranges from 0 to 1 according to the gene expression level. CAI was calculated by the equation [61]: where L is the count of codons in the gene and ω c(k) is the ω relative adaptiveness value for the k-th codon in the gene [61]. The CAI value was calculated using BCAWT.

Relative Synonymous Codon Usage (RSCU) and Putative Optimal Codons
The RSCU value for the codon was analyzed as the ratio of the observed frequency of a codon to the expected frequency under the assumption that all synonymous codons of a particular amino acid are used equally, and the RSCU value is unaffected by gene length and amino acid frequency [62]. RSCU directly reflects the CUB. If the RSCU value of a codon is lower than 1, this means that the codon in question is used less frequently than average; if the RSCU value equals 1, this indicates that codon usage is unbiased; if the RSCU value is higher than 1, this means that the codon in question is used more frequently than average. Similarly, codons with RSCU values higher than 1.6 and lower than 0.6 are considered to be over-represented and under-represented in the CDS, respectively [63]. The equation to calculate the RSCU is [61]: where O ac is the count of codon c for an amino acid and k a is the number of synonymous codons. The RSCU value was calculated using BCAWT. The putative optimal codons were determined by BCAWT. If each synonymous codon of an amino acid family is correlated with the Enc of all genes, we defined the optimal codon of each amino acid family as the codon with the strongest negative correlation between the RSCU and ENc values [55].

Grand Average Hydropathicity (Gravy) and Aromaticity (Aroma) Indices
Changes in Gravy and Aroma reflect variations in the number of amino acids used. The Gravy was calculated as the arithmetic average of the sum of the hydrophilic indices of each amino acid, with scores ranging from −2 to 2, i.e., positive values for hydrophobic proteins and negative values for hydrophilic proteins [64]. Aroma refers to the aromatic properties of proteins. It represents the frequency of the aromatic amino acids (phenylalanine, tyrosine, and tryptophan) in the translated gene product [65]. Gravy and Aroma indices were calculated using BCAWT.

Correspondence Analysis
Correspondence analysis is a multivariate statistical method that we applied to 59 codons (the exceptions being ATG, TGG, TAA, TAG, and TGA) to investigate the major trends in codon usage variation in all CDSs. Correspondence analysis plotted the distribution of genes and codons on a continuum of 59 dimensions based on the trends that affect the usage of synonymous codons in the genome [55]. The first axis (axis 1) represents the majority of variation in codon usage, and subsequently, the amount of variation explained by each axis gradually decreases [66].

Parity Rule 2 (PR2) Plot Analysis
The base composition at the 3rd codon position has extensive heterogeneity in the genomes of higher eukaryotes. We can analyze whether the factors affecting CUB are only random mutations (A3 = T3, G3 = C3), or mutations with selection (A3 = T3, G3 = C3). A PR2-plot that used two-fold, four-fold, and six-fold degenerate codon families utilized A3/(A3 + T3) as the vertical axis and G3/(G3 + C3) as the horizontal axis. A = T and G = C were the central positions and the coordinates were (0.5, 0.5) [67].

Translational Selection Index
The translation selection (P2) index measures the degree of bias of anticodon-codon interactions and thus can indicate translation efficiency [68]. Similar to PR2 which reflects the selection of cytosine and thymidine as the 3rd base of the codon, P2 is based on the different usage of homologous tRNA species in the process of gene translation to indicate CUB in the gene [12]. P2 was calculated according to the following formula [68]: where W = A or T, S = C or G, Y = C or T.

Neutrality Plot Analysis
GC content at the 3rd codon position is almost neutral to natural selection, whereas GC content at the 1st and 2nd codon positions is negatively affected by directional mutational pressure and selective restriction, respectively [69]. Thus, neutrality plots were illustrated by GC3 on the horizontal axis and GC12 on the vertical axis. The slope of the regression line indicated a neutral degree of GC content. If the slope of the line is close to or equal to ±1, this indicates that mutation pressure is the sole determinant of CUB. In contrast, if the slope of the line is close to 0, this indicates that natural selection is the sole determinant of CUB. When the slope of the regression line is equal to ± 1/2, it means that mutation pressure and selective constraints are equal [70].

Statistical Analysis
Correlation analysis was done using IBM SPSS Statistics 26 software. The figures were constructed using BCAWT version 1.0.0, GraphPad Prism version 8.0, and R software version 3.6.3.

Nucleotide Compositions
CUB may be shaped by nucleotide composition bias, specifically the GC content of CDSs. Genomic GC content determined by mutational processes was the prime factor of codon usage variation across species, and it was important evidence that mutation pressure determined CUB [74,75]. In this study, genomic GC content and GC content at different codon positions of the 21 ciliate species are shown in Figure S1 and Table 1. The GC content varied greatly among the four classes of ciliates investigated. Strombidium stylifer in the class Spirotrichea had the highest GC content (49.74%). In addition, the GC content of Pseudokeronopsis flava, Pseudokeronopsis carnea and Halteria grandinella in the class Spirotrichea was each over 40% (46.55%, 45.18% and 44.34%, respectively). The GC content of species belonging to the classes Litostomatea, Oligohymenophorea and Heterotrichea were relatively low, ranging from 23.49% to 32.75%. Furthermore, the GC contents at different codon positions (GC1, GC2, GC3) of the 21 ciliate species ranged from 32.83% to 52.32%, 24.20% to 37.48%, and 12.8% to 61.98%, respectively. There was a difference in GC content at different codon positions of the whole CDSs among the 21 ciliate species. The largest difference was found in the GC content of the 3rd codon position. These data indicated that there were differences in genome nucleotide compositions among different ciliate species. The nucleotide compositions at the 3rd codon base (A3, T3, C3, G3) in the 21 ciliate species were also analyzed ( Table 1 and Figure S2). The A3 content of E. caudatum (class Litostomatea) was the highest (42.95%), while that of S. stylifer (class Heterotrichea) was the lowest (19.40%). The T3 content of Uronema marinum (class Oligohymenophorea) was the highest (45.43%), while that of S. stylifer was the lowest (18.61%). Stylonychia stylifer had the highest C3 content (35.78%), while E. caudatum had the lowest (7.61%). Stylonychia stylifer also had the highest G3 content (26.20%), while U. marinum had the lowest (4.95%). These findings suggested that for the 21 species investigated, the composition at the 3rd base of codon varied greatly within species and among classes.

Effective Number of Codons and its Association with GC3
The effective number of codons (ENc) is used to measure the CUB in a gene, and the codon bias degree increases with the decline of the ENc value [56]. The ENc values in the 21 ciliate species ranged from 31.48 ± 2.55 to 44.86 ± 5.48 ( Table 2, mean ± SD), with an average value of 38.34 (SD = 4.0120). A lower ENc value indicated that there was a strong CUB in the ciliates (average ENc value was approximately 35), but the different ciliate species codon usage patterns were also remarkably distinct, i.e., the maximum ENc value was 44.86, while the minimum ENc value was 31.48. The low ENc value of the 21 ciliate species codons revealed the instability and evolutionary diversity of the ciliate genome. An ENc-plot was constructed with the ENc value of each ciliate species on the x-axis and GC3s on the y-axis ( Figure 1). According to GC3s content, the ciliates were divided into some with low GC3s content, ranging from 12.80% to 36.71%, and others with high GC3s content, ranging from 47.61% to 61.98% (Table 1). The average value of GC3s was 29.46% (SD: 13.6437). As shown in Figure 1, some genes were located near or on the standard curve, suggesting that their CUB was mainly affected by mutation pressure. However, most of Microorganisms 2023, 11, 1833 6 of 24 the genes were located above or below the standard curve, suggesting that other factors, such as natural selection together with mutation pressure, determined CUB. In addition, a significant correlation between ENc and GC3s was observed (Table S2), suggesting that mutation pressure had a significant influence on CUB in the ciliates.  The ENc ratio, as given by (ENc exp − ENc obs )/ENc exp , was calculated to show the difference between ENc obs and ENc exp more clearly. The ENc ratio was in the range of 0.1 to 0.3 (Table S3), indicating that the ENc exp values of most genes were significantly different from the ENc obs values. These data suggested that although the ciliate CUB was related to differences in GC3s, it was mainly affected by other factors such as natural selection.

Relative Synonymous Codon Usage (RSCU) and Putative Optimal Codons
The RSCU value can reveal the codon usage pattern of the gene. RSCU values <1, =1, and >1 indicate that the frequency of codon usage is below, equal to, or above average values, respectively. Codons with RSCU values >1.6 and <0.6 were considered overrepresented and under-represented, respectively [63]. The RSCU values of 59 codons (the exceptions being ATG, TGG, and three stop codons) in the 21 ciliate species were analyzed to show if there were differences among the four classes represented.
The RSCU values of the classes Oligohymenophorea, Litostomatea, and Heterotrichea were similar as shown in Figure 2 and Table  In the class Spirotrichea, however, the species whose GC contents were more than 40% (S. stylifer, Pseudokeronopsis flava, P. carnea, and H. grandinella) had 48 codons with RSCU value > 1 (A:11, T:14, G:8, C:15), 23 codons of which ended in G/C, accounting for 47.92% of all codons; the other species, which had GC content ranging from 30% to 40% (Euplotes vannus, Euplotes octocarinatus, Oxytricha trifallax, and Stylonychia lemnae) had 29 codons with RSCU value > 1 (A:12, T:14, G:2, C:1), 26 codons of which ended in A/T, accounting for 89.66% of all codons. In 20 ciliate species (the exception being P. flava in the class Spirotrichea), the most preferred codon was AGA encoding arginine. The results showed that the ciliate species of the classes Oligohymenophorea, Litostomatea, and Heterotrichea, and some of those in the class Spirotrichea (E. vannus, E. octocarinatus, O. trifallax, and S. lemnae), preferred using codons ending in A/T, whereas the other species of the class Spirotrichea, including S. stylifer, P. flava, P. carnea and H. grandinella, preferred using codons ending in G/C, which is consistent with the bias for the 3rd base of codons in different ciliate classes [26]. The RSCU was affected by the restriction of nucleotide composition, suggesting that mutation pressure was one of the most impactful factors of CUB.
The method described in [58] was used to determine the putative optimal codons (Table S5). In the class Oligohymenophorea, the putative optimal codons of 17 of the 18 amino acids ended in A/T, the exception being lysine (which was coded by AAG in Tetrahymena borealis and T. elliotti) and phenylalanine (which was coded by TTC). The putative optimal codons of the 18 amino acids in the classes Litostomatea and Heterotrichea all ended in A/T. By contrast, in the class Spirotrichea, there were 15 amino acids in S. stylifer, 18 in P. flava, 17 in P. carnea, and five in H. grandinella, whose putative optimal codons ended in G/C. There were differences in CUB among different classes of the 21 species investigated here, indicating that ciliates may be restricted by CUB in the evolution process.

PR2-Plot Analysis
PR2 is an intrastrand rule where A = T and G = C are expected if there is no mutation pressure or selection bias. If the usage of AT and CG are unbalanced, then both natural selection bias and mutation pressure together determine the composition of synonymous codons at the 3rd codon position and influence the CUB [76]. In most protein-coding genes, there are wide differences between both C and G content and A and T content [77]. We observed that the genes were distributed in four regions in the PR2-plot (Figure 3). In the 21 ciliate species, the AT bias ranged from 43.86% to 52.45% (Table S6), and only five species had a higher AT bias than 50% (P. caudatum, P. tetraurelia, E. octocarinatus, E. vannus, and S. stylifer). The GC bias ranged from 36.89% to 58.35% and only S. stylifer had a GC bias higher than 50%. Thus, in the 21 species, the rate of codon usage ending in T/C was higher than that ending in A/G, which was consistent with the nucleotide composition in species where the 3rd codon position ending in pyrimidine bases was preferred. This finding was also supported by correlation analysis between ENc and A, T3, C3, G3, which showed a more significant correlation between ENc and T3, C3 (Table S2). A codon usage imbalance between A/T and G/C as shown in the PR2-plot suggested that both natural selection bias and mutation pressure worked together on CUB in the 21 species.

PR2-Plot Analysis
PR2 is an intrastrand rule where A = T and G = C are expected if there is no mutation pressure or selection bias. If the usage of AT and CG are unbalanced, then both natural selection bias and mutation pressure together determine the composition of synonymous codons at the 3rd codon position and influence the CUB [76]. In most protein-coding genes, there are wide differences between both C and G content and A and T content [77]. We observed that the genes were distributed in four regions in the PR2-plot (Figure 3). In the 21 ciliate species, the AT bias ranged from 43.86% to 52.45% (Table S6), and only five species had a higher AT bias than 50% (P. caudatum, P. tetraurelia, E. octocarinatus, E. vannus, and S. stylifer). The GC bias ranged from 36.89% to 58.35% and only S. stylifer had a GC bias higher than 50%. Thus, in the 21 species, the rate of codon usage ending in T/C This finding was also supported by correlation analysis between ENc and A, T3, C3, G3, which showed a more significant correlation between ENc and T3, C3 (Table S2). A codon usage imbalance between A/T and G/C as shown in the PR2-plot suggested that both natural selection bias and mutation pressure worked together on CUB in the 21 species.

Neutrality Plot Analysis
The difference in GC3 among the different species reflects the mutation pressure [78]. A neutrality plot analysis, which shows the relationship between GC12 and GC3, was

Neutrality Plot Analysis
The difference in GC3 among the different species reflects the mutation pressure [78]. A neutrality plot analysis, which shows the relationship between GC12 and GC3, was conducted in the 21 ciliate species to explore the influence of mutation pressure and selection bias on CUB. There was a significant correlation between GC12 and GC3 (Table S2), meaning that mutation pressure had a significant effect on CUB. Furthermore, the absolute value of the slope of the regression line in the neutrality plot ranged from 0.020 to 0.377, which indicated the effect of mutation pressure was only about 2% to 37.7% (Figure 4). The above results showed that although the CUB was affected by mutation pressure, natural selection seemed to have a greater influence. Four species (E. caudatum, P. persalinus, Stentor roeselii, and S. stylifer) with higher mutational pressure may have more rapid rates of evolution and higher adaptability than the other species investigated.
Microorganisms 2023, 11, x FOR PEER REVIEW 11 of 24 conducted in the 21 ciliate species to explore the influence of mutation pressure and selection bias on CUB. There was a significant correlation between GC12 and GC3 (Table S2), meaning that mutation pressure had a significant effect on CUB. Furthermore, the absolute value of the slope of the regression line in the neutrality plot ranged from 0.020 to 0.377, which indicated the effect of mutation pressure was only about 2% to 37.7% ( Figure  4). The above results showed that although the CUB was affected by mutation pressure, natural selection seemed to have a greater influence. Four species (E. caudatum, P. persalinus, Stentor roeselii, and S. stylifer) with higher mutational pressure may have more rapid rates of evolution and higher adaptability than the other species investigated.

Correspondence Analysis
Correspondence analysis creates a series of orthogonal axes to determine the tendency to explain variation in data, with each subsequent axis explaining a gradual decrease in the amount of variation [79]. RSCU correspondence analysis in the 21 ciliate species was carried out show the trend of CUB based on RSCU values. In order to minimize the influence of amino acid composition on codon usage, each gene was represented as a vector with 59 dimensions, and each dimension corresponded to the RSCU value of a justice codon (excluding Met, Trp, and three stop codons) [80]. The first axis (axis 1) of the 21 species contributed 5.82% to 37.59% of the total variation, and the accumulative variation of the first four axes was 23.71% to 55.11% ( Table 3). The first axis accounted for most of the variation of the RSCU deviation in these genes and was the main factor determining the codon usage pattern in these ciliates, the influence of the other axes being insignificant. The genes were plotted on a planar graph with the first axis as the abscissa (horizontal axis) and the second axis as the ordinate (vertical axis), respectively ( Figure 5). The scattering of the genes in the graph indicated that CUB was not affected by a single factor but rather was determined by many different factors. To verify the association between CUB and nucleotide compositions, we performed a correlation analysis between nucleotide compositions and axis 1 (Table S2). This showed a significant correlation in each species indicating that there was a correlation between CUB and nucleotide compositions. Axis 1 was significantly correlated with GC and GC3, indicating that mutation pressure was an important factor affecting CUB. In addition, CAI, ENc and axis 1 were significantly correlated with each other. Through codon correspondence analysis, we explored the codon usage patterns ( Figure S3). We found that axis 1 could distinguish codons ending in G/C and A/T just as easily as axis 2 could distinguish codons ending in T/C and A/G, confirming the previous conclusion (described in the section on Nucleotide Compositions and PR2-Plot Analysis) that these ciliates preferred using codons ending in AT, especially pyrimidines.

Prediction of Gene Expression in 21 Ciliates Species
There is a significant positive correlation between CUB and gene expression [12,81]. The codon adaptation index (CAI) was used to predict gene expression level and codon usage bias in the 21 ciliate species. A higher CAI value means a higher gene expression level, and the CAI value range is 0 to 1 [61]. The gene expression levels of the 21 species were predicted by CAI values (Table 4). Among the 21 species, H. grandinella had the highest average CAI value (0.7572, SD = 0.0391), while P. flava had the lowest (0.5226, SD = 0.1326). The CAI values of the 21 species were all greater than 0.5, indicating that these ciliates have high gene expression levels and strong CUB. We conducted a correlation analysis between CAI and ENc values as well as between CAI and GC3 values ( Figure 6 and Table S2). With the exception of P. carnea, a significant negative correlation between CAI and ENc values and between CAI and GC3 values was observed, suggesting that gene expression levels may play a key role in determining the CUB in these ciliates.

Compositions and Gene Lengths of Amino Acids
Amino acid composition and gene length can affect CUB. Here, we conducted the correlation among Gravy and Aroma of amino acids, gene length ENc, and GC content in the genome of each of the 21 ciliate species. As shown in Table S2, it can be seen that the Gravy of 20 species (the exception being Tetrahymena malaccensis) was significantly correlated with GC content (Figure 7), and the Gravy of 19 ciliate species (the exceptions being P. caudatum and E. caudatum) was significantly correlated with ENc. In the 21 ciliate species, there was a significant correlation between Aroma and GC content (Figure 8), and a very high correlation between Aroma and ENc in 16 species (the exceptions being P. tetraurelia, U. marinum, T. malaccensis, S. coeruleus, and P. carnea). The gene length of 17 species (the exceptions being P. traurelia, P. persalinus, Ichthyophthirius multifiliis, and H. grandinella) was significantly correlated with GC content, and the gene length of all 21 species was significantly correlated with ENc. In addition, except for P. caudatum and P. persalinus, the gene length was markedly positively correlated with ENc (Table S2). This indicates that gene length was significantly negatively correlated with CUB. A previous study has reported that longer genes have weak CUB because selection may reduce the size of highly expressed proteins, and this effect is particularly pronounced in eukaryotes [10]. The results of the present study showed that the amino acid compositions and gene lengths could affect the CUB, but the absolute values of the correlation were low, indicating that they were only the secondary factors affecting CUB in the ciliate species investigated here.

Compositions and Gene Lengths of Amino Acids
Amino acid composition and gene length can affect CUB. Here, we conducted the correlation among Gravy and Aroma of amino acids, gene length ENc, and GC content in the genome of each of the 21 ciliate species. As shown in Table S2, it can be seen that the Gravy of 20 species (the exception being Tetrahymena malaccensis) was significantly correlated with GC content (Figure 7), and the Gravy of 19 ciliate species (the exceptions being P. caudatum and E. caudatum) was significantly correlated with ENc. In the 21 ciliate species, there was a significant correlation between Aroma and GC content (Figure 8), and a very high correla- cantly negatively correlated with CUB. A previous study has reported that longer genes have weak CUB because selection may reduce the size of highly expressed proteins, and this effect is particularly pronounced in eukaryotes [10]. The results of the present study showed that the amino acid compositions and gene lengths could affect the CUB, but the absolute values of the correlation were low, indicating that they were only the secondary factors affecting CUB in the ciliate species investigated here.  The P2 index, created using the principle of the distance between expected and observed codon usage, predicts the CUB using the strength of codon-anticodon binding between mRNA and tRNA [14]. The P2 index is defined as the frequency of the correct choice between pyrimidines in codons beginning with AA, AT, TA, TT, CC, CG, GC, or GG [12]. In 19 of the 21 ciliate species (the exceptions being P. flava and S. stylifer, the values of SST

Translation Selection (P2) and Choice between Pyrimidines in the 3rd Position of Codon
The P2 index, created using the principle of the distance between expected and observed codon usage, predicts the CUB using the strength of codon-anticodon binding between mRNA and tRNA [14]. The P2 index is defined as the frequency of the correct choice between pyrimidines in codons beginning with AA, AT, TA, TT, CC, CG, GC, or GG [12]. In 19 of the 21 ciliate species (the exceptions being P. flava and S. stylifer, the values of SST and WWT were higher than those of SSC and WWC (Table 5). This suggests the 3rd codon tends to end in T than C, which is consistent with the nucleotide composition as described above. Only S. stylifer, P. carnea, and P. flava had P2 values higher than 0.5, suggesting that translation selection played a major role in directional mutation stress in these three species, perhaps because their GC content was nearly equal to the AT content [68,82].

Phylogenomic Analyses
We performed phylogenetic analyses based on orthologous protein sequences of 21 ciliates to determine the relationship between ciliate systematic position and CUB ( Figure 9). The phylogenetic trees constructed on BI and ML analysis had similar topologies. The result corresponded to the GC content result where in Spirotrichea the species which had higher GC content including H. grandinella, P. carnea, P. flava and S. stylifer were clustered together and other species which had lower GC content had a similar phylogenetic relationship in concatenated protein tree. Hence, our study further supported the part of Spirotrichea species that had a unique codon usage pattern and preferred using codons ending with GC. Microorganisms 2023, 11, x FOR PEER REVIEW 19 of 24

Discussion
From prokaryotes to eukaryotes, CUB has a profound influence on genome evolution [8,83]. Contrary to Crick's description, some ciliates do not follow the conventional protein-coding pattern of codons, but reassign termination codons to encode glutamine [54,84]. The codon is an important carrier of genetic information transmission, and the CUB in coding genes is often generated for more accurate and efficient translation. The study of CUB is therefore crucial for fully understanding the genetic and translation mechanism of ciliates [85,86].
In this study, we analyzed the CUB of 21 ciliate species representing four classes and two subphyla, and explored their molecular evolutionary mechanism so as to further understand the evolutionary relationship in different ciliate classes. Related species invariably had similar nucleotide compositions and codon usage patterns. Most species had a GC content of less than 50%, with a bias for synonymous codons ending A or T. However, the GC content of species in the class Spirotrichea was higher than that in the other classes, and the difference in GC content at the 3rd codon position was particularly significant. Measuring GC content at the 3rd codon position is a good indicator of the degree of base composition bias [78]. Based on the significant differences in GC3 content, it can be shown that there are differences in CUB among the 21 species. GC3 content and codon usage were strongly correlated among genes, suggesting that CUB may be due to a mutational bias at the DNA level rather than natural selection at the translation level. The results of nucleotide composition analysis were consistent with the PR2, RSCU and codon correspondence analyses. The CUB in most species had a bias for codons ending in AT, which contrasts with plants such as monocotyledons, which have a bias for codons ending in

Discussion
From prokaryotes to eukaryotes, CUB has a profound influence on genome evolution [8,83]. Contrary to Crick's description, some ciliates do not follow the conventional protein-coding pattern of codons, but reassign termination codons to encode glutamine [54,84]. The codon is an important carrier of genetic information transmission, and the CUB in coding genes is often generated for more accurate and efficient translation. The study of CUB is therefore crucial for fully understanding the genetic and translation mechanism of ciliates [85,86].
In this study, we analyzed the CUB of 21 ciliate species representing four classes and two subphyla, and explored their molecular evolutionary mechanism so as to further understand the evolutionary relationship in different ciliate classes. Related species invariably had similar nucleotide compositions and codon usage patterns. Most species had a GC content of less than 50%, with a bias for synonymous codons ending A or T. However, the GC content of species in the class Spirotrichea was higher than that in the other classes, and the difference in GC content at the 3rd codon position was particularly significant. Measuring GC content at the 3rd codon position is a good indicator of the degree of base composition bias [78]. Based on the significant differences in GC3 content, it can be shown that there are differences in CUB among the 21 species. GC3 content and codon usage were strongly correlated among genes, suggesting that CUB may be due to a mutational bias at the DNA level rather than natural selection at the translation level. The results of nucleotide composition analysis were consistent with the PR2, RSCU and codon correspondence analyses. The CUB in most species had a bias for codons ending in AT, which contrasts with plants such as monocotyledons, which have a bias for codons ending in GC, and dicotyledons, which have a bias for codons ending in AT [87]. Due to compositional constraints, ciliates may prefer using codons ending in AT [78]. The bias in the composition of the 3rd codon base indicated that compositional constraints under mutational pressure may influence the CUB in different ciliate species. In P. flava, P. carnea and S. stylifer, however, nucleotide compositions, RSCU values, and putative optimal codons suggested a bias for codons ending in GC. There were only four high-frequency codons (CCT, CCA, AGA and GGA) that were common to the four ciliate classes, which suggested that CUB had large differences among the 21 species. In addition, although CUB of most genes reflects the overall AT content of the genome in Tetrahymena thermophila, there is a set of genes in which the optimal codon has no connection with AT content, indicating that the factors affecting ciliate CUB are complex [88].
The ENc value of ciliates was low (ENc < 40), which can indicate that many ciliates may need high gene expression to adapt to environmental stress. Furthermore, ENc-plots showed that there was a significant correlation between ENc and GC3s, which indicated that mutation pressure existed in the CUB. However, some genes in ENc-plots were far from the curve, indicating that mutation pressure did not play a major role in CUB. The significant negative correlation between ENc and CAI, and the significant negative correlation between CAI and GC3 indicated that genes with high expression had stronger CUB, and that gene expression was one of the most important factors for CUB. Surprisingly, however, in Tetrahymena genes with high expression had higher GC content and tended to have codons ending in GC. It has previously been speculated that codons ending in GC have higher translation efficiency and accuracy [48,49].
Correspondence analysis showed that nucleotide composition, which plays an important role in CUB, is significantly correlated with axis 1. Furthermore, correlation analysis indicated that multiple factors such as gene length and Gravy and Aroma of amino acids together influenced CUB in the 21 ciliate species. PR2 analysis showed that mutation pressure and natural selection bias were both involved in CUB. The neutral theory of molecular evolution suggests that silent mutation sites in codons represent neutral evolution [89]. In this study, GC12 and GC3 showed a significant correlation, indicating that mutation pressure plays an important role in CUB in ciliates. However, the linear regression slope of the neutral plot was less than 50%, suggesting that natural selection bias may also play a major role.
The high GC content in the genome of the class Spirotrichea may be due to environmental stress, resulting from stable DNA. Biased gene transformation (BGC) or mutation pressure that changed AT into GC may be the reason for the differences in nucleotide compositions [50]. The BGC is a GC-biased repair process occurring in the recombinant genome, which is the main driving force of genome evolution [90]. CUB in ciliates may be an adaptive mechanism to facilitate adaptation to environmental conditions. Therefore, ciliates from different environments may differ in their CUB.

Conclusions
Though different species in ciliate have variant genome size and GC content, we conclude that most of the ciliates investigated prefer using the codons of AT-ending and the CUB of ciliates is affected by gene mutation and natural selection together.