Next Article in Journal
Proton-Pump Inhibitors Suppress T Cell Response by Shifting Intracellular Zinc Distribution
Previous Article in Journal
Meta-Analysis and Systematic Review of HLA DQ2/DQ8 in Adults with Celiac Disease
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

The Influence of the Selection at the Amino Acid Level on Synonymous Codon Usage from the Viewpoint of Alternative Genetic Codes

Department of Bioinformatics and Genomics, Faculty of Biotechnology, University of Wroclaw, ul. Joliot-Curie 14a, 50-383 Wroclaw, Poland
*
Author to whom correspondence should be addressed.
Int. J. Mol. Sci. 2023, 24(2), 1185; https://doi.org/10.3390/ijms24021185
Submission received: 10 November 2022 / Revised: 19 December 2022 / Accepted: 30 December 2022 / Published: 7 January 2023
(This article belongs to the Section Molecular Microbiology)

Abstract

:
Synonymous codon usage can be influenced by mutations and/or selection, e.g., for speed of protein translation and correct folding. However, this codon bias can also be affected by a general selection at the amino acid level due to differences in the acceptance of the loss and generation of these codons. To assess the importance of this effect, we constructed a mutation–selection model model, in which we generated almost 90,000 stationary nucleotide distributions produced by mutational processes and applied a selection based on differences in physicochemical properties of amino acids. Under these conditions, we calculated the usage of fourfold degenerated (4FD) codons and compared it with the usage characteristic of the pure mutations. We considered both the standard genetic code (SGC) and alternative genetic codes (AGCs). The analyses showed that a majority of AGCs produced a greater 4FD codon bias than the SGC. The mutations producing more thymine or adenine than guanine and cytosine increased the differences in usage. On the other hand, the mutational pressures generating a lot of cytosine or guanine with a low content of adenine and thymine decreased this bias because the nucleotide content of most 4FD codons stayed in the compositional equilibrium with these pressures. The comparison of the theoretical results with those for real protein coding sequences showed that the influence of selection at the amino acid level on the synonymous codon usage cannot be neglected. The analyses indicate that the effect of amino acid selection cannot be disregarded and that it can interfere with other selection factors influencing codon usage, especially in AT-rich genomes, in which AGCs are usually used.

1. Introduction

The redundancy of the standard genetic code (SGC) is a direct consequence of the fact that 64 codons encode 20 amino acids and the signal for the termination of protein synthesis (translation). Therefore, groups of codons exist that encode the same genetic information. In most cases, codons belonging to the given group differ in one position. As a consequence, some point mutations, which may occur between codons in the group, are silent, i.e., they do not change coded information. These mutations and codons are called synonymous. For example, there are codon groups named fourfold degenerated (4FD), in which the third codon position can be changed to any nucleotide without consequences on the encoded amino acid. Such codons encode alanine (Ala), glycine (Gly), proline (Pro), threonine (Thr) and valine (Val) in the SGC. Nevertheless, the distribution of synonymous codons’ usage in protein-coding sequences is not uniform, and some preferences in this usage are observed. This phenomenon is well-known as a codon usage bias [1]. Causes and implications of this bias have been widely investigated and discussed [2,3,4].
There are two main explanations of the codon bias, namely, directional mutational pressure and specific selection [2,3,4]. The mutational view assumes that variability in the usage of synonymous codons results from the mutational processes, which substitute nucleotides with different rates. Consequently, we should expect more codons composed of nucleotides generated more frequently. In the light of this assumption, G C / A T content is the main force responsible for the pattern of codon usage in genomes [5,6,7]. For example, in the genomes characterized by a high G C content, synonymous codons ending with guanine and cytosine are more frequently observed in comparison to those ending with thymine and adenine. It should also be noted that the mutational pressure can be different in individual genome regions [6,8,9] and depend on differently replicating DNA strands, i.e., the leading and the lagging strands [10,11,12,13,14]. Thus, genes located in various genomic regions and DNA strands can demonstrate a specific codon usage.
The second opinion points out that the synonymous codon bias arises from selection factors. For example, it was shown that highly expressed protein-coding genes have a tendency to use codons for which there are more tRNA molecules in a cell [15,16,17,18,19,20,21,22]. This codon bias is supposed to be an adaptation to a more effective, faster and accurate translation process.
The variable usage of synonymous codons was also detected in different regions of protein-coding sequences. It was explained by selection against the formation of mRNA secondary structures at the 5 end to ensure the efficient initiation of translation [23] or by selection for the optimal protein folding and adopting the correct structure during the translation [24,25,26]. What is more, some poorly adapted codons in the first 90–150 codon positions in genes are expected to slow down the translation process to reduce the number of ribosomal blockages and collision-induced abortions [27]. It was also proposed that the codon usage changes the gene expression by the influence on the transcription process [28,29,30] and mRNA stability [31].
Morton [32] proposed another important factor influencing the synonymous codon usage in protein-coding genes. He postulated that this effect results from a general selection at the amino acid level and associated with that a biased selection for codon substitutions. For example, glycine is encoded by four codons, namely, GGA, GGC, GGT and GGG in the standard genetic code. Its codon GGA can potentially mutate by a single substitution into TGA, which encodes a translation termination signal causing premature protein synthesis. Other glycine codons have no risk to be changed into the stop codon by single-nucleotide mutations. Since the GGA→TGA mutation is deleterious, it is not accepted by selection. Thus, the codon GGA would not disappear, at least due to these mutations, as would other Gly codons whose changes to other codons are more tolerated. In consequence, the frequency of GGA should be higher than the other codons. If the codon TGA encodes Gly or other similar amino acid, we should expect a greater tolerance for the GGA→TGA mutation and a similar disappearance rate of GGA as other glycine codons. Then, the usage of all GGN codons would be more balanced. Actually, there are the alternative genetic codes in which TGA codes for an amino acid and this effect should be observed. Of course, the frequency of these codons is the result of many other substitutions, which are not equivalent from a selectional point of view. Thus, the diverse selection against substitutions can lead to distinct balances between the loss and generation of the codons and can finally cause their different usage.
In fact, Morton [32] showed that the composition of the third positions in synonymous codons deviates from the composition of non-coding neutral sites, even in the absence of direct selection on the synonymous codons. However, his study only considered four selected mutational processes generating equal frequencies of complementary nucleotides. Therefore, this hypothesis was further explored by Błażej et. al. [33], who developed a theoretical mutation–selection model model and examined 88 , 560 nucleotide stationary distributions produced by mutational pressures under selection at the amino acid level. They found that the influence of selection at the amino acid level is visible, especially for theoretical mutational pressures generating more adenine and thymine than guanine and cytosine, as well as more purines than pyrimidines. Therefore, this effect cannot be neglected and requires further studies.
The previous study [33] analyzed the synonymous codon usage assuming the assignment of amino acids to codons in the standard genetic code (SGC), which is nearly universal among all living organisms. However, there are deviations from this universality, called alternative genetic codes (AGCs). These genetic coding systems operate mainly in small mitochondrial [34,35,36,37,38,39] and plastid genomes [40,41,42], but also in the genomes of parasites and symbionts [43,44,45,46] or even the nuclear genomes of various eukaryotes [47,48,49,50,51,52,53].
Generally, the deviations from the SGC observed in the AGCs can result from: (i) the reassignment of codons encoding the typical 20 amino acids and stopping the protein translation; (ii) loss of codon meaning due to the disappearance of this very codon; and (iii) the incorporation of new amino acids, e.g., selenocysteine and pyrrolysine [54,55,56]. The deviations of the first type are the most frequent. The changes in the codon’s meaning are mainly related to reassignments of the stop to sense codons, e.g., TGA to tryptophan. In the study, we analyzed all AGCs available at the NCBI database that were characterized by different assignments of amino acids and/or stop translation signals to codons than in the SGC. They also had the four amino acids, Ala, Gly, Pro, Thr and Val, coded by the same 4FD codons as the SGC.
Because the acceptance of a given codon substitution to others depends on the assignment of amino acids to codons, we can expect that the alternative genetic codes can demonstrate a different influence on codon usage than the SGC. Therefore, we decided to study the effect of selection at the amino acid level on codon usage, including the alternative codes. The comparison of AGCs with the SGC revealed that the alternatives minimize consequences of point mutations or translational errors much better [56,57], and the SGC is not fully optimized in this respect [58,59,60,61,62,63,64,65,66]. Thus, we can also expect a different strength of selection at the amino acids level on the codon usage in these alternative coding systems.

2. Results

The main goal of our investigation was to assess a potential selection strength at the amino acid level on the usage of codons, assuming coding systems in alternative genetic codes and in the standard genetic code for comparison. Therefore, we calculated the usage of fourfold degenerated (4FD) codons generated under almost 90,000 nucleotide stationary distributions generated by mutational processes and compared it with the usage of codons whose mutations to other codons were subjected to selection based on differences in the physicochemical properties of coded amino acids. The differences were described by Grantham’s chemical similarity matrix [67].
To estimate the selection strength on the codon usage F π , we calculated for each of 4FD codons the difference between the codon frequency after the selection and their expected frequency resulting only from mutations. The difference was divided by the expected frequency. Next, we summed up the differences obtained for five 4FD codons. The high F π values should be interpreted as a large variation in the synonymous codon usage due to selection at the amino acid level. Because a stationary nucleotide distribution can be realized by many mutational pressures, we applied a modified evolutionary optimization algorithm to determine the pressures for which the effect on the relative codon usage was maximized.

2.1. General Performance of Genetic Codes in Terms of the Selection Strength at the Amino Acid Level on the Codon Usage

Based on conducted simulations for each genetic code, we obtained 88,560 values of selection strength on the codon usage F π produced under various mutational pressures generating various stationary nucleotide compositions. In order to present a general performance of the codes in terms of the selection strength on codon usage, we calculated several descriptive statistics of F π such as the maximum, the minimum and the median, as well as the percentage of changes in relation to the SGC (Table 1, Figure 1).
As one can notice, the selection strength is different for the individual genetic codes. On one hand, there are ten AGCs that show a lower median m e ( F π ) among all tested mutational stationary distributions than the SGC. Three of them, i.e., codes 9, 14 and 21, have a 19–21% smaller median than the SGC (Table 1, Figure 1). These codes are used in the mitochondria of flatworms (codes 9 and 14), roundworms (code 14), trematodes (code 21) and echinoderms (code 9). The results suggest that the synonymous codon bias generated under these codes due to the selection at the amino acid level is generally smaller than in the SGC. These codes produce also the maximum selection strength on the codon usage m a x ( F π ) , which is 25%-31% lower than the SGC. Thereby, they induce a weaker selection on the codon usage under the studied mutational pressures.
On the other hand, the rest of the 13 investigated alternative genetic codes are characterized by a larger m e ( F π ) in comparison to the SGC. Five of them, i.e., 22, 23, 2, 12 and 26, can generate a codon bias 12% to 37% larger than the SGC (Table 1, Figure 1). Three codes were found to operate in the mitochondrial genomes of various eukaryotes: code 22 in green alga Scenedesmus obliquus, code 23 in a protist labyrinthulid Thraustochytrium and code 2 in vertebrates, whereas code 12 and 26 is an alternative to the yeast nuclear genome. Their values of m i n ( F π ) and m a x ( F π ) are also substantially higher, from 6% to 75%, than in the SGC. This indicates that the bias in the synonymous codon usage induced by these codes under the selection at the amino acid level is generally greater than in the SGC.

2.2. The Importance of Individual Assignments in Alternative Genetic Codes in the Selection Strength on Codon Usage

It is interesting to check which assignments of amino acids to codons in AGCs contribute the most to decreasing or increasing the selection strength on the codon usage. Codes 9, 14 and 21, characterized by a lower median m e ( F π ) than the SGC, feature the reassignment of stop codon TGA into tryptophan. In code 14, the stop codon TAA also encodes tyrosine. The codes include the reprogramming of several other codons into canonical amino acids, too.
It can be noticed that the codon TGA is redefined to Trp in these codes, which results in the single-nucleotide substitution of GGA (Gly) to TGA being more acceptable from the selection point of view than in the SGC, in which TGA encodes a stop translation signal. This reassignment can decrease differences in the loss and generation of glycine codons and, finally, in their relative usage (Figure 2). Three codes, 9, 14 and 21, are also characterized by the reassignment of arginine codons AGA and AGG into serine, which results in all four codons in the group AGN encode Ser. In consequence, the single nucleotide mutations of individual 4FD codons from the group Thr (ACN) and Gly (GGN) to these serine codons have a similar selective value, which can equalize their relative frequency (Figure 2). Of course, other codon redefinitions can also indirectly modify the usage of the fourfold degenerated codons considered here. Consequently, the individual selection strength on the usage of Thr and Gly codons F π | s (see Material and Methods) is smaller for codes 9, 14 and 21 than for the SGC. In the case of ACN codons this measure is 0.049, 0.047 and 0.037 for codes 9, 14 and 21, whereas 0.053 for the SGC. For GGN codons, F π | s is 0.041, 0.046 and 0.021 for the AGCs and 0.049 for the SGC.
On the other hand, the alternative coding systems 22, 23, 2, 12 and 26 are characterized by a substantially higher m e ( F π ) of selection strength on the 4FD codon usage. In code 22, the codon TCA means the stop translation signal instead of serine as other codons in the TCN group. This codon redefinition entails unbalanced substitutions of alanine codons GCN differing from the serine codons TCN by the first codon position (Figure 3). The deleterious mutation of GCA (Ala) to TCA (stop) can be accepted by selection with much lower probability in the comparison to substitutions of other Ala codons. Consequently, the Ala codons can be unequally used. In fact, F π | s for GCN is much higher for code 22 than for the SGC, i.e., 2.677 vs. 0.706.
Likewise, another Ser codon group AGN is disrupted in code 2 because its codons AGA and AGG encode the stop signal. In this case, the acceptance of substitutions of threonine codons ACA and ACG into these new stop codons can be weaker than substitutions of other Thr codons ACT and ACC into intact Ser codons AGT and AGC, which influences the unbalanced frequency of Thr codons (Figure 3). Therefore, F π | s for ACN is 3.953 in the case of code 2 and only 0.670 for the SGC.
In turn, code 23 has the TTA codon encoding the stop translation signal instead of leucine. Therefore, the substitution of the valine codon GTA into TTA can be much less frequently accepted than substitutions of other Val codons: GTG into TTG still encoding leucine, as well as GTT and GTC, into TTT and TTC, respectively, coding for phenylalanine (Figure 3). All three amino acids, Val, Leu and Phe, show similar hydrophobic properties, so their substitutions are much more easily acceptable. In consequence, the equality in the frequency of Val codons can be disturbed, so F π | s for Val codons for code 23 is 3.264 and larger than 1.369 for the SGC.
A similar explanation can be applied to proline codons (CCN) in code 12, in which CTG encodes Ser rather than Leu similar to other codons in the CTN group. The substitution CCG (Pro)→CTG (Ser) can be less accepted than substitutions of Pro to Leu: CCA→CTA, CCC→CTC and CCT→CTT, because, according to general physicochemical properties, proline is more similar to leucine than serine (Figure 3). In agreement with that, F π | s for Pro codons is 2.557 for code 12 and is greater than 0.473 for the SGC.
Finally, in code 26, CTG encodes Ala instead of Leu as other CTN codons. These amino acids show distinct physicochemical properties. For example, alanine is much smaller than leucine. Thus, the mutations of proline codons CCN and valine codons GTN into CTN codons in this alternative code have different selection values. CCT→CTT and GTT→CTT can be less accepted than substitutions of other Pro or Val codons into unchanged Leu codons (Figure 3). Thus, F π | s values for the CCN (2.641) and GTN (2.641) codons are greater in code 26 than in the SGC (0.473 and 1.369, respectively).

2.3. The Influence of Mutational Pressure on the Selection Strength on Codon Usage

The selection strength at the amino acid level F π acting on the 4FD codon usage should depend on the rate of individual nucleotide substitutions, which can change the general codon frequency. In Figure 4, we compared the rate of individual nucleotide substitutions in the mutational pressures that minimize and maximize F π . The rates of substitutions T→A, T→C, G→A, G→C and C→T were statistically higher (with p < 0.05 corrected by the Benjamini–Hochberg procedure and the paired Wilcoxon test) in most mutational pressures responsible for the increase in F π . This means that these substitutions can elevate the selection strength on the 4FD codon usage. On the other hand, the probability of substitutions A→G, G→T and C→G was statistically higher (with p < 0.05) in the mutational pressures that decreased F π and can reduce the codon bias.
For example, the substitution G→A can increase the selection strength on the codon usage because it is responsible in most genetic codes for many mutations of 4FD codons into those encoding different amino acids. Because of this substitution, Val codons (GTN) can be mutated into ATN codons encoding Ile or Met in the case of all studied genetic codes. Gly codons (GGN) can mutate due to this substitution into two codon groups encoding various amino acids depending on the genetic code: GAN codons for Asp/Glu (in all studied codes) or AGN codons for Arg/Ser (in codes 1, 3, 4, 6, 10–12, 15, 16, 22, 23, 25–32), Ser/Stop (in code 2), Gly/Ser (in code 13) or Lys/Ser (in codes 24 and 33).
We also investigated the stationary nucleotide distributions generated by the mutational pressures that minimize and maximize the selection strength at the amino acid level F π . The results are presented in Table 2 and Table 3. Interestingly, the stationary distributions that minimize the selection strength are characterized by a high G + C nucleotide content, from 0.80 to 0.9 (Table 2). For example, the nucleotide stationary distributions associated with code 21 showing the lowest m i n ( F π ) is characterized by the 0.89 G + C fraction.
In contrast to that, the stationary distributions that maximize the amino acid selection strength on the 4FD codon usage under the respective genetic codes generate a high A + T content (Table 3). The nucleotide distributions are characterized by the fraction of adenine and thymine from 0.88 to 0.89 . The distribution associated with code 23 showing the highest m a x ( F π ) is characterized by the extreme fraction of adenine 0.84 , whereas other nucleotides constitute only 0.05 or 0.06 .
The relationships of m i n ( F π ) and m a x ( F π ) from the stationary nucleotide distributions are presented in Figure 5. It can be seen that the minimum selection strength on the 4FD codon usage can be obtained by all genetic codes when they are subjected to the mutational pressure generating a very low amount of adenine and thymine. Considering the frequency of guanine and cytosine, we can find two groups of codes. One of these groups (codes 2, 5, 12, 13, 21 and 31) produced a m i n ( F π ) under the stationary distribution that was richer in guanine (0.66–0.83) than cytosine (0.07–0.21), and the other group under the excess of cytosine (0.44–0.75) over guanine (0.15–0.41).
Considering the maximum selection strength on the 4FD codon usage, we can notice that all codes are grouped in the plot for a very low frequency of cytosine and guanine (Figure 5). However, in terms of adenine and thymine content, they are separated into two groups. Most of the codes generated m a x ( F π ) under the pressures producing a high frequency of thymine (0.67–0.82) and a lower frequency of adenine (0.07–0.22). In turn, other codes (1, 2, 16, 22, 23, 25 and 30) subjected to the mutations generating a high content of adenine (0.78–0.84) and a small content of thymine (0.05–0.11) resulted in the maximum ( F π ) . The results indicate that extreme codon usage can be generated under the selection on the amino acid level by mutational pressures with stationary nucleotide distributions specific for various groups of genetic codes.
Interestingly, the stationary distributions that are associated with the minimization of the selection influence on the usage of 4FD codons are more similar to the nucleotide composition of most these codons, i.e., Ala, Gly and Pro, because they are also rich in G + C content (0.83). The similarity is visualized in Figure 6, which presents the biplot of correspondence analysis for the nucleotide composition of 4FD codons and nucleotide stationary compositions generated by mutational pressures, which, together with the amino acid selection, minimize or maximize F π for individual genetic codes. The 4FD codon groups for Pro, Ala and Gly are clustered with the compositions associated with the minimum deviation in the relative codon usage. These nucleotide contents are characterized by the high frequency of G and C. The separation of nucleotide compositions associated with the maximization of F π is also clearly visible. One set is characterized by the high content of A and the other of T. The codon groups for threonine and valine are located in the middle of the plot due to the equal content of A + T and G + C .
The comparison in Figure 6 indicates that the composition of most 4FD codons remains closer to the equilibrium state with the pressures minimizing the selection strength on the codon usage. In contrast to that, the pressures generating more A + T cause these codons to be more frequently substituted by the codons rich in adenine and thymine. This would explain why the 4FD codon usage is more biased under the mutational pressures generating a high frequency of A and T.
The results showed only the extreme nucleotide distributions that were associated with the minimum and maximum deviation in the 4FD codon usage. Therefore, we investigated the relationships between the maximum deviation in the selection strength, i.e., m a x ( F π ) , and the stationary frequency of nucleotides induced by the mutational pressures that, together with the amino acid, generated this highest codon bias selection. The plot in Figure 7 demonstrates that the dependence of median F π on the A + T content is monotonically increasing, although non-linearly, for all studied genetic codes. Initially, the m e ( F π ) increases with the rise of A + T in all considered coding systems to about 0.25 of A + T . It stabilizes to ca 0.60 of A + T and then rapidly grows up to the highest values for the extreme A + T content. The alternative genetic codes 9, 14 and 21 show the lowest course of the curve, whereas the curve of code 22 stands out from the others and achieves the highest position in the plot. This indicates that the most biased 4FD codon usage is produced by this code under mutational pressures, generating a very wide range of A + T compositions.

2.4. The Comparison of Calculated Deviations in the Codon Usage with That Observed in Protein-Coding Genes

The results presented above refer to the maximum deviation in the codon usage that can be exerted by theoretical mutational pressures with the selection at the amino acid level for the given genetic codes. Thus, it is interesting to assess the strength of this effect in the case of real biological data. Therefore, we calculated an analogous measure to F π , called F, which is the normalized difference between: (i) the observed usage of 4FD codons in protein-coding sequences that are translated by the studied genetic codes and (ii) the expected frequencies of these codons. The expected frequencies were approximated by the average of relative frequencies of all 4FD codons in these protein-coding genes (See Material and Methods for details). The obtained values were compared with the F π values collected from the theoretical calculations (Figure 8).
Although the medians of F for biological data are larger than those achieved for the theoretical effect (p < 1 × 10 16 corrected by Benjamini–Hochberg procedure, Kruskal–Wallis test), the values are of the same order of magnitude and the distributions overlap (Figure 8). Moreover, some theoretical cases are even greater than those for protein-coding genes. Comparing the median values, we can ascertain that the the selection strength on the 4FD codon usage estimated in the computer simulations constitute about 35 % of the deviation in the codon usage found for protein coding sequences read under the studied genetic codes. The deviation in the 4FD codon usage assessed for the protein genes translated by the SGC and the AGCs are comparable and not statistically different from each other (p = 0.84).
The bias in the codon usage obtained for the real gene sequences can be higher than those for the theoretical calculations because additional factors, e.g., those associated with the effectiveness of translation, can influence the codon usage. Nevertheless, the results indicate that the effect of selection at the amino acid level can explain a substantial proportion of the observed 4FD codon usage.

3. Discussion

The results enabled us to assess the impact of mutational pressures and the selection at the amino acid level on the usage of fourfold degenerated codons. The influence was studied for alternative genetic codes in comparison with the standard genetic code. The analyses showed that there are ten AGCs that generate 4FD codon biases smaller than the SGC. However, 13 AGCs produced a much higher diversity in the 4FD codon usage than the SGC. The codes generating the largest codon bias evolved independently in various eukaryotic groups, in mitochondrial genomes of green algae, labyrinthulids and vertebrates, as well as the nuclear genomes of yeasts. New codes are still being discovered, especially in poorly studied groups of protists [52,68,69,70]. Therefore, we cannot exclude that other codes can show a more extreme influence on the codon usage.
The weaker bias in the 4FD codon usage can be associated with specific codon assignments in AGCs that minimize differences in the loss and generation of codons in a given 4FD group. In turn, the greater dissimilarity in the usage can be associated with a larger disequilibrium between the loss and generation of these codons. This effect should be enhanced when codons in a 4FD group mutate into amino acids that differ substantially from each other in stereochemistry and physicochemical properties. A much greater distinction between the codon usage should be observed when a codon out of this group is substituted into the stop translation signal, whereas other codons in this group are mutated into codons encoding the same amino acid or showing similar properties. In these cases, the difference between probability of accepted substitutions is the largest. Actually, such assignments are present in many AGCs.
The influence of selection at the amino acid level on the 4FD codon usage is clearly related to the mutational pressure acting on the codons and generating a specific nucleotide stationary distribution. The pressures that produce the high frequency of guanine or cytosine decrease the 4FD codon bias. Most 4FD codons are also GC-rich, which can indicate that the mutational pressures generating more G and C hardly change the usage of these codons under the applied selection model. In turn, the pressures generating more thymine or adenine at the expense of guanine and cytosine contribute to the largest difference in the usage of 4FD codons. Thus, these pressures can change the usage of these codons to a greater extent due to a higher rate of some substitutions, e.g., G→A and C→T. It should be emphasized that the diversified usage of 4FD codons is produced after the influence of the selection pressure associated with differences in physicochemical properties of the coded amino acids but not only the pure mutational pressure. The two processes, the mutations and the diverse selection of the codon substitutions, can together generate the high deviation in the usage of 4FD codons. Bearing that in mind, the calculated deviation in codon usage F π was normalized by the codon usage generated only by the mutations.
The comparison of 4FD codon usage bias in the real gene sequences with the theoretical estimations demonstrates that the effect of selection at the amino acid level could explain the observed codon usage bias and should not be disregarded. This effect can constitute even more than one-third of the total 4FD codon usage. Our results also have interesting evolutionary consequences and imply that combinations of various mutational pressures and alternative genetic codes can increase and decrease the 4FD codons’ usage. The effect of selection at the amino acid usage is most visible under the mutations generating the high A + T content. Therefore, this influence should be enhanced in AT-rich genomes, which are present in bacterial and eukaryotic endosymbionts and intracellular parasites [71,72,73,74,75,76,77] and most organelles [78,79]. It was also found that even in free-living bacteria, there is a tendency to generate more adenine and thymine [73,80]. This means that the deviation in 4FD codon usage due to the amino acid selection can be more common than one can assume.
The selection at the amino acid level on codon usage can disturb and interfere with the effects of other factors influencing codon usage, e.g., selection on translational speed and efficiency. This implies that cellular systems involved in protein translation should be modified and adapt with the change in mutational pressure and reassignments in the genetic code. The greater deviation of 4FD codon usage in AGCs than the SGC suggests that the translation systems under the alternative genetic systems needs to undergo a major reorganization, if the accuracy and effectiveness of protein synthesis and folding are maintained at the same level.
Codon usage is important for genomes of viruses, which utilize the translational machinery of their host and have to adapt the usage in order to optimize the synthesis of their proteins. The higher similarity of viral codon usage to the host can facilitate the replication of these pathogenic agents. The usage is similar to specific host genes [81,82] and is visible especially in viruses infecting a narrow spectrum of hosts [83,84]. Therefore, dynamic changes in codon usage in the hosts, e.g., due to changes in mutational pressure, can protect them against infectious agents. Then, the speed and efficiency of viral protein synthesis can be lower and worse. Thereby, the hosts can avoid or diminish the infection.
The findings presenting here indicate that the effect of amino acid selection on the codon bias can occur under various coding systems and cannot be neglected and should be elevated under the mutational pressure generating more AT than GC. Disregarding this effect can mislead that only other factors influence the synonymous codon usage. For example, one can overestimate the number of protein-coding genes whose codon usage is subjected to selection on the translational efficiency. It is not inconceivable that the selection at the amino acid level is a primary cause of synonymous codon usage, and other selectional forces have to fit to it. For example, the concentration of tRNA isoacceptors should be adapted to the number of recognized codons in protein-coding sequences, which can be modified by the selection at the amino acid level. When many tRNA isoacceptors recognizing the most frequent synonymous codons are produced, the translational elongation is more effective [19,85].

4. Materials and Methods

We included in the study all 23 AGCs that encode 20 canonical amino acids and differ from the SGC in at least one assignment of amino acids and/or stop to codons (Table 4). At the same time, these codes contain the same 4FD codons for Ala, Gly, Pro, Thr and Val as the SGC. The codes were compiled by Andrzej (Anjay) Elzanowski and Jim Ostell at the National Center for Biotechnology Information (NCBI), Bethesda, MD, USA. Detailed information about these coding systems are available at the NCBI database (https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi), accessed on 10 November 2022.
The applied methodology was based on that from [33]. Briefly, to detect the influence of selection at the amino acid level on the usage of 4FD codons for Ala, Gly, Pro, Thr and Val, we compared the usage produced by various mutational pressures with that subjected also to selectional constraints.
The effect of mutations was induced by homogeneous, stationary and continuous-time Markov processes characterized by 88 , 560 different stationary distributions of four nucleotides A, C, G and T. The stationary distributions of these mutational pressures, i.e., the frequencies of particular nucleotide, ranged from 0.05 to 0.85 with 0.01 increments. For each distribution, we generated rate matrices under an unrestricted (UNREST) model of nucleotide substitutions [86]. After uniformization of the rate matrices to transition probability matrices of nucleotides, we calculated mutational transition probability matrices for codons.
The selection pressure was described by an amino acid acceptance matrix based on Grantham’s distance matrix depending on three physicochemical properties of amino acids: composition, polarity and molecular volume [67]. The values in this matrix were converted into acceptance probabilities. In the case of substitutions involving stop translation signal, we assumed the lowest possible probability of acceptance in the derived acceptance matrix. Including the mutational and selectional components in one matrix, we also calculated mutational–selectional transition probability matrices for codons.
However, it is well-known that there are infinitely many Markov processes for a given stationary distribution. Therefore, we applied a modified version of the Evolutionary Strategies approach [87] in order to detect the maximum differences between the relative 4FD codon usage induced by a given Markov process with and without the effect of selection. Thus, for each mutational process acting together with selection, we calculated the selection strength F π in two steps. Firstly, we calculated the normalized difference between the relative frequency of 4FD codons after the selection and their expected frequency resulting only from a mutation process:
F π | s = i A , T , G , C | π i π s i s e l π s s e l | π i ,
where π i is a stationary frequency of i nucleotide; s denotes a group of 4FD codons assigned to a specific amino acid; s i is a selected codon belonging to s group, where i nucleotide occurs at the third codon position. Thus, the measure F π | s describes the selection strength in the case of single codon group s. Based on that, the selection strength for all 4FD codon groups was obtained:
F π = s S F π | s ,
where S is the set of all considered codon groups.
Moreover, we calculated a parameter corresponding to F π for the 4FD codons in protein-coding sequences translated by appropriate genetic codes [33]. This measure F is the summarized deviation from the expectation in the codon usage for all 4FD groups in these sequences:
F = s S f s ,
where
f s = i A , T , G , C | e i o s i o s | e i
is the normalized difference between its relative frequency of 4FD codons in a group and their expected frequency; o s i is the observed frequency of the 4FD codon with a nucleotide i at the third codon position; o s is the observed frequency of all codons in the group; and e i is the expected frequency of the codon calculated as the average of relative frequencies of all 4FD codons with a nucleotide i at the third codon position.
The deviations F from the expectation in the 4FD codon usage for protein genes translated by the SGC in 4879 bacteria were taken from [33], whereas the deviations F for protein genes translated by AGCs were calculated based on the codon usage published by [68] for codes 27 and 28 or derived from sequences of these genes collected from the NCBI GenBank database (https://www.ncbi.nlm.nih.gov/), accessed on 16 April 2022. in the case of other codes. In the analysis, we considered only these codes for which we were able to collect at least 12 protein-coding sequences. Code 30 was excluded from this analysis because we found only two sequences for it. In total, we gathered 637,066 sequences for the studied AGCs.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/ijms24021185/s1.

Author Contributions

Conceptualization, P.B. and P.M.; methodology, P.B.; software, P.B. and K.P.; validation, P.M., D.M. and K.P.; formal analysis, P.B., K.P., P.M. and D.M.; investigation, P.B., K.P. and P.M.; writing, P.M., P.B. and D.M. All authors have read and agreed to the published version of the manuscript.

Funding

Some computations were carried out at the Wrocław Centre for Networking and Supercomputing under Grant No. 552 (granted to K.P.).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The computations were conducted using C++ programming language. All source codes and raw data relevant to our investigations were included in Supplementary Materials.

Acknowledgments

We are grateful for insightful comments of three Reviewers. Their remarks greatly improved the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Sharp, P.M.; Li, W.H. An evolutionary perspective on synonymous codon usage in unicellular organisms. J. Mol. Evol. 1986, 24, 28–38. [Google Scholar] [CrossRef] [PubMed]
  2. Plotkin, J.B.; Kudla, G. Synonymous but not the same: The causes and consequences of codon bias. Nat. Rev. Genet. 2011, 12, 32–42. [Google Scholar] [CrossRef] [Green Version]
  3. Iriarte, A.; Lamolle, G.; Musto, H. Codon usage bias: An endless tale. J. Mol. Evol. 2021, 89, 589–593. [Google Scholar] [CrossRef] [PubMed]
  4. Parvathy, S.T.; Udayasuriyan, V.; Bhadana, V. Codon usage bias. Mol. Biol. Rep. 2022, 49, 539–565. [Google Scholar] [CrossRef] [PubMed]
  5. Knight, R.D.; Freeland, S.J.; Landweber, L.F. A simple model based on mutation and selection explains trends in codon and amino-acid usage and GC composition within and across genomes. Genome Biol. 2001, 2, RESEARCH0010. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  6. Chen, S.L.; Lee, W.; Hottes, A.K.; Shapiro, L.; McAdams, H.H. Codon usage between genomes is constrained by genome-wide mutational processes. Proc. Natl. Acad. Sci. USA 2004, 101, 3480–3485. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  7. Li, J.; Zhou, J.; Wu, Y.; Yang, S.; Tian, D. GC-content of synonymous codons profoundly influences amino acid usage. G3-Genes Genomes Genet. (Bethesda Md.) 2015, 5, 2027–2036. [Google Scholar] [CrossRef] [Green Version]
  8. Fedorov, A.; Saxonov, S.; Gilbert, W. Regularities of context-dependent codon bias in eukaryotic genes. Nucleic Acids Res. 2002, 30, 1192–1197. [Google Scholar] [CrossRef] [Green Version]
  9. Scaiewicz, V.; Sabbía, V.; Piovani, R.; Musto, H. CpG islands are the second main factor shaping codon usage in human genes. Biochem. Biophys. Res. Commun. 2006, 343, 1257–1261. [Google Scholar] [CrossRef]
  10. Lafay, B.; Lloyd, A.T.; McLean, M.J.; Devine, K.M.; Sharp, P.M.; Wolfe, K.H. Proteome composition and codon usage in spirochaetes: Species-specific and DNA strand-specific mutational biases. Nucleic Acids Res. 1999, 27, 1642–1649. [Google Scholar] [CrossRef]
  11. Mackiewicz, P.; Gierlik, A.; Kowalczuk, M.; Dudek, M.R.; Cebrat, S. How does replication-associated mutational pressure influence amino acid composition of proteins? Genome Res. 1999, 9, 409–416. [Google Scholar] [CrossRef]
  12. Mackiewicz, P.; Gierlik, A.; Kowalczuk, M.; Szczepanik, D.; Dudek, M.R.; Cebrat, S. Mechanisms generating long-range correlation in nucleotide composition of the Borrelia burgdorferi genome. Phys. A 1999, 273, 103–115. [Google Scholar] [CrossRef]
  13. Rocha, E.P.; Danchin, A.; Viari, A. Universal replication biases in bacteria. Mol. Microbiol. 1999, 32, 11–16. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  14. Qu, H.; Wu, H.; Zhang, T.; Zhang, Z.; Hu, S.; Yu, J. Nucleotide compositional asymmetry between the leading and lagging strands of eubacterial genomes. Res. Microbiol. 2010, 161, 838–846. [Google Scholar] [CrossRef] [PubMed]
  15. Bennetzen, J.L.; Hall, B.D. Codon selection in yeast. J. Biol. Chem. 1982, 257, 3026–3031. [Google Scholar] [CrossRef]
  16. Ikemura, T. Codon usage and tRNA content in unicellular and multicellular organisms. Mol. Biol. Evol. 1985, 2, 13–34. [Google Scholar] [CrossRef]
  17. Duret, L.; Mouchiroud, D. Expression pattern and, surprisingly, gene length shape codon usage in Caenorhabditis, Drosophila, and Arabidopsis. Proc. Natl. Acad. Sci. USA 1999, 96, 4482–4487. [Google Scholar] [CrossRef] [Green Version]
  18. Goetz, R.M.; Fuglsang, A. Correlation of codon bias measures with mRNA levels: Analysis of transcriptome data from Escherichia coli. Biochem. Biophys. Res. Commun. 2005, 327, 4–7. [Google Scholar] [CrossRef]
  19. Kanaya, S.; Yamada, Y.; Kudo, Y.; Ikemura, T. Studies of codon usage and tRNA genes of 18 unicellular organisms and quantification of Bacillus subtilis tRNAs: Gene expression level and species-specific diversity of codon usage based on multivariate analysis. Gene 1999, 238, 143–155. [Google Scholar] [CrossRef]
  20. Rocha, E.P.C. Codon usage bias from tRNA’s point of view: Redundancy, specialization, and efficient decoding for translation optimization. Genome Res. 2004, 14, 2279–2286. [Google Scholar] [CrossRef]
  21. Cannarrozzi, G.; Schraudolph, N.N.; Faty, M.; von Rohr, P.; Friberg, M.T.; Roth, A.C.; Gonnet, P.; Gonnet, G.; Barral, Y. A role for codon order in translation dynamics. Cell 2010, 141, 355–367. [Google Scholar] [CrossRef] [Green Version]
  22. Supek, F.; Skunca, N.; Repar, J.; Vlahovicek, K.; Smuc, T. Translational selection is ubiquitous in prokaryotes. PLoS Genet. 2010, 6, e1001004. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  23. Gu, W.; Zhou, T.; Wilke, C.O. A universal trend of reduced mRNA stability near the translation-initiation site in prokaryotes and eukaryotes. PLoS Comput. Biol. 2010, 6, e1000664. [Google Scholar] [CrossRef] [Green Version]
  24. Oresic, M.; Shalloway, D. Specific correlations between relative synonymous codon usage and protein secondary structure. J. Mol. Biol. 1998, 281, 31–48. [Google Scholar] [CrossRef] [PubMed]
  25. Quax, T.E.F.; Claassens, N.J.; Söll, D.; van der Oost, J. Codon bias as a means to fine-tune gene expression. Mol. Cell 2015, 59, 149–161. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  26. Liu, Y. A code within the genetic code: Codon usage regulates co-translational protein folding. Cell Commun. Signal. 2020, 18, 145. [Google Scholar] [CrossRef]
  27. Tuller, T.; Carmi, A.; Vestsigian, K.; Navon, S.; Dorfan, Y.; Zaborske, J.; Pan, T.; Dahan, O.; Furman, I.; Pilpel, Y. An evolutionarily conserved mechanism for controlling the efficiency of protein translation. Cell 2010, 141, 344–354. [Google Scholar] [CrossRef] [Green Version]
  28. Kudla, G.; Lipinski, L.; Caffin, F.; Helwak, A.; Zylicz, M. High guanine and cytosine content increases mRNA levels in mammalian cells. PLoS Biol. 2006, 4, e180. [Google Scholar] [CrossRef]
  29. Zhou, Z.; Dang, Y.; Zhou, M.; Li, L.; Yu, C.H.; Fu, J.; Chen, S.; Liu, Y. Codon usage is an important determinant of gene expression levels largely through its effects on transcription. Proc. Natl. Acad. Sci. USA 2016, 113, E6117–E6125. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  30. Zhou, Z.; Dang, Y.; Zhou, M.; Yuan, H.; Liu, Y. Codon usage biases co-evolve with transcription termination machinery to suppress premature cleavage and polyadenylation. eLife 2018, 7, e33569. [Google Scholar] [CrossRef]
  31. Presnyak, V.; Alhusaini, N.; Chen, Y.H.; Martin, S.; Morris, N.; Kline, N.; Olson, S.; Weinberg, D.; Baker, K.E.; Graveley, B.R.; et al. Codon optimality is a major determinant of mRNA stability. Cell 2015, 160, 1111–1124. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  32. Morton, B.R. Selection at the amino acid level can influence synonymous codon usage: Implications for the study of codon adaptation in plastid genes. Genetics 2001, 159, 347–358. [Google Scholar] [CrossRef] [PubMed]
  33. Błażej, P.; Mackiewicz, D.; Wnetrzak, M.; Mackiewicz, P. The impact of selection at the amino acid level on the usage of synonymous codons. G3-Genes Genomes Genet. (Bethesda Md.) 2017, 7, 967–981. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  34. Osawa, S.; Ohama, T.; Jukes, T.H.; Watanabe, K. Evolution of the mitochondrial genetic code. I. Origin of AGR serine and stop codons in metazoan mitochondria. J. Mol. Evol. 1989, 29, 202–207. [Google Scholar] [CrossRef]
  35. Boore, J.L.; Brown, W.M. Complete DNA sequence of the mitochondrial genome of the black chiton, Katharina tunicata. Genetics 1994, 138, 423–443. [Google Scholar] [CrossRef]
  36. Clark-Walker, G.D.; Weiller, G.F. The structure of the small mitochondrial DNA of Kluyveromyces thermotolerans is likely to reflect the ancestral gene order in fungi. J. Mol. Evol. 1994, 38, 593–601. [Google Scholar] [CrossRef]
  37. Knight, R.D.; Landweber, L.F.; Yarus, M. How mitochondria redefine the code. J. Mol. Evol. 2001, 53, 299–313. [Google Scholar] [CrossRef]
  38. Yokobori, S.; Watanabe, Y.; Oshima, T. Mitochondrial genome of Ciona savignyi (Urochordata, Ascidiacea, Enterogona): Comparison of gene arrangement and tRNA genes with Halocynthia roretzi mitochondrial genome. J. Mol. Evol. 2003, 57, 574–587. [Google Scholar] [CrossRef]
  39. Abascal, F.; Posada, D.; Zardoya, R. The evolution of the mitochondrial genetic code in arthropods revisited. Mitochondrial DNA 2012, 23, 84–91. [Google Scholar] [CrossRef] [Green Version]
  40. Lang-Unnasch, N.; Aiello, D.P. Sequence evidence for an altered genetic code in the Neospora caninum plastid. Int. J. Parasitol. 1999, 29, 1557–1562. [Google Scholar] [CrossRef]
  41. Janouskovec, J.; Sobotka, R.; Lai, D.H.; Flegontov, P.; Konik, P.; Komenda, J.; Ali, S.; Prasil, O.; Pain, A.; Obornik, M.; et al. Split photosystem protein, linear-mapping topology, and growth of structural complexity in the plastid genome of Chromera velia. Mol. Biol. Evol. 2013, 30, 2447–2462. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  42. Del Cortona, A.; Leliaert, F.; Bogaert, K.A.; Turmel, M.; Boedeker, C.; Janouskovec, J.; Lopez-Bautista, J.M.; Verbruggen, H.; Vandepoele, K.; De Clerck, O. The plastid genome in Cladophorales green algae is encoded by hairpin chromosomes. Curr. Biol. 2017, 27, 3771–3782.e6. [Google Scholar] [CrossRef] [Green Version]
  43. Lim, P.O.; Sears, B.B. Evolutionary relationships of a plant-pathogenic mycoplasmalike organism and Acholeplasma laidlawii deduced from two ribosomal protein gene sequences. J. Bacteriol. 1992, 174, 2606–2611. [Google Scholar] [CrossRef] [Green Version]
  44. Bove, J.M. Molecular features of mollicutes. Clin. Infect. Dis. 1993, 17 (Suppl. 1), S10–S31. [Google Scholar] [CrossRef] [PubMed]
  45. McCutcheon, J.P.; McDonald, B.R.; Moran, N.A. Origin of an alternative genetic code in the extremely small and GC-rich genome of a bacterial symbiont. PLoS Genet. 2009, 5, e1000565. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  46. Campbell, J.H.; O’Donoghue, P.; Campbell, A.G.; Schwientek, P.; Sczyrba, A.; Woyke, T.; Soll, D.; Podar, M. UGA is an additional glycine codon in uncultured SR1 bacteria from the human microbiota. Proc. Natl. Acad. Sci. USA 2013, 110, 5540–5545. [Google Scholar] [CrossRef] [Green Version]
  47. Schneider, S.U.; Leible, M.B.; Yang, X.P. Strong homology between the small subunit of ribulose-1,5-bisphosphate carboxylase/oxygenase of two species of Acetabularia and the occurrence of unusual codon usage. Mol. Gen. Genet. 1989, 218, 445–452. [Google Scholar] [CrossRef]
  48. Santos, M.A.S.; Keith, G.; Tuite, M.F. Non-standard translational events in Candida albicans mediated by an unusual seryl-tRNA with a 5’-CAG-3’ (leucine) anticodon. EMBO J. 1993, 12, 607–616. [Google Scholar] [CrossRef]
  49. Hoffman, D.C.; Anderson, R.C.; Dubois, M.L.; Prescott, D.M. Macronuclear gene-sized molecules of hypotrichs. Nucleic Acids Res. 1995, 23, 1279–1283. [Google Scholar] [CrossRef] [Green Version]
  50. Panek, T.; Zihala, D.; Sokol, M.; Derelle, R.; Klimes, V.; Hradilova, M.; Zadrobilkova, E.; Susko, E.; Roger, A.J.; Cepicka, I.; et al. Nuclear genetic codes with a different meaning of the UAG and the UAA codon. BMC Biol. 2017, 15, 8. [Google Scholar] [CrossRef]
  51. Sanchez-Silva, R.; Villalobo, E.; Morin, L.; Torres, A. A new noncanonical nuclear genetic code: Translation of UAA into glutamate. Curr. Biol. 2003, 13, 442–447. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  52. Heaphy, S.M.; Mariotti, M.; Gladyshev, V.N.; Atkins, J.F.; Baranov, P.V. Novel ciliate genetic code variants including the reassignment of all three stop codons to sense codons in Condylostoma magnum. Mol. Biol. Evol. 2016, 33, 2885–2889. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  53. Muhlhausen, S.; Findeisen, P.; Plessmann, U.; Urlaub, H.; Kollmar, M. A novel nuclear genetic code alteration in yeasts and the evolution of codon reassignment in eukaryotes. Genome Res. 2016, 26, 945–955. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  54. Sengupta, S.; Higgs, P.G. A unified model of codon reassignment in alternative genetic codes. Genetics 2005, 170, 831–840. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  55. Sengupta, S.; Yang, X.; Higgs, P.G. The mechanisms of codon reassignments in mitochondrial genetic codes. J. Mol. Evol. 2007, 64, 662–688. [Google Scholar] [CrossRef] [Green Version]
  56. Błażej, P.; Wnetrzak, M.; Mackiewicz, D.; Gagat, P.; Mackiewicz, P. Many alternative and theoretical genetic codes are more robust to amino acid replacements than the standard genetic code. J. Theor. Biol. 2019, 464, 21–32. [Google Scholar] [CrossRef]
  57. Błażej, P.; Wnetrzak, M.; Mackiewicz, P. The importance of changes observed in the alternative genetic codes. In Proceedings of the 11th International Joint Conference on Biomedical Engineering Systems and Technologies—Volume 4: BIOINFORMATICS, Funchal, Madeira, Portugal, 19–21 January 2018; pp. 154–159. [Google Scholar] [CrossRef]
  58. Novozhilov, A.S.; Wolf, Y.I.; Koonin, E.V. Evolution of the genetic code: Partial optimization of a random code for robustness to translation error in a rugged fitness landscape. Biol. Direct 2007, 2, 24. [Google Scholar] [CrossRef] [Green Version]
  59. Massey, S.E. A neutral origin for error minimization in the genetic code. J. Mol. Evol. 2008, 67, 510–516. [Google Scholar] [CrossRef]
  60. Santos, M.A.S.; Gomes, A.C.; Santos, M.C.; Carreto, L.C.; Moura, G.R. The genetic code of the fungal CTG clade. C. R. Biol. 2011, 334, 607–611. [Google Scholar] [CrossRef]
  61. Błażej, P.; Wnetrzak, M.; Mackiewicz, P. The role of crossover operator in evolutionary-based approach to the problem of genetic code optimization. BioSystems 2016, 150, 61–72. [Google Scholar] [CrossRef]
  62. Santos, J.; Monteagudo, A. Inclusion of the fitness sharing technique in an evolutionary algorithm to analyze the fitness landscape of the genetic code adaptability. BMC Bioinform. 2017, 18, 195. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  63. Wnetrzak, M.; Błażej, P.; Mackiewicz, D.; Mackiewicz, P. The optimality of the standard genetic code assessed by an eight-objective evolutionary algorithm. BMC Evol. Biol. 2018, 18, 192. [Google Scholar] [CrossRef] [PubMed]
  64. Błażej, P.; Wnetrzak, M.; Mackiewicz, D.; Mackiewicz, P. Optimization of the standard genetic code according to three codon positions using an evolutionary algorithm. PLoS ONE 2018, 13, e0201715. [Google Scholar] [CrossRef] [Green Version]
  65. Błażej, P.; Wnetrzak, M.; Mackiewicz, D.; Mackiewicz, P. The influence of different types of translational inaccuracies on the genetic code structure. BMC Bioinform. 2019, 20, 114. [Google Scholar] [CrossRef] [Green Version]
  66. Wnetrzak, M.; Błażej, P.; Mackiewicz, P. Optimization of the standard genetic code in terms of two mutation types: Point mutations and frameshifts. BioSystems 2019, 181, 44–50. [Google Scholar] [CrossRef]
  67. Grantham, R. Amino acid difference formula to help explain protein evolution. Science 1974, 185, 862–864. [Google Scholar] [CrossRef]
  68. Swart, E.C.; Serra, V.; Petroni, G.; Nowacki, M. Genetic codes with no dedicated stop codon: Context-dependent translation termination. Cell 2016, 166, 691–702. [Google Scholar] [CrossRef] [Green Version]
  69. Zahonova, K.; Kostygov, A.Y.; Sevcikova, T.; Yurchenko, V.; Elias, M. An unprecedented non-canonical nuclear genetic code with all three termination codons reassigned as sense codons. Curr. Biol. 2016, 26, 2364–2369. [Google Scholar] [CrossRef] [Green Version]
  70. Li, Y.; Kocot, K.M.; Tassia, M.G.; Cannon, J.T.; Bernt, M.; Halanych, K.M. Mitogenomics reveals a novel genetic code in Hemichordata. Genome Biol. Evol. 2019, 11, 29–40. [Google Scholar] [CrossRef]
  71. Moran, N.A. Microbial minimalism: Genome reduction in bacterial pathogens. Cell 2002, 108, 583–586. [Google Scholar] [CrossRef] [Green Version]
  72. Pallen, M.J.; Wren, B.W. Bacterial pathogenomics. Nature 2007, 449, 835–842. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  73. Hershberg, R.; Petrov, D.A. Evidence that mutation is universally biased towards AT in bacteria. PLoS Genet. 2010, 6, e1001115. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  74. McCutcheon, J.P.; Moran, N.A. Functional convergence in reduced genomes of bacterial symbionts spanning 200 My of evolution. Genome Biol. Evol. 2010, 2, 708–718. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  75. Wernegreen, J.J. Endosymbiont evolution: Predictions from theory and surprises from genomes. Ann. N. Y. Acad. Sci. 2015, 1360, 16–35. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  76. Videvall, E. Plasmodium parasites of birds have the most AT-rich genes of eukaryotes. Microb. Genom. 2018, 4, e000150. [Google Scholar] [CrossRef]
  77. George, E.E.; Husnik, F.; Tashyreva, D.; Prokopchuk, G.; Horák, A.; Kwong, W.K.; Lukeš, J.; Keeling, P.J. Highly reduced genomes of protist endosymbionts show evolutionary convergence. Curr. Biol. 2020, 30, 925–933.e3. [Google Scholar] [CrossRef]
  78. Smith, D.R. Updating our view of organelle genome nucleotide landscape. Front. Genet. 2012, 3, 175. [Google Scholar] [CrossRef] [Green Version]
  79. Formaggioni, A.; Luchetti, A.; Plazzi, F. Mitochondrial genomic landscape: A portrait of the mitochondrial genome 40 years after the first complete sequence. Life 2021, 11, 663. [Google Scholar] [CrossRef]
  80. Hildebrand, F.; Meyer, A.; Eyre-Walker, A. Evidence of selection upon genomic GC-content in bacteria. PLoS Genet. 2010, 6, e1001107. [Google Scholar] [CrossRef]
  81. Jitobaom, K.; Phakaratsakul, S.; Sirihongthong, T.; Chotewutmontri, S.; Suriyaphol, P.; Suptawiwat, O.; Auewarakul, P. Codon usage similarity between viral and some host genes suggests a codon-specific translational regulation. Heliyon 2020, 6, e03915. [Google Scholar] [CrossRef]
  82. Wang, Q.; Lyu, X.; Cheng, J.; Fu, Y.; Lin, Y.; Abdoulaye, A.H.; Jiang, D.; Xie, J. Codon usage provides insights into the adaptive evolution of mycoviruses in their associated fungi host. Int. J. Mol. Sci. 2022, 23, 7441. [Google Scholar] [CrossRef] [PubMed]
  83. Freire, C.C.d.M.; Palmisano, G.; Braconi, C.T.; Cugola, F.R.; Russo, F.B.; Beltrão-Braga, P.C.; Iamarino, A.; Lima Neto, D.F.d.; Sall, A.A.; Rosa-Fernandes, L.; et al. NS1 codon usage adaptation to humans in pandemic Zika virus. Mem. Do Inst. Oswaldo Cruz 2018, 113, e170385. [Google Scholar] [CrossRef] [PubMed]
  84. Tian, L.; Shen, X.; Murphy, R.W.; Shen, Y. The adaptation of codon usage of +ssRNA viruses to their hosts. Infect. Genet. Evol. J. Mol. Epidemiol. Evol. Genet. Infect. Dis. 2018, 63, 175–179. [Google Scholar] [CrossRef]
  85. Xia, X. How optimized is the translational machinery in Escherichia coli, Salmonella typhimurium and Saccharomyces cerevisiae? Genetics 1998, 149, 37–44. [Google Scholar] [CrossRef]
  86. Yang, Z. Estimating the pattern of nucleotide substitution. J. Mol. Evol. 1994, 39, 105–111. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  87. De Jong, K.; Fogel, D.B.; Schwefel, H.P. A history of evolutionary computation. In Handbook of Evolutionary Computation; Baeck, T., Fogel, D., Michalewicz, Z., Eds.; IOP Publishing Ltd.: Bristol, UK; Oxford University Press: Oxford, UK, 1997; pp. 1–12. [Google Scholar] [CrossRef]
Figure 1. The ranking of alternative genetic codes according to percent changes in the median selection strength on the 4FD codon usage F π in relation to the SGC. The genetic codes were numbered according to the NCBI database (https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi), accessed on 10 November 2022. See Materials and Methods for details.
Figure 1. The ranking of alternative genetic codes according to percent changes in the median selection strength on the 4FD codon usage F π in relation to the SGC. The genetic codes were numbered according to the NCBI database (https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi), accessed on 10 November 2022. See Materials and Methods for details.
Ijms 24 01185 g001
Figure 2. Examples of codon substitutions that can decrease the usage of fourfold degenerated codons in alternative genetic codes (AGCs) in comparison to the standard genetic code (SGC). Colors of arrows correspond to substitutions to a specific amino acid or stop translation signal. In contrast to the SGC, substitutions under AGCs generate codons encoding the same or more similar amino acids.
Figure 2. Examples of codon substitutions that can decrease the usage of fourfold degenerated codons in alternative genetic codes (AGCs) in comparison to the standard genetic code (SGC). Colors of arrows correspond to substitutions to a specific amino acid or stop translation signal. In contrast to the SGC, substitutions under AGCs generate codons encoding the same or more similar amino acids.
Ijms 24 01185 g002
Figure 3. Examples of codon substitutions that can increase the usage of fourfold degenerated codons in alternative genetic codes (AGCs) in comparison to the standard genetic code (SGC). Colors of arrows correspond to substitutions to a specific amino acid or stop translation signal. In contrast to the SGC, substitutions under AGCs generate codons encoding more different amino acids or stop translation signal.
Figure 3. Examples of codon substitutions that can increase the usage of fourfold degenerated codons in alternative genetic codes (AGCs) in comparison to the standard genetic code (SGC). Colors of arrows correspond to substitutions to a specific amino acid or stop translation signal. In contrast to the SGC, substitutions under AGCs generate codons encoding more different amino acids or stop translation signal.
Ijms 24 01185 g003
Figure 4. The rates of nucleotide substitutions in the mutational pressures (Y axes) which, together with the selection at the amino acid level, minimize (min) or maximize (max) the selection strength F π on the 4FD codon usage calculated for known genetic codes. The thick line indicates median, the box shows quartile range and the whiskers denote the range without outliers. Substitutions statistically different with p < 0.05 between the compared groups are indicated in red.
Figure 4. The rates of nucleotide substitutions in the mutational pressures (Y axes) which, together with the selection at the amino acid level, minimize (min) or maximize (max) the selection strength F π on the 4FD codon usage calculated for known genetic codes. The thick line indicates median, the box shows quartile range and the whiskers denote the range without outliers. Substitutions statistically different with p < 0.05 between the compared groups are indicated in red.
Ijms 24 01185 g004
Figure 5. The relationship between the selection strength on codon usage F π and the stationary nucleotide composition of adenine (A) and thymine (T), as well as guanine (G) and cytosine (C), produced by mutational pressures, which, together with the selection at the amino acid level, generate the minimum and maximum values of F π . Labels at points in the plots indicate individual genetic codes for which these values were calculated.
Figure 5. The relationship between the selection strength on codon usage F π and the stationary nucleotide composition of adenine (A) and thymine (T), as well as guanine (G) and cytosine (C), produced by mutational pressures, which, together with the selection at the amino acid level, generate the minimum and maximum values of F π . Labels at points in the plots indicate individual genetic codes for which these values were calculated.
Ijms 24 01185 g005
Figure 6. The result of correspondence analysis for stationary nucleotide compositions generated by mutational pressures which, together with the selection at the amino acid level, generate the extreme variation in the 4FD codon usage. Numbers indicate individual genetic codes under which the usage was calculated. Ala, Gly, Pro, Thr and Val are names of amino acids coded by the 4FD codon groups.
Figure 6. The result of correspondence analysis for stationary nucleotide compositions generated by mutational pressures which, together with the selection at the amino acid level, generate the extreme variation in the 4FD codon usage. Numbers indicate individual genetic codes under which the usage was calculated. Ala, Gly, Pro, Thr and Val are names of amino acids coded by the 4FD codon groups.
Ijms 24 01185 g006
Figure 7. The relationship between the median of ( F π ) values and the sum of stationary frequencies of adenine (A) and thymine (T) calculated for each genetic code separately. The median was derived from m a x ( F π ) values that were obtained from substitution models generating the given stationary nucleotide distribution. The lines are the best approximation based on generalized additive models with integrated smoothness estimation.
Figure 7. The relationship between the median of ( F π ) values and the sum of stationary frequencies of adenine (A) and thymine (T) calculated for each genetic code separately. The median was derived from m a x ( F π ) values that were obtained from substitution models generating the given stationary nucleotide distribution. The lines are the best approximation based on generalized additive models with integrated smoothness estimation.
Ijms 24 01185 g007
Figure 8. The deviation from the expectation in the codon usage for the 4FD codon groups calculated for protein-coding sequences translated by the standard genetic code (PCG SGC) and alternative genetic codes (PCG AGCs) in comparison to the selection strength on the 4FD codon usage calculated for the theoretical mutational pressures under the standard genetic code ( F π SGC) and alternative genetic codes ( F π ACGs). The thick line indicates median, the box shows quartile range and the whiskers denote the range without outliers.
Figure 8. The deviation from the expectation in the codon usage for the 4FD codon groups calculated for protein-coding sequences translated by the standard genetic code (PCG SGC) and alternative genetic codes (PCG AGCs) in comparison to the selection strength on the 4FD codon usage calculated for the theoretical mutational pressures under the standard genetic code ( F π SGC) and alternative genetic codes ( F π ACGs). The thick line indicates median, the box shows quartile range and the whiskers denote the range without outliers.
Ijms 24 01185 g008
Table 1. The descriptive statistics: the minimum ( m i n ( F π ) ), the median ( m e ( F π ) ) and the maximum ( m a x ( F π ) ) of the selection strength on the 4FD codon usage ( F π ) computed over 88,560 mutational pressures. Corresponding percent changes in relation to the SGC are also shown (%). The genetic codes were numbered according to the NCBI database (https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi), accessed on 10 November 2022. See Materials and Methods for details.
Table 1. The descriptive statistics: the minimum ( m i n ( F π ) ), the median ( m e ( F π ) ) and the maximum ( m a x ( F π ) ) of the selection strength on the 4FD codon usage ( F π ) computed over 88,560 mutational pressures. Corresponding percent changes in relation to the SGC are also shown (%). The genetic codes were numbered according to the NCBI database (https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi), accessed on 10 November 2022. See Materials and Methods for details.
Code min ( F π ) me ( F π ) max ( F π ) % min ( F π ) % me ( F π ) % max ( F π )
1 (SGC)0.2451.6969.0160.00.00.0
20.431.96810.4575.416.015.9
30.2661.7127.9838.70.9−11.5
40.241.6277.855−2.0−4.1−12.9
50.2321.5657.264−5.4−7.7−19.4
60.2711.7348.92610.52.2−1.0
90.2221.3376.208−9.3−21.2−31.1
100.2491.5378.3411.6−9.4−7.5
120.3171.919.55629.312.66.0
130.331.7597.67134.73.7−14.9
140.2271.3416.176−7.4−20.9−31.5
160.2971.6748.34421.2−1.3−7.5
210.2031.3716.758−17.0−19.2−25.0
220.3762.32211.3453.536.925.8
230.2741.98511.4112.017.026.5
240.2331.5887.728−5.0−6.4−14.3
250.2431.7269.002−0.81.7−0.2
260.4061.8949.73765.711.78.0
270.2621.739.0527.02.00.4
280.2621.739.0527.02.00.4
290.2621.5378.2427.0−9.4−8.6
300.2381.7088.831−2.90.7−2.1
310.2561.7087.7424.60.7−14.1
330.2481.6087.421.3−5.2−17.7
Table 2. The stationary distributions of four nucleotides generated by mutational pressures, which, together with the selection at the amino acid level, produce the minimum values of selection strength F π .
Table 2. The stationary distributions of four nucleotides generated by mutational pressures, which, together with the selection at the amino acid level, produce the minimum values of selection strength F π .
CodeATCG min ( F π )
1 (SGC)0.050.070.20.680.245
20.050.120.660.170.430
30.050.050.210.690.266
40.050.060.170.720.240
50.050.060.70.190.232
60.050.060.170.720.271
90.050.070.310.570.222
100.050.090.290.570.249
120.050.070.710.170.317
130.050.070.710.170.330
140.050.050.210.690.227
160.060.050.230.660.297
210.050.060.680.210.203
220.060.070.410.460.376
230.050.080.290.580.274
240.060.090.410.440.233
250.050.060.20.690.243
260.070.130.320.480.406
270.050.090.210.650.262
280.050.090.210.650.262
290.050.130.260.560.262
300.050.050.150.750.238
310.050.050.830.070.256
330.050.050.160.740.248
Table 3. The stationary distributions of four nucleotides generated by mutational pressures, which, together with the selection at the amino acid level, produce the maximum values of selection strength F π .
Table 3. The stationary distributions of four nucleotides generated by mutational pressures, which, together with the selection at the amino acid level, produce the maximum values of selection strength F π .
CodeATCG max ( F π )
1 (SGC)0.840.050.050.069.01596
20.810.070.060.0610.445
30.150.730.060.067.98304
40.220.670.050.067.85467
50.170.710.050.077.26361
60.110.770.050.078.92571
90.110.770.050.076.2084
100.090.80.050.068.34086
120.070.810.060.069.55644
130.070.820.050.067.67111
140.220.670.050.066.17645
160.820.070.050.068.34363
210.120.770.050.066.75813
220.840.050.050.0611.3414
230.840.050.050.0611.4083
240.090.80.050.067.72787
250.780.110.050.069.0016
260.090.80.050.069.73709
270.210.680.050.069.05196
280.210.680.050.069.05196
290.190.690.060.068.24171
300.840.050.050.068.83112
310.10.790.050.067.74191
330.140.750.050.067.41951
Table 4. The genetic codes tested in the study. They are numbered according to the NCBI database (https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi), accessed on 10 November 2022, where you can find more detailed information about these coding systems.
Table 4. The genetic codes tested in the study. They are numbered according to the NCBI database (https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi), accessed on 10 November 2022, where you can find more detailed information about these coding systems.
NumberCode Description
1Standard
2Vertebrate Mitochondrial
3Yeast Mitochondrial
4Mold, Protozoan and Coelenterate Mitochondrial and Mycoplasma/Spiroplasma
5Invertebrate Mitochondrial
6Ciliate, Dasycladacean and Hexamita Nuclear
9Echinoderm and Flatworm Mitochondrial
10Euplotid Nuclear
12Alternative Yeast Nuclear
13Ascidian Mitochondrial
14Alternative Flatworm Mitochondrial
16Chlorophycean Mitochondrial
21Trematode Mitochondrial
22Scenedesmus obliquus Mitochondrial
23Thraustochytrium Mitochondrial
24Rhabdopleuridae Mitochondrial
25Candidate Division SR1 and Gracilibacteria
26Pachysolen tannophilus Nuclear
27Karyorelict Nuclear
28Condylostoma Nuclear
29Mesodinium Nuclear
30Peritrich Nuclear
31Blastocrithidia Nuclear
33Cephalodiscidae Mitochondrial UAA-Tyr
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Pawlak, K.; Błażej, P.; Mackiewicz, D.; Mackiewicz, P. The Influence of the Selection at the Amino Acid Level on Synonymous Codon Usage from the Viewpoint of Alternative Genetic Codes. Int. J. Mol. Sci. 2023, 24, 1185. https://doi.org/10.3390/ijms24021185

AMA Style

Pawlak K, Błażej P, Mackiewicz D, Mackiewicz P. The Influence of the Selection at the Amino Acid Level on Synonymous Codon Usage from the Viewpoint of Alternative Genetic Codes. International Journal of Molecular Sciences. 2023; 24(2):1185. https://doi.org/10.3390/ijms24021185

Chicago/Turabian Style

Pawlak, Konrad, Paweł Błażej, Dorota Mackiewicz, and Paweł Mackiewicz. 2023. "The Influence of the Selection at the Amino Acid Level on Synonymous Codon Usage from the Viewpoint of Alternative Genetic Codes" International Journal of Molecular Sciences 24, no. 2: 1185. https://doi.org/10.3390/ijms24021185

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop