Next Article in Journal
Carotenoid Biosynthetic Genes in Cabbage: Genome-Wide Identification, Evolution, and Expression Analysis
Previous Article in Journal
Effects of Genetic Polymorphisms of Cathepsin A on Metabolism of Tenofovir Alafenamide
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Formation of the Codon Degeneracy during Interdependent Development between Metabolism and Replication

Ministry of Education Key Laboratory for Non-Equilibrium Synthesis and Modulation of Condensed Matter, Shaanxi Province Key Laboratory of Advanced Functional Materials and Mesoscopic Physics, School of Physics, Xi’an Jiaotong University, Xi’an 710049, China
Genes 2021, 12(12), 2023; https://doi.org/10.3390/genes12122023
Submission received: 3 November 2021 / Revised: 30 November 2021 / Accepted: 3 December 2021 / Published: 20 December 2021
(This article belongs to the Section Molecular Genetics and Genomics)

Abstract

:
Nirenberg’s genetic code chart shows a profound correspondence between codons and amino acids. The aim of this article is to try to explain the primordial formation of the codon degeneracy. It remains a puzzle how informative molecules arose from the supposed prebiotic random sequences. If introducing an initial driving force based on the relative stabilities of triplex base pairs, the prebiotic sequence evolution became innately nonrandom. Thus, the primordial assignment of the 64 codons to the 20 amino acids has been explained in detail according to base substitutions during the coevolution of tRNAs with aaRSs; meanwhile, the classification of aaRSs has also been explained.

Graphical Abstract

1. Introduction

The difficulty in the field of the origin of the genetic code is due to the lack of key experiments to reproduce the primordial scenario of evolution of life. The debate about the nature of life even makes it difficult to reach a consensus on the definition of life. Pragmatically, we need to put together the few following well-established and enlightening observations to have a deep insight into the transition from non-living to living phenomena. Wong values that Phase I amino acids appeared earlier than Phase II amino acids in prebiotic evolution [1,2]. Pouplana and Schimmel prefer aminoacyl-tRNA synthetases (aaRSs) to clues to establishment of the genetic code [3,4]. Woese divided cellular life into three domains [5], which helps to comprehend last universal common ancestor (LUCA). In addition, a living JCVI-syn1.0 cell has been created by combining a cytoplasm without natural DNA and a chemically synthesised chromosome with Venter’s watermark [6]. Verily, potential contradictions, as will be explained next, have yet appeared in the few above common sense observations, which urges us to be serious in collecting experimental observations and extremely cautious in interpreting them.
Pouplana and Schimmel have overlooked the above two phases of amino acids. By comparing sequences and structures, the 20 aaRSs are divided into two distinct classes, each of which is subdivided into three subclasses. Pouplana and Schimmel assumed simultaneous association of two aaRSs on a single tRNA to interpret the symmetrical subclasses between the two classes of aaRSs, where the aaRS pairs, namely I l e R S (subclass Ia) and T h r R S (IIa), G l n R S (Ib) and A s p R S (IIb), and T y r R S (Ic) and P h e R S (IIc), can cover the tRNA acceptor stem without major steric clashes and, meanwhile, link together the specific subclasses. However, G l n as a Phase II amino acid recruited much later than A s p as a Phase I amino acid [7,8]. It becomes suspicious to associate G l n R S and A s p R S on a single tRNA simultaneously.
The creation of JCVI-syn1.0 is quite different from the primordial picture for supposed LUCA, where the former was synthesised rapidly, while the latter evolved during a long period. Moreover, a new cell can be certainly recreated anytime in JCV institute if a synthesised cell dies, but the evolution of life has to be halted if LUCA died. Life is a phenomenon rather than eternal matter, which emerged from interdependent development between metabolism and replication. The cell JCVI-syn1.0 was created by combining a cytoplasm without natural DNA and a chemically synthesised chromosome, whose life was acquired by integrating metabolism in cytoplasm and replication of chromosome, so JCVI-syn1.0 belongs to “cell without living parents” (CwoP). The apparatuses in JCV institute can play the role of “‘the Hand that feeds you’ for CwoPs” (HfC). In such an “HfC-CwoP” mechanism, the long-lasting non-living HfC is able to rapidly create various ephemeral living CwoPs at any time. The successful creation of the cell JCVI-syn1.0 is just a contemporary transition from non-living to living phenomena, whose mechanism can enlighten the prebiotic transition from non-living to living phenomena. LUCA followed Darwin’s vague idea of common ancestor. Many popular theories have yet forgotten to explain the viability of simple LUCA during the geologically long period. In fact, only living LUCA can hardly survive such a geologically long period. However, an “HfC-CwoP”-like mechanism is feasible to bridge the gap between the nonliving and living by continually generating viable systems, where an auxiliary non-living HfC evolved during the geologically long period to create various living CwoPs at any appropriate time. It is logical to assume that the molecular ancestors, at different times, had the same chemical characteristics given the same source of primordial matter. Thus, death of an individual CwoP and extinction in its offsprings cannot interrupt the process of evolution of life. When numerous CwoPs and their offsprings gathered and formed into a continuously evolving ecosystem, the HfC-like apparatus stepped down and vanished.
Nirenberg’s genetic code chart [9] revealed a profound correspondence between amino acids and codons. It is still a mystery how the codon degeneracy formed yet. If introducing an HfC-like apparatus during the evolution of tRNAs with aaRSs, the symmetrical subclasses of aaRSs can also be explained without the above redundant complex of two aaRSs on a single tRNA. The crux in the coevolution of aaRSs and tRNAs is how to continually generate and maintain non-random sequences in the geologically long period. In this paper, I propose that certain primordial polymer molecules played the role of HfC in the prebiotic evolution. Triple-helical nucleic acids provide a possible picture. Hence, non-random sequences were generated routinely based on the triplex base substitutions, where low stable triplex bases were substituted spontaneously by more stable triplex base pairs [10,11]. Numerous non-random DNA sequences were generated, along which aaRSs and tRNAs coevolved. Remarkably, the evolution of complementary strands also accounts for the symmetrical subclasses of aaRSs. This process is intricate but elegant; thus, both the codon degeneracy and the two classes, as well as their symmetrical subclasses of aaRSs, have been explained thoroughly.
The interesting property of triplex base pairs in triple-helical nucleic acids has been often ignored in the field of the origin of life. In experiments, the homopurine and homopyrimidine strands tend to form triple helix [10,12,13] besides double helix. It is not unreasonable to introduce triplex nucleic acids in the prebiotic evolution, considering the substantial role of triplex DNAs within recA fibers in the fundamental process of recombination. Furthermore, the triplex base pairs remained in tRNAs [14] also indicate that the origin of tRNAs depended on stability of triplex base pairs, which hints the rationality of this scenario. The prebiotic reciprocal impact between the HfC-like triplex DNAs and the CwoP-like systems with evolving genetic code is essential for the emergence of life where the evolution of prebiotic informative molecules was driven by the triplex DNAs whose evolution in return needed the help of prebiotic informative molecules, which is analogous to the reciprocal impact between Wilkinson’s boring machine and Watt’s steam engine in the industrial revolution, where a boring machine was driven by the steam engine whose improved cylinder in return needed to be bored by a boring machine. In such an interim scenario, the “chicken-egg” problem becomes an opportunity rather than a dilemma that RNA world theory tends to avoid.
The synthesis of adenine from hydrogen cyanide by Oro initiated the prebiotic chemistry of nucleic acids [2,15]. HCN tetramer is polymerised to a intractable solid, from which adenine and guanine can be recovered [16]. There are also HCN-independent routes of purine synthesis [17]. Cyanoacetylene, via electrically discharging nitrogen and methane, reacts with cyanic acid to give cytosine. Hydrolysis of cytosine yields uracil [16,18,19]. Progress in the prebiotic synthesis of the pyrimidine ribonucleosides [20], together with recent advances in non-enzymatic RNA replication [21], have given credence to the RNA world theory. So far, progress towards the abiotic synthesis of purine nucleosides has to use disputable starting materials [22]. The nature of the first genetic polymer is the subject of major debate. A prebiotic scenario for coexistence and co-evolution of RNA and DNA has been investigated [23,24]. Synthesis under prebiotic conditions gives credence to the idea that DNA could appear concurrently with RNA, instead of being its later descendent [25]. Purine deoxyribonucleosides and pyrimidine ribonucleosides may have coexisted before the emergence of life [26]. Resently, Xu et al. demonstrated a high-yielding, completely stereo-, regio-, and furanosyl-selective prebiotic synthesis of the purine deoxyribonucleosides, leading to a mixture of deoxyadenosine, deoxyinosine, cytidine, and uridine [27]. Considering that the homopurine and homopyrimidine RNA and DNA strands tend to form triple helix [10], substitutions of triplex base pairs among the prebiotic triplex nucleic acids also contribute to the prebiotic evolution.
Although numerous theories have attempted to explain the origin of the genetic code in literature [3,28], a candidate theoretical framework must at least be able to explain: (i) the driving force in the prebiotic sequence evolution, (ii) the degeneracies 6, 4, 3, 2, and 1 for the respective 20 amino acids, and (iii) the two classes of aaRSs to recognise tRNAs from either major or minor groove sides. These are the tasks of this article. I found that the evolution of triple-helical nucleic acids driven by the spontaneous substitutions of triplex base pairs provides an elegant roadmap picture for the prebiotic evolution. Accordingly, the assignment of the 64 codons to the 20 amino acids has been explained one by one based on the coevolution of aaRSs and tRNAs, where the symmetrical subclasses of aaRSs need the help of palindromic para-codons. There are many profound and amazing relationships among traditionally separate fields, summarised as follows. The coevolution of tRNAs with aaRSs along the roadmap that is established by the relative stabilities of triplex base pairs [10,11] agrees with both the codon degeneracy [29,30] and two classes of aaRSs [31] in observations. The earliest amino acids recruited in the initiation stage of the roadmap agrees with phase I amino acids in Miller-Urey experiment [32] and carbonaceous chondrites [33,34,35], and see Chapter 6 in [2]. The recruitment orders of amino acids and codons on the roadmap agrees with the variation trends of amino acid frequencies in proteomes [36] and codon position G C content variation [37]. The expansion of codons along the roadmap agrees with biosynthetic families of amino acids [1]. All the above agreements between predictions and observations prompted the formation of the present hypothesis on the origin of the genetic code.

2. Materials and Methods

2.1. Triplex Picture

The genetic code is a common and essential feature of life, which can be regarded as a relic of the prebiotic emergence of informative molecules. The complexity of the problem for the origin of the genetic code may exceed all the theoretical estimations, such as frozen accident, error minimisation, stereochemical interaction, amino acid biosynthesis, expanding codons, etc. [1,38,39,40,41,42,43,44,45,46,47,48]. So far, it can hardly describe the evolution of the genetic code step by step so as to explain the formation of the codon degeneracy in detail. Here, a triplex picture is proposed to describe the intricate evolution of the genetic code thoroughly, by which both the formation of the codon degeneracy and the classification of aaRSs have been explained in a same theoretical framework. The complexity of the following explanation of the codon degeneracy is comparable to that of a symphony score. The simplest method of score-reading is to concentrate on an individual voice part that can be heard particularly well and then going over to section-by-section or selective reading. Similarly, here are some suggestions for reading the following technical explanation of the codon degeneracy in the triplex picture. Please watch the Supplementary Movie S1 and start from Figure 1 and then figures on tRNAs and aaRSs so as to understand the recruitment of the 20 amino acids during coevolution of tRNAs with aaRSs.
Guessing the right prebiotic picture is the key for understanding the origin of the genetic code. Here, I propose a triplex picture for the prebiotic sequence evolution. There are 8 kinds of triplex nucleic acids S · S S (‘·’ represents a Watson-Crick base pair, while ‘∗’ a Hoogsteen base pair), where the strands S , S , S can be either DNA or RNA [49,50,51], such as the triplex DNA D · D D and the triplex nucleic acids mixed with DNA and RNA D · D R , etc. The Y R R triplex DNA P o l y C · P o l y G P o l y G is supposed as the initial physical conditions for the evolution of the genetic code. The 64 codons have been recruited one by one with the D · D D sequence evolution by alternative separation and recombination of the three strands in the periodic changing environments. Such sequence evolution in the prebiotic evolution was driven by the substitutions of triplex base pairs according to their relative stabilities. The sequence evolution of D · D D led to the evolution of the genetic code, while the RNA strands separated from the coevolving D · D R yielded tRNAs and the template RNAs for aaRSs. The tRNAs and aaRSs were generated in accompany with the recruitment of the corresponding codons, respectively. So, the triplex picture gives a physical basis for the coevolution of the genetic code with the corresponding tRNAs and aaRSs.

Nomenclature and Notation

  • Notations for the 20 amino acids, 20 aaRSs, and the corresponding tRNAs ( n = 1 t o 20 ): amino acid N o . n a a R S n ↔ tRNA t n , t n , t n + , t n + , t n , t n , where the amino acids from N o . 1 to N o . 20 are, respectively, as follows: 1 G l y , 2 A l a , 3 G l u , 4 A s p , 5 V a l , 6 P r o , 7 S e r , 8 L e u , 9 T h r , 10 A r g , 11 C y s , 12 T r p , 13 H i s , 14 G l n , 15 I l e , 16 M e t , 17 P h e , 18 T y r , 19 A s n , 20 L y s , and a a R S n are, respectively, as follows: a a R S 1 (namely G l y R S ), a a R S 2 (namely A l a R S ), and so on.
  • Triplex DNAs ( D · D D ): Y R R , Y R Y and the inverse triplex DNAs: y r r , y r y , where Y, y stands for pyrimidine strands, and R, r purine strands.
  • Triplex DNA·DNA*RNA ( D · D R ): y r r t , y r y t , Y R R t , Y R Y t , where two types of tRNAs can be generated by linking the RNA strands 5 y t + r t 3 or 5 R t + Y t 3 , and aaRSs can approach tRNAs from major groove side (M) or minor groove side (m).
  • Codon pairs: # 1 G G G · C C C , etc.; pair connections: # 1 G l y # 2 , etc.; route dualities: # 1 G l y # 3 # 2 G l y # 6 , etc., where the numbers # m ( m = 1 t o 32 ) indicates the positions on the roadmap

2.2. Origin of the Genetic Code

2.2.1. The Roadmap

In the triplex picture, I obtained a roadmap for the evolution of the genetic code (or the roadmap for short). The validity of the roadmap depends essentially on the experimental data of triplex base pairs. The stabilities of the 16 triplex base pairs in triplex DNA are listed from instability (−), weak (+) to strong ( + + , 3 + , 4 + ) as follows [10,11]:
( ) G C A , A T C , A T A ( + ) C G G , T A C , T A A , T A G , G C C , G C G , A T T ( + + ) C G A , C G T , G C T ( 3 + ) A T G ( 4 + ) C G C , T A T .
The above stability order in experiments played a significant role in the primordial evolution of triplex DNA. The substitutions of triplex base pairs from weak to strong provided the principal driving force in the prebiotic sequence evolution.
At the beginning of the evolution of the genetic code, there existed single-stranded DNA P o l y G and P o l y C , which tended to form a triplex DNA (Figure 1a,b) [10,13]. P o l y C · P o l y G P o l y G is a usual Y R R triplex DNA, which is combined by triplex base pair C G G (Figure 1b and Supplementary Movie S1). The sequences evolved via substitutions of triplex base pairs in the procedure of alternative combining and separating for the strands of triple-stranded DNA. Only three kinds of substitutions of triplex base pairs are practically required on the roadmap: (1) substitution of ( + ) C G G by ( + + ) C G A [10,11], with the transition from G to A in the third R strand. This is of the most common substitution on the roadmap by which all the codons in R o u t e 0 and most codons in R o u t e 1 3 were recruited (Figure 1a); (2) substitution of ( + ) C G G by ( 4 + ) C G C , with the transversion from G to C in the third R strand, which blazed a new path at # 2 , # 7 , # 10 for the recruitment of codons in R o u t e 1 3 , respectively (Figure 1a); (3) substitution of ( + ) G C C by ( + + ) G C T , with the transition from C to T in the third R strand at # 6 , # 19 , # 12 (Figure 1a and Figure 2), by which the remaining codons in R o u t e 1 3 were recruited (Figure 1a). Thus, all the 64 codons have been recruited following the roadmap (Figure 1a, Figure 3a and Figure 4b).
According to the base substitutions on the roadmap, the recruitment order of the codon pairs from # 1 to # 32 is as follows (Figure 1a):
# 1 G G G · C C C , # 2 G G C · G C C , # 3 G G A · U C C , # 4 G A G · C U C , # 5 G A C · G U C , # 6 G G U · A C C , # 7 G C G · C G C , # 8 A G C · G C U , # 9 G C A · U G C , # 10 C G G · C C G , # 11 A G G · C C U , # 12 U G G · C C A , # 13 C G A · U C G , # 14 A G A · U C U , # 15 U G A · U C A , # 16 A C G · C G U , # 17 A G U · A C U , # 18 A C A · U G U , # 19 G U G · C A C , # 20 C A G · C U G , # 21 G A U · A U C , # 22 A U G · C A U , # 23 G A A · U U C , # 24 G U A · U A C , # 25 U A G · C U A , # 26 A A C · G U U , # 27 A A G · C U U , # 28 C A A · U U G , # 29 A U A · U A U , # 30 A A U · A U U , # 31 U A A · U U A , # 32 A A A · U U U ;
and the recruitment order of the amino acids from N o . 1 to N o . 20 is as follows (Figure 1a):
N o . 1 G l y , N o . 2 A l a , N o . 3 G l u , N o . 4 A s p , N o . 5 V a l , N o . 6 P r o , N o . 7 S e r , N o . 8 L e u , N o . 9 T h r , N o . 10 A r g , N o . 11 C y s , N o . 12 T r p , N o . 13 H i s , N o . 14 G l n , N o . 15 I l e , N o . 16 M e t , N o . 17 P h e , N o . 18 T y r , N o . 19 A s n , N o . 20 L y s .
The evolution of the genetic code can be divided into three stages (Figure 1a): the initiation stage ( # 1 # 6 ), the midway stage ( # 7 # 20 , # 24 # 27 ), and the ending stage ( # 21 # 23 , # 28 # 32 ). All the amino acids recruited in the initiation stage belong to phase I. The recruitment of amino acids along the roadmap is described step by step hereinafter, and the pair connections and route dualities on the roadmap will be explained according to the evolution of tRNAs and aaRSs in the following.
  • Initiation
  • step 1: 1GlyVacant #1
  • step 2: 1Gly Vacant #1 1Gly Vacant #2
  • step 3: 1Gly Vacant #1 1Gly 2Ala #2
  • step 4: 1Gly Vacant #1 1Gly 2Ala #2 1Gly Vacant #3
  • step 5: 1Gly Vacant #1 1Gly 2Ala #2 1Gly Vacant #3 3Glu Vacant #4
  • step 6: 1Gly Vacant #1 1Gly 2Ala #2 1Gly Vacant #3 3Glu Vacant #4 4Asp Vacant #5
  • step 7: 1Gly Vacant #1 1Gly 2Ala #2 1Gly Vacant #3 3Glu Vacant #4 4Asp 5Val #5
  • step 8: 1Gly 6Pro #1 1Gly 2Ala #2 1Gly Vacant #3 3Glu Vacant #4 4Asp 5Val #5
  • step 9: 1Gly 6Pro #1 1Gly 2Ala #2 1Gly 7Ser #3 3Glu Vacant #4 4Asp 5Val #5
  • step 10: 1Gly 6Pro #1 1Gly 2Ala #2 1Gly 7Ser #3 3Glu 8Leu #4 4Asp 5Val #5
  • step 11: 1Gly 6Pro #1 1Gly 2Ala #2 1Gly 7Ser #3 3Glu 8Leu #4 4Asp 5Val #5 1Gly Vacant #6
  • step 12: 1Gly 6Pro #1 1Gly 2Ala #2 1Gly 7Ser #3 3Glu 8Leu #4 4Asp 5Val #5 1Gly 9Thr #6
  • Midway & ending
  • step 13: (#1 ∼ #6 are fully filled by 1Gly to 9Thr, the same below for the following steps) 2Ala 10Arg #7
  • and the following steps (omitting the previously fully filled #1 ∼ #(n-1) codon pairs in step #n, from #8 to #32):
    • 7Ser 2Ala #8; 2Ala 11Cys #9; 10Arg 6Pro #10; 10Arg 6Pro #11; 12Trp 6Pro #12; 10Arg 7Ser #13; 10Arg 7Ser #14; stop 7Ser #15; 9Thr 10Arg #16; 7Ser 9Thr #17; 9Thr 11Cys #18; 5Val 13His #19; 14Gln 8Leu #20; 4Asp 15Ile #21; 16Met 13His #22; 3Glu 17Phe #23; 5Val 18Tyr #24; stop 8Leu #25; 19Asn 5Val #26; 20Lys 8Leu #27; 14Gln 8Leu #28; 15Ile 18Tyr #29; 19Asn 15Ile #30; stop 8Leu #31; 20Lys 17Phe #32.

2.2.2. Initiation

In the beginning, there was an R (R denotes purine) single-stranded DNA P o l y G (Figure 1a,b, # 1 ). By complementary base pairing formed a Y R (Y denotes pyrimidine) double-stranded DNA P o l y C · P o l y G . Furthermore, by triplex base pairing C G G formed a Y R R 1 triple-stranded DNA P o l y C · P o l y G P o l y G (Figure 1a,b, # 1 ). The third R 1 strand P o l y G separated out of this Y R R 1 triple-stranded DNA, which then formed a new Y 1 R 1 double-stranded DNA P o l y C · P o l y G . So far, there was only initial codon pair G G G · C C C (Figure 1a,b, # 1 ).
In the initiation stage of the roadmap, the codon pairs from # 1 to # 6 were recruited along the roadmap, which constituted the initial subset of the genetic code:
# 1 G G G ( 1 G l y ) · C C C ( 6 P r o ) , # 2 G G C ( 1 G l y ) · G C C ( 2 A l a ) , # 3 G G A ( 1 G l y ) · U C C ( 7 S e r ) ,
# 4 G A G ( 3 G l u ) · C U C ( 8 L e u ) , # 5 G A C ( 4 A s p ) · G U C ( 5 V a l ) , # 6 G G U ( 1 G l y ) · A C C ( 9 T h r ) .
And in this stage were recruited the earliest 9 amino acids in order: 1 G l y , 2 A l a , 3 G l u , 4 A s p , 5 V a l , 6 P r o , 7 S e r , 8 L e u , 9 T h r , all of which belong to phase I amino acids [7,8]. For example, at codon pair position # 6 on the roadmap, 1 G l y and 9 T h r are encoded by the codon pair 5 G G T 3 in R 6 strand and 5 A C C 3 in Y 6 strand, respectively. Although the initial subset is concise, two essential features of the roadmap, pair connection and route duality, had taken shape in this initiation stage (Figure 1a and Figure 3a).
Pair connection is an essential feature of the roadmap. A connected codon pair on the roadmap generally encode a common amino acid (Figure 1a and Figure 3b). For instance, the pair connection # 1 G l y # 2 indicates that both G G G in # 1 and G G C in # 2 encode the common amino acid G l y . Pair connections reveal the close relationship between recruitment of codons and recruitment of amino acids, which will be explained later according to the evolution of tRNAs.
Route duality is another essential feature of the roadmap, which shows the relationship of pair connections between different routes (Figure 1a and Figure 3b). For instance, the route duality
# 1 G l y # 3 # 2 G l y # 6
indicates that the pair connection # 1 G l y # 3 in R o u t e 0 and the pair connection # 2 G l y # 6 in R o u t e 1 are dual, which encode a common amino acid G l y . Route dualities generally exist between R o u t e 0 and R o u t e 3 , or between R o u t e 1 and R o u t e 2 (Figure 3b), which will be explained later according to the evolution of aaRSs.
Glycine, the simplest amino acid, is encoded by the cytosine triplet, the simplest nitrogen base. Glycine has been identified in the coma of comet [52] and could be the first amino acid on earth. Here, glycine G l y is also the first amino acid recruited on the roadmap. In the initiation stage of the roadmap, the non-chiral G l y helped to create the first pair connection # 1 G l y # 2 , recruiting chiral A l a at # 2 (Figure 1a). Furthermore, the non-chiral G l y also helped to create the first route duality on the roadmap (Figure 1a):
# 1 G l y # 3 # 2 G l y # 6 .
This route duality played a central role in the initiation stage; consequently, the initial subset played a central role in the midway stage (Figure 3a). The chirality was required at the beginning of the roadmap by the triplex DNA itself (Figure 1a,b). Even so, there was still a transition period from non-chirality to chirality, in consideration of the special role of non-chiral G l y . Competition between opposite homochiral roadmap systems resulted in the homochirality by a winner-take-all game [53].

2.2.3. Midway

The genetic codes evolved along four routes R o u t e 0 3 , respectively, where 8 codon pairs in each route evolved in the order of four hierarchies H i e r a r c h y 1 4 , respectively (Figure 1a). The roadmap can be divided into two groups: the early hierarchies H i e r a r c h y 1 2 and the late hierarchies H i e r a r c h y 3 4 . It can also be divided into two groups: the initial route R o u t e 0 (all-purine codons pairing with all-pyrimidine codons) and the expanded routes R o u t e 1 3 (purine-pyrimidine-mixing codons).
In the midway stage of the roadmap, the genetic codes expanded spontaneously from the initial subset (Figure 1a and Figure 3a). Each of the 6 codon pairs in the initial subset expanded to three additional codon pairs, respectively, by route dualities. Details are as follows. The codon pair # 2 in the initial subset expanded to the three continual codon pairs # 7 , # 8 and # 9 by route duality
# 2 A l a # 8 # 7 A l a # 9 ;
the codon pair # 1 in the initial subset expanded to the three continual codon pairs # 10 , # 11 , and # 12 by route duality
# 1 P r o # 11 # 10 P r o # 12 ;
the codon pair # 3 in the initial subset expanded to the three continual codon pairs # 13 , # 14 , and # 15 by route duality
# 3 S e r # 14 # 13 S e r # 15 ;
the codon pair # 6 in the initial subset expanded to the three continual codon pairs # 16 , # 17 , and # 18 by route duality
# 6 T h r # 17 # 16 T h r # 18 ;
the codon pair # 5 in the initial subset expanded to the three codon pairs # 19 , # 24 , and # 26 by route duality
# 5 V a l # 26 # 19 V a l # 24 ;
and the codon pair # 4 in the initial subset expanded to the three codon pairs # 20 , # 25 , and # 27 by route duality
# 4 L e u # 27 # 20 L e u # 25 .
The recruitment order of the codon pairs and the recruitment order of the amino acids are intricately well organised and coherent, according to the subtle roadmap (Figure 1a and Figure 3a). In the initiation stage, firstly, the amino acid N o . 1 was recruited with the codon pair # 1 , remaining a vacant position. Subsequently, N o . 1 and N o . 2 were recruited with the codon pair # 2 ; N o . 1 was recruited with the codon pair # 3 , remaining a vacant position; N o . 3 was recruited with the codon pair # 4 , remaining a vacant position; N o . 4 and N o . 5 were recruited with the codon pair # 5 ; N o . 6 filled up the vacant position of # 1 ; N o . 7 filled up the vacant position of # 3 ; N o . 8 filled up the vacant position of # 4 ; N o . 1 and N o . 9 were recruited with the codon pair # 6 (Figure 3a). Thus, the framework of the genetic code had been established at the end of the initiation stage. From # 7 on, the latecomer amino acids no longer jumped the queue in recruitment so that there were no more vacant positions in the recruited codon pairs. Details are as follows. N o . 2 and N o . 10 amino acids were recruited with the codon pair # 7 ; and, subsequently, N o . 2 and N o . 7 were recruited with # 8 ; N o . 2 and N o . 11 were recruited with # 9 ; N o . 6 and N o . 10 were recruited with # 10 ; N o . 6 and N o . 10 were recruited with # 11 ; N o . 6 and N o . 12 were recruited with # 12 ; N o . 7 and N o . 10 were recruited with # 13 ; N o . 7 and N o . 10 were recruited with # 14 ; N o . 7 and s t o p were recruited with # 15 ; N o . 9 and N o . 10 were recruited with # 16 ; N o . 7 and N o . 9 were recruited with # 17 ; N o . 9 and N o . 11 were recruited with # 18 ; N o . 5 and N o . 13 were recruited with # 19 ; N o . 8 and N o . 14 were recruited with # 20 ; N o . 4 and N o . 15 were recruited with # 21 ; N o . 13 and N o . 16 were recruited with # 22 ; N o . 3 and N o . 17 were recruited with # 23 ; N o . 5 and N o . 18 were recruited with # 24 ; N o . 8 and s t o p were recruited with # 25 ; N o . 5 and N o . 19 were recruited with # 26 ; N o . 8 and N o . 20 were recruited with # 27 ; N o . 8 and N o . 14 were recruited with # 28 ; N o . 15 and N o . 18 were recruited with # 29 ; N o . 15 and N o . 19 were recruited with # 30 ; N o . 8 and s t o p were recruited with # 31 ; N o . 17 and N o . 20 were recruited with # 32 (Figure 3a).
Take, for example, from # 1 to # 29 , the evolution of the genetic code along the roadmap can be described in details as follows (Figure 1a,b and Supplementary Movie S1). Starting from the position # 1 (Figure 1b, #1), an R single-stranded DNA brought about a Y R double-stranded DNA; next, the Y R double-stranded DNA brought about a Y R R 1 triple-stranded DNA (the number 1 denotes # 1 , similar below); next, an R 1 single-stranded DNA departed from the Y R R 1 triple-stranded DNA; next, the R 1 single-stranded DNA brought about a R 1 Y 1 double-stranded DNA. Thus, the codon pair G G G · C C C were achieved at # 1 . At the beginning of # 7 (Figure 1b, #7), the R 1 Y 1 double-stranded DNA was renamed as Y 1 R 1 double-stranded DNA, where the 180 rotation in writing did not change the right-handed helix; next, the Y 1 R 1 double-stranded DNA brought about a Y 1 R 1 R 7 triple-stranded DNA, through the transversion from G to C, where the stability ( + ) of C G G increased to the stability ( 4 + ) of C G C ; next, an R 7 single-stranded DNA departed from the Y 1 R 1 R 7 triple-stranded DNA; next, the R 7 single-stranded DNA brought about a R 7 Y 7 double-stranded DNA. Thus, the codon pair G C G · C G C were achieved at # 7 . The case of # 19 is similar to # 7 (Figure 1b, #19); the codon pair G T G · C A C were achieved through the transition from C to T, where the stability ( + ) of G C C increased to the stability ( 2 + ) of G C T . The case of # 24 is also similar to # 7 (Figure 1b, #24); the codon pair G T A · T A C were achieved through the common transition from G to A, where the stability ( + ) of C G G increased to the stability ( 2 + ) of C G A . At the position # 29 (Figure 1b, #29), the codon pair G C G · C G C in Y 24 R 24 are non-palindromic in consideration that both G C G and C G C do not read the same backwards as forwards. In this case, a reverse operation is necessary so that the obtained codon pair C A T · A T G in y 24 r 24 read reversely the same as the codon pair T A C · G T A in Y 24 R 24 . The process from y 24 r 24 to R 29 Y 29 is still similar to the case of # 7 ; the codon pair A T A · T A T were achieved through the transition from G to A, where the stability ( + ) of C G G increased to the stability ( 2 + ) of C G A . Other processes on the roadmap are similar to the above example (Figure 1a,b). The reverse operation is unnecessary in the cases of # 2 , # 7 , # 10 , # 11 , # 3 , # 4 , # 16 , # 9 , # 19 , # 27 , # 23 , # 22 , # 24 after palindromic codon pairs and the last one # 32 (Figure 1a), whereas the reverse operation is necessary in the remaining cases of # 5 , # 6 , # 8 , # 12 , # 13 , # 14 , # 15 , # 17 , # 18 , # 20 , # 21 , # 25 , # 26 , # 28 , # 29 , # 30 , # 31 (Figure 1a).

2.2.4. The Ending

So far, the genetic code table had been expanded from the 6 codon pairs in the initial subset to the 6 + 18 codon pairs by route duality; the remaining 8 codon pairs were recruited into the genetic code table in the ending stage of the roadmap (Figure 1a and Figure 3a). There were 2 codon pairs remained in each of the four routes R o u t e 0 3 , respectively. They satisfied pair connections as follows: # 23 P h e # 32 , # 21 I l e # 30 , # 22 M e t / I l e # 29 , # 28 L e u # 31 (Figure 3a). Two of them satisfied route duality (Figure 3a):
# 21 I l e # 30 # 22 M e t / I l e # 29 .
The last two stop codons appeared in the pair connection # 25 s t o p # 31 (Figure 1a and Figure 3a). When the last two amino acids were recruited through the base pairs # 26 A s n # 30 and # 27 L y s # 32 , the codon U A G at # 25 had to be selected as a stop codon. The codon U A A at # 31 was selected as the last stop codon, due to lack of corresponding tRNA.
The non-standard codons also satisfy codon pairs and route dualities on the roadmap (Figure 1a). The codon pairs pertaining to non-standard codons are as follows: # 11 A r g ( S e r , s t o p ) # 14 , # 4 L e u ( T h r ) # 27 in R o u t e 0 ; none in R o u t e 1 ; # 22 ( M e t ) # 29 in R o u t e 2 ; # 20 L e u ( T h r , G l n ) # 25 , # 12 ( T r p ) # 15 , # 25 s t o p ( G l n ) / L e u # 31 , # 28 L e u ( G l n ) # 31 in R o u t e 3 . Majority of non-standard codons appear in the last R o u t e 3 (Figure 1a). Route dualities of non-standard codons exist between R o u t e 0 and R o u t e 3 (Figure 1a):
# 4 L e u ( T h r ) # 27 # 20 L e u ( T h r ) # 25 # 11 ( s t o p ) # 14 # 12 T r p / s t o p # 15 ,
where the first stop codon U G A at # 15 is dual to the non-standard stop codons in R o u t e 0 .
The choice of the genetic code was by no means random, which resulted from the increasing stabilities of triplex base pairs in the substitutions [10,11], where the rotation of the single glycosidic bond between base and deoxiribose has been considered in the opposite direction. It had been emphasised that the roadmap followed the strict rule that the stabilities of triplex base pairs monotonically increase (Figure 2). Note that the roadmap had tried its best to avoid the unstable triplex DNA. The roadmap (Figure 1a) is the only possible one that has avoided the unstable triplex base pairs (−) G C A , A T C and A T A , as shown in Table 1, while other eliminated possible roadmaps cannot avoid.
Among the 16 possible triplex base pairs, there are three relatively unstable triplex base pairs. So, the statistical ratio of instability for the triplex base pairs is 3 / 16 . However, the ratio of instability for the triplex base pairs on the roadmap is much smaller. There are 49 triplex DNAs through # 1 to # 32 on the roadmap, which involve 3 × 49 = 147 triplex base pairs (Figure 1a). The relatively unstable triplex base pairs G C A and A T C have not appeared on the roadmap; only the relatively unstable triplex base pair A T A has appeared inevitably for 7 times in the reverse operations so as to fulfil all the permutations of 64 codons (Figure 1a). The ratio of instability 7 / 147 on the roadmap is much smaller than the ratio of instability 3 / 16 by the statistical requirement. When the relatively unstable A T A appears at the positions # 15 , # 17 , # 21 , # 25 , # 29 , # 30 , and # 31 , both stabilities of the other two triplex base pairs in the triplex DNA are ( 4 + ) (Figure 1a), which compensates the instability of the triplex DNA to some extent. The amino acid I l e , whose degeneracy uniquely is three, occupied three positions # 21 , # 29 , and # 30 among those 7 positions. In addition, the three stop codons occupied other three neighbour positions # 15 , # 25 and # 31 (Figure 1a). The first stop codon U G A appeared at the position # 15 , where the relatively unstable A T A appeared firstly (Figure 1a). According to the primordial translation mechanism, the weak combination of A T A might help to assign stop codons. The route dualities played significant roles in the midway stage, where the remnant codons were chosen as the stop codons (Figure 1a and Figure 3a). The stop codon appeared as early as the midway of the evolution of the genetic code (Figure 1a and Figure 3a), which indicates that the genetic code had been taken shape around the midway to promote the formation of the primitive life. Not until the fulfilment of the genetic code did the translation efficiency increase notably by recognising all the 64 codons.

2.3. Origin of tRNA

The roadmap illustrates the coevolution of the genetic code with the amino acids, where tRNAs and aaRSs play an intermediary role. The expansion of the genetic code along the roadmap can be explained by the coevolution of tRNAs with aaRSs (Figure 5c, Figure 6b and Figure 7). The cloverleaf shape of tRNA can be explained by assembling the two complementary RNA strands separated from triplex nucleic acid D · D R in the triplex picture (Figure 6a). The origin of aaRS will be explained next.

2.3.1. Anti-Codon

When studying the evolution of the genetic code, we were focused on only three bases in the triplex DNA. However, when studying the origin of tRNAs, it is necessary to study the evolution of entire sequences of both triplex DNA and triplex nucleic acid D · D R , where the third RNA strands in D · D R can be used to assemble tRNAs (Figure 5a,b and Figure 6a). According to the order of the relative stabilities of Y R Y for the 8 kinds of triplex nucleic acids: D · D D , D · D R , R · D R , R · D D > D · R R , R · R R > > R · R D , D · R D [50,54], the relative stabilities of D · D D and D · D R are greater than the relative stabilities of other kinds of triplex nucleic acids. The choice of triplex DNA for the roadmap and the choice of D · D R for the origin of tRNAs are based on the observed relative stabilities. And the other kinds of triplex nucleic acids can be neglected due to their less probabilities to appear.
There are four types of RNA strands for assembling tRNAs that were generated by the triplex base pairing of triplex nucleic acids D · D R : via the triplex nucleic acid y r y t , via the triplex nucleic acid y r r t (Figure 5a,c), and via the triplex nucleic acid Y R Y t , via the triplex nucleic acid Y R R t (Figure 5b,c), where the subscript t indicates that theses RNA strands y t , r t and Y t , R t are used to assemble tRNA (Figure 5a,b and Figure 6a). The sequences Y t , R t are the respective reverse sequences of y t and r t . There is a difference in the sequence evolution along the roadmap between purine strands and pyrimidine strands. The pyrimidine sequences Y t , y t and the purine sequences R t , r t are complementary, respectively, owing to the triplex pairing with the purine DNA strand and the pyrimidine DNA stand in the triplex nucleic acids D · D R , respectively. These tRNA strands coevolved with the triplex DNA along the roadmap. Therefore, the evolution of the anti-codons on tRNAs can be explained according to the evolution of the genetic code along the roadmap. The evolution of aaRSs should be considered next. After separating from the triplex nucleic acids D · D R , the pair of complementary single RNA strands y t and r t , or R t and Y t , can concatenate and fold into a cloverleaf-shaped tRNA [55,56,57,58,59], whose anti-codon corresponds to the codon of the triplex DNA on the roadmap (Figure 6a). Owing to the different positions of anti-codons in the RNA strands, either near to 3 -ends or near to 5 -ends, it must be seriously considered for the different reading directions between Y t , R t and y t , r t (Figure 6a). There were two types of tRNAs: the type 5 y t r t 3 tRNA and the type 5 R t Y t 3 tRNA (Figure 5a,b), where the anti-codons are near to the 3 -end of the RNA strand y t and the 3 -end of the RNA strand R t , respectively. The other concatenated RNA strands 5 r t y t 3 and 5 Y t R t 3 cannot evolve together with the above two types of tRNAs because the corresponding triplets would be on the acceptor arms rather than on the anti-codon loops.
It is possible to explain the sequence evolution of tRNAs in detail along the roadmap (Figure 5a–c and Figure 6a). For example, the tRNA t 2 for 2 A l a can form by concatenating y t 7 and r t 7 , which are generated by triplex base parings y 7 r 7 y t 7 and y 7 r 7 r t 7 at the branch node # 7 . The anti-codon C G C near the 3 -end of the strand y t 7 is palindromic. The two complementary strands y t 7 and r t 7 can combine into a cloverleaf-shaped type 5 y t r t 3 tRNA t 2 by concatenating, pairing, and folding (Figure 6a). Thus, anti-codon arm of t 2 contains the anti-codon C G C , which corresponds to A l a , with the help of aaRS; consequently, the codon G C G at the R DNA strand in # 7 is assigned to A l a . The sequences evolve from # 7 to # 16 along the roadmap. As another example, the codons at the position # 16 is non-palindromic, where the type 5 y t r t 3 tRNA t 9 and the type 5 R t Y t 3 tRNA t 11 are assembled by concatenating y t 16 and r t 16 for t 9 and by concatenating R t 16 and Y t 16 for t 11 , respectively (Figure 6a). Hence, the codon A C G at # 16 and the reversely complimentary codon U G C at # 9 are assigned to 9 T h r and 11 C y s , respectively.
There are 4 pairs of palindromic codons: # 1 C C C · G G G , # 4 C U C · G A G , # 7 C G C · G C G , # 19 C A C · G U G in the 16 branch nodes of the roadmap (Figure 1a). Accordingly there are 12 non-palindromic codons among the branch nodes at the positions # 2 , # 5 , # 6 , # 10 , # 11 , # 12 , # 16 , # 20 , # 21 , # 23 , # 24 , and # 25 . The sets of complementary pairs of RNA strands are the same for the two routes because of the bijection between R o u t e 1 and R o u t e 3 in the sense of reverse relationship (Figure 1a). Thus, there are totally 4 + ( 12 4 ) × 2 = 20 pairs of complementary single RNA strands (4 palindromic codons, and the 12 non-palindromic codons minus 4 identities between R o u t e 1 and R o u t e 3 ), which can assemble into 20 groups of cognate tRNAs, respectively. This could be among the reasons why there are 20 canonical amino acids.
There is another reason at the sequence level for the number “20” of the canonical amino acids (Figure 6b). There are 64 triple permutations for the 4 bases, which accounts for the number 64 of the codons. However, little attention has been paid to the 20 triple combinations for the 4 bases. The products p ( i ) p ( j ) p ( k ) ( i , j , k = G , C , A , T ) are the same, respectively, for the 20 groups of combinations for the 4 bases (Figure 6b), owing to the multiplication exchange law, where p ( i ) denotes the base compositions for i = G , C , A , T . The products determine the average interval distances of codons in genome sequences. Therefore, there are 20 classes of genomic codon interval distributions according to the 20 combinations rather than the 64 permutations of the 4 bases [53]. Consequently, there are 20 cognate tRNA-synthetase systems so as to improve the translation efficiency for tRNAs to recognise the corresponding codons, considering the 20 average interval distances of codons. So, the number “20” of the canonical amino acids actually should be attributed to a statistical origin at the sequence level. The 20 combinations of the 4 bases can be divided into 4 groups: < G > , < C > , < A > , < T > . H i e r a r c h y 1 and H i e r a r c h y 2 correspond < G > and < C > ; H i e r a r c h y 3 and H i e r a r c h y 4 correspond to < A > and < T > . Their positions on the roadmap are H i e r a r c h y 1 2 Y : < G > , H i e r a r c h y 1 2 R : < C > , H i e r a r c h y 3 4 Y : < A > , H i e r a r c h y 3 4 R : < T > . Each group can be divided into 5 combinations, which correspond to R o u t e 0 or R o u t e 1 3 , respectively. In the case < G > , < G , G , G > and < G , G , A > belong to R o u t e 0 ; < G , G , C > , < G , G , T > , and < G , C , A > belong to R o u t e 1 3 , and it is similar for the other cases < C > , < A > , < T > . These 20 combinations roughly correspond to the 20 cognate tRNAs (Figure 6b). This rough correspondence shows that the codons, especially those in H i e r a r c h y 1 3 , are assigned to the tRNAs based on the combinations, considering that the codons in H i e r a r c h y 4 are A T -rich, and the context sequences tend to form A T -rich repeats. Concretely speaking, the group of codons in the combinations < G G G > , < G G C > , < G G A > , < G G U > , < G C A > , < G C U > , < G A A > , < G A U > , < C C C > , < C C A > , < C C U > , < C A A > , < C A U > , < C U U > , < A A U > are assigned, respectively, to t 1 , t 2 and t 10 , t 3 , t 5 and t 12 , t 4 and t 9 and t 14 , t 8 and t 11 , t 20 , t 16 , t 6 , t 13 , t 7 , t 19 , t 18 , t 17 , t 15 (Figure 6b). In addition, the first stop codon appeared halfway in the evolution of tRNAs (Figure 6b). The order of combinations are simply organised by the bases in the order “G”, “C”, “A”, “U” (Figure 6b), considering the substitutions “G to C”, “G to A”, “C to U” on the roadmap (Figure 1a). And the amino acids are in the recruitment order. Then, a rough diagonal distribution of tRNAs has been obtained (Figure 6b), which is due to the evolutionary relationship between the genetic code and amino acids.

2.3.2. Evolution of tRNA

There was a post-initiation-stage stagnation (Figure 1a) between the initiation stage and the midway stage of the roadmap. Such a stagnation in the prebiotic evolution was just to await the birth of functional macromolecules. In this period, oligonucleotides with arbitrary finite sequences can be generated via the base substitutions G to A, G to C, and C to T in the triplex picture. The primordial sequences of the prototype tRNAs and the template RNAs of prototype aaRSs can be generated along the roadmap. In the light of complicated interactions between oligonucleotides and amino acids, some early tRNAs with certain anti-codons can be generated in the sequence evolution along the roadmap so as to carry the corresponding prebiotically synthetised phase I amino acids, respectively. These tRNAs were not necessarily homologous, as long as they were capable of fulfilling their respective tasks. There are two independent codon systems for tRNAs: the anti-codons and the para-codons. The anti-codons evolved along the roadmap, while the para-codons evolved with aaRSs (Figure 5c and Figure 7). When the para-codons did not evolve but the anti-codons evolved, only cognate tRNAs originated. However, when both the para-codons and the anti-codons evolved, more new tRNAs originated to carry the remaining amino acids.
There exists an assignment scheme for the genetic code. The 64 codons can be assigned to the 20 amino acids and stop codons with the help of approximate four dozens of tRNAs: t 1 , t 1 , t 1 + , t 2 , t 2 , t 2 + , t 3 , t 3 , t 4 , t 5 , t 5 , t 5 + , t 6 , t 6 + , t 6 + , t 7 , t 7 + , t 7 , t 7 , t 8 , t 8 , t 8 + , t 8 , t 8 , t 9 , t 9 , t 9 + , t 10 , t 10 , t 10 + , t 10 , t 10 , t 11 , t 12 , t 13 , t 14 , t 14 , t 15 , t 15 + , t 16 , t 17 , t 18 , t 19 , t 20 , t 20 (Figure 5c and Figure 6b). The naming rules for tRNAs are as follows. The tRNA series numbers are named after the recruitment order of the respective canonical amino acids. The prime tRNAs t 1 t 20 are the early recruited tRNAs that coevolve with the corresponding aaRSs. The derivative tRNAs t n + are the cognate tRNAs expanded within the codon boxes, namely with the same first two bases in codons. The derivative tRNAs t n are the cognate tRNAs expanded outside the codon boxes. The derivative tRNAs t n , n + and t n are the cognate tRNAs needed by wobble pairing rules. The bracket in “ ( t n ) ” indicates the same tRNA t n . It is also possible to generate more or less new tRNAs in the triplex picture for different species, so the numbers of tRNAs are different among species.
On one side, the tRNAs can recognise the respective codons according to the genetic code evolution along the roadmap. On the other side, they can recognise the respective aaRSs to combine with the respective aminoacyls. Among the 20 prime tRNAs t 1 t 20 , there are 13 type 5 y t r t 3 tRNAs ( t 1 , t 2 , t 3 , t 4 , t 5 , t 9 , t 10 , t 12 , t 14 , t 15 , t 16 , t 19 , t 20 ) and 7 type 5 R t Y t 3 tRNAs ( t 6 , t 7 , t 8 , t 11 , t 13 , t 17 , t 18 ) (Figure 5c). The codons for the type 5 y t r t 3 prime tRNAs are situated in the purine strand on the roadmap, whose first base are purine, except t 10 , t 12 , t 14 , while the codons for the type 5 R t Y t 3 prime tRNAs are situated in the Y strand on the roadmap, whose first base are pyrimidine. In total, there are 6 prime tRNAs ( t 1 , t 3 , t 6 , t 7 , t 17 , t 20 ) in R o u t e 0 , 3 prime tRNAs ( t 4 , t 8 , t 19 ) in R o u t e 1 , 8 prime tRNAs ( t 2 , t 5 , t 9 , t 11 , t 13 , t 15 , t 16 , t 18 ) in R o u t e 2 , and 3 prime tRNAs ( t 10 , t 12 , t 14 ) in R o u t e 3 (Figure 5c). The majority of prime tRNAs situated in the branch nodes, except t 15 , t 17 , t 19 , t 20 (Figure 5c). For each amino acid, several cognate tRNAs can be generated at certain steps of the roadmap.
1 G l y a a R S 1 ( G l y R S ) t 1 ( G G G ) , t 1 ( G G A ) , t 1 + ( G G C , G G U )
2 A l a a a R S 2 ( A l a R S ) t 2 ( G C G ) , t 2 ( G C A ) , t 2 + ( G C C , G C U )
3 G l u a a R S 3 ( G l u R S ) t 3 ( G A G ) , t 3 ( G A A )
4 A s p a a R S 4 ( A s p R S ) t 4 ( G A C , G A U )
5 V a l a a R S 5 ( V a l R S ) t 5 ( G U G ) , t 5 ( G U A ) , t 5 + ( G U C , G U U )
6 P r o a a R S 6 ( P r o R S ) t 6 ( C C C , C C U ) , t 6 + ( C C G ) , t 6 + ( C C A )
7 S e r a a R S 7 ( S e r R S ) t 7 ( U C C , U C U ) , t 7 + ( U C G ) , t 7 + ( U C A ) , t 7 ( A G C , A G U )
8 L e u a a R S 8 ( L e u R S ) t 8 ( C U G ) , t 8 ( C U A ) , t 8 + ( C U C , C U U ) , t 8 ( U U G ) , t 8 ( U U A )
9 T h r a a R S 9 ( T h r R S ) t 9 ( A C G ) , t 9 ( A C A ) , t 9 + ( A C C , A C U )
10 A r g a a R S 10 ( A r g R S ) t 10 ( C G G ) , t 10 ( C G A ) , t 10 + ( C G C , C G U ) , t 10 ( A G G ) , t 10 ( A G A )
11 C y s a a R S 11 ( C y s R S ) t 11 ( U G C , U G U )
12 T r p a a R S 12 ( T r p R S ) t 12 ( U G G )
13 H i s a a R S 13 ( H i s R S ) t 13 ( C A C , C A U )
14 G l n a a R S 14 ( G l n R S ) t 14 ( C A G ) , t 14 ( C A A )
15 I l e a a R S 15 ( I l e R S ) t 15 ( A U A ) , t 15 + ( A U C , A U U )
16 M e t a a R S 16 ( M e t R S ) t 16 ( A U G )
17 P h e a a R S 17 ( P h e R S ) t 17 ( U U C , U U U )
18 T y r a a R S 18 ( T y r R S ) t 18 ( U A C , U A U )
19 A s n a a R S 19 ( A s n R S ) t 19 ( A A C , A A U )
20 L y s a a R S 20 ( L y s R S ) t 20 ( A A G ) , t 20 ( A A A )
The following evolution of derivative tRNAs can be explained by the base substitution G to A along the roadmap (Figure 5c): t 1 ( G G G ) to t 1 ( G G A ) , t 2 ( G C G ) to t 2 ( G C A ) , t 3 ( G A G ) to t 3 ( G A A ) , t 5 ( G U G ) to t 5 ( G U A ) , t 6 + ( C C G ) to t 6 + ( C C A ) , t 7 + ( U C G ) to t 7 + ( U C A ) , t 8 ( C U G ) to t 8 ( C U A ) , t 8 ( U U G ) to t 8 ( U U A ) , t 9 ( A C G ) to t 9 ( A C A ) , t 10 ( C G G ) to t 10 ( C G A ) , t 10 ( A G G ) to t 10 ( A G A ) , t 14 ( C A G ) to t 14 ( C A A ) , t 20 ( A A G ) to t 20 ( A A A ) . Moreover, the following evolution of derivative tRNAs can be explained by the base substitution G to C along the roadmap (Figure 5c): t 1 ( G G G ) to t 1 + ( G G C , G G U ) , t 2 ( G C G ) to t 2 + ( G C C , G C U ) , t 5 ( G U G ) to t 5 + ( G U C , G U U ) , t 6 + ( C C G ) to t 6 ( C C C , C C U ) , t 8 ( C U G ) to t 8 + ( C U C , C U U ) , t 9 ( A C G ) to t 9 + ( A C C , A C U ) , t 10 ( C G G ) to t 10 + ( C G C , C G U ) . However, the following tRNAs can recognise the respective two codons whose third bases are C or U, owing to the wobble pairing (Figure 5c): t 1 + ( G G C , G G U ) , t 2 + ( G C C , G C U ) , t 4 ( G A C , G A U ) , t 5 + ( G U C , G U U ) , t 6 ( C C C , C C U ) , t 7 ( U C C , U C U ) , t 7 ( A G C , A G U ) , t 8 + ( C U C , C U U ) , t 9 + ( A C C , A C U ) , t 10 + ( C G C , C G U ) , t 11 ( U G C , U G U ) , t 13 ( C A C , C A U ) , t 15 + ( A U C , A U U ) , t 17 ( U U C , U U U ) , t 18 ( U A C , U A U ) , t 19 ( A A C , A A U ) .
The wobble pairing rules can be explained by the origin and evolution of tRNAs in the triplex picture. The transition from C to T occurred at the position # 6 on the roadmap, which resulted in the wobble pairing rule G : U o r C . Taking y 2 r 2 as a template, y t 2 with G C C is formed by the triplex base pairing, while r t 2 with G G C and r t 2 with G G U are formed, where the transition from C to U occurred in the formation of r t 2 . The complementary strands y t 2 and r t 2 combine into a tRNA with anti-codon G C C , where G at the first position of the anti-codon of the tRNA is paired with U at the third position of the triple code of an additional single strand r t 2 . It implies that the wobble pairing rule G : U had been established as early as the end of the initiation stage of the roadmap. The transition from C to T occurred at the position # 12 , which resulted in the wobble pairing rule U : G o r A . Taking y 10 r 10 as a template, y t 10 with C C G is formed by the triplex base pairing, and r t 10 with C G G and r t 10 with U G G are also formed, where the transition from C to U occurred in the formation of r t 10 . The complementary strands y t 10 and r t 10 combine into a tRNA with anti-codon U G G , where U at the first position of the anti-codon of the tRNA is paired with G at the third position of the triple code of an additional single strand y t 10 . The above explanation of the wobble pairing rules by tRNA mutations is supported by the observations of nonsense suppressor. For instance, the wobble pairing rule C : A for a U G A suppressor can be established by a transition from G to A at the 24 t h position of t R N A T r p . The wobble pairing rules G : U o r C and U : G o r A had been established early in the evolution of the genetic code, which continued to flourish so as to make full use of the short supply tRNAs.
The evolutionary relationship between tRNAs that corresponds to pairs of different amino acids can also be explained according to the evolution of tRNAs along the roadmap. For example, based on the substitution G to A, t 16 ( A U G , M e t ) can evolve to t 15 ( A U A , I l e ) , and based on the substitution G to C, t 3 ( G A G , G l u ) can evolve to t 4 ( G A C , G A U , A s p ) , and so on (Figure 5c). However, this kind of evolution of tRNAs involves not only anti-codons but also para-codons because it inevitably needs extra help from aaRSs. There is a close relationship between the evolution of tRNAs and the biosynthetic families of amino acids, so the sequences of tRNAs coevolved with the sequences of aaRSs at each step of the roadmap. The recognition between tRNAs and aaRSs will be explained next, where there are many technical details, and each step needs to be straightened out in order to draw a comprehensive conclusion.
The evolution of tRNAs played significant roles to implement the number of canonical amino acids as 20. There is an important difference between the early prime tRNAs t n and the late derivative tRNAs t n + . Generally speaking, the wobble pairing rules apply to the late derivative tRNAs t n + rather than to the early prime tRNAs t n (Figure 6b). The early prime tRNAs do not need wobble pairings so as to accurately implement the number of bases in codons as 3, whereas the late derivative tRNAs need wobble pairings so as to improve translation efficiency via codon degeneracy. This was a dynamic process to achieve that the number of canonical amino acids equals to the combination number of bases, which can hardly be fulfilled in lack of tRNAs but can be adjusted by choosing among the numerous candidates of tRNAs.

2.3.3. Palindrome

Palindromic sequences play significant roles not only in contemporary molecular biology but also in the prebiotic evolution. Palindromic or non-palindromic codons on the roadmap can produce different effects in the origin and evolution of informative macromolecules. The cloverleaf secondary structure of tRNAs can be explained by the complementary palindrome in assembling tRNAs. Furthermore, the evolution of aaRSs also depended strongly on the evolution of palindromic para-codons along the roadmap, which will be explained next.
There are two types of tRNAs: type 5 y t r t 3 and type 5 R t Y t 3 , where the two single RNA strands y t and r t , Y t and R t are complementary to each other. A D-loop and an anti-codon loop situate in the 5 -end RNA strand ( y t for type 5 y t r t 3 and R t for type 5 R t Y t 3 ), while a T Ψ C loop and a missing loop situate in the 3 -end RNA strand ( r t for type 5 y t r t 3 or Y t for type 5 R t Y t 3 ) (Figure 6a). The strand pair y t and r t or Y t and R t can form two pairs of hairpins in the complementary double-stranded RNA, where the D-loop and the T Ψ C loop constitute a pair of hairpins, and the anti-codon loop and the missing complementary loop constitute another pair of hairpins (Figure 6a). When the missing loop has been deleted, the three other loops form a cloverleaf-shaped tRNA (Figure 6a). A palindromic nucleotide sequence can form a hairpin, and palindromic complementary double RNA sequences can form a pair of hairpins, which can account for the cloverleaf secondary structure of tRNAs (Figure 6a and Figure 8). If there are palindromic sequence intervals in the 5 -end RNA strand, there will also be the corresponding palindromic sequence intervals in the complementary 3 -end RNA strand. A D-loop and an anti-codon loop can form in the 5 -end RNA strand, owing to the complementarity in the palindromic sequence intervals. Accordingly, a T Ψ C loop and a missing loop can also form in the 3 -end RNA strand, which correspond to the D-loop and the anti-codon loop, respectively. After deleting the missing loop, a catenated RNA strand with three loops can form a cloverleaf secondary structure, and consequently, a stable tertiary structure can form. Therefore, palindromic sequences contribute to the formation of stable RNA structures in the prebiotic evolution. It is easy to generate palindromic oligonucleotides according to the base substitutions along the roadmap (Figure 5a,b). So, it tended to generate pairs of palindromic single RNA strands so as to assemble cloverleaf-shaped tRNA candidates. Numerous tRNA candidates can be produced by such an assembly line during the prebiotic evolution, where several qualified tRNAs with proper anti-codons and para-codons can be selected to carry the respective amino acids. Although it is difficult for the origin of aaRSs in the prebiotic evolution (Figure 8), it is not too difficult for the origin of tRNAs and amino acids. The early aaRSs had chance to adapt by choosing among the numerous tRNA candidates and amino acid candidates. Thus, the degree of difficulty for the origin of life can be reduced to some extent. Yet, if both tRNAs and aaRSs had been rare, there would have been little opportunity to establish the correspondence relationship between aaRSs and tRNAs.

2.4. Origin of aaRS

2.4.1. Para-Codon

On one hand, an aaRS is able to recognise cognate tRNAs by para-codons (Figure 6b and Figure 8). On the other hand, the aaRS is able to catalyse the esterification of proper amino acid to its cognate tRNA (Figure 8). The origin of aaRS is one of the most difficult events in the origin of life because a primordial mechanism must be invented to generate the earliest proteins in absence of ribosome, and, meanwhile, aaRSs have to possess both para-codons and enzyme activity. It should be a rare critical event for the emergence of the first aaRS with enzyme activity in primordial sequence evolution. Following this process, the enzyme activity can transmit from the common ancestor of aaRSs to all the descendant aaRSs, either to the class I or class II aaRSs. Thus, the evolution of para-codons became to play a leading role in the evolution of aaRSs. The evolution of aaRS closely related to both the evolution of tRNA and the biosynthesis families of amino acids. The evolution of para-codons can be explained in the triplex picture. The para-codons of aaRSs coevolved with the sequences of tRNAs along the roadmap. The abilities to recognise certain amino acids came from the coevolution within the biosynthetic families of amino acids. According to the sequence evolution in the triplex picture, the recognition of tRNA by aaRS can be explained by the sequence homology between the template RNA of aaRS and the corresponding major or minor groove side sequence of tRNA. The recognition between aaRS and its template RNA led to the recognition between aaRS and the corresponding tRNA.
There are two types of tRNA according to the generation process of tRNA along the roadmap: type 5 y t r t 3 and type 5 R t Y t 3 (Figure 5a,b), where the 5 side corresponds to the minor groove, while the 3 side to the major groove. Additionally, the aaRSs can combine with the two types of tRNAs from either minor groove or major groove (Figure 5c and Figure 8). Thus, there are four classes of aaRSs: class y t m aaRS, class r t M aaRS, class R t m aaRS, class Y t M aaRS (Figure 5c and Figure 7). The four symbols indicate that aaRSs combine with tRNAs, respectively, from the minor groove (m) side 5 y t (y) of type 5 y t r t 3 tRNA, from the major groove (M) side r t 3 (r) of type 5 y t r t 3 tRNA, from the minor groove (m) side 5 R t (R) of type 5 R t Y t 3 tRNA, and from the major groove (M) side Y t 3 (Y) of type 5 R t Y t 3 tRNA.
The evolution of aaRSs occurred between the four classes of aaRSs (Figure 7). The sequences of para-codon can evolved between the homologous strands, and it can also evolve between the complementary strands when the sequences of para-codons are palindromic (Figure 7). According to the evolution of palindromic para-codons and the origin of the template RNA of aaRS (Figure 8), the class y t m aaRS can be complementary with the class r t M aaRS owing to the complementary two strands 5 y t and r t 3 that combine into the type 5 y t r t 3 tRNA (Figure 5a), and the class R t m aaRS can be complementary with the class Y t M aaRS owing to the complementary two strands 5 R t and Y t 3 that combine into the type 5 R t Y t 3 tRNA (Figure 5b). According to the evolution of palindromic para-codons and the coevolution of the template RNAs of aaRSs with tRNAs (Figure 7 and Figure 8), the class r t M aaRS can be complementary with the class Y t M aaRS, and the class R t m aaRS can be complementary with the class y t m aaRS. The class y t m aaRS can be homologous to the class Y t M aaRS, and the class r t M aaRS can be homologous to the class R t m aaRS. These relationships are useful for studying the evolution of aaRS along the roadmap.
The aaRSs are denoted in evolutionary order as a a R S 1 to a a R S 20 instead of G l y R S to L y s R S for convenience, according to the recruitment order of the corresponding amino acids from N o . 1 G l y to N o . 20 L y s , respectively. The ancestor of aaRSs, namely the major groove a a R S 1 , belongs to the class r t M aaRS, which catalysed pairing between the amino acid 1 G l y and the tRNA t 1 and which approaches to the type 5 Y t R t 3 tRNA t 1 from the major groove side R t 3 (Figure 7). The a a R S 1 evolved into the same class a a R S 2 and the Y t M class a a R S 7 (Figure 7). The a a R S 2 evolved into a a R S 3 . According to the evolution of the G l u biosynthesis family, a a R S 3 evolved into a a R S 6 , a a R S 10 , a a R S 13 , and, furthermore, a a R S 14 , and a a R S 3 evolved into a a R S 4 (Figure 7). According to the evolution of the A s p biosynthesis family, a a R S 4 evolved into a a R S 9 , a a R S 19 , and, furthermore, a a R S 15 , a a R S 16 , and a a R S 20 (Figure 7). According to the evolution of the S e r biosynthesis family, a a R S 7 evolved into a a R S 11 and a a R S 12 . According to the evolution of the V a l biosynthesis family, a a R S 2 evolved into a a R S 5 , a a R S 8 . According to the evolution of the P h e biosynthesis family, a a R S 8 evolved into a a R S 17 and a a R S 18 . In general, the evolutions via the G l u and S e r biosynthesis families took place in H i e r a r c h y 1 and H i e r a r c h y 2 , corresponding to the codons whose second bases are G or C, while the evolutions via the A s p , V a l and P h e biosynthesis families took place in H i e r a r c h y 3 and H i e r a r c h y 4 , corresponding to the codons whose second bases are A or U (Figure 5c). This result accounts for the observation that the second bases of codons relate to the biosynthesis families of amino acids (Figure 4c).
The evolution of aaRSs depends strongly on the para-codon evolution (Figure 7 and Figure 8). Some para-codons of aaRS are homologous but not complementary to the previous para-codons. However, the para-codons of aaRSs that are complementary to the previous para-codons had to be palindromic. Some evolutions occurred between the same classes, which includes from a a R S 1 to a a R S 2 , from a a R S 3 to a a R S 10 , from a a R S 15 to a a R S 16 , from a a R S 4 to a a R S 9 , from a a R S 4 to a a R S 19 , from a a R S 8 to a a R S 17 (Figure 7). Some evolutions of palindromic para-codons occurred between class y t m and class r t M , which includes from a a R S 2 to a a R S 3 , from a a R S 2 to a a R S 5 , from a a R S 3 to a a R S 4 , from a a R S 9 to a a R S 15 , from a a R S 19 to a a R S 20 (Figure 7). Some evolutions of palindromic para-codons occurred between class R t m and class Y t M , which includes from a a R S 7 to a a R S 11 , from a a R S 17 to a a R S 18 (Figure 7). In addition, from a a R S 1 to a a R S 7 occurred between class r t M and class Y t M ; from a a R S 2 to a a R S 8 occurred between class r t m and class R t m ; from a a R S 3 to a a R S 6 , from a a R S 13 and from a a R S 13 to a a R S 14 occurred between class y t m and class Y t M ; from a a R S 11 to a a R S 12 occurred between class R t m and class y t m (Figure 7).
The evolution of aaRSs along the roadmap helps to clarify the traditional classifications of aaRSs in the literature (Figure 4c), such as the major groove (M), minor groove (m) classification [31], or the class I ( I A , I B , I C ), class I I ( I I A , I I B , I I C ) classification (Gesteland et al. 2006). The four classes y t m , r t M , R t m , Y t M classification here makes clear some confused ideas in the above classifications. The majority of class r t M aaRSs correspond to class I I A aaRSs, and the majority of class R t m aaRSs correspond to class I A aaRSs, which indicates an evolution from I I A to I A due to the reverse sequence relationship between the RNA templates of class r t M aaRS and class R t m aaRS (Figure 7). The majority of Y t M aaRSs correspond to class I I A aaRSs, which were from the homologous r t M aaRSs. In addition, the majority of class y t m aaRSs correspond to class I A or I B aaRSs, which were from the complementary r t M aaRSs due to evolution of palindromic para-codons (Figure 7). The traditional classification of aaRSs by the major groove and minor groove are reasonable in practice because the template RNAs of aaRSs are complementary between the major groove class and the minor groove class, where the para-codons are palindromic to link the two classes. Meanwhile, the traditional classification of aaRS by classes A, B, and C reflects some reasonable evolutionary relationships between aaRSs based on the evolution of the biosynthetic families.

2.4.2. Coevolution of tRNA with aaRS

A comprehensive study of the evolution of the genetic code inevitably involves the origins of tRNAs and aaRSs. The intricate evolutionary relationships between tRNAs and aaRSs can be explained step by step for each codon in the triplex picture (Figure 7). The initiation stage on the roadmap played a fundamental role. At the end of the initiation stage, arbitrary finite sequences can be generated, which provided opportunities to generate complex RNAs, such as tRNAs, the template RNAs for aaRSs, ribozymes and the prototype of rRNAs, coding and non-coding RNAs, etc. The primordial translation mechanism were invented during the evolution of the genetic code. There were a junior stage and a senior stage of the primordial translation mechanism (Figure 8). The ancestor of aaRSs originated in the junior stage when no tRNAs were involved (Figure 8). However, the tRNAs and ribosomes were indispensable in the senior stage of the primordial translation mechanism, as well as in the modern translation mechanism. Certainly, the translation efficiency was low in the junior stage, was medium in the senior stage, and was high in the modern translation mechanism. There exists non-standard translation in experiments, such as direct translation from DNA to protein [60,61].
The benefits to explain the origins of tRNAs and aaRSs in the triplex picture are as follows. First, the ancestors of tRNAs and aaRSs did not originate from the random sequences; the sequence evolution along the roadmap was recurrent so the informative molecules were generated recurrently and accumulated in the prebiotic surroundings. Second, the evolutionary relationships between tRNAs and aaRSs can be naturally explained by the relationships of the homologous strands of the evolving triplex DNAs. The sequence of the template of the ancestor aaRS can be generated in the triplex picture by the junior stage of the primordial translation mechanism; meanwhile, the sequence of ribozyme can also be generated by the other strand of the same triplex nucleic acid. Thus, the earliest proteins, such as the ancestor of aaRSs, can be generated by the complex consisting of the ribozyme, the RNA template of aaRS, as well as a triplex DNA. Such a complex itself was the product of sequence evolution of triplex nucleic acids based on specific substitutions of triplex base pairs, where both the sequence for ribozyme and the sequence for the template of ancestor aaRS with enzyme activity were generated in different strands of the same triplex DNA by chance. Although the efficiency to produce proteins was low in this junior stage, it was feasible to generate a small number of proteins by this complex consisting only nucleic acids. The ancestor of aaRS with enzyme activity can be generated by this complex, which naturally tends to combine with the corresponding RNA template.
If the sequence of tRNA is homologous to the above RNA template, the ancestor aaRS also tends to combine with the tRNA. Furthermore, the above requirement can be reduced to homologous para-codons. Thus, in the triplex picture, the aaRSs coevolved with the para-codons, while the tRNAs coevolved with the codons. When considering the homologous or complementary sequence relationships, the reverse sequence relationships and the base substitution relationships in the strands of triplex nucleic acids, the intricate evolutionary relationships between tRNAs and aaRSs can be revealed in detail (Figure 5c and Figure 7). It is more difficult to generate aaRSs than to generate tRNAs, so there existed numerous tRNAs candidates in the prebiotic surroundings. Only the tRNAs that were recognised by aaRSs can be recruited into the living system. For example, the RNA 5 y t 1 r t 1 3 were recognised by the class r t M a a R S 1 , so it was chosen as the first tRNA t 1 to transport 1 G l y . The prime RNAs t n were recognised by a a R S n , so they were chosen as the tRNAs to transport N o . n amino acids (Figure 5c and Figure 7), respectively. Similarly, the derivative RNAs t n , t n + , t n + , t n , t n , with non-palindromic or palindromic para-codons homologous to the para-codons of t n , were recognised by a a R S n , so they became the tRNAs to transport N o . n amino acids, respectively. Para-codons are the key factors for the recognition between tRNAs and aaRSs. The types of tRNAs are not necessarily same for the cognate tRNAs. Generally, the aaRSs combine with the cognate tRNAs from the same side. For example, a a R S 8 combines with the 5 R t Y t 3 type cognate tRNAs t 8 , t 8 , t 8 + , t 8 , and t 8 from the minor groove side, where the para-codons can be non-palindromic (Figure 7); a a R S 7 combines with the 5 R t Y t 3 type tRNAs t 7 , t 7 + , t 7 and the 5 y t r t 3 type tRNAs t 7 from the major groove side, where the para-codons of the two types of tRNAs have to be palindromic (Figure 7). However, a a R S 10 combines with the 5 y t r t 3 type tRNAs t 10 , t 10 and the 5 R t Y t 3 type tRNA t 10 + from the minor groove side, while combine with the 5 y t r t 3 type tRNAs t 10 and t 10 from the major groove side, where the para-codons also need to be palindromic (Figure 7).
The biosynthetic families played essential roles in the evolution of aaRSs when both anti-codon and para-codon had changed (Figure 7). There were far more than 20 amino acids in the prebiotic surroundings. Only the amino acids that were recognised by aaRSs can be recruited into the living system. When a a R S 1 involved to a a R S 2 , a a R S 2 recognised 2 A l a , as well as t 2 , from the major groove side, which inherited from a a R S 1 that recognised 1 G l y , as well as t 1 , from the major groove side. When a a R S 2 involved to a a R S 3 , a a R S 3 recognised 3 G l u , as well as t 3 , from the minor groove side owing to the palindromic para-codons, which inherited from a a R S 2 that recognised 2 A l a , as well as t 2 , from the major groove side. When aaRSs involved in the same biosynthetic families: G l u family, A s p family, V a l family, S e r family, and P h e family, the new aaRSs tended to recruit the new amino acids with the similar chemical properties in the same biosynthetic family. When aaRSs evolved from a a R S 1 to a a R S 20 , the enzyme activity transmitted between the aaRSs, and the recognised tRNAs t 1 to t 20 and the recognised amino acids N o . 1 G l y to N o . 20 L y s were recruited, where the evolving non-palindromic or palindromic para-codons linked these evolutions.
The evolutionary pairs of aaRSs combining two sides of the same tRNAs along the roadmap agree with the results based on structures: I l e R S and T h r R S , G l n R S ( G l u R S ) and A s p R S , and T y r R S and P h e R S [4,62], and additionally S e r R S and C y s R S . The aaRS pair T h r R S and I l e R S (namely a a R S 9 and a a R S 15 ) corresponds to an evolution from r t M a a R S 9 to y t m a a R S 15 . The aaRS pair G l u R S and A s p R S (namely a a R S 3 and a a R S 4 ) corresponds to an evolution from y t m a a R S 3 to r t M a a R S 4 . The aaRS pair P h e R S and T y r R S (namely a a R S 17 and a a R S 18 ) corresponds to an evolution from R t m a a R S 17 to Y t M a a R S 18 . The aaRS pair S e r R S and C y s R S (namely a a R S 7 and a a R S 11 ) corresponds to an evolution from Y t M a a R S 7 to R t m a a R S 11 .
The recruitment order of the 20 amino acids from N o . 1 to N o . 20 can be obtained by the roadmap (Figure 3a and Figure 9), which meets the basic requirement that Phase I amino acids appeared earlier than the Phase II amino acids [1,2]. The species with complete genome sequences are sorted by the order R 10 / 10 according to their amino acid frequencies, where the order R 10 / 10 is defined as the ratio of the average amino acid frequencies for the last 10 amino acids to that for the first 10 amino acids [8,36,63,64,65]. Along the evolutionary direction indicated by the increasing R 10 / 10 , the amino acid frequencies vary in different monotonous manners for the 20 amino acids, respectively (Figure 9). For the early amino acids G l y , A l a , A s p , V a l , P r o , the amino acid frequencies tend to decrease greatly, except for G l u to increase slightly (Figure 9); for the midterm amino acids S e r , L e u , T h r , C y s , T r p , H i s , G l n , the amino acid frequencies tend to vary slightly, except for A r g to decrease greatly (Figure 9); for the late amino acids I l e , P h e , T y r , A s n , L y s , the amino acid frequencies tend to increase greatly, except for M e t to increase slightly (Figure 9). In the recruitment order from N o . 1 to N o . 20 , the variation trends of the amino acid frequencies increase in general; namely, the later the amino acids recruited, the more greatly the amino acid frequencies tend to increase (Figure 9). The recruitment order of the amino acids from N o . 1 to N o . 20 is supported not only by the previous roadmap theory but also by this pattern of amino acid frequencies based on genomic data.

2.5. Recruitment of Codons

The roadmap only provided a logical substitution relationship of the 64 codons based on the stabilities of triplex base pairs (Figure 1a). It was the tRNAs and aaRSs that gave the genetic significance to the 64 codons (Figure 5c). The pair connections and route dualities observed in the recruitment of codons along the roadmap should be explained based on the coevolution of tRNAs with aaRSs (Figure 5b and Figure 7). The standard genetic code table can be comprehended in a biological context. Incidentally, the non-standard codons can also be explained.

2.5.1. Pair Connection

The pair connections can be explained by the coevolution of tRNAs with aaRSs when a a R S n recognise, respectively, both the prime tRNAs t n (in bold in the following pair connections and route dualities) and the corresponding derivative tRNAs t n , t n + and t n , where the anti-codons of tRNAs change but the para-codons of tRNAs do not change, or when t n have the efficient ability to recognise similar codons by wobble pairings (Figure 7c and Figure 7). Taking # 1 1 G l y # 3 as an example, the 5 y t r t 3 type tRNA t 1 and the class r t M a a R S 1 originated at # 1 on the roadmap, and the same type tRNA t 1 appeared at # 3 on the roadmap. The a a R S 1 for 1 G l y can recognise both the same type tRNAs t 1 and t 1 via the same para-codon. Namely, tRNAs t 1 and t 1 recognise, respectively, the codons G G G at # 1 and G G A at # 3 on the purine stands (R) on the roadmap (Figure 5c).
The following pair connections are due to wobble pairings or the tRNA evolution from t n to t n , both of which can be recognised by the respective same a a R S n (Figure 5c, Figure 6b and Figure 7).
1Gly, aaRS1, t1→t1’: #1 R-Gly-#3 R2Ala, aaRS2, t2→t2’: #7 R-Ala-#9 R
3Glu, aaRS3, t3→t3’: #4 R-Glu-#23 R4Asp, aaRS4, t4 wobbling: #5 R-Asp-#21 R
5Val, aaRS5, t5→t5’: #19 R-Val-#24 R6Pro, aaRS6, t6 wobbling: #1 Y-Pro-#11 Y
7Ser, aaRS7, t7 wobbling: #3 Y-Ser-#14 Y8Leu, aaRS8, t8→t8’: #20 Y-Leu-#25 Y
9Thr, aaRS9, t9→t9’: #16 R-Thr-#18 R10Arg, aaRS10, t10→t10’: #10 R-Arg-#13 R
11Cys, aaRS11, t11 wobbling: #9 Y-Cys-#18 Y12Trp, aaRS12, t12 wobbling: #12 R-Trp-#(15 R)
13His, aaRS13, t13 wobbling: #19 Y-His-#22 Y14Gln, aaRS14, t14→t14’: #20 R-Gln-#28 R
15Ile/16Met,aaRS15/16,t15/t16:#29R-Ile/Met-#22R17Phe, aaRS17, t17 wobbling: #23 Y-Phe-#32 Y
18Tyr, aaRS18, t18 wobbling: #24 Y-Tyr-#29 Y19Asn, aaRS19, t19 wobbling: #26 R-Asn-#30 R
20Lys, aaRS20, t20→t20’: #27 R-Lys-#32 Rstop, no aaRS, no tRNA: #25 R-stop-#31 R
Especially, in the pair connection # 29 R I l e / M e t # 22 R , a a R S 15 for 15 I l e evolved to a a R S 16 for 16 M e t , and the corresponding t 15 evolved to t 16 by changing both anti-codon and para-codon.
The following pair connections are due to wobble pairings or the tRNA evolution from t n + to t n + , both of which can be recognised by the respective same a a R S n (Figure 5c, Figure 6b, and Figure 7).
1Gly, aaRS1, t 1 + wobbling: #2 R-Gly-#6 R2Ala, aaRS2, t 2 + wobbling: #2 Y-Ala-#8 Y
5Val, aaRS5, t 5 + wobbling: #5 Y-Val-#26 Y6Pro, aaRS6, t 6 + t 6 + : #10 Y-Pro-#12 Y
7Ser, aaRS7, t 7 + t 7 + : #13 Y-Ser-#15 Y8Leu, aaRS8, t 8 + wobbling: #4 Y-Leu-#27 Y
9Thr, aaRS9, t 9 + wobbling: #6 Y-Thr-#17 Y10Arg, aaRS10, t 10 + wobbling: #7 Y-Arg-#16 Y
15Ile, aaRS15, t 15 + wobbling: #21 Y-Ile-#30 Y
The following pair connections are due to wobble pairings or the tRNA evolution from t n to t n , both of which can be recognised by the respective same a a R S n (Figure 5c, Figure 6b, and Figure 7).
7Ser, aaRS7, t 7 wobbling: #8 R-Ser-#17 R8Leu, aaRS8, t 8 t 8 : #28 Y-Leu-#31 Y
10Arg, aaRS10, t 10 t 10 : #11 R-Arg-#14 R
The pair connections between non-standard codons are also due to the non-standard tRNA evolution. The non-standard tRNAs t n with non-standard anti-codons can also be recognised by a a R S n . The existence of non-standard codons indicates a variety of possibilities to choose tRNAs among the candidate tRNAs by the aaRSs during the evolution of the genetic code. The non-standard genetic code system can exist in case of certain metabolic cycle (Figure 5c and Figure 7).
7Ser, aaRS7, t 7 t 7 : #11 R-Ser-#14 Rstop, no aaRS, no tRNA: #11 R-Ser-#14 R
9Thr, aaRS9, t 9 wobbling: #4 Y-Thr-#27 Y9Thr, aaRS9, t 9 + t 9 + : #20 Y-Thr-#25 Y
14Gln, aaRS14, t 14 t 14 : #25 R-Gln-#31 R

2.5.2. Route Duality

Route duality refers to the relationships between pair connections in different routes. The route duality can also be explained by the coevolution of tRNAs with aaRSs when a a R S n recognise both the prime tRNAs t n and the corresponding derivative tRNAs t n + and t n , respectively. Taking the route duality # 7 A l a # 9 # 2 A l a # 8 , for example, there were two pair connections: # 7 A l a # 9 connecting via the 5 y t r t 3 type tRNA t 2 , t 2 and # 2 A l a # 8 connecting via the 5 R t Y t 3 type tRNA t 2 + . The route duality between # 7 A l a # 9 in R o u t e 2 and # 2 A l a # 8 in R o u t e 1 is due to the fact that a a R S 2 for 2 A l a recognises both the tRNAs t 2 , t 2 and the different type tRNAs t 2 + by same para-codon.
The following route dualities are due to the tRNA evolution from t n to t n + or t n , all of which can be recognised by the respective same a a R S n (Figure 5c, Figure 6b and Figure 7).
1Gly, aaRS1, t1 t 1 + #1-Gly-#3 (Route 0) ∼ #2-Gly-#6 (Route 1)
2Ala, aaRS2, t2 t 2 + #7-Ala-#9 (Route 2) ∼ #2-Ala-#8 (Route 1)
5Val, aaRS5, t5 t 5 + #19-Val-#24 (Route 2) ∼ #5-Val-#26 (Route 1)
6Pro, aaRS6, t6 t 6 + #1-Pro-#11 (Route 0) ∼ #10-Pro-#12 (Route 3)
7Ser, aaRS7, t7 t 7 + #3-Ser-#14 (Route 0) ∼ #13-Ser-#15 (Route 3)
            and t7 t 7 #3-Ser-#14 (Route 0) ∼ #8-Ser-#17 (Route 1)
8Leu, aaRS8, t8 t 8 + #20-Leu-#25 (Route 3) ∼ #4-Leu-#27 (Route 0)
            and t8 t 8 #20-Leu-#25 (Route 3) ∼ #28-Leu-#31 (Route 3)
9Thr, aaRS9, t9 t 9 + #16-Thr-#18 (Route 2) ∼ #6-Thr-#17 (Route 1)
10Arg, aaRS10, t10 t 10 + #10-Arg-#13 (Route 3) ∼ #7-Arg-#16 (Route 2)
            and t10 t 10 #10-Arg-#13 (Route 3) ∼ #11-Arg-#14 (Route 0)
The relationship between pair connections via aaRS evolution can be regarded as quasi route dualities (Figure 5c, Figure 6b and Figure 7).
3Glu/4Asp, t 3 / t 4 , aaRS3 → aaRS4#4-Glu-#23 (Route 0) ∼#5-Asp-#21 (Route 1)
7Ser/10Arg, t 7 / t 10 , aaRS7 / aaRS10#8-Ser-#17 (Route 1) ∼ #11-Arg-#14 (Route 0)
11Cys/12Trp, t 11 / t 12 , aaRS11 → aaRS12#9-Cys-#18 (Route 2) ∼#12-Trp-(#15) (Route 3)
13His/14Gln, t 13 / t 14 , aaRS13 → aaRS14#19-His-#22 (Route 2) ∼#20-Gln-#28 (Route 3)
15Ile/16Met, t 15 , t 16 / t 15 + ,aaRS15→aaRS16#29-Ile/Met-#22 (Route 2) ∼ #21-Ile-#30 (Route 1)
8Leu/17Phe, t 8 / t 17 , aaRS8 → aaRS17#28-Leu-#31 (Route 3) ∼ #23-Phe-#32 (Route 0)
18Tyr/stop, t18, aaRS18#24-Tyr-#29 (Route 2) ∼#25-stop-#31 (Route 3)
19Asn/20Lys, t 19 / t 20 , aaRS19 → aaRS20#26-Asn-#30 (Route 1) ∼#27-Lys-#32 (Route 0)
The route dualities between non-standard pair connections are also due to the non-standard tRNA evolution. The non-standard tRNAs t n and t n + with non-standard anti-codons can also be recognised by the respective same a a R S n (Figure 5c and Figure 7). The phenomenon of non-standard genetic code is due to alternative choice of tRNAs by aaRSs as small probability events in the fulfilment of the genetic code.
7Ser, aaRS7, t 7 t 7 #8-Ser-#17 (Route 1) ∼ #11-(Ser)-#14 (Route 0)
9Thr, aaRS9, t 9 t 9 + #4-(Thr)-#27 (Route 0) ∼ #20-(Thr)-#25 (Route 3)
stop#11-(stop)-#14 (Route 0) ∼ #15-stop-#31 (Route 3)
The 4 × 4 codon boxes in the standard genetic code table come from the 8 route dualities and the 8 quasi route dualities (Table 2 and Figure 4a,b), where the pair connections are from H i e r a r c h y 1 to H i e r a r c h y 2 , from H i e r a r c h y 2 to H i e r a r c h y 3 , and from H i e r a r c h y 3 to H i e r a r c h y 4 , only. And the route dualities only exist between R o u t e 0 and R o u t e 1 , between R o u t e 2 and R o u t e 3 , between R o u t e 0 and R o u t e 3 , and between R o u t e 1 and R o u t e 2 , but not between R o u t e 0 and R o u t e 2 and R o u t e 1 and R o u t e 3 (Figure 4a,b).

2.6. Codon Degeneracy

The degeneracies 6, 4, 3, 2, or 1 for the 20 amino acids can be explained one by one according to pair connections and route dualities on the roadmap based on the coevolution of tRNAs with aaRSs in the triplex picture (Figure 5c, Figure 6b and Figure 7). Especially, the evolution of aaRSs based on the biosynthetic families played significant roles in the expansion of the genetic code. The degeneracy 2 mainly results from pair connections. The degeneracy 4 or 6 mainly result from the expansion of the genetic code from the initial subset by route dualities for S e r , L e u , A l a , V a l , P r o , and T h r (Figure 3a,b).
The degeneracy 6 for S e r , L e u , and A r g can be explained by pair connections and route dualities (Figure 1a, Figure 3b, Figure 5c, Figure 6b and Figure 7), where S e r and L e u belong to the initial subset, and A r g was recruited immediately after the initial subset. All of them have appeared in R o u t e 0 . The 6 codons of S e r satisfy both the route duality and pair connection
# 3 S e r # 14 # 13 S e r # 15   and   # 8 S e r # 17 .
The 6 codons of L e u satisfy both the route duality and pair connection
# 20 L e u # 25 # 4 L e u # 27   and   # 28 L e u # 31 .
The 6 codons of A r g satisfy both the route duality and pair connection
# 10 A r g # 13 # 7 A r g # 16   and   # 11 A r g # 14 .
The degeneracy 4 for G l y , A l a , V a l , P r o , and T h r can be explained by route dualities (Figure 1a and Figure 3b). All of them belong to the initial subset. The degeneracy 4 for G l y satisfy the route duality:
# 1 G l y # 3 # 2 G l y # 6 .
The degeneracy 4 for A l a satisfy the route duality:
# 2 A l a # 8 # 7 A l a # 9 .
The degeneracy 4 for V a l satisfy the route duality:
# 5 V a l # 26 # 19 V a l # 24 .
The degeneracy 4 for P r o satisfy the route duality:
# 1 P r o # 11 # 10 P r o # 12 .
The degeneracy 4 for T h r satisfy the route duality:
# 6 T h r # 17 # 16 T h r # 18 .
The degeneracy 2 for G l u , A s p , C y s , H i s , G l n , P h e , T y r , A s n , and L y s can be explained by pair connections (Figure 1a and Figure 3b). They satisfy the following pair connections, respectively: # 4 G l u # 23 , # 5 A s p # 21 , # 9 C y s # 18 , # 19 H i s # 22 , # 20 G l n # 28 , # 23 P h e # 32 , # 24 T y r # 29 , # 26 A s n # 30 , # 27 L y s # 32 . The degeneracy 3 for I l e and the degeneracy 1 for M e t satisfies the route duality (Figure 1a, Figure 3b, Figure 5c, Figure 6b and Figure 7).
# 21 I l e # 30 # 22 M e t / I l e # 29 .
The degeneracy 1 for T r p satisfies the pair connection for nonstandard genetic code # 12 T r p / s t o p ( T r p ) # 15 . This pair connection includes a stop codon; the other stop codons satisfy the pair connection: # 25 s t o p # 31 (Figure 1a, Figure 3b, Figure 5c, Figure 6b and Figure 7).

3. Results

3.1. Driving Force in the Prebiotic Sequence Evolution

First, I propose an elegant roadmap for the evolution of the genetic code (Figure 1a). Around the middle of the last century, double helix DNAs, the genetic code, as well as triplex DNAs, were discovered, the former two of which greatly enhanced our understanding of life. There are indeed profound relationships among the above three discoveries. Although triple-helical nucleic acids are rare in vivo, they might be the unsung heroes in the origin of life. According to the substitutions of triplex base pairs from weak to strong along the roadmap, the recruitment of the 64 codons has been described from initiation to expansion and, finally, to the ending, and, hence, the perplexing codon degeneracy has been obtained.
The whole process is complicated and cumbersome, and has been explained step by step in the Methods section. Here is an overview of the basic process. Concretely speaking, the stability of the 16 triplex base pairs in triplex DNAs are from instability (−), weak (+) to strong ( + + , 3 + , 4 + ) [10,11]. This stability order in experiments is crucial to establish a roadmap for the evolution of the genetic code. P o l y C · P o l y G P o l y G is a common and easily formed Y R R triplex DNA [10,13], which is bound together by triplex base pair C G G . The sequences evolved via substitutions between triplex base pairs when the strands of triplex DNAs combined and separated alternatively. Only three kinds of substitutions between triplex base pairs are practically required to obtain a complete set of 64 codons on the roadmap (Figure 1 and Figure 2): (1) substitution of ( + ) C G G by ( + + ) C G A (transition from G to A with increasing stability from + to + + ). This is the most common substitution on the roadmap by which all the codons in R o u t e 0 and most codons in R o u t e 1 3 were recruited; (2) substitution of ( + ) C G G by ( 4 + ) C G C (transversion from G to C with increasing stability from + to 4 + ), which blazed a new path at # 2 , # 7 , # 10 for the recruitment of codons in R o u t e 1 3 , respectively; (3) substitution of ( + ) G C C by ( + + ) G C T (transition from C to T with increasing stability from + to + + ) at # 6 , # 19 , # 12 , by which the remaining codons in R o u t e 1 3 were recruited.
Hence, a roadmap has been obtained with 4 Routes and 4 Hierarchies (Figure 1a, Figure 3b and Figure 4a). This unique roadmap has narrowly avoided those unstable triplex base pairs that can hinder the sequence evolution of triplex DNAs. The roadmap describes recruitments of both the 64 codons and the 20 amino acids in proper order during coevolution of tRNAs with aaRSs. The initial codon pair G G G · C C C ( # 1 ) corresponds the amino acid pair G l y and P r o , and the consequent codon pair G G C · G C C (G to C at # 2 ) corresponds a new amino acid pair G l y and A l a . The obtained pair connection # 1 G l y # 2 indicates that the common G l y is encoded by G G G in the former pair and G G C in the latter pair. Pair connections appear step by step along the roadmap, which relates to the evolution of the corresponding tRNAs. In addition, there are route dualities between pair connections, which relate to the evolution of the corresponding aaRSs. The expansion of codons along the roadmap has been explained by route dualities from the Phase I amino acids [34] A l a , V a l , P r o , S e r , L e u , and T h r , which are due to recognition of tRNAs by the corresponding aaRSs step by step. In addition, stop codons and non-standard genetic code often occur at the ending stage. Thus, the intricate codon degeneracy has been obtained based on the incremental stability of triplex base pairs. In the triplex picture for the prebiotic evolution, the base substitution of triplex DNA drives both the recruitment of the 64 codons and the corresponding coevolution of tRNAs and aaRSs, step by step.
The benefit of the triplex picture is that nonrandom sequences can be generated routinely in the prebiotic evolution. The modification of homopolymers became a routine process in forming the codon degeneracy. This non-living apparatus based on sequence evolution of triplex DNAs was able to maintain during geologically long period, by which similar nonrandom sequences can be statistically generated again and again under selective pressure at any appropriate time. Hence, the nonrandom sequences, e.g., tRNAs and aaRSs, were able to emerge more efficiently than any mechanism to choose informative molecules from random sequences. Such an HfC-like apparatus based on sequence evolution of triplex DNAs had vanished after the establishment of the genetic code system, whose relic may have remained in the triplex base pairs in tRNAs at present.

3.2. Explanation of Two Classes of aaRSs According to Coevolution of tRNAs with aaRSs

Then, I explain the coevolution of tRNAs with aaRSs (Figure 5, Figure 6 andFigure 7), by which the two classes of aaRSs [31] and the anti-codons and para-codons of tRNAs have been explained in detail. A comprehensive study of the evolution of the genetic code inevitably involves the intricate evolutionary relationships between tRNAs and aaRSs. The evolution of triple-helical nucleic acids D · D D and D · D R (D for DNA, R for RNA) [10] created conditions for coevolution of tRNAs and aaRSs along the roadmap. The third RNA strand R and its complementary strand can carry codons and anti-codons in sequence evolution along the roadmap, which, hence, accounts for that the tRNAs can be assembled by pairs of these complementary RNAs [66] whose anti-codons evolved along the roadmap (Figure 5a,b and Figure 6a). Meanwhile, genes of aaRSs also evolved along the roadmap, which were homologous to the complementary [67,68] templates of major or minor groove sides of tRNAs. The recognition of a tRNA by certain aaRS came from the combining ability between the aaRS and its gene that is homologous to the corresponding side of the tRNA. Hence, the recognition of tRNAs by aaRSs kept pace with the evolution of the genetic code along the roadmap. The tRNAs were relatively easy to be assembled, so there existed numerous candidate tRNAs. Only tRNAs that were recognised by aaRSs had been recruited into the living system. The genes of aaRSs are scarce, whose enzyme activity came from a common ancestor. The genes of the two classes of aaRSs evolved alternatively in two complementary strands. Palindrome enabled recognition of tRNA via choosing its appropriate side by the corresponding aaRS.
The intricate relationships between tRNAs and aaRSs along the roadmap has been explained, which agrees with both the anti-codons of tRNAs and the two classes of aaRSs in observations (Figure 5). The evolution of aaRSs along the roadmap in the triplex picture helps to clarify the traditional classifications of aaRSs in the literature. The major/minor groove classification of aaRSs [31] can be accounted for by the complementary strands of the template RNAs of aaRSs, and the A/B/C sub-classification of aaRSs [69] relates to the impact from biosynthetic families of amino acids. In most cases, the aaRSs combine with the cognate tRNAs from the same side, whose classes are fixed. As a special case, a a R S 10 ( A r g R S ) combines with the 5 y t r t 3 type tRNAs t 10 , t 10 and the 5 R t Y t 3 type tRNA t 10 + from the minor groove side, while combine with the 5 y t r t 3 type tRNAs t 10 and t 10 from the major groove side, where the para-codons need to be palindromic. In the evolution from a a R S 1 ( G l y R S ) to a a R S 2 ( A l a R S ) , for instance, a a R S 2 ( A l a R S ) recognised 2 A l a from major groove side of t 2 , whose class follows the former a a R S 1 ( G l y R S ) to recognise 1 G l y from major groove side of t 1 . In addition, in the consequent evolution from a a R S 2 ( A l a R S ) to a a R S 3 ( G l u R S ) , a a R S 3 ( G l u R S ) recognised 3 G l u yet from minor groove side of t 3 due to the palindromic para-codons. The biosynthetic families played significant roles in the evolution of aaRSs when both anti-codon and palindromic or non-palindromic para-codon evolved. When aaRSs involved in the same biosynthetic families, the new aaRSs tended to recruit amino acids in same biosynthetic family with similar chemical properties. Thus, the observed recognition of tRNAs from major or minor groove sides by aaRSs have been explained for respective amino acids in detail (Figure 7). The aaRS pair supposed to combine both sides of tRNA simultaneously [4,62] should be amended as new aaRS pair that combined one side of a tRNA and evolved to the other side. The pairs I l e R S - T h r R S and T y r R S - P h e R S appear both in the above literature and here. However, the pair G l u R S - T h r R S in the above literature should be changed to G l n R S - T h r R S . In addition, the pair S e r R S - C y s R S appeared here was missing in the above literature.

3.3. Explanation of the Codon Degeneracy on the Genetic Code Chart

As the main result, the codon degeneracy should be explained based on the roadmap for the evolution of the genetic code (Figure 1) and the coevolution of tRNAs with aaRSs (Figure 5 and Figure 7). The intricate codon degeneracies are just the relics of learning process for the recognition of tRNAs by aaRSs. The pair connections and route dualities on the roadmap result from the evolution of tRNAs and the recognition of tRNAs by aaRSs (Figure 5). Especially, homologous aaRSs often evolved within the biosynthetic families of amino acids by combining either the same side or the opposite side of tRNAs (Figure 7). The 4 × 4 codon boxes in the standard genetic code table came from the 8 route dualities and the 8 quasi route dualities (Figure 1).
The degeneracies 6, 4, 3, 2, or 1 for the 20 amino acids have been explained, respectively, according to the corresponding pair connections and route dualities (Figure 1, Figure 5 and Figure 7). The large degeneracy 4 or 6 mainly results from the expansion of codons for the amino acids recruited in the initiation stage: S e r , L e u , A l a , V a l , P r o , and T h r . The degeneracy 6 for S e r , L e u , and A r g is due to the following route dualities and pair connections, respectively: # 3 S e r # 14 # 13 S e r # 15   and   # 8 S e r # 17 ,   # 20 L e u # 25 # 4 L e u # 27   and   # 28 L e u # 31 ,   # 10 A r g # 13 # 7 A r g # 16   and   # 11 A r g # 14 . The degeneracy 4 for G l y , A l a , V a l , P r o and T h r is due to the following route dualities, respectively: # 1 G l y # 3 # 2 G l y # 6 , # 2 A l a # 8 # 7 A l a # 9 , # 5 V a l # 26 # 19 V a l # 24 , # 1 P r o # 11 # 10 P r o # 12 , # 6 T h r # 17 # 16 T h r # 18 . In addition, the degeneracy 2 for G l u , A s p , C y s , H i s , G l n , P h e , T y r , A s n , and L y s is due to the following pair connections, respectively: # 4 G l u # 23 , # 5 A s p # 21 , # 9 C y s # 18 , # 19 H i s # 22 , # 20 G l n # 28 , # 23 P h e # 32 , # 24 T y r # 29 , # 26 A s n # 30 , # 27 L y s # 32 . The degeneracies 3 for I l e and 1 for M e t are due to the route duality # 21 I l e # 30 # 22 M e t / I l e # 29 . The degeneracy 1 for T r p satisfies the pair connection with non-standard genetic code # 12 T r p / s t o p ( T r p ) # 15 . This pair connection includes a stop codon; the other stop codons satisfy the pair connection: # 25 s t o p # 31 . Incidentally, the route dualities for non-standard codons are also due to recognition of non-standard tRNAs by the corresponding aaRSs: # 8 S e r # 17 # 11 ( S e r ) # 14 , # 4 ( T h r ) # 27 # 20 ( T h r ) # 25 , # 11 ( s t o p ) # 14 # 15 s t o p # 31 .

4. Conclusions

In the present prebiotic picture with selective pressure, both the codon degeneracy and the major/minor groove classification of aaRSs have been explained together within the scope of literature.

Supplementary Materials

The following are available online at https://www.mdpi.com/article/10.3390/genes12122023/s1, Movie S1: Substitutions of triplex base pairs along the roadmap.

Funding

Supported by the Fundamental Research Funds for the Central Universities (xjtu08143058).

Acknowledgments

My warm thanks go to Jinyi Li for valuable discussions. The author thanks the anonymous reviewers for their helpful comments.

Conflicts of Interest

The author declares no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

References

  1. Wong, J.T. A coevolution theory of the genetic code. Proc. Natl. Acad. Sci. USA 1975, 72, 1909–1912. [Google Scholar] [CrossRef] [Green Version]
  2. Wong, J.T.; Lazcano, A. Prebiotic Evolution and Astrobiology; Landes Bioscience: Austin, TX, USA, 2009. [Google Scholar]
  3. De Pouplana, L.R. (Ed.) The Genetic Code and the Origin of Life; Kluwer Academic: New York, NY, USA, 2004. [Google Scholar]
  4. De Pouplana, L.R.; Schimmel, P. Aminoacyl-tRNA synthetases: Potential markers of genetic code development. Trends Biochem. Sci. 2001, 26, 591–596. [Google Scholar] [CrossRef]
  5. Woese, C.R.; Kandler, O.; Wheelis, M.L. Towards a natural system of organisms: Proposal for the domains Archaea, Bacteria, and Eucarya. Proc. Natl. Acad. Sci. USA 1990, 87, 4576–4579. [Google Scholar] [CrossRef] [Green Version]
  6. Gibson, D.G.; Glass, J.I.; Lartigue, C.; Noskov, V.N.; Chuang, R.Y.; Algire, M.A.; Benders, G.A.; Montague, M.G.; Ma, L.; Moodie, M.M.; et al. Creation of a bacterial cell controlled by a chemically synthesized genome. Science 2010, 329, 52–56. [Google Scholar] [CrossRef] [Green Version]
  7. Wong, J.T. Coevolution theory of the genetic code at age thirty. BioEssays 2005, 27, 416–425. [Google Scholar] [CrossRef]
  8. Trifonov, E.N.; Gabdank, I.; Barash, D.; Sobolevsky, Y. Primordia vita. deconvolution from modern sequences. Orig. Life Evol. Biosph. 2006, 36, 559–565. [Google Scholar] [CrossRef]
  9. Reznick, J.S. Embracing the future as stewards of the past: Charting a course forward for historical medical libraries and archives. RBM 2014, 15, 111–123. [Google Scholar] [CrossRef] [Green Version]
  10. Soyfer, V.N.; Potaman, V.N. Triple-Helical Nucleic Acids; Springer: New York, NY, USA, 1996. [Google Scholar]
  11. Belotserkovskii, B.P.; Veselkov, A.G.; Filippov, S.A.; Dobrynin, V.N.; Mirkin, S.M.; Frank-Kamenetskii, M.D. Formation of intramolecular triplex in homopurine-homopyrimidine mirror repeats with point substitutions. Nucleic Acids Res. 1990, 18, 6621–6624. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  12. Sklenář, V.; Felgon, J. Formation of a stable triplex from a single DNA strand. Nature 1990, 345, 836–838. [Google Scholar] [CrossRef] [PubMed]
  13. Frank-Kamenetskii, M.D. Triplex DNA structrutures. Annu. Rev. Biochem. 1995, 64, 65–95. [Google Scholar] [CrossRef] [PubMed]
  14. Robertus, J.D.; Ladner, J.E.; Finch, J.T.; Rhodes, D.; Brown, R.S.; Clark, B.F.C.; Klug, A. Structure of yeast phenylalanine tRNA at 3Å resolution. Nature 1974, 250, 546–551. [Google Scholar] [CrossRef] [PubMed]
  15. Oro, J. Mechanism of synthesis of adenine from hydrogen cyanide under possible primitive Earth conditions. Nature 1961, 191, 1193–1194. [Google Scholar] [CrossRef] [PubMed]
  16. Orgel, L.E. Prebiotic chemistry and the origin of the RNA world. Crit. Rev. Biochem. Mol. Biol. 2006, 39, 99–123. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  17. Miyakawa, S.; Murasawa, K.; Kobayashi, K.; Sawaoka, A.B. Abiotic synthesis of guanine with high temperature plasma. Orig. Life Evol. Biosph. 2000, 30, 557–566. [Google Scholar] [CrossRef] [PubMed]
  18. Ferris, J.P.; Sanchez, R.A.; Orgel, L.E. Studies in prebiotic synthesis. 3. Synthesis of pyrimidines from cyanoacetylene and cyanate. J. Mol. Biol. 1968, 33, 693–704. [Google Scholar] [CrossRef]
  19. Sanchez, R.; Ferris, J.P.; Orgel, L.E. Cyanoacetylene in prebiotic synthesis. Science 1966, 154, 784–785. [Google Scholar] [CrossRef] [PubMed]
  20. Xu, J.; Tsanakopoulou, M.; Magnani, C.J.; Szabla, R.; Šponer, J.E.; Šponer, J.; Góra, R.W.; Sutherland, J.D. A prebiotically plausible synthesis of pyrimidine β-ribonucleosides and their phosphate derivatives involving photoanomerization. Nat. Chem. 2017, 9, 303–309. [Google Scholar] [CrossRef] [Green Version]
  21. Li, L.; Prywes, N.; Tam, C.P.; O’flaherty, D.K.; Lelyveld, V.S.; Izgu, E.C.; Pal, A.; Szostak, J.W. Enhanced nonenzymatic RNA copying with 2-aminoimidazole activated nucleotides. J. Am. Chem. Soc. 2017, 139, 1810–1813. [Google Scholar] [CrossRef]
  22. Becker, S.; Feldmann, J.; Wiedemann, S.; Okamura, H.; Schneider, C.; Iwan, K.; Crisp, A.; Rossa, M.; Amatov, T.; Carell, T. Unified prebiotically plausible synthesis of pyrimidine and purine RNA ribonucleotides. Science 2019, 366, 76–82. [Google Scholar] [CrossRef] [Green Version]
  23. Powner, M.W.; Zheng, S.; Szostak, J.W. Multicomponent assembly of proposed DNA precursors in water. J. Am. Chem. Soc. 2012, 134, 13889–13895. [Google Scholar] [CrossRef]
  24. Trevinoa, S.G.; Zhanga, N.; Elenkoa, M.P.; Luptákb, A.; Szostak, J.W. Evolution of functional nucleic acids in the presence of nonheritable backbone heterogeneity. Proc. Natl. Acad. Sci. USA 2011, 108, 13492–13497. [Google Scholar] [CrossRef] [Green Version]
  25. Bhowmik, S.; Krishnamurthy, R. The role of sugar-backbone heterogeneity and chimeras in the simultaneous emergence of RNA and DNA. Nat. Chem. 2019, 11, 1009–1018. [Google Scholar] [CrossRef] [PubMed]
  26. Xu, J.; Green, N.J.; Gibard, C.; Krishnamurthy, R.; Sutherl, J.D. Prebiotic phosphorylation of 2-thiouridine provides either nucleotides or DNA building blocks via photoreduction. Nat. Chem. 2019, 11, 457–462. [Google Scholar] [CrossRef] [PubMed]
  27. Xu, J.; Chmela, V.; Green, N.J.; Russell, D.A.; Janicki, M.J.; Góra, R.W.; Szabla, R.; Bond, A.D.; Sutherland, J.D. Selective prebiotic formation of RNA pyrimidine and DNA purine nucleosides. Nature 2020, 582, 60–66. [Google Scholar] [CrossRef] [PubMed]
  28. Wong, J.T.; Ng, S.; Mat, W.; Hu, T.; Xue, H. Coevolution theory of the genetic code at age forty: Pathway to translation and synthetic life. Life 2016, 6, 12. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  29. Nirenberg, M.W.; Matthaei, J.H. The dependence of cell-free protein synthesis in E. coli upon naturally occurring or synthetic polyribonucleotides. Proc. Natl. Acad. Sci. USA 1961, 47, 1588–1602. [Google Scholar] [CrossRef] [Green Version]
  30. Crick, F.H.C. Codon-anticodon pairing: The wobble hypothesis. J. Mol. Biol. 1966, 19, 548–555. [Google Scholar] [CrossRef]
  31. Eriani, G.; Delarue, M.; Poch, O.; Gangloff, J.; Moras, D. Partition of tRNA synthetases into two classes based on mutually exclusive sets of sequence motifs. Nature 1990, 347, 203–206. [Google Scholar] [CrossRef] [PubMed]
  32. Miller, S.L.; Urey, H.C. Organic compound synthes on the primitive earth. Science 1959, 130, 245–251. [Google Scholar] [CrossRef] [PubMed]
  33. Engel, M.H.; Macko, S.A. Isotopic evidence for extraterrestrial non-racemic amino acids in the Murchison meteorite. Nature 1997, 389, 265–268. [Google Scholar] [CrossRef]
  34. Wong, J.T. Coevolution of the genetic code and amino acid biosynthesis. Trends Biochem. Sci. 1981, 6, 33–36. [Google Scholar] [CrossRef]
  35. Kobayashi, K.; Kaneko, T.; Saito, T.; Oshima, T. Amino acid formation in gas mixtures by high energy particle irradiation. Orig. Life Evol. Biosph. 1998, 28, 155–165. [Google Scholar] [CrossRef]
  36. Li, D.J.; Zhang, S. Genetic code evolution as an initial driving force for molecular evolution. Phys. A 2009, 388, 3809–3825. [Google Scholar] [CrossRef] [Green Version]
  37. Muto, A.; Osawa, S. The guanine and cytosine content of genomic DNA and bacterial evolution. Proc. Natl. Acad. Sci. USA 1987, 84, 166–169. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  38. Woese, C.R.; Dugre, D.H.; Dugre, S.A.; Kondo, M.; Saxinger, W.C. On the fundamental nature and evolution of the genetic code. Cold Spring Harbour. Symp. Quant. Biol. 1966, 31, 723–736. [Google Scholar] [CrossRef]
  39. Crick, F.H.C. The origin of the genetic code. J. Mol. Biol. 1968, 38, 367–379. [Google Scholar] [CrossRef]
  40. Yarus, M. A specific amino acid binding site composed of RNA. Science 1988, 240, 1751–1758. [Google Scholar] [CrossRef]
  41. Di Giulio, M. The Extension Reached by the Minimization of the Polarity Distances during the Evolution of the Genetic Code. J. Mol. Evol. 1989, 29, 288–293. [Google Scholar] [CrossRef]
  42. Di Giulio, M. Some Aspects of the Organization and Evolution of the Genetic Code. J. Mol. Evol. 1989, 29, 191–201. [Google Scholar] [CrossRef]
  43. Osawa, S.; Jukes, T.H. Codon Reassignment (Codon Capture) in Evolution. J. Mol. Evol. 1989, 28, 271–278. [Google Scholar] [CrossRef] [PubMed]
  44. Root-Bernstein, R. Simultaneous origin of homochirality, the genetic code and its directionality. Bioessays 2007, 29, 689–698. [Google Scholar] [CrossRef]
  45. Rodin, A.S.; Szathmáry, E.; Rodin, S.N. One ancestor for two codes viewed from the perspective of two complementary modes of tRNA aminoacylation. Biol. Direct 2009, 4, 4. [Google Scholar] [CrossRef] [Green Version]
  46. Knight, R.D.; Freel, S.J.; Landweber, L.F. Rewiring the keyboard: Evolvability of the genetic code. Nat. Rev. Genet. 2001, 2, 49–58. [Google Scholar] [CrossRef] [PubMed]
  47. Sengupta, S.; Higgs, P.G. Pathways of Genetic Code Evolution in Ancient and Modern Organisms. J. Mol. Evol. 2015, 80, 229–243. [Google Scholar] [CrossRef] [PubMed]
  48. Sengupta, S.; Yang, X.; Higgs, P.G. The Mechanisms of Codon Reassignments in Mitochondrial Genetic Codes. J. Mol. Evol. 2007, 64, 662–688. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  49. Escudé, C.; Francçois, J.C.; Sun, J.S.; Ott, G.; Sprinzl, M.; Garestier, T.; Heélene, J.C. Stability of triple helices containing RNA and DNA strands: Experimental and molecular modeling studies. Nucleic Acids Res. 1993, 21, 5547–5553. [Google Scholar] [CrossRef]
  50. Han, H.; Dervan, P.B. Sequence-specific recognition of double helical RNA and RNA·DNA by triple helix formation. Proc. Natl. Acad. Sci. USA 1993, 90, 3806–3810. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  51. Wang, S.; Kool, E.T. Relative stabilities of triple helices composed of combinations of DNA, RNA and 2’-O-methyl-RNA backbones: Chimeric circular oligonucleotides as probes. Nucleic Acids Res. 1995, 23, 1157–1164. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  52. Altwegg, K.; Balsiger, H.; Bar-Nun, A.; Berthelier, J.J.; Bieler, A.; Bochsler, P.; Briois, C.; Calmonte, U.; Combi, M.R.; Cottin, H.; et al. Prebiotic chemicals–amino acid and phosphorus–in the coma of comet 67P/Churyumov-Gerasimenko. Sci. Adv. 2016, 2, e1600285. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  53. Li, D.J. Concurrent origins of the genetic code and the homochirality of life, and the origin and evolution of biodiversity Part I: Observations and explanations. bioRxiv 2015. [Google Scholar] [CrossRef] [Green Version]
  54. Roberts, R.W.; Crothers, D.M. Stability and properties of double and triple helices: Dramatic effects of RNA or DNA backbone composition. Science 1992, 258, 1463–1466. [Google Scholar] [CrossRef]
  55. Di Giulio, M. On the origin of the transfer RNA molecule. J. Theor. Biol. 1992, 159, 199–214. [Google Scholar] [CrossRef]
  56. Di Giulio, M. Was it an ancient gene codifying for a hairpin RNA that, by means of direct duplication, gave rise to the primitive tRNA molecule? J. Theor. Biol. 1995, 177, 95–101. [Google Scholar] [CrossRef]
  57. Di Giulio, M. The nonmonophyletic origin of tRNA molecule. J. Theor. Biol. 1999, 197, 403–414. [Google Scholar] [CrossRef]
  58. Di Giulio, M. The origin of the tRNA molecule: Implications for the origin of protein synthesis. J. Theor. Biol. 2004, 226, 89–93. [Google Scholar] [CrossRef]
  59. Di Giulio, M. Nanoarchaeum equitans is a living fossil. J. Theor. Biol. 2006, 242, 257–260. [Google Scholar] [CrossRef] [PubMed]
  60. McCarthy, B.J.; Holl, J.J. Denatured DNA as a Direct Template for in vitro Protein Synthesis. Proc. Natl. Acad. Sci. USA 1965, 54, 880–886. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  61. Uzawa, T.; Yamagishi, A.; Oshima, T. Polypeptide Synthesis Directed by DNA as a Messenger in Cell-Free Polypeptide Synthesis by Extreme Thermophiles, Thermus thermophilus HB27 and Sulfolobus tokodaii Strain 7. J. Biochem. 2002, 131, 849–853. [Google Scholar] [CrossRef]
  62. Schimmel, P. Development of tRNA synthetases and connection to genetic code and disease. Protein Sci. 2008, 17, 1643–1652. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  63. Trifonov, E.N. Consensus temporal order of amino acids and evolution of the triplet code. Gene 2000, 261, 139–151. [Google Scholar] [CrossRef]
  64. Trifonov, E.N. The triplet code from first principles. J. Biomol. Struct. Dyn. 2004, 22, 1. [Google Scholar] [CrossRef]
  65. Trifonov, E.N.; Kirzhner, A.; Kirzhner, V.M.; Berezovsky, I.N. Distinc stage of protein evolution as suggested by protein sequence analysis. J. Mol. Evol. 2001, 53, 394–401. [Google Scholar] [CrossRef] [PubMed]
  66. Widmann, J.; Di Giulio, M.; Yarus, M.; Knight, R. tRNA creation by hairpin duplication. J. Mol. Evol. 2005, 61, 524–530. [Google Scholar] [CrossRef] [PubMed]
  67. Rodin, S.N.; SOhno, S. Two types of aminoacyl-trna synthetases could be originally encoded by complementary strands of the same nucleic acid. Orig. Life Evol. Biosph. 1995, 25, 565–589. [Google Scholar] [CrossRef] [PubMed]
  68. Martinez-Rodriguez, L.; Erdogan, O.; Jimenez-Rodriguez, M.; Gonzalez-Rivera, K.; Williams, T.; Li, L.; Weinreb, V.; Collier, M.; Chandrasekaran, S.N.; Ambroggio, X.; et al. Functional class I and II amino acid-activating enzymes can be coded by opposite strands of the same gene. J. Biol. Chem. 2015, 290, 19710–19725. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  69. Gestel, R.F. (Ed.) The RNA World, 3rd ed.; Cold Spring Harbor Laboratory: New York, NY, USA, 2006. [Google Scholar]
Figure 1. The origin of the genetic code. (a) The roadmap for the evolution of the genetic code. The 64 codons formed from base substitutions in triplex DNAs are in red. Only three-base-length segments of the triplex DNAs are shown explicitly; the whole length right-handed triplex DNAs are indicated in Figure 1b. In each position # n ( n = 1 , 2 , , 32 ), the # n codon pair on R n , and Y n is in red. The relative stabilities of the triplex base pairs (−, +, ++, 4+) are written to the right of the base triplexes, where the increased relative stabilities of triplex base pairs in base substitutions are indicated in green. Each triplex DNA is denoted by three arrows, whose directions are from 5’ to 3’. The Y R R triplex DNAs are in pink, and the Y R Y triplex DNAs in azure. The recruitment order of codon pairs are from # 1 to # 32 , and the recruitment order of the 20 amino acids are to the left of them, respectively. Non-standard genetic codes are indicated by brackets beside the corresponding amino acids. The R o u t e 0 3 and H i e r a r c h y 1 4 are indicated to the right of and below the roadmap, respectively. The evolution of the genetic code are denoted by black arrows, beside which pair connections are indicated by the corresponding amino acids. Refer to an example in Figure 1b to understand details of the roadmap; refer to Figure 2 to understand the critical role of relative stabilities of triplex base pairs in achieving the real genetic code; refer to Figure 5a,b to see the origin of tRNAs; refer to Figure 3a to see the coherent relationship between the recruitment orders of codons and amino acids; refer to Figure 3b to see the codon degeneracy in the symmetric roadmap. (b) A detailed description of the roadmap (see Supplementary Movie S1). Taking, for example, from # 1 to # 29 , the evolution of the genetic code from # 1 , to # 7 , to # 19 , to # 24 , and, at last, to # 29 are explained in detail in the upper boxes, and the corresponding right-handed single-stranded, double-stranded, and triple-stranded DNAs are shown in the lower boxes, respectively.
Figure 1. The origin of the genetic code. (a) The roadmap for the evolution of the genetic code. The 64 codons formed from base substitutions in triplex DNAs are in red. Only three-base-length segments of the triplex DNAs are shown explicitly; the whole length right-handed triplex DNAs are indicated in Figure 1b. In each position # n ( n = 1 , 2 , , 32 ), the # n codon pair on R n , and Y n is in red. The relative stabilities of the triplex base pairs (−, +, ++, 4+) are written to the right of the base triplexes, where the increased relative stabilities of triplex base pairs in base substitutions are indicated in green. Each triplex DNA is denoted by three arrows, whose directions are from 5’ to 3’. The Y R R triplex DNAs are in pink, and the Y R Y triplex DNAs in azure. The recruitment order of codon pairs are from # 1 to # 32 , and the recruitment order of the 20 amino acids are to the left of them, respectively. Non-standard genetic codes are indicated by brackets beside the corresponding amino acids. The R o u t e 0 3 and H i e r a r c h y 1 4 are indicated to the right of and below the roadmap, respectively. The evolution of the genetic code are denoted by black arrows, beside which pair connections are indicated by the corresponding amino acids. Refer to an example in Figure 1b to understand details of the roadmap; refer to Figure 2 to understand the critical role of relative stabilities of triplex base pairs in achieving the real genetic code; refer to Figure 5a,b to see the origin of tRNAs; refer to Figure 3a to see the coherent relationship between the recruitment orders of codons and amino acids; refer to Figure 3b to see the codon degeneracy in the symmetric roadmap. (b) A detailed description of the roadmap (see Supplementary Movie S1). Taking, for example, from # 1 to # 29 , the evolution of the genetic code from # 1 , to # 7 , to # 19 , to # 24 , and, at last, to # 29 are explained in detail in the upper boxes, and the corresponding right-handed single-stranded, double-stranded, and triple-stranded DNAs are shown in the lower boxes, respectively.
Genes 12 02023 g001aGenes 12 02023 g001b
Figure 2. The driving force in the evolution of the genetic code based on the relative stabilities of triplex base pairs. The base substitutions on the roadmap occur when the relative stabilities of triplex base pairs increase. The roadmap is the best result to avoid the unstable triplex base pairs. So, the universal genetic code is a narrow choice by the relative stabilities of triplex base pairs. The relative stability increases from (+) of the triplex base pair C G G to (4+) of the triplex base pair C G C at # 2 , # 7 , and # 10 that initiates R o u t e 1 3 , respectively. G C C (+) changes to G C T (++) at # 6 , # 19 , and # 12 , and C G G (+) changes to C G A (++) at other positions on the roadmap.
Figure 2. The driving force in the evolution of the genetic code based on the relative stabilities of triplex base pairs. The base substitutions on the roadmap occur when the relative stabilities of triplex base pairs increase. The roadmap is the best result to avoid the unstable triplex base pairs. So, the universal genetic code is a narrow choice by the relative stabilities of triplex base pairs. The relative stability increases from (+) of the triplex base pair C G G to (4+) of the triplex base pair C G C at # 2 , # 7 , and # 10 that initiates R o u t e 1 3 , respectively. G C C (+) changes to G C T (++) at # 6 , # 19 , and # 12 , and C G G (+) changes to C G A (++) at other positions on the roadmap.
Genes 12 02023 g002
Figure 3. (a) Cooperative recruitment of codons and amino acids. Codon pairs are plotted from left to right according to their recruitment order. The initial subset plays a crucial role in the expansion of the genetic code along the roadmap. The 6 biosynthetic families of the amino acid are distinguished by different colours. (b) The cubic roadmap. This is a revised roadmap Figure 1a to indicate the symmetry in the evolution of the genetic code. The four routes are represented by four cubes, respectively. Pair connections are marked besides the evolutionary arrows. Route dualities are indicated by same colours for the corresponding pair connections.
Figure 3. (a) Cooperative recruitment of codons and amino acids. Codon pairs are plotted from left to right according to their recruitment order. The initial subset plays a crucial role in the expansion of the genetic code along the roadmap. The 6 biosynthetic families of the amino acid are distinguished by different colours. (b) The cubic roadmap. This is a revised roadmap Figure 1a to indicate the symmetry in the evolution of the genetic code. The four routes are represented by four cubes, respectively. Pair connections are marked besides the evolutionary arrows. Route dualities are indicated by same colours for the corresponding pair connections.
Genes 12 02023 g003
Figure 4. (a) The distribution of codons from R- and Y-strands of R o u t e 0 3 in the G C A U genetic code table. The pattern of the 4 × 4 codon boxes for the degenerate codons relates to such a distribution of the four routes, owing to the evolution of the genetic code along the roadmap. (b) The G C A U genetic code table. The clusterings of biosynthetic families (Glu, Asp, Val, Ser, Phe) in the G C A U genetic code table. Such nice clusterings are correspondingly observed in the R- and Y-strands of R o u t e 0 3 in Figure 3b (denoted in the same group of colour as in the present figure). The clusterings of biosynthetic families in the present figure are closely related to the distribution of codons from R- and Y-strands of R o u t e 0 3 , owing to the recruitment of amino acids along the roadmap. Generally speaking, the amino acids are arranged properly in the recruitment order from N o . 1 to N o . 20 along the direction from G, C to A, U in the G C A U genetic code table. (c) The distribution of types of aaRSs in the G C A U genetic code table. The aaRSs can be divided into C l a s s I I and C l a s s I , which can be divided into subclasses I I A , I I B , I I C and I A , I B , I C , respectively. The aaRSs can also be divided into minor groove ones (m) and major groove ones (M).
Figure 4. (a) The distribution of codons from R- and Y-strands of R o u t e 0 3 in the G C A U genetic code table. The pattern of the 4 × 4 codon boxes for the degenerate codons relates to such a distribution of the four routes, owing to the evolution of the genetic code along the roadmap. (b) The G C A U genetic code table. The clusterings of biosynthetic families (Glu, Asp, Val, Ser, Phe) in the G C A U genetic code table. Such nice clusterings are correspondingly observed in the R- and Y-strands of R o u t e 0 3 in Figure 3b (denoted in the same group of colour as in the present figure). The clusterings of biosynthetic families in the present figure are closely related to the distribution of codons from R- and Y-strands of R o u t e 0 3 , owing to the recruitment of amino acids along the roadmap. Generally speaking, the amino acids are arranged properly in the recruitment order from N o . 1 to N o . 20 along the direction from G, C to A, U in the G C A U genetic code table. (c) The distribution of types of aaRSs in the G C A U genetic code table. The aaRSs can be divided into C l a s s I I and C l a s s I , which can be divided into subclasses I I A , I I B , I I C and I A , I B , I C , respectively. The aaRSs can also be divided into minor groove ones (m) and major groove ones (M).
Genes 12 02023 g004
Figure 5. The origin and evolution of tRNAs along the roadmap. (a) The evolution of the 5 y t r t 3 type tRNAs by the triplex base pairings y r y t and y r r t . (b) The evolution of the 5 R t Y t 3 type tRNAs by the triplex base pairings y r R , y r Y and Y R Y t and Y R R t . The node numbers # n on the roadmap may exchange within or between routes because the sequences of Y and R are reverse to the sequences of y and r, respectively. (c) The coevolution of tRNAs with aaRSs along the roadmap, which determines the pair connections and route dualities. The aaRSs a a R S 1 to a a R S 20 combine, respectively, with the tRNAs t 1 to t 20 from certain major/minor groove side. The complementary relationship between the pyrimidine y t strand of the 5 y t r t 3 type tRNAs and the purine R t strand of the 5 R t Y t 3 type tRNAs agrees with the complementary relationship between G and C for the second bases of the consensus genes of tRNAs, especially for the early tRNAs in R o u t e 0 and in H i e r a r c h y 1 .
Figure 5. The origin and evolution of tRNAs along the roadmap. (a) The evolution of the 5 y t r t 3 type tRNAs by the triplex base pairings y r y t and y r r t . (b) The evolution of the 5 R t Y t 3 type tRNAs by the triplex base pairings y r R , y r Y and Y R Y t and Y R R t . The node numbers # n on the roadmap may exchange within or between routes because the sequences of Y and R are reverse to the sequences of y and r, respectively. (c) The coevolution of tRNAs with aaRSs along the roadmap, which determines the pair connections and route dualities. The aaRSs a a R S 1 to a a R S 20 combine, respectively, with the tRNAs t 1 to t 20 from certain major/minor groove side. The complementary relationship between the pyrimidine y t strand of the 5 y t r t 3 type tRNAs and the purine R t strand of the 5 R t Y t 3 type tRNAs agrees with the complementary relationship between G and C for the second bases of the consensus genes of tRNAs, especially for the early tRNAs in R o u t e 0 and in H i e r a r c h y 1 .
Genes 12 02023 g005aGenes 12 02023 g005bGenes 12 02023 g005c
Figure 6. (a) The assembly of tRNAs. The tRNAs t 1 - t 20 with anti-codons (Figure 5c) are listed here to carry the amino acids from N o . 1 to N o . 20 , respectively. The two complimentary single-stranded RNAs for each tRNA join together and fold into a cloverleaf shape by taking advantage of the complementarity between the two strands. The joining position of the two strands is near to the 3 side of the anti-codon loop, which agrees with the position of introns in tRNA genes in observations. The anti-codons situate in the 3 -ends of the y t strand or R t strand. The palindromic sequences tend to form loops of the tRNAs. The para-codon of tRNA are non-palindromic or palindromic which adapt to the aaRSs (Figure 7 and Figure 8). (b) The cognate tRNAs. Explanation of the number of canonical amino acids as 20 based on the relationship between the types of cognate tRNAs and the 20 types of base combinations. The primer tRNAs generally appeared earlier than the derivative tRNAs. The primer tRNAs generally distribute along the diagonal line due to the chronological arrangements for both the 20 amino acids and the 20 base combinations, considering the substitution order G, C, A, U along the roadmap. The codon degeneracies 6, 4, 3, 2, and 1 are due to the tRNA evolution from t n to t n + and t n , as well as from t n to t n , etc., all of which can be recognised by the corresponding a a R S n .
Figure 6. (a) The assembly of tRNAs. The tRNAs t 1 - t 20 with anti-codons (Figure 5c) are listed here to carry the amino acids from N o . 1 to N o . 20 , respectively. The two complimentary single-stranded RNAs for each tRNA join together and fold into a cloverleaf shape by taking advantage of the complementarity between the two strands. The joining position of the two strands is near to the 3 side of the anti-codon loop, which agrees with the position of introns in tRNA genes in observations. The anti-codons situate in the 3 -ends of the y t strand or R t strand. The palindromic sequences tend to form loops of the tRNAs. The para-codon of tRNA are non-palindromic or palindromic which adapt to the aaRSs (Figure 7 and Figure 8). (b) The cognate tRNAs. Explanation of the number of canonical amino acids as 20 based on the relationship between the types of cognate tRNAs and the 20 types of base combinations. The primer tRNAs generally appeared earlier than the derivative tRNAs. The primer tRNAs generally distribute along the diagonal line due to the chronological arrangements for both the 20 amino acids and the 20 base combinations, considering the substitution order G, C, A, U along the roadmap. The codon degeneracies 6, 4, 3, 2, and 1 are due to the tRNA evolution from t n to t n + and t n , as well as from t n to t n , etc., all of which can be recognised by the corresponding a a R S n .
Genes 12 02023 g006aGenes 12 02023 g006b
Figure 7. The coevolution of tRNAs with aaRSs. The coevolution of the four classes of aaRSs and the corresponding two types of tRNAs in accordance with the biosynthetic families indicated in certain colours. The ancestor of aaRS, namely a a R S 1 , corresponding to the non-chiral amino acid 1 G l y , belongs to the r t M class. The codon degeneracy are due to the coevolution of tRNAs with aaRSs, where the surplus tRNAs were chosen by the rare aaRSs. There are some truths in the traditional classifications of aaRSs, but the evolutionary relationships of aaRSs are so intricate, as shown here. The start and stop codons generally appear in the positions corresponding to y t m class. The non-standard codons also evolved as alternative choices of tRNAs by aaRSs.
Figure 7. The coevolution of tRNAs with aaRSs. The coevolution of the four classes of aaRSs and the corresponding two types of tRNAs in accordance with the biosynthetic families indicated in certain colours. The ancestor of aaRS, namely a a R S 1 , corresponding to the non-chiral amino acid 1 G l y , belongs to the r t M class. The codon degeneracy are due to the coevolution of tRNAs with aaRSs, where the surplus tRNAs were chosen by the rare aaRSs. There are some truths in the traditional classifications of aaRSs, but the evolutionary relationships of aaRSs are so intricate, as shown here. The start and stop codons generally appear in the positions corresponding to y t m class. The non-standard codons also evolved as alternative choices of tRNAs by aaRSs.
Genes 12 02023 g007
Figure 8. The origin and evolution of four classes of early aaRSs in the junior stage of the primordial translation mechanism in absent of tRNA and ribosome. The first aaRS can be produced through the non-random evolution of the triplex DNA and the corresponding RNAs. At the beginning of the translation mechanism, DNAs are the carrier of information, and RNAs develop the functions of life.
Figure 8. The origin and evolution of four classes of early aaRSs in the junior stage of the primordial translation mechanism in absent of tRNA and ribosome. The first aaRS can be produced through the non-random evolution of the triplex DNA and the corresponding RNAs. At the beginning of the translation mechanism, DNAs are the carrier of information, and RNAs develop the functions of life.
Genes 12 02023 g008
Figure 9. The recruitment orders of amino acids and codon pairs on the roadmap are supported by the variation of the amino acid frequencies. The 20 amino acids are arranged in the recruitment order on the roadmap Figure 1a. The 20 amino acid frequencies for each of the 803 species are obtained, respectively, based on the genomic data in NCBI. The 803 amino acid frequencies (green dots) for each of the 20 amino acids are all arranged properly in the R 10 / 10 order [36], respectively. The variation trend of the amino acid frequencies for each of the 20 amino acids is obtained by the regression line (denoted in red). Generally speaking, the variation trends for the earlier amino acids tend to decrease, and the variation trends for the latecomers to increase.
Figure 9. The recruitment orders of amino acids and codon pairs on the roadmap are supported by the variation of the amino acid frequencies. The 20 amino acids are arranged in the recruitment order on the roadmap Figure 1a. The 20 amino acid frequencies for each of the 803 species are obtained, respectively, based on the genomic data in NCBI. The 803 amino acid frequencies (green dots) for each of the 20 amino acids are all arranged properly in the R 10 / 10 order [36], respectively. The variation trend of the amino acid frequencies for each of the 20 amino acids is obtained by the regression line (denoted in red). Generally speaking, the variation trends for the earlier amino acids tend to decrease, and the variation trends for the latecomers to increase.
Genes 12 02023 g009
Table 1. Selective pressure due to the unique roadmap with increasing stability.
Table 1. Selective pressure due to the unique roadmap with increasing stability.
StabilityCG*NGC*NTA*NAT*N
(−) GC*A AT*C AT*A
(+)CG*GGC*C GC*GTA*C TA*G TA*AAT*T
(++)CG*A CG*TGC*T
(3+) AT*G
(4+)CG*C TA*T
(+)CG*G → (++)CG*A increase in stability(+)GC*C → (−)GC*A unstable(+)TA*A → (+)TA*G
no increase in stability
(+)AT*T → (3+)AT*G
(+)CG*G → (4+)CG*C increase in stability(+)GC*C → (+)GC*G
no increase in stability
(+)TA*A → (4+)TA*T(+)AT*T → (−)AT*A   unstable
(+)GC*C → (++)GC*T increase in stability(+)CG*G → (++)CG*T(+)AT*T → (+)AT*C
no increase in stability
(+)TA*A → (+)TA*C
no increase in stability
POSSIBLE (Roadmap)ImpossibleImpossibleImpossible
(+)CG*G → (++)CG*T(+)GC*C → (++)GC*T(+)TA*A → (+)TA*C
no increase in stability
(+)AT*T → (−)AT*C   unstable
(+)CG*G → (4+)CG*C(+)GC*C → (+)GC*G
no increase in stability
(+)TA*A → (4+)TA*T(+)AT*T → (−)AT*A   unstable
(+)GC*C → (−)GC*A   unstable(+)CG*G → (++)CG*A(+)AT*T → (3+)AT*G(+)TA*A → (+)TA*G
no increase in stability
ImpossibleImpossibleImpossibleImpossible
Table 2. Formation of the codon boxes via (quasi) route dualities.
Table 2. Formation of the codon boxes via (quasi) route dualities.
Hierarchy 1 to Hierarchy 2Hierarchy 2 to Hierarchy 3Hierarchy 3 to Hierarchy 4
Route 01Gly 6Pro 3Glu 7Ser8Leu 10Arg 17Phe 20Lys
Route 11Gly2Ala 4Asp5Val 9Thr7Ser 15Ile 19Asn
Route 2 2Ala 10Arg 5Val 9Thr 11Cys13His16Met 18Tyr
Route 3 6Pro10Arg 7Ser8Leu 12Trp14Gln 8Leustop
Codon boxGGNGCNCCNCGNGANGUNUCNCUNACNAGNUGNCANAUNUUNUANAAN
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Li, D.J. Formation of the Codon Degeneracy during Interdependent Development between Metabolism and Replication. Genes 2021, 12, 2023. https://doi.org/10.3390/genes12122023

AMA Style

Li DJ. Formation of the Codon Degeneracy during Interdependent Development between Metabolism and Replication. Genes. 2021; 12(12):2023. https://doi.org/10.3390/genes12122023

Chicago/Turabian Style

Li, Dirson Jian. 2021. "Formation of the Codon Degeneracy during Interdependent Development between Metabolism and Replication" Genes 12, no. 12: 2023. https://doi.org/10.3390/genes12122023

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop