Formation of the Codon Degeneracy during Interdependent Development between Metabolism and Replication

Nirenberg’s genetic code chart shows a profound correspondence between codons and amino acids. The aim of this article is to try to explain the primordial formation of the codon degeneracy. It remains a puzzle how informative molecules arose from the supposed prebiotic random sequences. If introducing an initial driving force based on the relative stabilities of triplex base pairs, the prebiotic sequence evolution became innately nonrandom. Thus, the primordial assignment of the 64 codons to the 20 amino acids has been explained in detail according to base substitutions during the coevolution of tRNAs with aaRSs; meanwhile, the classification of aaRSs has also been explained.


Introduction
The difficulty in the field of the origin of the genetic code is due to the lack of key experiments to reproduce the primordial scenario of evolution of life. The debate about the nature of life even makes it difficult to reach a consensus on the definition of life. Pragmatically, we need to put together the few following well-established and enlightening observations to have a deep insight into the transition from non-living to living phenomena. Wong values that Phase I amino acids appeared earlier than Phase II amino acids in prebiotic evolution [1,2]. Pouplana and Schimmel prefer aminoacyl-tRNA synthetases (aaRSs) to clues to establishment of the genetic code [3,4]. Woese divided cellular life into three domains [5], which helps to comprehend last universal common ancestor (LUCA). In addition, a living JCVI-syn1.0 cell has been created by combining a cytoplasm without natural DNA and a chemically synthesised chromosome with Venter's watermark [6]. Verily, potential contradictions, as will be explained next, have yet appeared in the few above common sense observations, which urges us to be serious in collecting experimental observations and extremely cautious in interpreting them.
Pouplana and Schimmel have overlooked the above two phases of amino acids. By comparing sequences and structures, the 20 aaRSs are divided into two distinct classes, each of which is subdivided into three subclasses. Pouplana and Schimmel assumed simultaneous association of two aaRSs on a single tRNA to interpret the symmetrical subclasses between the two classes of aaRSs, where the aaRS pairs, namely IleRS (subclass Ia) and ThrRS (IIa), GlnRS (Ib) and AspRS (IIb), and TyrRS (Ic) and PheRS (IIc), can cover the tRNA acceptor stem without major steric clashes and, meanwhile, link together the specific subclasses. However, Gln as a Phase II amino acid recruited much later than Asp as a Phase I amino acid [7,8]. It becomes suspicious to associate GlnRS and AspRS on a single tRNA simultaneously.
The creation of JCVI-syn1.0 is quite different from the primordial picture for supposed LUCA, where the former was synthesised rapidly, while the latter evolved during a long period. Moreover, a new cell can be certainly recreated anytime in JCV institute if a The genetic code is a common and essential feature of life, which can be regarded as a relic of the prebiotic emergence of informative molecules. The complexity of the problem for the origin of the genetic code may exceed all the theoretical estimations, such as frozen accident, error minimisation, stereochemical interaction, amino acid biosynthesis, expanding codons, etc. [1,[38][39][40][41][42][43][44][45][46][47][48]. So far, it can hardly describe the evolution of the genetic code step by step so as to explain the formation of the codon degeneracy in detail. Here, a triplex picture is proposed to describe the intricate evolution of the genetic code thoroughly, by which both the formation of the codon degeneracy and the classification of aaRSs have been explained in a same theoretical framework. The complexity of the following explanation of the codon degeneracy is comparable to that of a symphony score. The simplest method of score-reading is to concentrate on an individual voice part that can be heard particularly well and then going over to section-by-section or selective reading. Similarly, here are some suggestions for reading the following technical explanation of the codon degeneracy in the triplex picture. Please watch the Supplementary Movie S1 and start from Figure 1 and then figures on tRNAs and aaRSs so as to understand the recruitment of the 20 amino acids during coevolution of tRNAs with aaRSs.     (The stability of base triplex increases from CG*G + to CG*C 4+ ) GC*C + to GC*T ++ ) CG*G + to CG*A ++ ) CG*G + to GC*A ++ )  Figure 1b. In each position #n (n = 1, 2, . . . , 32), the #n codon pair on Rn, and Yn is in red. The relative stabilities of the triplex base pairs (−, +, ++, 4+) are written to the right of the base triplexes, where the increased relative stabilities of triplex base pairs in base substitutions are indicated in green. Each triplex DNA is denoted by three arrows, whose directions are from 5' to 3'. The YR * R triplex DNAs are in pink, and the YR * Y triplex DNAs in azure. The recruitment order of codon pairs are from #1 to #32, and the recruitment order of the 20 amino acids are to the left of them, respectively. Non-standard genetic codes are indicated by brackets beside the corresponding amino acids. The Route 0 − 3 and Hierarchy 1 ∼ 4 are indicated to the right of and below the roadmap, respectively. The evolution of the genetic code are denoted by black arrows, beside which pair connections are indicated by the corresponding amino acids. Refer to an example in Figure 1b to understand details of the roadmap; refer to Figure 2 to understand the critical role of relative stabilities of triplex base pairs in achieving the real genetic code; refer to Figure 5a,b to see the origin of tRNAs; refer to Figure 3a to see the coherent relationship between the recruitment orders of codons and amino acids; refer to Figure 3b to see the codon degeneracy in the symmetric roadmap. (b) A detailed description of the roadmap (see Supplementary Movie S1). Taking, for example, from #1 to #29, the evolution of the genetic code from #1, to #7, to #19, to #24, and, at last, to #29 are explained in detail in the upper boxes, and the corresponding right-handed single-stranded, double-stranded, and triple-stranded DNAs are shown in the lower boxes, respectively. Guessing the right prebiotic picture is the key for understanding the origin of the genetic code. Here, I propose a triplex picture for the prebiotic sequence evolution. There are 8 kinds of triplex nucleic acids S · S * S ('·' represents a Watson-Crick base pair, while ' * ' a Hoogsteen base pair), where the strands S, S , S can be either DNA or RNA [49][50][51], such as the triplex DNA D · D * D and the triplex nucleic acids mixed with DNA and RNA D · D * R, etc. The YR * R triplex DNA Poly C · Poly G * Poly G is supposed as the initial physical conditions for the evolution of the genetic code. The 64 codons have been recruited one by one with the D · D * D sequence evolution by alternative separation and recombination of the three strands in the periodic changing environments. Such sequence evolution in the prebiotic evolution was driven by the substitutions of triplex base pairs according to their relative stabilities. The sequence evolution of D · D * D led to the evolution of the genetic code, while the RNA strands separated from the coevolving D · D * R yielded tRNAs and the template RNAs for aaRSs. The tRNAs and aaRSs were generated in accompany with the recruitment of the corresponding codons, respectively. So, the triplex picture gives a physical basis for the coevolution of the genetic code with the corresponding tRNAs and aaRSs. In the triplex picture, I obtained a roadmap for the evolution of the genetic code (or the roadmap for short). The validity of the roadmap depends essentially on the experimental data of triplex base pairs. The stabilities of the 16 triplex base pairs in triplex DNA are listed from instability (−), weak (+) to strong (++, 3+, 4+) as follows [10,11]:

Nomenclature and
The above stability order in experiments played a significant role in the primordial evolution of triplex DNA. The substitutions of triplex base pairs from weak to strong provided the principal driving force in the prebiotic sequence evolution.
At the beginning of the evolution of the genetic code, there existed single-stranded DNA Poly G and Poly C, which tended to form a triplex DNA (Figure 1a,b) [10,13]. Poly C · Poly G * Poly G is a usual YR * R triplex DNA, which is combined by triplex base pair CG * G (Figure 1b and Supplementary Movie S1). The sequences evolved via substitutions of triplex base pairs in the procedure of alternative combining and separating for the strands of triple-stranded DNA. Only three kinds of substitutions of triplex base pairs are practically required on the roadmap: (1) substitution of (+) CG * G by (++) CG * A [10,11], with the transition from G to A in the third R strand. This is of the most common substitution on the roadmap by which all the codons in Route 0 and most codons in Route 1 ∼ 3 were recruited ( Figure 1a); (2) substitution of (+) CG * G by (4+) CG * C, with the transversion from G to C in the third R strand, which blazed a new path at #2, #7, #10 for the recruitment of codons in Route 1 ∼ 3, respectively ( Figure 1a); (3) substitution of (+) GC * C by (++) GC * T, with the transition from C to T in the third R strand at #6, #19, #12 (Figures 1a and 2), by which the remaining codons in Route 1 ∼ 3 were recruited ( Figure 1a). Thus, all the 64 codons have been recruited following the roadmap (Figures 1a, 3a and 4b).   Figure 2. The driving force in the evolution of the genetic code based on the relative stabilities of triplex base pairs. The base substitutions on the roadmap occur when the relative stabilities of triplex base pairs increase. The roadmap is the best result to avoid the unstable triplex base pairs. So, the universal genetic code is a narrow choice by the relative stabilities of triplex base pairs. The relative stability increases from (+) of the triplex base pair CG * G to (4+) of the triplex base pair CG * C at #2, #7, and #10 that initiates Route 1 ∼ 3, respectively. GC * C (+) changes to GC * T (++) at #6, #19, and #12, and CG * G (+) changes to CG * A (++) at other positions on the roadmap.

Initiation
In the beginning, there was an R (R denotes purine) single-stranded DNA Poly G (Figure 1a,b, #1). By complementary base pairing formed a YR (Y denotes pyrimidine) double-stranded DNA Poly C · Poly G. Furthermore, by triplex base pairing CG * G formed a YR * R1 triple-stranded DNA Poly C · Poly G * Poly G (Figure 1a,b, #1). The third R1 strand Poly G separated out of this YR * R1 triple-stranded DNA, which then formed a new Y1R1 double-stranded DNA Poly C · Poly G. So far, there was only initial codon pair GGG · CCC (Figure 1a,b, #1).
In the initiation stage of the roadmap, the codon pairs from #1 to #6 were recruited along the roadmap, which constituted the initial subset of the genetic code: And in this stage were recruited the earliest 9 amino acids in order: 1Gly, 2Ala, 3Glu, 4Asp, 5Val, 6Pro, 7Ser, 8Leu, 9Thr, all of which belong to phase I amino acids [7,8]. For example, at codon pair position #6 on the roadmap, 1Gly and 9Thr are encoded by the codon pair 5 GGT3 in R6 strand and 5 ACC3 in Y6 strand, respectively. Although the initial subset is concise, two essential features of the roadmap, pair connection and route duality, had taken shape in this initiation stage (Figures 1a and 3a).
Pair connection is an essential feature of the roadmap. A connected codon pair on the roadmap generally encode a common amino acid (Figures 1a and 3b). For instance, the pair connection #1 − Gly − #2 indicates that both GGG in #1 and GGC in #2 encode the common amino acid Gly. Pair connections reveal the close relationship between recruitment of codons and recruitment of amino acids, which will be explained later according to the evolution of tRNAs.
Route duality is another essential feature of the roadmap, which shows the relationship of pair connections between different routes (Figures 1a and 3b). For instance, the route duality indicates that the pair connection #1 − Gly − #3 in Route 0 and the pair connection #2 − Gly − #6 in Route 1 are dual, which encode a common amino acid Gly. Route dualities generally exist between Route 0 and Route 3, or between Route 1 and Route 2 (Figure 3b), which will be explained later according to the evolution of aaRSs. Glycine, the simplest amino acid, is encoded by the cytosine triplet, the simplest nitrogen base. Glycine has been identified in the coma of comet [52] and could be the first amino acid on earth. Here, glycine Gly is also the first amino acid recruited on the roadmap. In the initiation stage of the roadmap, the non-chiral Gly helped to create the first pair connection #1 − Gly − #2, recruiting chiral Ala at #2 (Figure 1a). Furthermore, the non-chiral Gly also helped to create the first route duality on the roadmap ( Figure 1a): This route duality played a central role in the initiation stage; consequently, the initial subset played a central role in the midway stage ( Figure 3a). The chirality was required at the beginning of the roadmap by the triplex DNA itself (Figure 1a,b). Even so, there was still a transition period from non-chirality to chirality, in consideration of the special role of non-chiral Gly. Competition between opposite homochiral roadmap systems resulted in the homochirality by a winner-take-all game [53].

Midway
The genetic codes evolved along four routes Route 0 − 3, respectively, where 8 codon pairs in each route evolved in the order of four hierarchies Hierarchy 1 ∼ 4, respectively ( Figure 1a). The roadmap can be divided into two groups: the early hierarchies Hierarchy 1 ∼ 2 and the late hierarchies Hierarchy 3 ∼ 4. It can also be divided into two groups: the initial route Route 0 (all-purine codons pairing with all-pyrimidine codons) and the expanded routes Route 1 ∼ 3 (purine-pyrimidine-mixing codons).
In the midway stage of the roadmap, the genetic codes expanded spontaneously from the initial subset (Figures 1a and 3a). Each of the 6 codon pairs in the initial subset expanded to three additional codon pairs, respectively, by route dualities. Details are as follows. The codon pair #2 in the initial subset expanded to the three continual codon pairs #7, #8 and #9 by route duality the codon pair #1 in the initial subset expanded to the three continual codon pairs #10, #11, and #12 by route duality the codon pair #3 in the initial subset expanded to the three continual codon pairs #13, #14, and #15 by route duality the codon pair #6 in the initial subset expanded to the three continual codon pairs #16, #17, and #18 by route duality the codon pair #5 in the initial subset expanded to the three codon pairs #19, #24, and #26 by route duality and the codon pair #4 in the initial subset expanded to the three codon pairs #20, #25, and #27 by route duality The recruitment order of the codon pairs and the recruitment order of the amino acids are intricately well organised and coherent, according to the subtle roadmap (Figures 1a and 3a). Take, for example, from #1 to #29, the evolution of the genetic code along the roadmap can be described in details as follows (Figure 1a,b and Supplementary Movie S1). Starting from the position #1 (Figure 1b, #1), an R single-stranded DNA brought about a YR doublestranded DNA; next, the YR double-stranded DNA brought about a YR * R1 triple-stranded DNA (the number 1 denotes #1, similar below); next, an R1 single-stranded DNA departed from the YR * R1 triple-stranded DNA; next, the R1 single-stranded DNA brought about a R1Y1 double-stranded DNA. Thus, the codon pair GGG · CCC were achieved at #1. At the beginning of #7 (Figure 1b, #7), the R1Y1 double-stranded DNA was renamed as Y1R1 double-stranded DNA, where the 180 • rotation in writing did not change the right-handed helix; next, the Y1R1 double-stranded DNA brought about a Y1R1 * R7 triple-stranded DNA, through the transversion from G to C, where the stability (+) of CG * G increased to the stability (4+) of CG * C; next, an R7 single-stranded DNA departed from the Y1R1 * R7 triple-stranded DNA; next, the R7 single-stranded DNA brought about a R7Y7 doublestranded DNA. Thus, the codon pair GCG · CGC were achieved at #7. The case of #19 is similar to #7 (Figure 1b, #19); the codon pair GTG · CAC were achieved through the transition from C to T, where the stability (+) of GC * C increased to the stability (2+) of GC * T. The case of #24 is also similar to #7 (Figure 1b, #24); the codon pair GTA · TAC were achieved through the common transition from G to A, where the stability (+) of CG * G increased to the stability (2+) of CG * A. At the position #29 (Figure 1b, #29), the codon pair GCG · CGC in Y24R24 are non-palindromic in consideration that both GCG and CGC do not read the same backwards as forwards. In this case, a reverse operation is necessary so that the obtained codon pair CAT · ATG in y24r24 read reversely the same as the codon pair TAC · GTA in Y24R24. The process from y24r24 to R29Y29 is still similar to the case of #7; the codon pair ATA · TAT were achieved through the transition from G to A, where the stability (+) of CG * G increased to the stability (2+) of CG * A. Other processes on the roadmap are similar to the above example (Figure 1a,b). The reverse operation is unnecessary in the cases of #2, #7, #10, #11, #3, #4, #16, #9, #19, #27, #23, #22, #24 after palindromic codon pairs and the last one #32 (Figure 1a), whereas the reverse operation is necessary in the remaining cases of #5, #6, #8, #12, #13, #14, #15, #17, #18, #20, #21, #25, #26, #28, #29, #30, #31 ( Figure 1a).

The Ending
So far, the genetic code table had been expanded from the 6 codon pairs in the initial subset to the 6 + 18 codon pairs by route duality; the remaining 8 codon pairs were recruited into the genetic code table in the ending stage of the roadmap (Figures 1a and 3a). There were 2 codon pairs remained in each of the four routes Route 0 − 3, respectively. They satisfied pair connections as follows: Figure 3a). Two of them satisfied route duality ( Figure 3a): The last two stop codons appeared in the pair connection #25 − stop − #31 (Figures 1a and 3a). When the last two amino acids were recruited through the base pairs #26 − Asn − #30 and #27 − Lys − #32, the codon U AG at #25 had to be selected as a stop codon. The codon U AA at #31 was selected as the last stop codon, due to lack of corresponding tRNA.
The non-standard codons also satisfy codon pairs and route dualities on the roadmap ( Figure 1a). The codon pairs pertaining to non-standard codons are as follows: where the first stop codon UGA at #15 is dual to the non-standard stop codons in Route 0. The choice of the genetic code was by no means random, which resulted from the increasing stabilities of triplex base pairs in the substitutions [10,11], where the rotation of the single glycosidic bond between base and deoxiribose has been considered in the opposite direction. It had been emphasised that the roadmap followed the strict rule that the stabilities of triplex base pairs monotonically increase ( Figure 2). Note that the roadmap had tried its best to avoid the unstable triplex DNA. The roadmap (Figure 1a) is the only possible one that has avoided the unstable triplex base pairs (−) GC * A, AT * C and AT * A, as shown in Table 1, while other eliminated possible roadmaps cannot avoid.
Among the 16 possible triplex base pairs, there are three relatively unstable triplex base pairs. So, the statistical ratio of instability for the triplex base pairs is 3/16. However, the ratio of instability for the triplex base pairs on the roadmap is much smaller. There are 49 triplex DNAs through #1 to #32 on the roadmap, which involve 3 × 49 = 147 triplex base pairs (Figure 1a). The relatively unstable triplex base pairs GC * A and AT * C have not appeared on the roadmap; only the relatively unstable triplex base pair AT * A has appeared inevitably for 7 times in the reverse operations so as to fulfil all the permutations of 64 codons (Figure 1a). The ratio of instability 7/147 on the roadmap is much smaller than the ratio of instability 3/16 by the statistical requirement. When the relatively unstable AT * A appears at the positions #15, #17, #21, #25, #29, #30, and #31, both stabilities of the other two triplex base pairs in the triplex DNA are (4+) (Figure 1a), which compensates the instability of the triplex DNA to some extent. The amino acid Ile, whose degeneracy uniquely is three, occupied three positions #21, #29, and #30 among those 7 positions. In addition, the three stop codons occupied other three neighbour positions #15, #25 and #31 (Figure 1a). The first stop codon UGA appeared at the position #15, where the relatively unstable AT * A appeared firstly (Figure 1a). According to the primordial translation mechanism, the weak combination of AT * A might help to assign stop codons. The route dualities played significant roles in the midway stage, where the remnant codons were chosen as the stop codons (Figures 1a and 3a). The stop codon appeared as early as the midway of the evolution of the genetic code (Figures 1a and 3a), which indicates that the genetic code had been taken shape around the midway to promote the formation of the primitive life. Not until the fulfilment of the genetic code did the translation efficiency increase notably by recognising all the 64 codons.

Origin of tRNA
The roadmap illustrates the coevolution of the genetic code with the amino acids, where tRNAs and aaRSs play an intermediary role. The expansion of the genetic code along the roadmap can be explained by the coevolution of tRNAs with aaRSs (Figures 5c, 6b and 7). The cloverleaf shape of tRNA can be explained by assembling the two complementary RNA strands separated from triplex nucleic acid D · D * R in the triplex picture (Figure 6a). The origin of aaRS will be explained next.    The evolution of the 5 R t Y t 3 type tRNAs by the triplex base pairings yr * R, yr * Y and YR * Y t and YR * R t . The node numbers #n on the roadmap may exchange within or between routes because the sequences of Y and R are reverse to the sequences of y and r, respectively. (c) The coevolution of tRNAs with aaRSs along the roadmap, which determines the pair connections and route dualities. The aaRSs aaRS1 to aaRS20 combine, respectively, with the tRNAs t1 to t20 from certain major/minor groove side. The complementary relationship between the pyrimidine y t strand of the 5 y t r t 3 type tRNAs and the purine R t strand of the 5 R t Y t 3 type tRNAs agrees with the complementary relationship between G and C for the second bases of the consensus genes of tRNAs, especially for the early tRNAs in Route 0 and in Hierarchy 1.

Anti-Codon
When studying the evolution of the genetic code, we were focused on only three bases in the triplex DNA. However, when studying the origin of tRNAs, it is necessary to study the evolution of entire sequences of both triplex DNA and triplex nucleic acid D · D * R, where the third RNA strands in D · D * R can be used to assemble tRNAs (Figures 5a,b and 6a). According to the order of the relative stabilities of YR * Y for the 8 kinds of triplex nucleic acids: D · D * D, D · D * R, R · D * R, R · D * D > D · R * R, R · R * R >> R · R * D, D · R * D [50,54], the relative stabilities of D · D * D and D · D * R are greater than the relative stabilities of other kinds of triplex nucleic acids. The choice of triplex DNA for the roadmap and the choice of D · D * R for the origin of tRNAs are based on the observed relative stabilities. And the other kinds of triplex nucleic acids can be neglected due to their less probabilities to appear.
There are four types of RNA strands for assembling tRNAs that were generated by the triplex base pairing of triplex nucleic acids D · D * R: via the triplex nucleic acid yr * y t , via the triplex nucleic acid yr * r t (Figure 5a,c), and via the triplex nucleic acid YR * Y t , via the triplex nucleic acid YR * R t (Figure 5b,c), where the subscript t indicates that theses RNA strands y t , r t and Y t , R t are used to assemble tRNA (Figures 5a,b and 6a). The sequences Y t , R t are the respective reverse sequences of y t and r t . There is a difference in the sequence evolution along the roadmap between purine strands and pyrimidine strands. The pyrimidine sequences Y t , y t and the purine sequences R t , r t are complementary, respectively, owing to the triplex pairing with the purine DNA strand and the pyrimidine DNA stand in the triplex nucleic acids D · D * R, respectively. These tRNA strands coevolved with the triplex DNA along the roadmap. Therefore, the evolution of the anti-codons on tRNAs can be explained according to the evolution of the genetic code along the roadmap. The evolution of aaRSs should be considered next. After separating from the triplex nucleic acids D · D * R, the pair of complementary single RNA strands y t and r t , or R t and Y t , can concatenate and fold into a cloverleaf-shaped tRNA [55][56][57][58][59], whose anti-codon corresponds to the codon of the triplex DNA on the roadmap (Figure 6a). Owing to the different positions of anti-codons in the RNA strands, either near to 3 -ends or near to 5 -ends, it must be seriously considered for the different reading directions between Y t , R t and y t , r t (Figure 6a). There were two types of tRNAs: the type 5 y t r t 3 tRNA and the type 5 R t Y t 3 tRNA (Figure 5a,b), where the anti-codons are near to the 3 -end of the RNA strand y t and the 3 -end of the RNA strand R t , respectively. The other concatenated RNA strands 5 r t y t 3 and 5 Y t R t 3 cannot evolve together with the above two types of tRNAs because the corresponding triplets would be on the acceptor arms rather than on the anti-codon loops.
It is possible to explain the sequence evolution of tRNAs in detail along the roadmap (Figures 5a-c and 6a). For example, the tRNA t2 for 2Ala can form by concatenating y t 7 and r t 7, which are generated by triplex base parings y7r7 * y t 7 and y7r7 * r t 7 at the branch node #7. The anti-codon CGC near the 3 -end of the strand y t 7 is palindromic. The two complementary strands y t 7 and r t 7 can combine into a cloverleaf-shaped type 5 y t r t 3 tRNA t2 by concatenating, pairing, and folding ( Figure 6a). Thus, anti-codon arm of t2 contains the anti-codon CGC, which corresponds to Ala, with the help of aaRS; consequently, the codon GCG at the R DNA strand in #7 is assigned to Ala. The sequences evolve from #7 to #16 along the roadmap. As another example, the codons at the position #16 is non-palindromic, where the type 5 y t r t 3 tRNA t9 and the type 5 R t Y t 3 tRNA t11 are assembled by concatenating y t 16 and r t 16 for t9 and by concatenating R t 16 and Y t 16 for t11, respectively (Figure 6a). Hence, the codon ACG at #16 and the reversely complimentary codon UGC at #9 are assigned to 9Thr and 11Cys, respectively.
There is another reason at the sequence level for the number "20" of the canonical amino acids (Figure 6b). There are 64 triple permutations for the 4 bases, which accounts for the number 64 of the codons. However, little attention has been paid to the 20 triple combinations for the 4 bases. The products p(i) * p(j) * p(k) (i, j, k = G, C, A, T) are the same, respectively, for the 20 groups of combinations for the 4 bases (Figure 6b), owing to the multiplication exchange law, where p(i) denotes the base compositions for i = G, C, A, T. The products determine the average interval distances of codons in genome sequences. Therefore, there are 20 classes of genomic codon interval distributions according to the 20 combinations rather than the 64 permutations of the 4 bases [53]. Consequently, there are 20 cognate tRNA-synthetase systems so as to improve the translation efficiency for tRNAs to recognise the corresponding codons, considering the 20 average interval distances of codons. So, the number "20" of the canonical amino acids actually should be attributed to a statistical origin at the sequence level. The 20 combinations of the 4 bases can be divided into 4 groups: < G >, < C >, < A >, < T >. Hierarchy 1 and Hierarchy 2 correspond < G > and < C >; Hierarchy 3 and Hierarchy 4 correspond to < A > and < T >. Their positions on the roadmap are Hierarchy 1 ∼ 2 Y : < G >, Hierarchy 1 ∼ 2 R : < C >, Hierarchy 3 ∼ 4 Y : < A >, Hierarchy 3 ∼ 4 R : < T >. Each group can be divided into 5 combinations, which correspond to Route 0 or Route 1 ∼ 3, respectively. In the case < G >, < G, G, G > and < G, G, A > belong to Route 0; < G, G, C >, < G, G, T >, and < G, C, A > belong to Route 1 ∼ 3, and it is similar for the other cases < C >, < A >, < T >. These 20 combinations roughly correspond to the 20 cognate tRNAs (Figure 6b). This rough correspondence shows that the codons, especially those in Hierarchy 1 ∼ 3, are assigned to the tRNAs based on the combinations, considering that the codons in Hierarchy 4 are AT-rich, and the context sequences tend to form AT-rich repeats. Concretely speaking, the group of codons in the combinations < GGG >, < GGC >, < GGA >, < GGU >, < GCA >, < GCU >, < GAA >, < GAU >, < CCC >, < CCA >, < CCU >, < CAA >, < CAU >, < CUU >, < AAU > are assigned, respectively, to t1, t2 and t10, t3, t5 and t12, t4 and t9 and t14, t8 and t11, t20, t16, t6, t13, t7, t19, t18, t17, t15 (Figure 6b). In addition, the first stop codon appeared halfway in the evolution of tRNAs (Figure 6b). The order of combinations are simply organised by the bases in the order "G", "C", "A", "U" (Figure 6b), considering the substitutions "G to C", "G to A", "C to U" on the roadmap (Figure 1a). And the amino acids are in the recruitment order. Then, a rough diagonal distribution of tRNAs has been obtained (Figure 6b), which is due to the evolutionary relationship between the genetic code and amino acids.

Evolution of tRNA
There was a post-initiation-stage stagnation (Figure 1a) between the initiation stage and the midway stage of the roadmap. Such a stagnation in the prebiotic evolution was just to await the birth of functional macromolecules. In this period, oligonucleotides with arbitrary finite sequences can be generated via the base substitutions G to A, G to C, and C to T in the triplex picture. The primordial sequences of the prototype tRNAs and the template RNAs of prototype aaRSs can be generated along the roadmap. In the light of complicated interactions between oligonucleotides and amino acids, some early tRNAs with certain anti-codons can be generated in the sequence evolution along the roadmap so as to carry the corresponding prebiotically synthetised phase I amino acids, respectively. These tRNAs were not necessarily homologous, as long as they were capable of fulfilling their respective tasks. There are two independent codon systems for tRNAs: the anti-codons and the para-codons. The anti-codons evolved along the roadmap, while the para-codons evolved with aaRSs (Figures 5c and 7). When the para-codons did not evolve but the anti-codons evolved, only cognate tRNAs originated. However, when both the para-codons and the anti-codons evolved, more new tRNAs originated to carry the remaining amino acids.
The wobble pairing rules can be explained by the origin and evolution of tRNAs in the triplex picture. The transition from C to T occurred at the position #6 on the roadmap, which resulted in the wobble pairing rule G : U or C. Taking y2r2 as a template, y t 2 with GCC is formed by the triplex base pairing, while r t 2 with GGC and r t 2 with GGU are formed, where the transition from C to U occurred in the formation of r t 2. The complementary strands y t 2 and r t 2 combine into a tRNA with anti-codon GCC, where G at the first position of the anti-codon of the tRNA is paired with U at the third position of the triple code of an additional single strand r t 2. It implies that the wobble pairing rule G : U had been established as early as the end of the initiation stage of the roadmap. The transition from C to T occurred at the position #12, which resulted in the wobble pairing rule U : G or A. Taking y10r10 as a template, y t 10 with CCG is formed by the triplex base pairing, and r t 10 with CGG and r t 10 with UGG are also formed, where the transition from C to U occurred in the formation of r t 10. The complementary strands y t 10 and r t 10 combine into a tRNA with anti-codon UGG, where U at the first position of the anti-codon of the tRNA is paired with G at the third position of the triple code of an additional single strand y t 10. The above explanation of the wobble pairing rules by tRNA mutations is supported by the observations of nonsense suppressor. For instance, the wobble pairing rule C : A for a UGA suppressor can be established by a transition from G to A at the 24th position of tRN A Trp . The wobble pairing rules G : U or C and U : G or A had been established early in the evolution of the genetic code, which continued to flourish so as to make full use of the short supply tRNAs.
The evolutionary relationship between tRNAs that corresponds to pairs of different amino acids can also be explained according to the evolution of tRNAs along the roadmap. For example, based on the substitution G to A, t16(AUG, Met) can evolve to t15(AU A, Ile), and based on the substitution G to C, t3(GAG, Glu) can evolve to t4(GAC, GAU, Asp), and so on (Figure 5c). However, this kind of evolution of tRNAs involves not only anticodons but also para-codons because it inevitably needs extra help from aaRSs. There is a close relationship between the evolution of tRNAs and the biosynthetic families of amino acids, so the sequences of tRNAs coevolved with the sequences of aaRSs at each step of the roadmap. The recognition between tRNAs and aaRSs will be explained next, where there are many technical details, and each step needs to be straightened out in order to draw a comprehensive conclusion.
The evolution of tRNAs played significant roles to implement the number of canonical amino acids as 20. There is an important difference between the early prime tRNAs tn and the late derivative tRNAs tn + . Generally speaking, the wobble pairing rules apply to the late derivative tRNAs tn + rather than to the early prime tRNAs tn (Figure 6b). The early prime tRNAs do not need wobble pairings so as to accurately implement the number of bases in codons as 3, whereas the late derivative tRNAs need wobble pairings so as to improve translation efficiency via codon degeneracy. This was a dynamic process to achieve that the number of canonical amino acids equals to the combination number of bases, which can hardly be fulfilled in lack of tRNAs but can be adjusted by choosing among the numerous candidates of tRNAs.

Palindrome
Palindromic sequences play significant roles not only in contemporary molecular biology but also in the prebiotic evolution. Palindromic or non-palindromic codons on the roadmap can produce different effects in the origin and evolution of informative macromolecules. The cloverleaf secondary structure of tRNAs can be explained by the complementary palindrome in assembling tRNAs. Furthermore, the evolution of aaRSs also depended strongly on the evolution of palindromic para-codons along the roadmap, which will be explained next.  There are two types of tRNAs: type 5 y t r t 3 and type 5 R t Y t 3 , where the two single RNA strands y t and r t , Y t and R t are complementary to each other. A D-loop and an anticodon loop situate in the 5 -end RNA strand (y t for type 5 y t r t 3 and R t for type 5 R t Y t 3 ), while a TΨC loop and a missing loop situate in the 3 -end RNA strand (r t for type 5 y t r t 3 or Y t for type 5 R t Y t 3 ) (Figure 6a). The strand pair y t and r t or Y t and R t can form two pairs of hairpins in the complementary double-stranded RNA, where the D-loop and the TΨC loop constitute a pair of hairpins, and the anti-codon loop and the missing complementary loop constitute another pair of hairpins (Figure 6a). When the missing loop has been deleted, the three other loops form a cloverleaf-shaped tRNA (Figure 6a). A palindromic nucleotide sequence can form a hairpin, and palindromic complementary double RNA sequences can form a pair of hairpins, which can account for the cloverleaf secondary structure of tRNAs (Figures 6a and 8). If there are palindromic sequence intervals in the 5 -end RNA strand, there will also be the corresponding palindromic sequence intervals in the complementary 3 -end RNA strand. A D-loop and an anti-codon loop can form in the 5 -end RNA strand, owing to the complementarity in the palindromic sequence intervals. Accordingly, a TΨC loop and a missing loop can also form in the 3 -end RNA strand, which correspond to the D-loop and the anti-codon loop, respectively. After deleting the missing loop, a catenated RNA strand with three loops can form a cloverleaf secondary structure, and consequently, a stable tertiary structure can form. Therefore, palindromic sequences contribute to the formation of stable RNA structures in the prebiotic evolution. It is easy to generate palindromic oligonucleotides according to the base substitutions along the roadmap (Figure 5a,b). So, it tended to generate pairs of palindromic single RNA strands so as to assemble cloverleaf-shaped tRNA candidates. Numerous tRNA candidates can be produced by such an assembly line during the prebiotic evolution, where several qualified tRNAs with proper anti-codons and para-codons can be selected to carry the respective amino acids. Although it is difficult for the origin of aaRSs in the prebiotic evolution (Figure 8), it is not too difficult for the origin of tRNAs and amino acids. The early aaRSs had chance to adapt by choosing among the numerous tRNA candidates and amino acid candidates. Thus, the degree of difficulty for the origin of life can be reduced to some extent. Yet, if both tRNAs and aaRSs had been rare, there would have been little opportunity to establish the correspondence relationship between aaRSs and tRNAs.

Origin of aaRS 2.4.1. Para-Codon
On one hand, an aaRS is able to recognise cognate tRNAs by para-codons (Figures 6b and 8).
On the other hand, the aaRS is able to catalyse the esterification of proper amino acid to its cognate tRNA ( Figure 8). The origin of aaRS is one of the most difficult events in the origin of life because a primordial mechanism must be invented to generate the earliest proteins in absence of ribosome, and, meanwhile, aaRSs have to possess both para-codons and enzyme activity. It should be a rare critical event for the emergence of the first aaRS with enzyme activity in primordial sequence evolution. Following this process, the enzyme activity can transmit from the common ancestor of aaRSs to all the descendant aaRSs, either to the class I or class II aaRSs. Thus, the evolution of para-codons became to play a leading role in the evolution of aaRSs. The evolution of aaRS closely related to both the evolution of tRNA and the biosynthesis families of amino acids. The evolution of para-codons can be explained in the triplex picture. The para-codons of aaRSs coevolved with the sequences of tRNAs along the roadmap. The abilities to recognise certain amino acids came from the coevolution within the biosynthetic families of amino acids. According to the sequence evolution in the triplex picture, the recognition of tRNA by aaRS can be explained by the sequence homology between the template RNA of aaRS and the corresponding major or minor groove side sequence of tRNA. The recognition between aaRS and its template RNA led to the recognition between aaRS and the corresponding tRNA.
There are two types of tRNA according to the generation process of tRNA along the roadmap: type 5 y t r t 3 and type 5 R t Y t 3 (Figure 5a,b), where the 5 side corresponds to the minor groove, while the 3 side to the major groove. Additionally, the aaRSs can combine with the two types of tRNAs from either minor groove or major groove (Figures 5c and 8). Thus, there are four classes of aaRSs: class y t -m aaRS, class r t -M aaRS, class R t -m aaRS, class Y t -M aaRS (Figures 5c and 7). The four symbols indicate that aaRSs combine with tRNAs, respectively, from the minor groove (m) side 5 y t (y) of type 5 y t r t 3 tRNA, from the major groove (M) side r t 3 (r) of type 5 y t r t 3 tRNA, from the minor groove (m) side 5 R t (R) of type 5 R t Y t 3 tRNA, and from the major groove (M) side Y t 3 (Y) of type 5 R t Y t 3 tRNA.
The evolution of aaRSs occurred between the four classes of aaRSs (Figure 7). The sequences of para-codon can evolved between the homologous strands, and it can also evolve between the complementary strands when the sequences of para-codons are palindromic (Figure 7). According to the evolution of palindromic para-codons and the origin of the template RNA of aaRS (Figure 8), the class y t -m aaRS can be complementary with the class r t -M aaRS owing to the complementary two strands 5 y t and r t 3 that combine into the type 5 y t r t 3 tRNA (Figure 5a), and the class R t -m aaRS can be complementary with the class Y t -M aaRS owing to the complementary two strands 5 R t and Y t 3 that combine into the type 5 R t Y t 3 tRNA (Figure 5b). According to the evolution of palindromic para-codons and the coevolution of the template RNAs of aaRSs with tRNAs (Figures 7 and 8), the class r t -M aaRS can be complementary with the class Y t -M aaRS, and the class R t -m aaRS can be complementary with the class y t -m aaRS. The class y t -m aaRS can be homologous to the class Y t -M aaRS, and the class r t -M aaRS can be homologous to the class R t -m aaRS. These relationships are useful for studying the evolution of aaRS along the roadmap.
The aaRSs are denoted in evolutionary order as aaRS1 to aaRS20 instead of GlyRS to LysRS for convenience, according to the recruitment order of the corresponding amino acids from No.1 Gly to No.20 Lys, respectively. The ancestor of aaRSs, namely the major groove aaRS1, belongs to the class r t -M aaRS, which catalysed pairing between the amino acid 1Gly and the tRNA t1 and which approaches to the type 5 Y t R t 3 tRNA t1 from the major groove side R t 3 (Figure 7). The aaRS1 evolved into the same class aaRS2 and the Y t -M class aaRS7 (Figure 7). The aaRS2 evolved into aaRS3. According to the evolution of the Glu biosynthesis family, aaRS3 evolved into aaRS6, aaRS10, aaRS13, and, furthermore, aaRS14, and aaRS3 evolved into aaRS4 (Figure 7). According to the evolution of the Asp biosynthesis family, aaRS4 evolved into aaRS9, aaRS19, and, furthermore, aaRS15, aaRS16, and aaRS20 (Figure 7). According to the evolution of the Ser biosynthesis family, aaRS7 evolved into aaRS11 and aaRS12. According to the evolution of the Val biosynthesis family, aaRS2 evolved into aaRS5, aaRS8. According to the evolution of the Phe biosynthesis family, aaRS8 evolved into aaRS17 and aaRS18. In general, the evolutions via the Glu and Ser biosynthesis families took place in Hierarchy 1 and Hierarchy 2, corresponding to the codons whose second bases are G or C, while the evolutions via the Asp, Val and Phe biosynthesis families took place in Hierarchy 3 and Hierarchy 4, corresponding to the codons whose second bases are A or U (Figure 5c). This result accounts for the observation that the second bases of codons relate to the biosynthesis families of amino acids (Figure 4c).
The evolution of aaRSs depends strongly on the para-codon evolution (Figures 7 and 8). Some para-codons of aaRS are homologous but not complementary to the previous paracodons. However, the para-codons of aaRSs that are complementary to the previous paracodons had to be palindromic. Some evolutions occurred between the same classes, which includes from aaRS1 to aaRS2, from aaRS3 to aaRS10, from aaRS15 to aaRS16, from aaRS4 to aaRS9, from aaRS4 to aaRS19, from aaRS8 to aaRS17 (Figure 7). Some evolutions of palindromic para-codons occurred between class y t -m and class r t -M, which includes from aaRS2 to aaRS3, from aaRS2 to aaRS5, from aaRS3 to aaRS4, from aaRS9 to aaRS15, from aaRS19 to aaRS20 (Figure 7). Some evolutions of palindromic para-codons occurred between class R t -m and class Y t -M, which includes from aaRS7 to aaRS11, from aaRS17 to aaRS18 (Figure 7). In addition, from aaRS1 to aaRS7 occurred between class r t -M and class Y t -M; from aaRS2 to aaRS8 occurred between class r t -m and class R t -m; from aaRS3 to aaRS6, from aaRS13 and from aaRS13 to aaRS14 occurred between class y t -m and class Y t -M; from aaRS11 to aaRS12 occurred between class R t -m and class y t -m ( Figure 7).
The evolution of aaRSs along the roadmap helps to clarify the traditional classifications of aaRSs in the literature (Figure 4c), such as the major groove (M), minor groove (m) classification [31], or the class I (I A, IB, IC), class I I (I I A, I IB, I IC) classification (Gesteland et al. 2006). The four classes y t -m, r t -M, R t -m, Y t -M classification here makes clear some confused ideas in the above classifications. The majority of class r t -M aaRSs correspond to class I I A aaRSs, and the majority of class R t -m aaRSs correspond to class I A aaRSs, which indicates an evolution from I I A to I A due to the reverse sequence relationship between the RNA templates of class r t -M aaRS and class R t -m aaRS (Figure 7). The majority of Y t -M aaRSs correspond to class I I A aaRSs, which were from the homologous r t -M aaRSs. In addition, the majority of class y t -m aaRSs correspond to class I A or IB aaRSs, which were from the complementary r t -M aaRSs due to evolution of palindromic para-codons ( Figure 7). The traditional classification of aaRSs by the major groove and minor groove are reasonable in practice because the template RNAs of aaRSs are complementary between the major groove class and the minor groove class, where the para-codons are palindromic to link the two classes. Meanwhile, the traditional classification of aaRS by classes A, B, and C reflects some reasonable evolutionary relationships between aaRSs based on the evolution of the biosynthetic families.

Coevolution of tRNA with aaRS
A comprehensive study of the evolution of the genetic code inevitably involves the origins of tRNAs and aaRSs. The intricate evolutionary relationships between tRNAs and aaRSs can be explained step by step for each codon in the triplex picture (Figure 7). The initiation stage on the roadmap played a fundamental role. At the end of the initiation stage, arbitrary finite sequences can be generated, which provided opportunities to generate complex RNAs, such as tRNAs, the template RNAs for aaRSs, ribozymes and the prototype of rRNAs, coding and non-coding RNAs, etc. The primordial translation mechanism were invented during the evolution of the genetic code. There were a junior stage and a senior stage of the primordial translation mechanism (Figure 8). The ancestor of aaRSs originated in the junior stage when no tRNAs were involved ( Figure 8). However, the tRNAs and ribosomes were indispensable in the senior stage of the primordial translation mechanism, as well as in the modern translation mechanism. Certainly, the translation efficiency was low in the junior stage, was medium in the senior stage, and was high in the modern translation mechanism. There exists non-standard translation in experiments, such as direct translation from DNA to protein [60,61].
The benefits to explain the origins of tRNAs and aaRSs in the triplex picture are as follows. First, the ancestors of tRNAs and aaRSs did not originate from the random sequences; the sequence evolution along the roadmap was recurrent so the informative molecules were generated recurrently and accumulated in the prebiotic surroundings. Second, the evolutionary relationships between tRNAs and aaRSs can be naturally explained by the relationships of the homologous strands of the evolving triplex DNAs. The sequence of the template of the ancestor aaRS can be generated in the triplex picture by the junior stage of the primordial translation mechanism; meanwhile, the sequence of ribozyme can also be generated by the other strand of the same triplex nucleic acid. Thus, the earliest proteins, such as the ancestor of aaRSs, can be generated by the complex consisting of the ribozyme, the RNA template of aaRS, as well as a triplex DNA. Such a complex itself was the product of sequence evolution of triplex nucleic acids based on specific substitutions of triplex base pairs, where both the sequence for ribozyme and the sequence for the template of ancestor aaRS with enzyme activity were generated in different strands of the same triplex DNA by chance. Although the efficiency to produce proteins was low in this junior stage, it was feasible to generate a small number of proteins by this complex consisting only nucleic acids. The ancestor of aaRS with enzyme activity can be generated by this complex, which naturally tends to combine with the corresponding RNA template.
If the sequence of tRNA is homologous to the above RNA template, the ancestor aaRS also tends to combine with the tRNA. Furthermore, the above requirement can be reduced to homologous para-codons. Thus, in the triplex picture, the aaRSs coevolved with the paracodons, while the tRNAs coevolved with the codons. When considering the homologous or complementary sequence relationships, the reverse sequence relationships and the base substitution relationships in the strands of triplex nucleic acids, the intricate evolutionary relationships between tRNAs and aaRSs can be revealed in detail (Figures 5c and 7). It is more difficult to generate aaRSs than to generate tRNAs, so there existed numerous tRNAs candidates in the prebiotic surroundings. Only the tRNAs that were recognised by aaRSs can be recruited into the living system. For example, the RNA 5 -y t 1r t 1-3 were recognised by the class r t -M aaRS1, so it was chosen as the first tRNA t1 to transport 1Gly. The prime RNAs tn were recognised by aaRSn, so they were chosen as the tRNAs to transport No. n amino acids (Figures 5c and 7), respectively. Similarly, the derivative RNAs tn , tn + , tn + , tn − , tn − , with non-palindromic or palindromic para-codons homologous to the para-codons of tn, were recognised by aaRSn, so they became the tRNAs to transport No. n amino acids, respectively. Para-codons are the key factors for the recognition between tRNAs and aaRSs. The types of tRNAs are not necessarily same for the cognate tRNAs. Generally, the aaRSs combine with the cognate tRNAs from the same side. For example, aaRS8 combines with the 5 R t Y t 3 type cognate tRNAs t8, t8 , t8 + , t8 − , and t8 − from the minor groove side, where the para-codons can be non-palindromic ( Figure 7); aaRS7 combines with the 5 R t Y t 3 type tRNAs t7, t7 + , t7 − and the 5 y t r t 3 type tRNAs t7 − from the major groove side, where the para-codons of the two types of tRNAs have to be palindromic (Figure 7). However, aaRS10 combines with the 5 y t r t 3 type tRNAs t10, t10 and the 5 R t Y t 3 type tRNA t10 + from the minor groove side, while combine with the 5 y t r t 3 type tRNAs t10 − and t10 − from the major groove side, where the para-codons also need to be palindromic (Figure 7).
The biosynthetic families played essential roles in the evolution of aaRSs when both anti-codon and para-codon had changed (Figure 7). There were far more than 20 amino acids in the prebiotic surroundings. Only the amino acids that were recognised by aaRSs can be recruited into the living system. When aaRS1 involved to aaRS2, aaRS2 recognised 2Ala, as well as t2, from the major groove side, which inherited from aaRS1 that recognised 1Gly, as well as t1, from the major groove side. When aaRS2 involved to aaRS3, aaRS3 recognised 3Glu, as well as t3, from the minor groove side owing to the palindromic para-codons, which inherited from aaRS2 that recognised 2Ala, as well as t2, from the major groove side. When aaRSs involved in the same biosynthetic families: Glu family, Asp family, Val family, Ser family, and Phe family, the new aaRSs tended to recruit the new amino acids with the similar chemical properties in the same biosynthetic family. When aaRSs evolved from aaRS1 to aaRS20, the enzyme activity transmitted between the aaRSs, and the recognised tRNAs t1 to t20 and the recognised amino acids No.1 Gly to No.20 Lys were recruited, where the evolving non-palindromic or palindromic para-codons linked these evolutions.
The evolutionary pairs of aaRSs combining two sides of the same tRNAs along the roadmap agree with the results based on structures: IleRS and ThrRS, GlnRS (GluRS) and AspRS, and TyrRS and PheRS [4,62], and additionally SerRS and CysRS. The aaRS pair ThrRS and IleRS (namely aaRS9 and aaRS15) corresponds to an evolution from r t -M aaRS9 to y t -m aaRS15. The aaRS pair GluRS and AspRS (namely aaRS3 and aaRS4) corresponds to an evolution from y t -m aaRS3 to r t -M aaRS4. The aaRS pair PheRS and TyrRS (namely aaRS17 and aaRS18) corresponds to an evolution from R t -m aaRS17 to Y t -M aaRS18. The aaRS pair SerRS and CysRS (namely aaRS7 and aaRS11) corresponds to an evolution from Y t -M aaRS7 to R t -m aaRS11.
The recruitment order of the 20 amino acids from No.1 to No.20 can be obtained by the roadmap (Figures 3a and 9), which meets the basic requirement that Phase I amino acids appeared earlier than the Phase II amino acids [1,2]. The species with complete genome sequences are sorted by the order R 10/10 according to their amino acid frequencies, where the order R 10/10 is defined as the ratio of the average amino acid frequencies for the last 10 amino acids to that for the first 10 amino acids [8,36,[63][64][65]. Along the evolutionary direction indicated by the increasing R 10/10 , the amino acid frequencies vary in different monotonous manners for the 20 amino acids, respectively (Figure 9). For the early amino acids Gly, Ala, Asp, Val, Pro, the amino acid frequencies tend to decrease greatly, except for Glu to increase slightly ( Figure 9); for the midterm amino acids Ser, Leu, Thr, Cys, Trp, His, Gln, the amino acid frequencies tend to vary slightly, except for Arg to decrease greatly ( Figure 9); for the late amino acids Ile, Phe, Tyr, Asn, Lys, the amino acid frequencies tend to increase greatly, except for Met to increase slightly (Figure 9). In the recruitment order from No.1 to No.20, the variation trends of the amino acid frequencies increase in general; namely, the later the amino acids recruited, the more greatly the amino acid frequencies tend to increase (Figure 9). The recruitment order of the amino acids from No.1 to No.20 is supported not only by the previous roadmap theory but also by this pattern of amino acid frequencies based on genomic data.   Figure 1a. The 20 amino acid frequencies for each of the 803 species are obtained, respectively, based on the genomic data in NCBI. The 803 amino acid frequencies (green dots) for each of the 20 amino acids are all arranged properly in the R 10/10 order [36], respectively. The variation trend of the amino acid frequencies for each of the 20 amino acids is obtained by the regression line (denoted in red). Generally speaking, the variation trends for the earlier amino acids tend to decrease, and the variation trends for the latecomers to increase.

Recruitment of Codons
The roadmap only provided a logical substitution relationship of the 64 codons based on the stabilities of triplex base pairs (Figure 1a). It was the tRNAs and aaRSs that gave the genetic significance to the 64 codons (Figure 5c). The pair connections and route dualities observed in the recruitment of codons along the roadmap should be explained based on the coevolution of tRNAs with aaRSs (Figures 5b and 7). The standard genetic code table can be comprehended in a biological context. Incidentally, the non-standard codons can also be explained.

Pair Connection
The pair connections can be explained by the coevolution of tRNAs with aaRSs when aaRSn recognise, respectively, both the prime tRNAs tn (in bold in the following pair connections and route dualities) and the corresponding derivative tRNAs tn , tn + and tn − , where the anti-codons of tRNAs change but the para-codons of tRNAs do not change, or when tn have the efficient ability to recognise similar codons by wobble pairings (Figures 5c and 7). Taking #1 − 1Gly − #3 as an example, the 5 y t r t 3 type tRNA t1 and the class r t -M aaRS1 originated at #1 on the roadmap, and the same type tRNA t1 appeared at #3 on the roadmap. The aaRS1 for 1Gly can recognise both the same type tRNAs t1 and t1 via the same para-codon. Namely, tRNAs t1 and t1 recognise, respectively, the codons GGG at #1 and GGA at #3 on the purine stands (R) on the roadmap (Figure 5c).
The following pair connections are due to wobble pairings or the tRNA evolution from tn to tn , both of which can be recognised by the respective same aaRSn (Figures 5c, 6b and 7). #7 − Ala − #9 in Route 2 and #2 − Ala − #8 in Route 1 is due to the fact that aaRS2 for 2Ala recognises both the tRNAs t2, t2 and the different type tRNAs t2 + by same para-codon.
The following route dualities are due to the tRNA evolution from tn to tn + or tn − , all of which can be recognised by the respective same aaRSn (Figures 5c, 6b and 7).
The route dualities between non-standard pair connections are also due to the nonstandard tRNA evolution. The non-standard tRNAs tn * and tn * + with non-standard anti-codons can also be recognised by the respective same aaRSn (Figures 5c and 7). The phenomenon of non-standard genetic code is due to alternative choice of tRNAs by aaRSs as small probability events in the fulfilment of the genetic code. The 4 × 4 codon boxes in the standard genetic code table come from the 8 route dualities and the 8 quasi route dualities (Table 2 and Figure 4a,b), where the pair connections are from Hierarchy 1 to Hierarchy 2, from Hierarchy 2 to Hierarchy 3, and from Hierarchy 3 to Hierarchy 4, only. And the route dualities only exist between Route 0 and Route 1, between Route 2 and Route 3, between Route 0 and Route 3, and between Route 1 and Route 2, but not between Route 0 and Route 2 and Route 1 and Route 3 (Figure 4a,b).

Codon Degeneracy
The degeneracies 6, 4, 3, 2, or 1 for the 20 amino acids can be explained one by one according to pair connections and route dualities on the roadmap based on the coevolution of tRNAs with aaRSs in the triplex picture (Figures 5c, 6b and 7). Especially, the evolution of aaRSs based on the biosynthetic families played significant roles in the expansion of the genetic code. The degeneracy 2 mainly results from pair connections. The degeneracy 4 or 6 mainly result from the expansion of the genetic code from the initial subset by route dualities for Ser, Leu, Ala, Val, Pro, and Thr (Figure 3a,b).
The degeneracy 6 for Ser, Leu, and Arg can be explained by pair connections and route dualities (Figures 1a, 3b, 5c, 6b and 7), where Ser and Leu belong to the initial subset, and Arg was recruited immediately after the initial subset. All of them have appeared in The degeneracy 4 for Gly, Ala, Val, Pro, and Thr can be explained by route dualities (Figures 1a and 3b). All of them belong to the initial subset. The degeneracy 4 for Gly satisfy the route duality: #1 − Gly − #3 ∼ #2 − Gly − #6.

Driving Force in the Prebiotic Sequence Evolution
First, I propose an elegant roadmap for the evolution of the genetic code (Figure 1a). Around the middle of the last century, double helix DNAs, the genetic code, as well as triplex DNAs, were discovered, the former two of which greatly enhanced our understanding of life. There are indeed profound relationships among the above three discoveries. Although triple-helical nucleic acids are rare in vivo, they might be the unsung heroes in the origin of life. According to the substitutions of triplex base pairs from weak to strong along the roadmap, the recruitment of the 64 codons has been described from initiation to expansion and, finally, to the ending, and, hence, the perplexing codon degeneracy has been obtained.
The whole process is complicated and cumbersome, and has been explained step by step in the Methods section. Here is an overview of the basic process. Concretely speaking, the stability of the 16 triplex base pairs in triplex DNAs are from instability (−), weak (+) to strong (++, 3+, 4+) [10,11]. This stability order in experiments is crucial to establish a roadmap for the evolution of the genetic code. Poly C · Poly G * Poly G is a common and easily formed YR * R triplex DNA [10,13], which is bound together by triplex base pair CG * G. The sequences evolved via substitutions between triplex base pairs when the strands of triplex DNAs combined and separated alternatively. Only three kinds of substitutions between triplex base pairs are practically required to obtain a complete set of 64 codons on the roadmap (Figures 1 and 2): (1) substitution of (+) CG * G by (++) CG * A (transition from G to A with increasing stability from + to ++). This is the most common substitution on the roadmap by which all the codons in Route 0 and most codons in Route 1 ∼ 3 were recruited; (2) substitution of (+) CG * G by (4+) CG * C (transversion from G to C with increasing stability from + to 4+), which blazed a new path at #2, #7, #10 for the recruitment of codons in Route 1 ∼ 3, respectively; (3) substitution of (+) GC * C by (++) GC * T (transition from C to T with increasing stability from + to ++) at #6, #19, #12, by which the remaining codons in Route 1 ∼ 3 were recruited.
Hence, a roadmap has been obtained with 4 Routes and 4 Hierarchies (Figures 1a, 3b and 4a). This unique roadmap has narrowly avoided those unstable triplex base pairs that can hinder the sequence evolution of triplex DNAs. The roadmap describes recruitments of both the 64 codons and the 20 amino acids in proper order during coevolution of tRNAs with aaRSs. The initial codon pair GGG · CCC (#1) corresponds the amino acid pair Gly and Pro, and the consequent codon pair GGC · GCC (G to C at #2) corresponds a new amino acid pair Gly and Ala. The obtained pair connection #1 − Gly − #2 indicates that the common Gly is encoded by GGG in the former pair and GGC in the latter pair. Pair connections appear step by step along the roadmap, which relates to the evolution of the corresponding tRNAs. In addition, there are route dualities between pair connections, which relate to the evolution of the corresponding aaRSs. The expansion of codons along the roadmap has been explained by route dualities from the Phase I amino acids [34] Ala, Val, Pro, Ser, Leu, and Thr, which are due to recognition of tRNAs by the corresponding aaRSs step by step. In addition, stop codons and non-standard genetic code often occur at the ending stage. Thus, the intricate codon degeneracy has been obtained based on the incremental stability of triplex base pairs. In the triplex picture for the prebiotic evolution, the base substitution of triplex DNA drives both the recruitment of the 64 codons and the corresponding coevolution of tRNAs and aaRSs, step by step.
The benefit of the triplex picture is that nonrandom sequences can be generated routinely in the prebiotic evolution. The modification of homopolymers became a routine process in forming the codon degeneracy. This non-living apparatus based on sequence evolution of triplex DNAs was able to maintain during geologically long period, by which similar nonrandom sequences can be statistically generated again and again under selective pressure at any appropriate time. Hence, the nonrandom sequences, e.g., tRNAs and aaRSs, were able to emerge more efficiently than any mechanism to choose informative molecules from random sequences. Such an HfC-like apparatus based on sequence evolution of triplex DNAs had vanished after the establishment of the genetic code system, whose relic may have remained in the triplex base pairs in tRNAs at present.

Explanation of Two Classes of aaRSs According to Coevolution of tRNAs with aaRSs
Then, I explain the coevolution of tRNAs with aaRSs ( Figures 5-7), by which the two classes of aaRSs [31] and the anti-codons and para-codons of tRNAs have been explained in detail. A comprehensive study of the evolution of the genetic code inevitably involves the intricate evolutionary relationships between tRNAs and aaRSs. The evolution of triple-helical nucleic acids D · D * D and D · D * R (D for DNA, R for RNA) [10] created conditions for coevolution of tRNAs and aaRSs along the roadmap. The third RNA strand R and its complementary strand can carry codons and anti-codons in sequence evolution along the roadmap, which, hence, accounts for that the tRNAs can be assembled by pairs of these complementary RNAs [66] whose anti-codons evolved along the roadmap (Figures 5a,b and 6a). Meanwhile, genes of aaRSs also evolved along the roadmap, which were homologous to the complementary [67,68] templates of major or minor groove sides of tRNAs. The recognition of a tRNA by certain aaRS came from the combining ability between the aaRS and its gene that is homologous to the corresponding side of the tRNA. Hence, the recognition of tRNAs by aaRSs kept pace with the evolution of the genetic code along the roadmap. The tRNAs were relatively easy to be assembled, so there existed numerous candidate tRNAs. Only tRNAs that were recognised by aaRSs had been recruited into the living system. The genes of aaRSs are scarce, whose enzyme activity came from a common ancestor. The genes of the two classes of aaRSs evolved alternatively in two complementary strands. Palindrome enabled recognition of tRNA via choosing its appropriate side by the corresponding aaRS.
The intricate relationships between tRNAs and aaRSs along the roadmap has been explained, which agrees with both the anti-codons of tRNAs and the two classes of aaRSs in observations ( Figure 5). The evolution of aaRSs along the roadmap in the triplex picture helps to clarify the traditional classifications of aaRSs in the literature. The major/minor groove classification of aaRSs [31] can be accounted for by the complementary strands of the template RNAs of aaRSs, and the A/B/C sub-classification of aaRSs [69] relates to the impact from biosynthetic families of amino acids. In most cases, the aaRSs combine with the cognate tRNAs from the same side, whose classes are fixed. As a special case, aaRS10(ArgRS) combines with the 5 y t r t 3 type tRNAs t10, t10 and the 5 R t Y t 3 type tRNA t10 + from the minor groove side, while combine with the 5 y t r t 3 type tRNAs t10 − and t10 − from the major groove side, where the para-codons need to be palindromic. In the evolution from aaRS1(GlyRS) to aaRS2(AlaRS), for instance, aaRS2(AlaRS) recognised 2Ala from major groove side of t2, whose class follows the former aaRS1(GlyRS) to recognise 1Gly from major groove side of t1. In addition, in the consequent evolution from aaRS2(AlaRS) to aaRS3(GluRS), aaRS3(GluRS) recognised 3Glu yet from minor groove side of t3 due to the palindromic para-codons. The biosynthetic families played significant roles in the evolution of aaRSs when both anti-codon and palindromic or non-palindromic para-codon evolved. When aaRSs involved in the same biosynthetic families, the new aaRSs tended to recruit amino acids in same biosynthetic family with similar chemical properties. Thus, the observed recognition of tRNAs from major or minor groove sides by aaRSs have been explained for respective amino acids in detail (Figure 7). The aaRS pair supposed to combine both sides of tRNA simultaneously [4,62] should be amended as new aaRS pair that combined one side of a tRNA and evolved to the other side. The pairs IleRS-ThrRS and TyrRS-PheRS appear both in the above literature and here. However, the pair GluRS-ThrRS in the above literature should be changed to GlnRS-ThrRS. In addition, the pair SerRS-CysRS appeared here was missing in the above literature.

Explanation of the Codon Degeneracy on the Genetic Code Chart
As the main result, the codon degeneracy should be explained based on the roadmap for the evolution of the genetic code ( Figure 1) and the coevolution of tRNAs with aaRSs ( Figures 5 and 7). The intricate codon degeneracies are just the relics of learning process for the recognition of tRNAs by aaRSs. The pair connections and route dualities on the roadmap result from the evolution of tRNAs and the recognition of tRNAs by aaRSs ( Figure 5). Especially, homologous aaRSs often evolved within the biosynthetic families of amino acids by combining either the same side or the opposite side of tRNAs (Figure 7). The 4 × 4 codon boxes in the standard genetic code table came from the 8 route dualities and the 8 quasi route dualities (Figure 1).