1. Introduction
Because of high sequence conservation, tRNAs in living organisms are a molecular fossil of the pre-life to life transition on Earth [
1,
2,
3,
4]. Tracking the evolution of tRNAomes (all the tRNAs of an organism) provides insight into the diversification of life from LUCA (the last universal common (cellular) ancestor). A clear story of the great divergence of Archaea and Bacteria can be inferred from the analysis of tRNAomes in living organisms. The reason that tRNAomes can be analyzed in such detail is that tRNA evolved during pre-life by an orderly process from RNA repeats and inverted repeats of a known sequence. The pre-life tRNA sequence remains apparent, and the diversification of tRNAome sequences can be tracked from the origin of the first tRNAs. Embedded in the tRNA sequence is a history of two overlapping pre-life worlds: (1) the polymer world; and (2) minihelix world. Since LUCA, organisms have lived in the tRNA world for ~4 billion years. Remarkably, the history of the pre-life to life transition was recorded in tRNA sequences and conserved until the present day. A view centered on tRNA and the tRNA anticodon is key to understand genetic coding [
5,
6,
7].
The purpose of this review is to integrate an understanding of type II tRNA variable (V) loops into an appreciation of tRNA evolution, tRNAome diversification, aminoacyl-tRNA synthetase (AARS) divergence, and genetic code evolution. Type I and type II tRNAs were generated from defined sequences in pre-life (
Figure 1). The 93 nt tRNA precursor was generated from the ligation of three 31 nt minihelices with two distinct 17 nt cores [
1]. A D loop 31 nt minihelix was ligated to two anticodon/T stem–loop–stem minihelices. A D loop minihelix also forms a stem–loop–stem, but one that is not preserved in the tRNA fold. The 31 nt minihelices were part of a more ancient translation system that attached 3′-ACCA-Gly to synthesize polyglycine, a component of protocells. During pre-life, RNAs were ligated (i.e., by a ribozyme ligase), replicated, and processed to generate, for instance, 31 nt minihelices or, sometimes, more complex molecules such as type I and type II tRNAs. As shown in
Figure 1, type II tRNAs were generated from a 93 nt tRNA precursor molecule by a single internal 9 nt deletion, within ligated 3′- and 5′-acceptor stems (93 nt neglects 3′-ACCA; a primitive adapter molecule). Type I tRNAs were generated by two closely related internal 9 nt deletions. The more 5′- 9 nt deletion was identical for type I and type II tRNAs. The more 3′- 9 nt deletion, unique to type I tRNAs, was within the V loop region. Please note that the 5′- and 3′- 9 nt deletions are identical on complementary strands. Longer type II V loops, therefore, are not an expansion from a shorter type I V loop. Rather, in the evolution of the first tRNAs, type I V loops resulted from processing of a type II V loop. The processing event within the V loop region to convert a type II tRNA into a type I tRNA was a single 9 nt internal deletion within ligated 3′- and 5′-acceptor stems.
The three 31 nt minihelix tRNA evolution theorem fully describes the origin of type I and type II tRNAs [
1,
2,
3,
4,
8,
9]. Because type I and type II tRNAs first evolved ~4 billion years ago, this is a remarkable observation. The pre-life sequences of tRNAs and tRNA precursor molecules, however, are known with essential certainty because these sequences are ordered and conserved in living organisms. Type I and type II tRNAs evolved from RNA repeats and inverted repeats, allowing the pre-life sequence to be determined. As shown in
Figure 1, the evolution of tRNA was from a 93 nt precursor molecule formed by the ligation of three 31 nt minihelices. The D loop 31 nt minihelix had the sequence GCGGCGG_UAGCCUAGCCUAGCCUA_CCGCCGC. The anticodon and T loop 31 nt minihelices had the sequence GCGGCGG_CCGGG_CU/???AA_CCCGG_CCGCCGC (_ delineates acceptor stems and stem–loop–stems; / indicates a U-turn; ? indicates sequence ambiguity).
Type II variable (V) loops of tRNAs have been misrepresented, and type I and type II tRNA V loops have been improperly aligned in tRNA databases [
10,
11,
12,
13]. Here, we describe (1) the origin of type II V loops; (2) their proper alignment to type I V loops; and (3) their interactions with cognate AARS enzymes. We posit that type II V arms are important for allosteric communication with aminoacylating and editing active sites of cognate AARS.
As noted, the type I V loop was processed from the type II V loop by an internal 9 nt deletion (
Figure 1). The initial type I V loop sequence was 5 nt in length (CCGCC; a fragment of a 3′-acceptor stem). The initial type II V loop sequence was 14 nt in length (CCGCCGC_GCGGCGG). The type II V loop arose from a 3′-acceptor stem (CCGCCGC) ligated to a 5′-acceptor stem (GCGGCGG). In type II tRNAs, the type II V loop evolved to a stem–loop–stem. In a domain (i.e., Archaea or Bacteria), type II V arms for a set of synonymous tRNAs have distinct trajectories (up to three) from the tRNA, allowing for distinct recognition (i.e., by a cognate AARS). The requirement for maintaining distinct trajectories of type II V loops limits the number of synonymous type II V loop tRNA sets in an organism or domain (two in Archaea; three in Bacteria).
For translation functions, Archaea appear older than Bacteria and closer to LUCA. We posit that Bacteria diverged from Archaea very early after LUCA [
14]. After a significant time of separation, Bacteria assumed their new and distinct identity. Archaeal tRNAomes are more highly ordered and simpler than bacterial tRNAomes [
1,
2,
3,
4,
15]. The archaeal genetic code, furthermore, is simpler and more ordered than the bacterial code [
16]. The analysis of type II V loops, type II tRNAs, and cognate AARS provides further insight into the divergence of Archaea and Bacteria.
The type II V arm interacts with AARS enzymes as a determinant for cognate AARS and as an anti-determinant for non-cognate AARS [
5,
6,
7,
16]. Leucine and serine may have entered the evolving genetic code at about the same time, and both leucine and serine utilize type II tRNAs. Leucine, serine, and arginine occupy six codon sectors within the code. LeuRS-IA, SerRS-IIA, and AlaRS-IID lack anticodon loop recognition. During code evolution, serine jumped between column 2 and column 4, and serine is the only amino acid that occupies separate columns of the code. It is likely that serine jumping between columns required tRNA
Ser to be a type II tRNA and, also, jumping required SerRS-IIA lacking anticodon loop recognition. Because serine can be converted to cysteine by tRNA-linked chemistry [
17,
18,
19], we posit that serine jumping during code evolution may have related to initial cysteine incorporation into the code.
Knowledge remains incomplete about allostery in cognate AARS charging of tRNAs [
20,
21]. It appears to us that, at least in some cases, the recognition of tRNA determinants by AARS may generate allosteric communication largely via the tRNA (i.e., acting similarly to a coiled spring) to the tRNA 3′-end for cognate tRNA aminoacylation. For many type I tRNAs and their AARS, allostery is largely initiated through accurate anticodon recognition to position the tRNA 3′-end for aminoacylation. Because LeuRS-IA and SerRS-IIA lack anticodon recognition and because tRNA
Leu and tRNA
Ser have protruding type II V arms, type II V arms assumed an enhanced role to tune allosteric communication. Because LeuRS-IA has separate aminoacylating and proofreading/editing active sites, allostery, mediated through different contacts of the V arm, helps direct the tRNA
Leu 3′-end to switch between aminoacylating and editing active sites. In Bacteria but not Archaea, tRNA
Tyr is a type II tRNA, and tRNA
Tyr type II V arm contacts help direct accurate tRNA
Tyr charging in Bacteria.
3. Different Trajectories of the Type II tRNA V Arms
A gallery of type II tRNAs is shown in
Figure 3, emphasizing the contacts and trajectories of the V arms. The type II tRNA
Pri V arm is self-complementary along its entire length and can form many different or tangled G=C pairings. Type II V arms, therefore, evolved to form stem–loop–stems that could be discriminated by a cognate AARS. We find that the trajectory of the V arm depends on the number of unpaired bases just 5′ to the Levitt base pair (2, 1, or 0; sometimes −1). The Levitt base pair is a reverse Watson–Crick pair between tRNA base 15 and V
n (for a V loop of n bases (numbered V
1–V
n)). A reverse Watson–Crick base pair is a standard Watson–Crick pair, as in DNA, with one of the bases flipped over and slightly shifted.
Figure 3A shows archaeal Pyrococcus horikoshii tRNA
Leu (CAA) in the aminoacylating “hairpin” conformation [
24,
28]. Structures of tRNAs are from co-crystal structures with cognate AARS. “Hairpin” relates to the bent-down 3′-end of the tRNA
Leu into the LeuRS-IA aminoacylating active site. V loops are numbered V
1-V
n for a V loop of n bases. For historical reasons, standard numbering of tRNAs breaks down in the D loop and V loop, explaining why we use the V
1 to V
n numbering here. P. horikoshii (an ancient Archaeon) tRNA
Leu (CAA) has a V loop of 14 nt, which is the primordial (pre-life) length [
1,
2,
3]. UV
1 interacts with tRNA-G26 (UV
1~G26). V
2-CCC-V
4 is the V arm 5′-stem. V
5-GUAG-V
8 is the V arm end loop. V arm end loop bases V
6-UAG-V
8 are highly conserved in Archaea and bind to archaeal LeuRS-IA, as indicated (see below). V
9-GGG-V
11 forms the V arm 3′-stem. V
12-UU-V
13 are two unpaired bases, just 5′ of the Levitt base (CV
14). The Levitt base (CV
14) forms a reverse Watson–Crick base pair with tRNA-G15 (CV
14 = G15) (the Levitt base pair). The number of unpaired bases just 5′ of the Levitt base V
n determines the trajectory of the V arm from the tRNA body.
Figure 3B shows bacterial Thermus thermophilus tRNA
Tyr (GU*A) (U* for pseudouridine) from a co-crystal with TyrRS-IC [
29]. The V loop is 14 nt, which is the primordial length. UV
1 interacts with tRNA-G26 (UV
1~G26). The V arm 5′-stem is V
2-GGC-V
4. The V arm end loop is V
5-GUAU-V
8. V arm end loop bases V
5-GU-V
6, along with V
5-UU-V
6, are highly conserved with tRNA
Tyr V loops of 14 nt in Bacteria, and bind to TyrRS-IC, as indicated (see below). Type II tRNA
Tyr V loops of less than 14 nt lose the conserved V
5 and V
6 bases and, presumably, also V arm end loop contacts by TyrRS-IC (see below). V
9-GCC-V
11 form the V arm 3′-stem. V
12-UU-V
13 are unpaired bases just 5′ of the Levitt base (CV
14) that forms the Levitt base pair with tRNA-G15 (CV
14 = G15).
Figure 3C shows bacterial Escherichia coli tRNA
Leu (UAA) from a co-crystal with LeuRS-IA in the editing/proofreading conformation (the 3′-end of tRNA
Leu is in the LeuRS-IA editing/proofreading active site rather than the aminoacylating active site) [
30]. The tRNA
Leu (UAA) V loop is 15 nt in E. coli. CV
1 interacts with tRNA-A26 (CV
1~A26). V
2-GGCG-V
5 is the V arm 5′-stem. V
6-UUCG-V
9 is the V arm end loop. In the editing conformation, UV
6 and GV
9 interact, and the V arm end loop is ordered. V
10-CGCU-V
13 is the V arm 3′-stem. GV
14 is the single unpaired base that determines the V arm trajectory, just 5′ of UV
15 that forms the Levitt reverse Watson–Crick base pair with tRNA-A15 (UV
15 = A15).
Figure 3D shows the same Escherichia coli tRNA
Leu (UAA) from a LeuRS-IA co-crystal but in the aminoacylating hairpin conformation [
31]. The major differences from the editing/proofreading conformation image in
Figure 3C are as follows: (1) the conformation of the V arm 3′-stem (green) is altered; (2) the V arm end loop is disordered (
Figure 3D); (3) the orientation of end loop base GV
9 is changed; and (4) UV
6 has lost contact to GV
9 and is disordered (
Figure 3D). We posit that allosteric interactions initiated by appropriate (aminoacylating) or inappropriate (editing/proofreading) contacts at the tRNA
Leu (UAA) 3′-end are communicated to and amplified by contacts at the V arm (see below).
Figure 3E shows bacterial Thermus thermophilus tRNA
Ser (GGA) from a co-crystal with SerRS-IIA [
32]. The tRNA
Ser (GGA) is significantly disordered, and the structure is not in an aminoacylating conformation (the tRNA
Ser 72-CGCCA 3′-end is disordered). The V loop is 22 nt. UV
1 can probably interact with tRNA-G26 (UV
1~G26) (tRNA-G26 is mostly disordered in the structure). V
2-AGGGGGG-V
8 is the V arm 5′-stem. The V arm end loop is V
9-CUUAAA-V
14 (mostly disordered). The V arm 3′-stem is V
15-CCUCCCU-V
21. CV
22 is the Levitt base that pairs with tRNA-G15 (CV
22 = G15). No unpaired bases are present just 5′ of the Levitt reverse Watson–Crick base pair (trajectory set point of 0). In Bacteria, tRNA
Ser V arm stems are generally longer than archaeal tRNA
Ser V arm stems. We posit that the lengthening of the stem in Bacteria may reflect a need to stabilize the V
2 = V
(n−1) pair. In Archaea, by contrast, the V
2 = V
(n−2) pair forms (in Archaea, the set point trajectory of the tRNA
Ser V arm is 1, in contrast to 0 in Bacteria).
Figure 3F shows a somewhat unusual case that is included here mostly for completeness. Human tRNA
Sec (UCA) is shown from a co-crystal with SerRS-IIA (Sec for selenocysteine) [
33]. SerRS-IIA attaches serine to tRNA
Sec (Ser-tRNA
Sec), which is then converted to Sec-tRNA
Sec by other enzymes utilizing tRNA-linked chemistry. The V loop is 17 nt. In this case, AV
1 probably interacts with tRNA-U26 (AV
1 = U26). V
2-GCUGUC-V
7 is the V arm 5′-stem. V
8-UAGC-V
11 is the V arm end loop. V
12-GACAGA-V
17 is the V arm 3′-stem. The GV
2~AV
17 interaction is notable. Because AV
17 interacts with GV
2, the Levitt base pair to tRNA-G15 cannot form.
We conclude that V arm trajectories are different with two, one, or zero unpaired bases just 5′ of the Levitt base. In the view shown in
Figure 3,
Figure 3A,B (trajectory score of 2; two unpaired bases just 5′ of the Levitt base) have the type II V arm extending almost straight back.
Figure 3C,D (score of 1) have the V arm angling more to the right, and
Figure 3E,F (scores of 0 and −1) have the V arm pointing to the right. At the time of writing, some desired images were not available. For instance, an archaeal SerRS-IIA-tRNA
Ser structure would be informative (V loop trajectory score 1 in Archaea versus 0 in Bacteria).
4. LeuRS-IA-tRNALeu Co-Crystal Structures
To better understand how the type II V arm is recognized, we have analyzed a number of cognate AARS-type II tRNA co-crystal structures.
Figure 4 shows a co-crystal of archaeal Pyrococcus horikoshii LeuRS-IA complexed with tRNA
Leu (CAA) [
24,
28]. P. horikoshii is an ancient Archaeon with a translation system very similar to the one that must have functioned at LUCA. The tRNA
Leu (CAA) is in the “hairpin” conformation with the tRNA 3′-end bent down into the aminoacylating active site. We could not identify an archaeal co-crystal structure with LeuRS-IA-tRNA
Leu in an editing/proofreading conformation for comparison.
Figure 4A shows the archaeal LeuRS-IA-tRNA
Leu (CAA) structure in the aminoacylating conformation. LeuRS-IA has separate aminoacylating and editing/proofreading active sites. The tRNA
Leu (CAA) 3′-end is in the hairpin conformation for aminoacylation. Because LeuRS-IA is a class I AARS, the aminoacylating active site is also identified by parallel β-sheets.
Figure 4B shows a C-terminal fragment of LeuRS-IA interacting with the tRNA
Leu V arm. In Archaea, the highly conserved V arm end loop sequence V
6-UAG-V
8 interacts with the C-terminal LeuRS-IA β-hairpin. In Pyrococcus horikoshii but not in Bacteria, the β-hairpin has an ~93-amino-acid insert that interacts with the tRNA
Leu elbow, where the D loop and T loop interact (i.e., where tRNA-G19 pairs with tRNA-C56 (a bent Watson–Crick pair)). The length of the insert in the archaeal C-terminal LeuRS-IA β-hairpin depends on how sequences are aligned. The V loop is 14 nt, which is the primordial (pre-life) length [
1,
2,
3]. More detail about the tRNA
Leu (CAA) V loop is shown in
Figure 3A. The comparison of an archaeal editing/proofreading LeuRS-IA-tRNA
Leu structure would enrich this discussion.
Figure 5 shows a bacterial Escherichia coli LeuRS-IA-tRNA
Leu (UAA) co-crystal structure in the aminoacylating hairpin conformation [
30].
Figure 5A shows the intact structure.
Figure 5B shows a LeuRS-IA C-terminal domain detail, highlighting interactions with the type II V loop. In contrast to archaeal LeuRS-IA (
Figure 4), bacterial LeuRS-IA does not make contact to the V arm end loop (
Figure 5). Furthermore, the C-terminal domain of bacterial LeuRS-IA, which lacks the ~93-amino-acid insert of archaeal LeuRS-IA, is significantly rearranged and redirected compared to the archaeal C-terminal domain. Notably, the bacterial C-terminal β-hairpin interacts with the tRNA
Leu elbow (i.e., the tRNA-G19 bent Watson–Crick pair with tRNA-C56). K809, R811, and R837 interact with the V arm 3′-stem (green). The V arm end loop is disordered, and V arm end loop base GV
9 is flipped out of the end loop. See also
Figure 3D. We posit that K809, R811, and K837 interaction with the V arm 3′-stem (green) generates allosteric communication with the tRNA
Leu 3′-end, particularly in the aminoacylating conformation, and that tighter attachment to the V arm 3′-stem in the aminoacylating conformation disrupts the structure of the V arm end loop.
Figure 6 shows bacterial Escherichia coli LeuRS-IA-tRNA
Leu (UAA) in the editing/proofreading conformation [
30]. The tRNA
Leu 3′-end, which was chemically modified, locates to the editing active site.
Figure 6A shows the entire structure.
Figure 6B highlights C-terminal LeuRS-IA-tRNA
Leu V arm contacts, which are significantly altered from the aminoacylating conformation (
Figure 5). The C-terminal β-hairpin maintains its contact to the tRNA-G19 = tRNA-C56 pair at the tRNA
Leu elbow. K809 and R811 move away from the tRNA
Leu V arm 3′-stem. K837 that is visualized in the aminoacylating conformation (
Figure 5) is unstructured in the editing/proofreading conformation (
Figure 6). Because of weakened contacts to the V arm 3′-stem in the editing conformation, the tRNA
Leu V arm end loop is structured. GV
9 stacks on CV
8 and interacts with UV
6 (see
Figure 3C; compare to
Figure 5). We posit that distal tRNA
Leu V arm and elbow determinant contacts act as allosteric effectors to influence accurate tRNA charging and editing. Allosteric communication is transmitted back and forth from the tRNA
Leu 3′-end and the type II V arm, such that contacts to the V arm amplify allostery initiated by tRNA
Leu 3′-end contacts. It is much more difficult to imagine strong allosteric communication being transmitted through the LeuRS-IA protein structure. For instance, the protein connection of the C-terminal LeuRS-IA domain to the aminoacylating active site is very flexible (
Figure 6A).
Figure 7 shows an overlay of Escherichia coli tRNA
Leu (UAA) in the aminoacylating and editing/proofreading conformations to indicate possible allosteric communication [
21,
30]. Structures were aligned based on LeuRS-IA protein. Because leucine occupies a 6-codon box in the genetic code, LeuRS-IA makes no contacts with the tRNA
Leu anticodon loop. Instead, allosteric communication is established between the tRNA
Leu V arm and the 3′-end, reinforcing the appropriate 3′-end placement into the aminoacylating or the editing/proofreading active site.
5. SerRS-IIA-tRNASer and -tRNASec Co-Crystal Structures
Figure 8 shows a bacterial Thermus thermophilus SerRS-IIA-tRNA
Ser co-crystal structure [
32,
34]. As do most or all class II AARSs, SerRS-IIA functions as an α
2-dimer. Serine is in a 6-codon box in the genetic code. Consistent with a 6-codon box, SerRS-IIA lacks tRNA
Ser anticodon recognition.
Figure 8A is the entire structure (1SER) supplemented with a light blue N-terminal helix hairpin and an SSA non-reactive reaction intermediate analog from 1SET.
Figure 8B highlights SerRS-IIA-tRNA
Ser type II V arm contacts. An N-terminal helix hairpin forms a brace that contacts the tRNA elbow (tRNA-G19 = tRNA-C56) (magenta and red), the V arm 3′-stem (yellow), and the V arm 5′-stem (green). Perhaps because only a single tRNA
Ser is bound, the N-terminal helix hairpin from the other α subunit was not visualized in the 1SER structure. The tRNA
Ser 3′-end 72-CGCCA is disordered, so the structure does not fully reflect an aminoacylating conformation. Perhaps the reaction must proceed to the serine–AMP stage to more stably attract the tRNA
Ser 3′-end. The tRNA
Ser structure shows a significant amount of disorder, possibly indicating allosteric communication linking the V arm and 3′-end contacts. The mode of SerRS-IIA-tRNA
Ser binding may have facilitated serine jumping from column 2 to column 4 of the genetic code. Serine is the only amino acid to have split between two genetic code columns. The binding of only a single tRNA
Ser by SerRS-IIA appears to indicate negative cooperativity in tRNA
Ser binding.
Figure 9 shows a human SerRS-IIA-tRNA
Sec (UCA) co-crystal structure [
33]. Because not all organisms encode selenocysteine (Sec), this is somewhat of an unusual case. Anticodon UCA represents stop codon UGA. The structure is important for the current discussion partly because it helps clarify some aspects of the structure shown in
Figure 8. In
Figure 9A, two tRNA
Sec (UCA) are bound, so both N-terminal helix hairpins are observed, possibly rendering the structure more intuitive to interpret (compare to
Figure 8). The aminoacylating active site is identified by (1) a surface of antiparallel β-sheets; (2) ANP binding (ANP is a non-reactive ATP analog); and (3) serine binding. Because the tRNA
Sec 3′-end (72-CGCCA) is disordered, the structure does not fully reflect an aminoacylating conformation. The tRNA
Sec (UCA) has a somewhat unusual V loop structure with a broken Levitt pair and a GV
2 = AV
17 interaction (
Figure 3F). The altered tRNA
Sec (UCA) type II V arm trajectory (score of −1 (tRNA
Sec) versus score of 0 (tRNA
Ser)) may aid in specifying subsequent tRNA-linked chemistry to convert Ser-tRNA
Sec to Sec-tRNA
Sec and to not improperly convert Ser-tRNA
Ser to Sec-tRNA
Ser. It appears that negative cooperativity observed in SerRS-IIA-tRNA
Ser binding (
Figure 8) is relieved in SerRS-IIA-tRNA
Sec binding (
Figure 9).
6. ArgRS-IA-tRNAArg
This section is included for completeness and to tie the analysis of type II tRNAs into the formation of 6-codon genetic code sectors. Leucine, serine, and arginine probably entered the genetic code at about the same time and occupied 6-codon sectors [
5,
6,
7]. In contrast to LeuRS-IA and SerRS-IIA, ArgRS-IA utilizes a type I tRNA and, also, anticodon recognition for tRNA
Arg (
Figure 10). Because ArgRS-IA utilizes five tRNA
Arg anticodons from two genetic code rows (2 and 3), there is potential ambiguity in reading the anticodon sequence. The structure shown is a yeast ArgRS-IA-tRNA
Arg (ICG) (I for inosine). Inosine is formed by the deamination of adenine. In Archaea, wobble tRNA-34A is not utilized. Notably, when A is modified to I in Bacteria and Eukarya, the corresponding G anticodon (i.e., GCG) is not utilized. Because tRNA-34I reads mRNA wobble A, C, and U, wobble inosine can only be utilized in a 3- or 4-codon sector [
35,
36]. The structure in
Figure 10 has tRNA
Arg (ICG) in the hairpin conformation with the tRNA
Arg 73-GCCA bending down into the aminoacylating active site, which is also identified by parallel β-sheets and arginine binding. The anticodon loop has the sequence 34-ICGAA-38 [
37]. Because of ambiguities in tRNA
Arg anticodon reading, the anticodon loop is substantially unwound, exposing tRNA-38A to interact with ArgRS-IA. Because of anticodon ambiguity, the strongest sequence-specific contacts are expected for tRNA-35C and tRNA-38A. Substantial unwinding of the anticodon loop is expected to generate torque on the tRNA
Arg 3′-end to support the aminoacylating conformation.
Numbering in the tRNA D loop is confusing, because, for historical reasons, numbering was based on eukaryotic tRNAs with three deleted D loop nts. The D loop evolved from a 17-nucleotide UAGCC repeat (i.e., D
1-UAGCCUAGCCUAGCCUA-D
17) (
Figure 1) [
1,
2]. According to improved numbering, 15-AU*--GGU*-20 would be numbered D
8-AU*--GGU*-D
14 (with two deleted nts (D
10 and D
11)). The tRNA
Arg (ICG) elbow (G19 = C56) and modified bases in the D loop make contact to ArgRS-IA, as shown.
7. TyrRS-IC-tRNATyr (GUA) in Archaea and Bacteria
Interestingly, tRNA
Tyr is a type I tRNA in Archaea and a type II tRNA with a longer V arm in Bacteria [
38]. An archaeal TyrRS-IC-tRNA
Tyr co-crystal is shown in
Figure 11. In contrast to most other class I AARSs, which are monomers, class IC AARSs are obligate α
2-dimers with anticodon-binding domains and aminoacylating domains in opposite subunits. Via TyrRS-IC protein, the anticodon-interaction domain is only loosely connected to the aminoacylating domain, indicating that allosteric contacts at the anticodon-binding region may be communicated to the aminoacylating active site mostly via the tRNA. Because the tRNA
Tyr (GUA) 3′-end is disordered (only A73 of 73-ACCA is ordered), the structure is not in a fully aminoacylating conformation. The aminoacylating active site is indicated by binding of tyrosine and parallel β-sheets.
In Bacteria, tRNA
Tyr is a type II tRNA with a longer V loop [
29]. In the ancient Bacterium Thermus thermophilus, tRNA
Tyr has a type II V loop of 14 nt, the primordial length. A co-crystal structure of T. thermophilus TyrRS-IC bound to tRNA
Tyr is shown in
Figure 12.
Figure 12A shows the entire structure.
Figure 12B shows more detailed contacts by a C-terminal TyrRS-IC fragment to the tRNA
Tyr V arm 5′-stem and end loop. The aminoacylating active site is identified by (1) parallel β-sheets; (2) ATP binding; and (3) TYE (a non-reactive tyrosine analog) binding. Because tRNA
Tyr 74-CCA is unstructured, the image does not fully represent an aminoacylating conformation. The type II V arm is contacted by TyrRS-IC R388 and R389 on its 5′-stem (yellow). V arm end loop bases V
5-GU-V
6 bind TyrRS-IC. The TyrRS-IC C-terminal domain is loosely tethered to the anticodon-binding domain, which is loosely tethered to the aminoacylating domain. From the image, it appears that allosteric effects from tRNA
Tyr anticodon and V arm 5′-stem contacts may be mostly communicated to the tRNA
Tyr 3′-end via tRNA
Tyr more than through the TyrRS-IC protein. The protein linkage of the C-terminal TyrRS-IC domain with the aminoacylating active site is very flexible, and the most relevant communication would be with the aminoacylating active site in the opposite α subunit.
Comparing the archaeal and bacterial TyrRS-IC-tRNATyr (GUA) structures, archaeal TyrRS-IC lacks the C-terminal domain that binds the bacterial tRNATyr type II V arm. Because the archaeal TyrRS-IC enzyme recognizes a type I tRNATyr, absence of the type II V arm-binding domain in archaeal TyrRS-IC is as expected. Either the bacterial TyrRS-IC C-terminal domain was a bacterial addition or an archaeal deletion. We posit, however, that the TyrRS-IC C-terminal domain in Bacteria may be distantly homologous to the LeuRS-IA C-terminus because of the possible similarities of the C-terminal β-hairpins. If this idea is correct, archaeal and bacterial TyrRS-IC may have diverged very early in evolution (i.e., at LUCA), and archaeal TyrRS-IC may have subsequently deleted the C-terminal domain. It appears to us that divergence of archaeal and bacterial TyrRS-IC may have occurred very early in evolution at the time that type I and type II tRNAs were first sorted. Because Thermus thermophilus tRNATyr has a 14 nt V loop, which is the primordial length, this observation is also consistent with early sorting and divergence of type I and type II tRNAs.
8. Type II V Loops in an Ancient Bacterium
Figure 13 compares type II V loops for Thermus thermophilus [
13]. In Archaea, tRNA
Leu and tRNA
Ser are type II tRNAs with V arm trajectory set points of 2 and 1, respectively. In Bacteria, tRNA
Tyr, tRNA
Leu, and tRNA
Ser are type II tRNAs with V arm trajectory set points of 2, 1, and 0, respectively. We consider T. thermophilus to be a reasonable reference organism for the earliest recognizable divergence of Archaea and Bacteria.
Figure 13A represents tRNA
Tyr (trajectory set point 2).
Figure 13B represents a typical tRNA for tRNA
Leu (set point 1).
Figure 13C represents a typical tRNA for tRNA
Ser (set point 0). Typical tRNAs represent a consensus tRNA with relaxed scoring [
13]. The program used to generate the cloverleaf diagrams does not handle V loop sequences appropriately, so the relevant V loop sequences are shown individually. For all type II tRNAs in T. thermophilus, UV
1 forms a wobble pair with tRNA-G26, as in Archaea. Only TyrRS-IC interacts with V arm end loop bases (V
5-GU-V
6 in Tth). TyrRS-IC contacts are also made to the V arm 5′-stem (
Figure 12). LeuRS-IA interacts with the V arm 3′-stem, making stronger contacts to the stem in the aminoacylating conformation compared to the editing/proofreading conformation (compare
Figure 5 and
Figure 6). SerRS-IIA interacts with both the V arm 5′- and 3′-stems. In Bacteria, the tRNA
Ser V arm set point of 0 may correlate with the longer lengths of the tRNA
Ser V arm stems compared to Archaea. Longer tRNA
Ser stems in Bacteria may be necessary to maintain V
2-V
(n−1) pairing. Also, the longer tRNA
Ser V arm stems may otherwise help accurately discriminate three type II tRNAs in Bacteria compared to two type II tRNAs in Archaea.
Because the tRNA
Tyr V loop is 14 nt, which is the primordial length, type II tRNA
Tyr in Bacteria may be as ancient as a time when most type II V loops were 14 nt in length. It appears to us that bacterial type II tRNA
Tyr may have evolved from an ancient type II tRNA
Ser before the expansion of the tRNA
Ser V arm [
4]. In T. thermophilus, tRNA
Tyr and tRNA
Ser are similar tRNAs. The deleted bases in the D loop are the same, the Levitt base pair is the same, and the T loops are the same. By contrast, tRNA
Leu has a different deleted base in the D loop, a different Levitt base pair, and a different T loop base (tRNA-57A versus tRNA-57G). We posit that, in Bacteria, type II tRNA
Tyr may have evolved from an ancient tRNA
Ser with a 14 nt V loop.
11. Divergence of Archaea and Bacteria
The evolution of type II tRNA V arms appears to relay a simple story about divergence of the archaeal and bacterial domains. We support the following model. From LUCA, Archaea and Bacteria diverged. For translation functions, Archaea are most similar to LUCA, and Bacteria are more distinct. Bacteria appear to have assumed their separate identity after significant isolation from Archaea. For instance, no intermediate organisms separating Archaea and Bacteria have been identified. Bacteria partly diverged because of their different transcription system. Bacteria rely on the coevolution of sigma factors, bacterial promoters, and a streamlined RNA polymerase [
14,
47,
48]. For translation functions, Bacteria appear more diverged from LUCA than Archaea. Bacterial divergence is evident by the inspection of tRNAomes and the genetic code. In many ways, Bacteria appear to be a more successful and innovated prokaryote compared to Archaea.
Table 1 summarizes type II V arm trajectory scores and lengths in Pyrococcus furiosus (an ancient Archaeon) and Thermus thermophilus (an ancient Bacterium) [
13]. In P. furiosus, tRNA
Tyr (1 tRNA
Tyr) is a type I tRNA (5 nt V loop). In P. furiosus, tRNA
Leu is a type II tRNA with a trajectory score of 2 and a length of 14 nt (5 tRNA
Leu), the primordial length. Also, tRNA
Ser has a trajectory score of 1 and a length of 15 nt (4 tRNA
Ser). In T. thermophilus, tRNA
Tyr has a score of 2 and a length of 14 nt, the primordial length (1 tRNA
Tyr). Also, tRNA
Leu has a score of 1 and lengths of 13–17 nt (5 tRNA
Leu), and tRNA
Ser has a score of 0 and lengths of 19–22 nt (4 tRNA
Ser). It appears to us that, within a domain, collections of synonymous tRNAs with type II V loops must have different trajectory set point scores and, thus, distinct trajectories of V arms from the tRNA body (
Figure 3). Archaeal type II V arms for tRNA
Leu are generally shorter compared to bacterial V arms. In Archaea, tRNA
Tyr was sorted to become a type I tRNA (closely related to tRNA
Asn) [
4]. Because Archaea only utilize type II tRNAs encoding leucine and serine, there was little pressure to lengthen V arms, and tRNA
Leu and tRNA
Ser generally maintained shorter V arms in Archaea than in Bacteria. Because type II tRNA
Tyr (score of 2; V
n of 14 nt) was adopted in Bacteria, tRNA
Leu was downgraded to a trajectory score of 1 (i.e., V
n of 13–17 nt in T. thermophilus), and tRNA
Ser was downgraded to a score of 0 (V
n of 19–22 nt in T. thermophilus), relative to Archaea. In Bacteria, we posit that longer tRNA
Ser V arm stems may have evolved to stabilize the V
2 = V
(n−1) pairing. Because of the different V arm set point in Archaea, V
2 = V
(n−2) pairing is utilized. In order to sort three type II tRNA amino acids in Bacteria, tRNA
Tyr is generally 14 nt or shorter. The primordial length is 14 nt. Interestingly, the tRNA
Tyr V loop is 13 nt in Escherichia coli and the V arm end loop sequence of V
5-GU-V
6 that binds TyrRS-IC in T. thermophilus (or V
5-UU-V
6 in some Bacteria) is not present. We posit that, in E. coli, contact to the V arm end loop is not utilized, and only contacts to the V arm 5′-stem are maintained for allosteric contacts. No structure is available currently to test this notion. AlphaFold 3 modeling could perhaps be used to address this issue [
39].
Above, we have argued that the β-hairpin at the C-terminus of bacterial TyrRS-IC may relate distantly to the β-hairpins at the C-termini of LeuRS-IA (
Figure 12). We agree that the sequence match is insufficient to fully demonstrate this idea. If this idea is correct, however, we would argue that the evolution of type II tRNA
Tyr and TyrRS-IC in Bacteria may have been as ancient an occurrence as the initial divergence of Archaea and Bacteria, perhaps as ancient as when all or most type II V arms were 14 nt in length. In such a scenario, Archaea could have adopted a type I tRNA
Tyr and deleted the TyrRS-IC C-terminal domain that was necessary only to recognize a type II tRNA
Tyr. Bacteria would have adopted or maintained a type II tRNA
Tyr and maintained a TyrRS-IC with a C-terminal domain capable of recognizing the type II V arm.
Table 2 summarizes type II V arm and elbow tRNA distal determinants for LeuRS-IA, SerRS-IIA, and TyrRS-IC. Missing data for a full comparison are identified. For LeuRS-IA, Archaea and Bacteria have evolved homologous C-terminal domains that are massively modified and rearranged to make very different tRNA V arm and elbow contacts. In Bacteria, TyrRS-IC has a C-terminal domain that interacts with the V arm end loop and V arm 5′-stem in Thermus thermophilus and probably only the V arm 5′-stem in Escherichia coli (no structure is available). SerRS-IIA utilizes an N-terminal helix hairpin to bind the tRNA
Ser V arm 5′- and 3′-stems and the elbow. LeuRS-IA and SerRS-IIA do not utilize cognate tRNA anticodon recognition, consistent with leucine and serine occupying 6-codon sectors of the genetic code. Serine is the only amino acid that is split between two genetic code columns (columns 2 and 4). As described below, we attempt to explain how SerRS-IIA-tRNA
Ser recognition may have facilitated the jump. Among missing data are (1) archaeal LeuRS-IA-tRNA
Leu in an editing/proofreading conformation; (2) archaeal SerRS-IIA-tRNA
Ser; and (3) Escherichia coli TyrRS-IC-tRNA
Tyr. We do not know how tRNA elbow recognition affects allosteric communication to a cognate AARS active site(s).
12. Serine Jumping in Genetic Code Evolution
Serine is the only amino acid that is split between two genetic code columns (columns 2 and 4) (see below). We posit that serine jumping was possible because tRNA
Ser has a type II V loop and SerRS-IIA lacks tRNA
Ser anticodon loop recognition [
5,
6,
7]. We further suggest that serine jumping during genetic code establishment may relate to the incorporation of cysteine into the code. Serine can be converted to cysteine through tRNA-linked chemistry [
17,
18,
19]. Because cysteine is important for Zn binding, there is reason to believe that cysteine was an early addition to the genetic code. First proteins that coevolved with the code may have utilized Zn binding for their initial folding [
49]. Cysteine, however, now occupies disfavored row 1 (tRNA-36A) of the genetic code, indicating that cysteine may have been a late addition. Row 1 (tRNA-36A) appears to be the last row of the code to fill. Complex aromatic amino acids (phenylalanine, tyrosine, and tryptophan) and stop codons locate to row 1 of the code, indicating that row 1 filled late [
50]. It has been posited that row 1 filled late because tRNA-36 was initially a wobble position [
5,
6,
7]. Unmodified A is not observed in a wobble position in Archaea (no unmodified tRNA-34A). The suppression of wobbling at tRNA-36A involved a tightening conformation (closing) of the 30S ribosomal subunit [
51,
52,
53,
54,
55] and modifications of tRNA-37 [
16]. At the base of code evolution, it appears that tRNA-37m
1G was necessary to read tRNA-36A and tRNA-37t
6A was necessary to read tRNA-36U. Such observations are consistent with tRNA-36 having been a wobble position before wobbling could be suppressed.
We suggest that serine jumped from column 2 to column 4 from an enlarged serine block within the code (i.e., tRNA
Ser (GGU → GCU)). Cysteine, however, may have entered the genetic code by the modification of serine (i.e., Ser-tRNA
Cys → Cys-tRNA
Cys through tRNA-linked chemistry) [
17,
18,
19]. In this way, cysteine could have invaded the genetic code early but settled in its final position in the code late (tRNA
Cys (GCA)). Anticodon GGU now encodes threonine, which is chemically related to serine. We are suggesting that both serine conversion to cysteine by tRNA-linked chemistry and type II tRNA
Ser V arm recognition by SerRS-IIA may have enabled serine jumping from column 2 to column 4 of the genetic code. Serine jumping during code establishment is of interest because this is some of the only observed chaos in the evolution of the code [
5,
6,
7].
13. Type II tRNA Evolution and the Origin of the Genetic Code
The effort to understand type II tRNA diversification at the origin of life and during the great divergence at LUCA of Archaea and Bacteria is part of a larger effort to understand the evolution of the genetic code [
1,
5,
6,
7,
56,
57]. Type I and type II tRNAs were sorted very early in evolution. Type II tRNA
Leu and tRNA
Ser appear to have been sorted before divergence of Archaea and Bacteria. Type II tRNA
Tyr was subsequently adopted in Bacteria but rejected in Archaea. The number of type II tRNAs in a prokaryote was limited by the number of potential trajectory set points of the V arm. Archaea adopted two set points. Bacteria adopted three. In Bacteria, having three type II V arm trajectory set points is correlated with the expansion of tRNA
Ser V arm stems. We posit that the adoption of three trajectory set points for type II V arms in Bacteria (1) resulted in lengthening of tRNA
Ser V arm stems; (2) altered the set points of tRNA
Leu and tRNA
Ser V arms; (3) caused alterations in how the tRNA
Leu V arm and elbow are utilized as determinants for LeuRS-IA recognition; and (4) contributed to divergence of Archaea and Bacteria.
Figure 14 shows an archaeal codon–anticodon table [
5,
6,
7,
49]. The complexity of the code is a maximum of 32 assignments. The table lists the encoded amino acid and its cognate AARS. Colors emphasize related amino acids and AARS enzymes that mostly align in columns. To suppress superwobbling and allow 2-codon sectors, U must be modified by methylation at the 5-carbon (i.e., tRNA-34cnm
5U) [
16]. Most evolution is in code columns (tRNA-35). Columns 1, 2, and 4 contain 4- and 6-codon sectors. Column 3 is entirely 2-codon sectors. Rows 1, 2, 3, and 4 relate to tRNA-36. Only purine–pyrimidine discrimination is achieved at a wobble position (tRNA-34; A and B rows).
The genetic code is highly ordered. Significant evolution is observed in genetic code columns. Related amino acids Leu, Ile, Met, and Val locate to column 1 of the code. LeuRS-IA, IleRS-IA, MetRS-IA, and ValRS-IA are all closely homologous class IA AARS enzymes. Ser and Thr are chemically related amino acids, and Ser, Pro, Thr, and Ala are neutral amino acids that locate to column 2. SerRS-IIA, ProRS-IIA, and ThrRS-IIA are closely homologous class IIA AARS enzymes. AlaRS-IID is a significantly different AARS, which may have replaced a now extinct AlaRS-IIA before LUCA to suppress translation errors. In Archaea, in column 3, an ordered striped pattern is observed. His, Asn, and Asp occupy column 3, and rows 2A, 3A, and 4A (tRNA-34G). HisRS-IIA, AsnRS-IIB, and AspRS-IIB are closely homologous AARSs. Gln, Lys, and Glu occupy column 3, and rows 2B, 3B, and 4B (tRNA-34U*/C; U* is modified U to suppress superwobbling; i.e., tRNA-34cnm
5U) [
16]. GlnRS-IB, LysRS-IB (in Archaea), and GluRS-IB are closely homologous AARSs. In column 4, CysRS-IA and ArgRS-IA are closely homologous class IA AARSs. Glycine occupies the most favored sector in the genetic code: column 4 (tRNA-35C) and row 4 (tRNA-36C).
It appears that glycine was the first encoded amino acid (tRNA-35C, tRNA36C) [
5,
6,
7,
58,
59]. Glycine, alanine, aspartic acid, and valine (GADV) are the simplest amino acids that occupy the most favored row 4 (tRNA-36C). It appears that GADV were the first four encoded amino acids [
60,
61,
62,
63,
64,
65]. An adequate model for the evolution of the genetic code must specify an order of the addition of amino acids into the code. An adequate model for the evolution of the code must account for the evolution of 6-codon sectors (Leu, Ser, and Arg), 4-codon sectors (Val, Pro, Thr, Ala, and Gly), the 3-codon sector (Ile), 2-codon sectors (Phe, Tyr, His, Gln, Asn, Lys, Asp, Glu, and Cys), and 1-codon sectors (Met and Trp). Leu and Ser occupy 6-codon sectors, and tRNA
Leu and tRNA
Ser are type II tRNAs that utilize their longer V arms for LeuRS-IA and SerRS-IIA recognition and discrimination. ArgRS-IA, which utilizes a type I tRNA
Arg, unwinds the anticodon loop to better recognize this feature (
Figure 10) [
37]. An adequate model for code evolution must rationalize why leucine and serine utilize type II tRNAs and occupy 6-codon sectors. An adequate model must rationalize serine jumping between column 2 and column 4 of the genetic code.
The genetic code evolved around tRNA and the tRNA anticodon [
5,
6,
7,
49]. Degeneracy explains why the genetic code encodes 21 assignments: 20 amino acids plus stops. The genetic code has the capacity to encode up to 32 assignments (
Figure 14). At a wobble position (tRNA-34), only purine versus pyrimidine resolution has been achieved. At Watson–Crick positions (tRNA-35 and tRNA-36), codon A, G, C, and U can be read. Thus, the code was limited by tRNA reading to 2 × 4 × 4 = 32 assignments. But, tRNA-34 (wobble) and tRNA-36 positions show similarities for the utilization of weakly pairing bases U and A. At tRNA-34, tRNA-34U must be modified to suppress “superwobbling” [
16]. Superwobbling, in which tRNA-34U reads mRNA wobble A, G, C, and U (as in mitochondria), can only be utilized in a 4-codon box [
66,
67]. To support 2-codon sectors, tRNA-34U must be modified to restrict its reading (i.e., tRNA-34cnm
5U; 5-cyanomethyluridine). In Archaea, tRNA-34A is not observed. At tRNA-36, at the base of the code, tRNA-36A is supported by adjacent tRNA-37m
1G modification. Also, tRNA-36U is supported by adjacent tRNA-37t
6A modification. We posit that tRNA-34 and tRNA-36 were originally both wobble positions, and only a single wobble position could be read at a time. With both tRNA-34 and tRNA-36 as wobble positions, the complexity of the genetic code was 8 assignments (2 × 4 or 4 × 2) (only one wobble position could be read at a time). When tRNA-36 was a wobble position, we posit that columns 1, 2, and 4 of the genetic code evolved primarily around tRNA-35 (Watson–Crick) and tRNA-36 (wobble). Column 3 of the genetic code evolved around tRNA-34 (wobble) and tRNA-35 (Watson–Crick). This explains why 6-, 4-, and 3-codon boxes locate to columns 1, 2, and 4. This also explains why only 2-codon boxes are found in column 3, and why column 3 fractionates on A and B rows (tRNA-34; wobble).
Because tRNA-36 was originally a wobble position, we posit that disfavored row 1 of the genetic code (tRNA-36A) was the last row to fill. Stop codons locate to disfavored row 1. Stop codons are read by protein release factors that read the mRNA codon directly, so there is no tRNA that corresponds to a stop codon (except in suppressor strains) [
68]. Aromatic amino acids Phe, Tyr, and Trp locate to row 1. Phe, Tyr, and Trp are the most complex amino acids, so it is reasonable that they were added late after wobbling was suppressed at tRNA-36 [
50]. The suppression of wobbling at tRNA-36 was partly via tRNA-37m
1G modification to read tRNA-36A and tRNA-37t
6A modification to read tRNA-36U. To suppress tRNA-36 wobbling, a conformational change in the 30S ribosomal subunit tightens the anticodon–codon interaction, dehydrates the bases to promote pairing, and locks the translation frame that helps to maintain translational fidelity and, also, helps to set the reading frame in place [
51,
52,
53,
54,
55,
69,
70,
71]. Wobbling cannot be suppressed in the same manner at tRNA-34 because the modification of adjacent bases does not assist tRNA-34 reading. The modification of tRNA-33 will not help to suppress tRNA-34 wobbling because tRNA-33 is on the other side of the anticodon loop U-turn. Also, tRNA-35 cannot be easily modified because this is a Watson–Crick base that must pair with mRNA. Too many different and constrained tRNA-35 modifications might be necessary (i.e., 2–4) to suppress wobbling at tRNA-34 for such a mechanism to evolve.
In the pre-life world, RNA-linked and tRNA-linked chemistries were common. In the evolution of the genetic code, tRNA-linked reactions may have promoted the incorporation of leucine (Val → Leu; five steps), tyrosine (Phe → Tyr; one step), glutamine (Glu → Gln; one step), asparagine (Asp → Asn; one step) [
72,
73], arginine (Orn → Arg; two steps; Orn for ornithine) [
74], and cysteine (Ser → Cys; two steps) [
17,
18,
19]. Because of tRNA-linked reactions, an 8-amino-acid genetic code can be significantly enriched to encode the first RNA sequence-dependent proteins. For instance, in addition to encoding GADVLSER, an 8 aa code with tRNA-34 and tRNA-36 wobbling could also utilize CQN through tRNA-linked reactions.
Adopting a tRNA-centric view of pre-life chemical evolution indicates that the genetic code was initially utilized to generate polyglycine, a component of protocells [
5,
6,
7]. Subsequently, the code progressed to generate GADV polymers. Then, probably, GADVLSER was encoded with CQN added through tRNA-linked chemistry. Leucine and serine, therefore, may have entered the code at about the same time to utilize type II tRNAs and to eventually settle into 6-codon boxes. First RNA sequence-dependent proteins and enzymes emerged at about the 11-amino-acid stage. The suppression of tRNA-36 wobbling allowed the code to expand. The code froze at 20 amino acids plus stops because of fidelity mechanisms.
We consider the utilization of type II tRNALeu, tRNASer, and tRNATyr (in Bacteria) to support this narrative.
14. Conclusions
We conclude that type II V loops in Archaea and Bacteria relay a simple story about the original sorting of type I and type II tRNAs. In a domain (i.e., Archaea and Bacteria), sets of synonymous type II tRNA V loops must have distinct trajectory set points determined by the number of unpaired bases just 5′ of the Levitt base (Vn). For Archaea, tRNALeu has a set point of 2, and tRNASer has a set point of 1. For Bacteria, tRNATyr has a set point of 2. To accommodate a type II tRNATyr with a set point of 2 in Bacteria, tRNALeu has a set point of 1, and tRNASer has a set point of 0. The longer lengths of the tRNASer V arm stems in Bacteria may relate to the need to stabilize the V2 = V(n−1) base pair. Sharing tRNAs between Archaea and Bacteria is awkward, in part, because of the incompatibility of type II tRNAs. Also, tRNA modification systems in Archaea and Bacteria are largely incompatible.
As organisms became more derived in evolution, V arm end loop contacts by a cognate AARS appear to have given way to V arm stem contacts. This trend appears to be supported by LeuRS-IA-tRNALeu V arm end loop contacts in Archaea being replaced by LeuRS-IA-tRNALeu V arm 3′-stem contacts in Bacteria. Also, V arm end loop and 5′-stem contacts in TyrRS-IC-tRNATyr of Thermus thermophilus appear to give way to V arm 5′-stem contacts in TyrRS-IC-tRNATyr of Escherichia coli (no structure is currently available). We posit that V arm stem contacts may exert greater allosteric communication to the tRNA 3′-end compared to V arm end loop contacts, which would be expected to be more flexible and more weakly allosteric than contacts to a V arm stem.
The three 31 nt minihelix tRNA evolution theorem completely describes the evolution of type I and type II tRNAs, to the last nucleotide (
Figure 1) [
1]. The reason that tRNA evolution could be solved with such high confidence is that tRNA sequences chemically evolved from RNA repeats and inverted repeats conserved from pre-life. The solution of type II tRNA evolution was predicted based on the model for type I tRNA evolution, making the three 31 nt minihelix tRNA evolution theorem powerfully predictive [
3]. The evolution of type II tRNAs in Archaea and Bacteria is fully supportive of the theorem. The evolution of type I and type II tRNAs comprises the core, successful and conserved strategy, and pathway in the evolution of life on Earth. After ~4 billion years, it is remarkable that such a clear record of pre-life worlds survived in the tRNA sequences of living organisms.