This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

Three-dimensional algebraic models, also called Genetic Hotels, are developed to represent the Standard Genetic Code, the Standard tRNA Code (S-tRNA-C), and the Human tRNA code (H-tRNA-C). New algebraic concepts are introduced to be able to describe these models, to wit, the generalization of the 2^{n}-Klein Group and the concept of a subgroup coset with a tail. We found that the H-tRNA-C displayed broken symmetries in regard to the S-tRNA-C, which is highly symmetric. We also show that there are only 12 ways to represent each of the corresponding phenotypic graphs of amino acids. The averages of statistical centrality measures of the 12 graphs for each of the three codes are carried out and they are statistically compared. The phenotypic graphs of the S-tRNA-C display a common triangular prism of amino acids in 10 out of the 12 graphs, whilst the corresponding graphs for the H-tRNA-C display only two triangular prisms. The graphs exhibit disjoint clusters of amino acids when their polar requirement values are used. We contend that the S-tRNA-C is in a frozen-like state, whereas the H-tRNA-C may be in an evolving state.

The transfer RNA (tRNA) is perhaps the most important molecule in the origin and evolution of the genetic code. Just two years after the discovery of the double helix structure of DNA, F. Crick [

The SGC has been theoretically derived from a primeval RNY genetic code under a model of sequential symmetry breakings [

In this work, we focus on the codon-anticodon rules for which we develop algebraic models. In previous works [

The manuscript is organized as follows. First, we highlight some fundamental algebraic properties of the SGC in 3D. Second, we review some fundamental biological properties of the tRNA code for both the S-tRNA-C and the H-tRNA-C. Third, we develop three-dimensional algebraic models for both codes. New algebraic concepts are introduced to be able to describe these models, to wit, the generalization of the 2^{n}-Klein Group and the concept of a subgroup with a tail. Next we demonstrate that there are exactly 12 ways to represent a graph of amino acids depending on the ordering of the four RNA nucleotides. Twelve phenotypic graphs of amino acids are calculated for each genetic code. The centrality measures of each of the 12 graphs for each genetic code are calculated and their averages are statistically compared. A common subgraph of connected amino acids that corresponds to a triangular prism is encountered in the SGC, the S-tRNA-C, and in the H-tRNA-C. We also searched for matches between the topology of the networks with physicochemical properties of amino acids. Finally, the present results are discussed in the context of algebraic models of the evolution of the genetic code.

Group theory is the branch of mathematics that is used for studying symmetry. In mathematics, a group is an algebraic structure consisting of a set together with an operation that combines any two of its elements to form a third element.

Associativity: x × (y × z) = (x × y) × z for all x, y, z, elements of G.

Existence of neutral: There is an element e in G such that, x × e = e × x = x for all x.

Existence of inverses: For every x of G there is an element x^{−}^{1} called the inverse of x, such that x × x^{−}^{1} = x^{−}^{1} × x = e.

In physics, groups are important because they describe the symmetries, which the laws of physics seem to obey. According to Noether’s theorem, every symmetry of a physical system corresponds to a conservation law of the system. Analogously, a large part of our work has contributed to determine the symmetry groups that not only describe the structure of the SGC but also its evolution [

In the next sections we derive the symmetry groups of the Genetic Hotels of the SGC, the S-tRNA-C and the H-tRNA-C.

In a previous work [_{2})^{2} = ℤ_{2} × ℤ_{2} = {00, 01, 10, 11} of the binary duplets, with ℤ_{2} = {0, 1} the binary field of two elements, generally known as GF(2), the Galois Field of two elements. There is nothing special about this matching of the nucleotides. In fact the set N can be partitioned into two disjoint binary classes in three different ways, based on chemical criteria: strong-weak, amino-keto, and pyrimidine-purine, and similar results can be obtained [_{2})^{2}, identified with the polynomial ring ℤ_{2}[^{2} + x + 1. The module 2 bitwise addition in the additive group (ℤ_{2})^{2} induces in the set N a group structure whose Cayley Table is (

The Four Klein Group.

+ | C | U | A | G |
---|---|---|---|---|

^{3}, where the triplets UCC, CUC, and CCU correspond, respectively, to the unitary vectors e_{1} = (1, 0, 0), e_{2} = (0, 1, 0), and e_{3} = (0, 0, 1), of the so-called orthonormal canonical basis. The set NNN has the geometric shape of a cube, or regular hexahedron, the sides of which are of the length three. Due to its resemblance to a three-floor cubic building, we have called it the Genetic Hotel [

For the genetic hotel NNN, identified as the vector space (GF(4))^{3} with being GF(4) = {0, 1, 2, 3} the Galois field of four elements with the appropriate field operations, we use the so-called Manhattan or Taxi-cab distance. The Manhattan distance is consistent with the graph-theoretical concept of a distance between vertexes. The Manhattan distance between the two triplets (X_{1}X_{2}X_{3}) and (Y_{1}Y_{2}Y_{3}) of NNN is defined as the nonnegative integer |X_{1} − Y_{1}| + |X_{2} − Y_{2}| + |X_{3} − Y_{3}|, where the operations are the ordinary addition and subtraction in the set ℤ of integers, and the vertical bars mean the usual absolute value of a real number. The latter definition of distance in NNN is similar to the one used for the definition of the Hamming distance in a hypercube (ℤ_{2})^{n}, where the only scalars are zero and one. The Manhattan distance gives us the minimal number of edges or unitary segments in a path between two triplets. Obviously the greatest distance in NNN is equal to nine, the distance between the null triplet CCC and its complementary GGG. It is easy to prove that two triplets are adjacent if, and only if, they differ in only one component, and these different components are consecutive under the selected order {C, U, A, G}, such that the Manhattan distance between them, D((X_{1}X_{2}X_{3}), (Y_{1}Y_{2}Y_{3})), is equal to one.

The group (E(NNN,◦)) of all the symmetries of the GF(4)-vector space NNN

In the first place, the linear isometries are those represented by orthogonal matrices with respect to the canonical basis (e_{1} = (1, 0, 0), e_{2} = (0, 1, 0) and e_{3} = (0, 0, 1)). They are the so-called permutation matrices, obtained from the identity matrix I_{3}=_{3}=^{2}=^{2}B=

They conform an order six group, which we will denote as P_{3}(GF(4)), generated by A and B, isomorphic to the symmetric group S{3e_{1}, 3e_{2}, 3e_{3}} of the set of the three corner triplets GCC, CGC, and CCG, that are collinear with the three unitary canonical vectors UCC = e_{1} = (1, 0, 0), CUC = e_{2} = (0, 10), CCU = e_{3} = (0, 0, 1). Among the generators A and B, the following defining relations take place: A^{3} = B^{2} = I_{3}, BA = A^{2}B. The group P_{3}(GF(4)) is also isomorphic to the dihedral group _{3} of all the symmetries of an equilateral triangle.

On the other hand, the only isometric translations are the ones associated with the eight triplets CCC, CCG, CGC, CGG, GCC, GCG, GGC, and GGG, situated at the corners of the multicube NNN. These eight translations conform an Abelian order eight group, generated by the three order two elements GCC, CGC, and CCG. We will denote this as IT(NNN) and call it the group of isometric translations of the multicube NNN. This group is isomorphic to the additive group ℤ_{2} × ℤ_{2} × ℤ_{2}, with the component-wise module 2 addition. In fact, this is what we, later on, will call a generalized 2^{3}-Klein Group. Hence, there are 48 isometric transformations, which are compositions of the six linear isometries with the eight isometric translations. They comprise the known 48 symmetries of a cube or regular hexahedron.

The so-called Euclidean group E(NNN) of all the isometric transformations of the multicube NNN, is the semidirect product IT(NNN)↙P_{3}(GF(4)). This means that every isometric transformation of NNN may be represented, in a unique way, as a composition T_{XYZ}◦M of the isometric translation T_{XYZ}, where X, Y, Z ∈ {C,G} with the permutation matrix M, and IT(NNN) being a normal subgroup of E(NNN). The Euclidean group E(NNN) is generated by the five elements: A, B, T_{GCC}, T_{CGC}, T_{CCG}, with the defining relations: A^{3} = B^{2} = I_{3}, BA = A^{2}B,
_{GCC})^{2} = (T_{C}_{GC})^{2} = (T_{CC}_{G})^{2} = I_{3},A◦ T_{GCC} = T_{CGC}◦ A,B◦ T_{GCC} = T_{GCC}◦ B,A◦ T_{C}_{GC} = T_{CCG}◦ A,B◦ T_{CG}_{C} = T_{CCG}◦ B,A◦ T_{CC}_{G} = T_{GCC}◦ A,B◦ T_{CC}_{G} = T_{CGC} ◦ B, T_{GCC}◦ T_{CGC} = T_{CGC}◦ T_{GCC},T_{GCC} ◦ T_{CCG} = T_{CCG} ◦ T_{GCC},T_{C}_{GC} ◦ T_{CCG} = T_{CCG} ◦ T_{C}_{GC}.

The set N is the disjoint union of two binary sets, the pyrimidines Y = {C, U} and the purines R = {A, G}. The subset YYY = Y × Y × Y is a unitary cube, which, under addition, is a subgroup of the additive group (NNN, +). It is not a GF(4)-vector subspace, since it is not closed under multiplication of scalars by vectors. It is, however, actually a ℤ_{2}-vector subspace, if NNN is seen as a six-dimensional ℤ_{2}-vector space, isomorphic to the binary hypercube (ℤ_{2})^{6}. The multicube NNN is the union of 27 unitary cubes, which are isometric to the subcube YYY whose nucleotide components are pyrimidines. Eight out of those 27 subcubes are the group-theoretical cosets of the subgroup (YYY, +), namely, the subcubes YYY, YYR, YRY, YRR, RYY, RYR, RRY, RRR, which are the corner subcubes of the whole multicube NNN. We refer to

The four walls of the Standard Genetic Hotel with the Cartesian set N × N × N (NNN). Positions of the 64 codons or triplets in the Genetic Hotel. The 64 codons are distributed in eight subcubes or condominiums in three dimensions (3D). Within each condominium we can move from one triplet to its nearest neighbor by means of a transition in the first, second or third position of the codon. In order to abandon a condominium, a transversion is required. We note that there are 48 symmetries of the Hotel.

The multicube NNN is the disjoint union of the four sets: NNC, NNU, NNA, and NNG, which are, respectively, the first, the second, the third floor, and the roof of the hotel. The floor NNC is a two-dimensional vector subspace, generated by the unitary vectors UCC and CUC, inserted in the XY-Euclidean plane. The other three: NNU, NNA, and NNG are two-dimensional affine subspaces, or group theoretical cosets of NNC. The subset NNY, of all the triplets that end in a pyrimidine nucleotide, C or U, is the union of the first two floors: NNC and NNU. It is an order 32 subgroup of the additive group (NNN, +). Its complement, the subset NNR, of all the triplets that end in a purine nucleotide, A or G, is the union of the two floors NNA and NNG. It is the only different coset of the subgroup (NNY, +).

The very front vertical wall is the set GNN of the triplets that begin with G. The first inner vertical wall is the set ANN of the triplets that begin with A. The second inner vertical wall is the set UNN of the triplets that begin with U. The rear vertical wall is the set CNN of the triplets that begin with C. The latter is the two-dimensional GF(4)-vector subspace, generated by the unitary vectors CUC and CCU. The other front walls, GNN, ANN, and UNN, are the affine subspaces, or group-theoretical cosets, determined, respectively, by the translations T_{GCC}, T_{ACC}, and T_{UCC}, associated with vectors GCC, ACC, and UCC, respectively, which are orthogonal to the plane CNN, which is inserted in the YZ-Euclidean plane. The additive subgroups: (NNC, +) and (CNN, +) are isomorphic to the group (ℤ_{2})^{4} = ℤ_{2} × ℤ_{2} × ℤ_{2} × ℤ_{2}, with the bitwise module 2 addition. In this latter group every element different from the neutral has order two.

Next, we will introduce a novel concept.

^{n}-Klein Group to an order 2^{n} finite Abelian 2-group in which every element different from the neutral has order two. It is isomorphic to the additive group of the hypercube (ℤ_{2})^{n} with the bitwise module 2 addition. Using another genus of terminology, a 2^{n}-Klein Group is a homocyclic Abelian 2-group of rank n, where every nonnull element has order two.

_{2})^{6}, is a 2^{6}-Klein Group.

From this it follows that the additive subgroups (CNN, +) and (NNC, +), associated with the rear wall CNN and the floor NNC, respectively, are generalized 2^{n}-Klein Groups. Analogously, the subgroups (NNY, +) and (YNN, +) of triplets that end or begin with a pyrimidine nucleotide, respectively, are generalized 2^{5}-Klein Groups.

We have observed [

Note from _{CCU}, associated with the unitary vector CCU. (3) The subset NNY is the union NNC ∪ NNU of the first and second floors. It consists of all the triplets that end in a pyrimidine. It is an index two subgroup of the additive group (NNN, +). (4) The plane NNA, red, the third floor, is the two-dimensional affine subspace, determined by the vertical translation T_{CCA}, associated with the vector CCA of length two. (5) The plane NNG, green, the roof, is the two-dimensional affine subspace, determined by the vertical translation T_{CCG}, associated with the vector CCG of length three. (6) The subset NNR is the union NNA ∪ NNG of the third floor and the roof. It consists of all the triplets that end in a purine. It is a coset of the subgroup NNY.

Today we know that an amino acid is covalently bound at the 3’ end of a tRNA molecule and that a specific nucleotide triplet elsewhere in the tRNA interacts with a particular triplet codon in mRNA through hydrogen bonding of complementary bases. A striking feature of the genetic code is that an amino acid may be specified by more than one codon, so the code is described as degenerate. This does not suggest that the code is flawed: although an amino acid may have two or more codons, each codon specifies only one amino acid. The degeneracy of the code is not uniform. Wobble allows some tRNAs to recognize more than one codon. Transfer RNAs base-pair with mRNA codons at a three-base sequence on the tRNA called the anticodon. The first base of the codon in mRNA (read in the 5’→3’ direction) pairs with the third base of the anticodon.

Crick proposed a set of four relationships called the

The first two bases of an mRNA codon always form strong Watson-Crick base pairs with the corresponding bases of the tRNA anticodon and confer most of the coding specificity. The third position in each codon is much less specific than the first and second and is said to wobble.

The first base of the anticodon (reading in the 5’→3’ direction; this pairs with the third base of the codon) determines the number of codons recognized by the tRNA. When the first base of the anticodon is C, base pairing is specific and only one codon is recognized by that tRNA. When the first base is U or G, binding is less specific and two different codons may be read. Adenine is very rarely used in this position and pairs mainly with U, but also with C and G. When inosine (I) is the first (wobble) nucleotide of an anticodon, three different codons (U, C, A) can be recognized.

When an amino acid is specified by several different codons, the codons that differ in either of the first two bases require different tRNAs.

A minimum of 32 tRNAs is required to translate all 61 codons (31 to encode the amino acids and one for initiation). Nowadays it is known that the smallest possible number of tRNA species with different anticodons able to read the genetic code of 20 amino acids is 26 [

Further relevant biological properties of the tRNA code are:

tRNAs are grouped into families of isoacceptors, with each family recognized by a single cognate aminoacyl-tRNA synthetase.

All tRNAs conform to a secondary structure described as a “cloverleaf”, and fold in three-dimensional space into an “

The 3’ end of all tRNAs have the sequence CCA, with the amino acid attached by the tRNA synthetase to the terminal adenosine residue. In eukaryotic cells, the 3’ terminal CCA is not encoded but is enzymatically added post-transcriptionally.

During protein synthesis, tRNAs interact with the ribosomal “A” (aminoacyl), “P” (peptidyl) and “E” (exit) sites.

All organisms exhibit preferred “codon bias”, in which certain synonymous codons are preferred over others, generally corresponding to cognate tRNA abundance.

The human genome has 497 identified tRNA genes and 324 putative tRNA pseudogenes. There are no tRNAs that decode stop codons [

An individual aminoacyl-tRNA synthetase must be specific not only for a single amino acid but for certain tRNAs as well.

The coding rules appear to be more complex than those in the “first” code [

Next, we will see that in the case of human tRNA set the assignments codon-anticodon [

_{1}X_{2}X_{3} the triplet X_{3}X_{2}X_{1}, which is obtained one from the other by the interchange of the first nucleotide with the third one.

_{1}X_{2}X_{3} is defined as the reverse of the triplet of its complementary nucleotides, that is, the triplet _{3}_{2}_{1} where _{i} denotes the complementary nucleotide of X_{i}.

_{13}=_{13} converts V, under multiplication, into P_{13}V=_{(3,3,3)} associated with the vector GGG converts each triplet (X, Y, Z) into its complementary (_{(3,3,3)}◦P_{13} which is an isometric affine transformation, converts each triplet or codon into its reverse complementary or anticodon. Here we identify the matrix P_{13} with the linear automorphism it defines.

Herein we describe the correspondences between codons and their anticodons in the standard tRNA code (adapted from [

Here we see that there are four triplets: AGG, AAG, AUG, and ACG that are not utilized as anticodons (due to gene deletion), all beginning with A.

Observe that there are four triplets: AGA, AAA, AUA, and ACA that are not utilized as anticodons of any codon, all beginning in A.

Note the absence of the four triplets: AGU, AAU, AUU, and ACU that are not utilized as anticodons of any codon, all beginning with A.

We remark that there are four triplets: AGC, AAC, AUC and ACC that are not utilized as anticodons of any codon, all beginning with A.

The subgroup cosets of the Standard Genetic Hotel. (

We observe that, in the standard tRNA code that is reduced with respect to the expected full set of tRNAs, the 16 triplets that begin with A, that is, those of the first inner wall, are not utilized as anticodons of any codon (

If, in the original Genetic Hotel NNN, the first inner wall, ANN, confirmed by the triplets that are not anticodons, is removed, we obtain a Hotel with three walls, confirmed by the sets CNN, UNN, and GNN. We will call it the Hotel of the Anticodons of the Standard tRNA code. The wall GNN is the set of shared anticodons (see

The Hotel of Anticodons of the S-tRNA-C (See text).

We see that among the five generators A, B, T_{GCC}, T_{CGC}, T_{CCG} of the group E(NNN) of all the symmetries of the genetic hotel NNN, only B, T_{CGC}, and T_{CCG} leave the three walls hotel ^{2} = (T_{CGC})^{2} = (T_{CCG})^{2} = I_{3},…, B◦ T_{CGC} = T_{CCG}◦B, B◦ T_{CCG} = T_{CGC}◦ B.

This group is isomorphic to the so-called quaternion group of eight elements.

Transfer ribonucleic acid (tRNA) decodes the genetic code by charging amino acids to the growing protein chain on the ribosome. With the availability of the complete sequence of the human genome at least 497 tRNA genes have been identified (which include some gene duplications) [

Next, we enlist the assignments between codons and their anticodons in the human tRNA code. As before, in the following correspondences the arrows show for each triplet on the left, the according anticodon on the right. This shows, again, that the sets of anticodons may differ from the expected (canonic) one-to-one correspondence.

In this case we observe that there are four triplets: GGG, GAG, AUG, and GCG that are not utilized as anticodons (that is, there are no genes for them, marked zero), one of them beginning with A, and the other three beginning with G. Note that we have the same set of amino acids as those found in the S-tRNA-C but with different wobbling properties. Note also that the remaining 5’R anticodons have the transcribed As modified post-transcriptionally to Hypoxanthine (marked I, for the nucleoside Inosine). The pDiN in the triplets are underlined in order to facilitate visualization of the pairings. The notation here also applies to the following segments of the assignments.

Note that there are four triplets GGA, AAA, AUA, and ACA, as well as the three stop triplets UUA, CUA, and UCA that do not exist, that are not utilized as anticodons of any codon. Again, this set of anticodons is not the same as those absent in the S-tRNA-C. In some circumstances, the codon UGA can be assigned to SelCys.

Observe that there are four triplets: GGU, GAU, AUU, and ACU that are not utilized as anticodons of any codon, two beginning with G and two beginning with A.

Note the absence of the four triplets: GGC, GAC, AUC, and ACC that are not utilized as anticodons of any codon, two beginning with G and the other two beginning with A.

The shared front wall: Sets of triplets of type GNN (red) and of type ANN (green) that are shared anticodons in the human tRNA. The subgroup coset with tail in the front red wall GNN is the image under the isometric translation T_{GCC} of the subgroup with tail of the rear wall CNN. The subgroup coset with tail in the front green wall, ANN, is the image under the isometric translation T_{AAU} of the subgroup with tail of the rear wall CNN.

We observe that in the human tRNA set, compared to the standard set, there are also 16 triplets that are not anticodons of any codon, eight of them beginning with A, and the other eight beginning with G. Hence, there are 16 triplets that are shared anticodons, eight of which beginning with A, and the other eight beginning with G, each one being the anticodon of two triplets, one of them being its reverse complementary. The eight triplets beginning with G that are not anticodons in the human tRNA anticode are the images under the translation T_{UCC} of another eight triplets beginning with A that are not anticodons in the standard tRNA anticode. This set is the complement in the first inner wall, ANN, of the set of the eight triplets that are not anticodons in the human tRNA. Both sets are next to each other, with a common part of eight triplets that begin with A and two subsets of 8 eight triplets that begin with A, as is the case for the S-tRNA-C, and another eight triplets that begin with G in human tRNA anticode, which are pair-wise adjacent to those that begin with A.

Physiological considerations are necessary for understanding the diversity of mechanisms in decoding when utilizing the 5’R anticodons, which is not yet complete. The rationale for the 5’A elimination might encompass combinations of mechanisms. (1) The 5’R anticodons are of one kind only in each box, genes for the other kind are deleted (the zeroes). In the human genome, there are three cases with the exceptional presence of one tRNA gene each for the Ile G

Codons or triplets that share a common anticodon belong to the two first rows of each block, that is, they are triplets that end in a pyrimidine, C or U. Hence, they are situated on the first and the second floors of our Hotel of triplets, in a vertical segment. Additionally, we observe that triplets that share a common anticodon both specify the same amino acid. The set of codons that share their anticodons with neighbors is the subgroup NNY, the union of the first and the second floor NNC and NNU (see

The Hotel of Anticodons in the H-tRNA-C. This is a hotel with only three front walls. The front wall is composed by GNN triplets. The 16 triplets in red and in green conform the wall of shared anticodons, joined in a unique set. The eight in red are those that begin with G and the eight in green are those that begin with A. The most external wall, anticodons of the type ANN, does not exist: it is the set of eight triplets beginning with A and eight that begin with G that are not anticodons of any codon.

We see that the anticodons of codons that belong to the additive subgroup NNY, the union of the first and the second floors, all belong to the set RNN, the union of the front and the first inner wall GNN and ANN, which is a coset of the additive subgroup YNN, the image of NNY under the reflection P_{13}. As the set RNY is the intersection of NNY with RNN, it is invariant under the action of the affine function _{(3,3,3)}·_{13} of reverse complementarity, and also, under the non-bijective function that assigns the anticodons to every corresponding codon in the human tRNA anticode. The 16 triplets that are shared anticodons all belong to the set RNN, eight of them to the front wall GNN and the other eight to the first inner wall ANN. Namely, they are: GCC, GUC, GCU, GUU, GCA, GUA, GAA, GUG in GNN, and AAC, AGC, AAU, AGU, AGA, ACG, AAG, AGG in ANN. Notice that there are no adjacent elements between both sets.

In order to develop the algebraic model of the H-tRNA-C we need now to consider the subset: SGwT = {CCC, CUC, CCU, CUU, CCA, CUA, CAA, CUG} of the rear wall CNN. The first four elements CCC, CUC, CCU, CUU, the rear square face of the cubic condominium YYY, conform to the additive subgroup CYY of NNN, and is isomorphic to the Klein Four Group. Three out of the other four elements of the set SGwT, namely CCA, CUA, and CUG belong to the coset CYR, obtained from CYY by the translation T_{CCA}. However, the other subgroup, CAA, does not belong to the coset CYR, but to the coset CRR, another elementary square. For this reason, we will call the set SGwT a

_{GCC} and T_{AAU}, respectively.

An important observation: The translation T_{GCC} is isometric. For this reason, the image in red {GCA, GUA, GAA, GUG} of the tail in blue, {CCA, CUA, CAA, CUG}, (_{AAU} is not isometric, the image in green, {AAG, AGG, ACG, AGA} of the tail, no longer has the shape of a cross (see

A consequent terminology: As a natural consequence of the Theorem we will call here and further every subset {GCC, GUC, GCU, GUU, GCA, GUA, GAA, GUG}, and {AAC, AGC, AAU, AGU, AGA, ACG, AAG, AGG} of shared anticodons a

In the next sections we derive the corresponding Phenotypic Networks of amino acids for the SGC, S-tRNA-C, and the H-tRNA-C.

A non-empty set

Reflexivity: Every element of

Symmetry: If an element of

Transitivity: If an element of

The equivalence relation ℜ defines

The set containing all the equivalence classes is the quotient set and is denoted by _{i}

_{i}

_{i}

_{i}_{j}_{i}_{j}

Given the set N = {C, U, A, G} of the nitrogenous bases, the set of all the triplets is the set NNN = {(X_{1}X_{2}X_{3})|X_{1}, X_{2}, X_{3} ∈ N}, and the set _{1}X_{2}X_{3}) (Y_{1}Y_{2}Y_{3}) ∈ NNN, (X_{1}X_{2}X_{3}) ℜ (Y_{1}Y_{2}Y_{3}) ⇔ (X_{1}X_{2}X_{3}) and (Y_{1}Y_{2}Y_{3}) encode the same element in

Example: The cube NNN can be seen as a graph

An undirected graph

From a mathematical point of view, a graph can be defined by means of the adjacency matrix X = {_{ij}

For undirected graphs the adjacency matrix is symmetric, _{ij}_{ji}

The abundance of given types of subgraphs or cliques and their properties will be examined in

In order to characterize the different graphs of the amino acids for the different codes we use the following statistical properties.

_{i}

For an undirected graph the degree measures how well an element is connected to other elements of the graph. For an undirected graph with a symmetric adjacency matrix, _{in}_{,i} = _{out}_{,i}, where in-degree _{in}_{,i} of the node _{out,i} is the number of edges departing from

This measure gives a large centrality to nodes, which have the shortest path distances to the other nodes. This can be interpreted as how long it would take to spread information from a given node to all the other nodes sequentially.

_{hj}_{hj}

While the previous measures consider nodes which are topologically better connected to the rest of the network, they overlook vertices, which may be crucial for connecting different regions of the network by acting as bridges. The betweenness measures how many times a node is the path of least length across all the other nodes.

As already pointed out in _{1}, X_{2}, X_{3}, X_{4}} of the nucleotides, the reverse ordering {X_{4}, X_{3}, X_{2}, X_{1}} leads to the same cube, differing from a reflection between the second and third floor and a rotation of 180° on the Z-axis. Therefore, the 24 algebraic representations of the SGC in 3D or 6D can be reduced to only 12 possible graphs for representing the codons of the SGC.

The fact that the equivalence relation ℜ is a relation over the set _{i}’

Graph _{i}’

Each graph _{i}’

In order to compare the statistical properties of the 12 graphs for each code, we calculated the centrality measures for each of them and calculated their averages to perform a one-way ANOVA test to determine if they were statistically similar (

The mean of centrality measures. In (_{i}’_{i}_{i}’

The centrality estimates of each of the 12 graphs for each code can be found in SI-1, SI-2, and SI-3.

We tested the null hypothesis that the patterns of all curves for each centrality measure were statistically indistinguishable. By means of a simple one-way ANOVA, we were unable to reject the hypothesis that the betweenness (

We note that, in the case of the SGC, all its graphs _{i}’

When considering only the standard tRNA code as a subset of the genetic code and constructing its graph, this will be a subgraph of _{i}_{i}_{i}

A common factor of these graphs _{i}_{3} × ℤ_{2} which is the product of two cyclic groups, and as a graph it is a six-vertex three-regular graph.

These prisms come in four different types (

Diagrams of the four types of prisms that appear on the graphs _{i}

The distribution of the prisms across all the orderings is given in

When considering the polar requirement scale values for each amino acid [

Distribution of the four types of prisms across the orderings of the nucleotides N.

AUGC | AUCG | ACUG | AGUC | AGCU | ACGU | UAGC | UACG | UGAC | GUAC | Total | |
---|---|---|---|---|---|---|---|---|---|---|---|

An interesting fact is that when the orderings of the graphs _{i}

Diagrams of the four types of prisms that appear on the graphs _{i}

Graphs _{i}

For the case of the human tRNA code the same procedure is carried out such that there are 12 generated graphs, _{i}’_{i}

Diagram of type 5 prism present in only two arrangements of the human tRNA code.

As an example, consider

When this procedure is applied to the SGC, the graphs _{i}’_{i}_{i}’_{i}_{i}’

Diagrams of the human tRNA graph _{i}’_{i}

Novel 3D algebraic models of the S-tRNA-C and H-tRNA-C have been rigorously derived, and we compared them with the Genetic Hotel of the SGC. The symmetry groups of the three-walls Hotels of the SGC, the S-tRNA-C, and the H-tRNA-C have been determined. The most symmetric of these genetic codes is the SGC followed by the S-tRNA-C, and the least symmetric is the H-tRNA-C. In fact, we had to create a new algebraic concept, the subgroup with a tail, in order to describe the latter. The SGC can be broken down into a product of simpler groups reflecting the pattern of degeneracy observed [

We also showed that there could be only 12 different graphs for representing the corresponding amino acids for each code. These 12 ways of representing the networks of amino acids depend on the ordering of the four RNA nucleotides. The averages of the centrality measures of the 12 graphs for each of the three codes were calculated. A simple one-way ANOVA test showed that the eigenvector and betweenness were statistically indistinguishable in the three types of graphs while the degree and closeness differed from each other. Therefore, while the topology of the three graphs is statistically the same, the differences in degree and closeness essentially capture the idiosyncrasies of wobbling of the S-tRNA-C and the H-tRNA-C in regard to the SGC.

The asymmetrical H-tRNA-C is also reflected by the fact that its phenotypic graphs display only two common prisms of amino acids in 10 out of the 12 possible graphs. In contrast, in the S-tRNA-C there are 10 out of the 12 possible graphs, which share a common type of prism. These subgraphs are relevant in the characterization of the networks of amino acids for the different genetic codes. For example, when the triangular prisms were colored according to the physicochemical property of polar requirement of amino acids [_{i}_{i}’

The RNY codon model that was utilized for the algebraic procedure of the Hotels has biological backing in the observations of Eigen’s group on tRNA sequences [

There are two ways of deriving the SGC from the primeval RNY code. First, we considered not a strict comma-less code as proposed by Crick _{o}

We remark that there is an interesting and intriguing connection between the circular codes with our approach of frame-shift reading mistranslations for obtaining the SGC from RNY codons. Note in fact that there are 12 out of the 16 RNY codons in _{o}

Genetic Hotel of the SGC and its evolution, See text.

The SRM is only now starting to be examined with mathematical procedures. Danckwerts and Neubert [

There are many algebraic models for the formation of the code. A pioneering work of the use of group theory for extracting the symmetries and evolution of the SGC was developed by Hornos and Hornos in 1993 [

Since the evolution of the genetic code is apparently a problem of symmetry breaking, we contend that the H-tRNA-C is in an ongoing evolution whereas the S-tRNA-C is more in a frozen state. The latter preserves symmetry, for example, in the formation of the four isometric subgroup cosets CYY, UYY, AYY, and GYY (

The S-tRNA-C is present in organisms that are not intensely regulatory, like Archaeas. Humans are an example of organisms that are more intensely regulatory, such as eukaryotes or vertebrates.

The paramount importance of tRNA for the evolution of the genetic code cannot be understated. It has recently been shown that the origin and evolution of the Peptidyl Transferase Center (PTC) of the ribosome could have evolved from proto-tRNAs [

The symmetry groups of 3D algebraic models of the SGC, the S-tRNA-C, and the H-tRNA code have been determined. The evolution of the codes can be reflected by successive symmetry breakings. Both the SGC and the S-tRNA code are more symmetrical than the H-tRNA code which means that the former are in a frozen state whereas the latter is still evolving. We demonstrated that there are only 12 ways for representing the phenotypic networks of amino acids for each code. The physicochemical property of polar requirement overlaps with the general topology of the networks of amino acids. We show that the degree and closeness of these networks capture the idiosyncracies of wobbling of the S-tRNA-C and the H-tRNA-C. The evolution of the whole Genetic Hotel of the SGC can be neatly reconstructed by means of simple translations of the RNY primeval genetic code.

M.V.J. was financially supported by PAPIIT-IN107112, UNAM, México.

Marco V. José performed the analyses and calculations, analyzed the data, contributed programs/analysis tools, performed the analysis, coordinated the research, and wrote the whole manuscript. Eberto R. Morgado contributed to ideas, performed the analyses and calculations, contributed with the algebraic analysis, and wrote drafts of the manuscript. Romeu Cardoso Guimarães formulated the original problem, analyzed the data, put the results into biological context and critically edited the manuscript. Gabriel S. Zamudio contributed to the analysis and calculations of the whole part of the graphs of amino acids, discussed and analyzed the results, wrote drafts of portions of the manuscript, and elaborated the Supplementary Material. Sávio Torres de Farías, Juan R. Bobadilla, and Daniela Sosa contributed to ideas, interpretation, calculations and analysis collection and preparation of data. All authors have read and approved the final published manuscript.

The authors declare no conflict of interest.