# The Graph, Geometry and Symmetries of the Genetic Code with Hamming Metric

## Abstract

**:**

## 1. Introduction

^{6}) 6-bit binary words to a 6-cube with 192 (=64 × 6/2) edges connecting vertices representing words at 1-Hamming distance (each vertex is incident on six edges; each code word is at 1-HD of six other words.) Hamming distances and their geometric representation are basics tools of mathematical code analysis, with, among others, relevance to a code’s error detection and correction capacities [9]. Hamming’s cube model has inspired similar models for the genetic code, but importantly the genetic code is quaternary—it uses four symbols, {A, C, G, U}, and not binary—{0,1}. This has implications for the geometric model of the code that hitherto have not been recognized to the best knowledge of this author.

**Figure 1.**The standard codon table. The table orders the 64 codons into 16 blocks of four codons varying at the third position only—the family boxes. The nucleotides are ordered as in (U, C, A, G). The rows are in this order by the first, the columns by the second, and the blocks by third codon position. The 64 slots each contain a codon and the message encoded by this codon, an amino acid or stop signal. The stop codon slots are white; the amino acid slots are colored with the color code for the Polar Requirement of the amino acid shown in Figure 2.

**Figure 2.**The Polar Requirements of 20 amino acids. Polar Requirement (PR) values for the 20 amino acids encoded by the canonical genetic code are listed by increasing value and color coded by a gradation of rainbow colors. Hydrophobic amino acids have PRs less than the PR of Ser and are colored with the purple to blue values, while hydrophilic amino acids have PRs greater than the PR of Ser and are colored with green to yellow and orange values.

**Figure 3.**The Hamming 3-cube. The Hamming 3-cube is the geometric model for the 8-word, 3-length binary code, the Euclidian vertex coordinates correspond with the code words as indicated in the figure.

## 2. Preliminaries

#### 2.1. The Code Function

^{3}) three-letter code words that can be made up by the four letter alphabet {A, C, G, U}. The genetic code is a length-3 block code—all code words have the same 3-length, and it is a quaternary code—build with four symbols, as opposed to the more common binary computer codes constructed with two symbols {0,1}. Each code word encodes a single message, an amino acid or stop signal, as reflected in the standard codon table (see Figure 1). The code function C: 64 codons → 21 messages is an onto mapping, or surjection, that reaches all 21 target messages at least once, but this coding function is not a bijection as several codons map to the same message, i.e., the function C is not invertible. The encoding of the same message by different (synonymous) codons is known by biologists as the degeneracy of the code. Degeneracy is not identical with the coding theory notion of redundancy, which relates to code words that are longer than minimally required, say length-3 instead of length-2 blocks. The code is not redundant in this sense as a 4-letter length-2 block code can encode at most 16 (=4

^{2}) messages, not 21. We use [n] as notation for a finite n-set = {1, 2, …, n} so that the codons can be indexed (numbered) by [64] = {1, …, 64} and the messages by [21]. The genetic code is one of 1.51 × 1084 different [64] → [21] surjections, an astronomically large function space [24]. (Appendix A summarizes the notation and combinatorial counting formulas used in this article, see [25]) The fact that just one code (or at most a few, very similar codes) evolved is a strong argument in favor of a last unique common ancestor (LUCA) for all known living organisms, but how, pre-LUCA, this unique code evolved is a yet unanswered question.

#### 2.2. The Hamming Distances between the 64 Codons of the Codon Set

#### 2.3. The CodonGraph, a Graph Representation of the Codon Set with Hamming Metric

_{1}, v

_{2}, … , v

_{n}} and E = {{v

_{i},v

_{j}}, … , {v

_{x},v

_{y}}}. Two vertices incident on the same edge are adjacent. The CodonGraph is comprised of 64 vertices, representing the codons, and 288 edges between adjacent vertices [24]. Adjacent vertices represent codons at Hamming 1-distance, and the graph is 9-regular as each vertex is adjacent to nine vertices representing the nine nearest neighbor codons at 1-distance. Therefore the graph contains 288 edges (288 = 64 × 9/2, two vertices per edge). Each vertex is connected via nine, 1-edge shortest paths with nine adjacent vertices, via 27, 2-edge shortest paths with 27 vertices representing codons at Hamming 2-distance, and via 27, 3-edge shortest paths with the remaining 27 vertices representing codons at Hamming 3-distance. The graph’s edge-metric (the number of edges on the shortest path) thus corresponds one-to-one with the Hamming metric: the graph representation of the codon set preserves the intercodon Hamming distances. Figure 5 shows a circular embedding of the CodonGraph—all vertices are arranged on a circle, numbered counterclockwise, and labeled with the codons in lexicographical order. A subgraph of the CodonGraph induced by a single vertex (such as vertex-1, AAA in Figure 6) comprises its nine adjacent vertices and contains three K4-graphs—complete-4 graphs made up of four adjacent vertices and six edges. The single induction vertex (vertex-1) is a cut vertex that connects the three K4-graphs (deleting this vertex disconnects the graph, cuts it in separate pieces). Each K4-graph contains four vertices that represent four codons that only differ at one particular codon position; for example, the vertices representing the codons AAA, AAC, AAG and AAU make up a K4-graph in Figure 6. Each of the 64 graph vertices and its nine adjacent vertices form a closed neighborhood made up of three K4-graphs as shown in Figure 6 (only the vertex labels change). Vertices connected via respectively 2- and 3-edge shortest paths are diagonal opposites of square- and cube-subgraphs—graph representations of the eponymous geometric figures, as shown in Figure 7. The CodonGraph has 3-width—the shortest path between any two vertices contains at most three edges, and the subgraphs of Figure 5, Figure 6 and Figure 7 thus illustrate all intercodon Hamming distance relationships.

**Figure 4.**The CodonDistanceMatrix. This 64 × 64 matrix shows the Hamming distances between the 64 codons numbered as in Figure 5. Zero-, 1-, 2- and 3-distances between codon-i (=row-i) and codon-j (=column-j) correspond with, respectively, black, dark gray, light gray, and white (small square) matrix entries-(i,j). Distance-0 entries (black) fall on the main diagonal, distance-1 (dark gray) entries correspond with edges between vertices i and j of the CodonGraph (Figure 5). Each row/column contains one 0-distance; nine 1-distances; 27, 2-distances; and 27, 3-distances [24].

**Figure 5.**Circular embedding of the CodonGraph. The graph’s 64 vertices are numbered and labeled counterclockwise with codons in lexicographical order and its 288 edges connect adjacent vertices at representing codons at Hamming 1-distance [24].

**Figure 6.**The closed neighborhood of vertex-1 of the CodonGraph. The subgraph of the CodonGraph induced by vertex-1 AAA and its nine adjacent vertices consists of three K4-graphs linked by cut vertex-1. The vertices are numbered and labeled as in Figure 5. Apart from the numbers and labels, the closed neighborhoods of all 64 vertices of the CodonGraph are identical [24].

**Figure 7.**A cube subgraph of the CodonGraph. The cube-graph shows that vertices representing codons at Hamming 2- and 3-distances are diagonal opposites of, respectively, square- and cube-subgraphs of the CodonGraph. For example, codons 50 = UAC and 58 = UGC are, respectively, at 2- and 3-distance of codon 1 = AAA. The vertices are labeled as in Figure 5 [24].

#### 2.4. From Graph to Geometry

#### 2.5. Permutation Symmetries of Graphs and Euclidian Symmetries of Geometric Objects

_{3}, the Symmetric group that contains all six permutations of three points, such as the three vertices of the triangle. (Isomorphic groups are essentially the same group, identical to abstract groups of the same number of group elements and their composition.) The mirror and rotation symmetries leave the triangle invariant—in the same position in the plane, but induce permutations of the triangle vertices that correspond with those of S

_{3}. Similarly, the symmetry group of a triangle graph ({v

_{1}, v

_{2}, v

_{3}}, {{v

_{1},v

_{2}}, {v

_{1},v

_{3}}, {v

_{2},v

_{3}}}) permutes the three vertices {v

_{1}, v

_{2}, v

_{3}}, and because all six permutations leave the edge set {{v

_{1},v

_{2}}, {v

_{1},v

_{3}}, {v

_{2},v

_{3}}}invariant, this group is also isomorphic to S

_{3}.

**Figure 8.**Equilateral triangle with mirrors. The vertices of the triangle are numbered (1, 2, 3) and the mirrors are labeled μ1, μ2, and μ3. The mirrors are incident on one vertex and orthogonally bisect the opposite side; they usually are seen as 2-dimensional planes perpendicular to the plane containing the triangle. Reflections in the three mirrors generate the D3 group of six symmetries.

_{4}) with a central carbon and four hydrogen atoms at the vertices of a tetrahedron. The A3-group is isomorphic with the permutation group S

_{4}, the Symmetric group on four objects, such as {A, C, G, U}, comprising all 24 permutations of these objects (Appendix B and Appendix C). The symmetry group of the K4-graph (Figure 6) of order 24 (the number of group elements) permutes the graph’s four vertices, but leaves its vertex and edge sets invariant and is isomorphic to both S

_{4}and A3 (isomorphism is transitive: if a ≈ b and b ≈ c then a ≈ c).

**Figure 9.**Tetrahedron inscribed in a cube with a mirror plane. The vertices of the tetrahedron are numbered (1, 2, 3, 4) and labeled (A, C, G, U). The front face is colored dark grey and the frontal edges are bold-solid-black while the edges hidden behind the front are bold-dashed-black. The edges of the cube are thin-gray-dashed. One of the six mirror planes of the tetrahedron is colored light-gray and outlined in thin-black. All six mirrors are planes intersecting the cube diagonally; reflections in these mirrors generate the Coxeter A3 group of 24 symmetries.

## 3. The CodonArray Embedding of the CodonGraph

^{3}cube-graphs as there are C(4,2) = 6 ways to pick two from four values for each of the three indices, see Appendix A).

**Figure 10.**The CodonArray embedding of the CodonGraph. The CodonArray is a 3-dimensional 4 × 4 × 4 array embedding of the 64 vertices of the CodonGraph; it is a graph, not a Euclidian cube. Only some vertices are labeled and only three of the six edges per row/column are shown so as not to clutter the image.

**Table 1.**The 3376 subgraphs of the CodonGraph and faces of the CodonPolytope. The subgraphs correspond with subarrays of the 4 × 4 × 4 CodonArray, and each subgraph corresponds with a face of the CodonPolytope. The number of vertices and edges of each subarray is listed; the dimensions are the dimensions of the corresponding polytope faces, and the number of faces corresponds with the number of subarrays. Non-congruent faces of the same dimension are distinguished by proper names (Triangle versus Square) or capital letters (4A-Face versus 4B-Face). Subarray A × B × C stands for all permutations of A, B and C as their order does not matter: e.g., 4 × 1 × 1 = 1 × 4 × 1 = 1 × 1 × 4.

Sub-Arrays of the CodonArray and the corresponding Faces of the CodonPolytope | ||||||
---|---|---|---|---|---|---|

Dimension | Sub-Array | Face Name | Vertices | Edges | Faces | Faces per Dimension |

9 | 4 × 4 × 4 | Polytope | 64 | 288 | 1 | 1 |

8 | 4 × 4× 3 | 8-Facet | 48 | 192 | 12 | 12 |

7 | 4 × 4 × 2 | 7A-Ridge | 32 | 112 | 18 | 66 |

4 × 3 × 3 | 7B-Ridge | 36 | 126 | 48 | 66 | |

6 | 4 × 4 × 1 | 6A-Face | 16 | 48 | 12 | |

4 × 3 × 2 | 6B-Face | 24 | 72 | 144 | ||

3 × 3 × 3 | 6C-Face | 27 | 81 | 64 | 220 | |

5 | 4 × 3 × 1 | 5A-Face | 12 | 30 | 96 | |

4 × 2 × 2 | 5B-Face | 16 | 40 | 108 | ||

3 × 3 × 2 | 5C-Face | 18 | 45 | 288 | 492 | |

4 | 4 × 2 × 1 | 4A-Face | 8 | 16 | 144 | |

3 × 3 × 1 | 4B-Face | 9 | 18 | 192 | ||

3 × 2 × 2 | 4C-Face | 12 | 24 | 432 | 768 | |

3 | 4 × 1 × 1 | Tetrahedron | 4 | 6 | 48 | |

3 × 2 × 1 | Prism | 6 | 9 | 576 | ||

2 × 2 × 2 | Cube | 8 | 12 | 216 | 840 | |

2 | 3 × 1 × 1 | Triangle | 3 | 3 | 192 | |

2 × 2 × 1 | Square | 4 | 4 | 432 | 624 | |

1 | 2 × 1 × 1 | Edge | 2 | 1 | 288 | 288 |

0 | 1 × 1 × 1 | Vertex | 1 | 0 | 64 | 64 |

−1 | 0 × 0 × 0 | Empty | 0 | 0 | 1 | 1 |

## 4. The Geometric Model: The CodonPolytope

#### 4.1. The Construction and Characterization of the CodonPolytope

_{1}× Th

_{2}× Th

_{3}. The Cartesian product of the vertex sets of the three tetrahedrons produces the 64 (=4

^{3}) vertices of the polytope. When labeled as above (Figure 9), the vertices of the CodonPolytope are numbered and labeled like those of the CodonArray graph. For example, the product of vertex-3 of Th

_{1}, vertex-2 of Th

_{2}and vertex-4 of Th

_{3}generates polytope vertex (3,2,4) = GCU. By construction every polytope vertex is incident on three tetrahedrons and connected via nine equal-length edges with nine adjacent vertices. This geometry corresponds with the closed neighborhood of the CodonGraph vertices, which are incident on three K4-graphs (Figure 6 and Figure 10). Therefore the vertex and edge sets of the CodonPolytope and CodonGraph correspond one-to-one: the graph uniquely represents the polytope—any graph representing the polytope is isomorphic to the CodonGraph. (Section 2.4. discusses the relation between a graph and the geometric objects it represents).

**Table 2.**The 3376 faces of the Codon-Polar-Polytope. The 12 vertices of the polytope are located on three tetrahedrons, each residing in a different 3-dimensional subspace of a Euclidian 9-space. The notation P + Q + R stands for P-, Q-, and R-vertices on the different tetrahedrons; e.g., each of the vertices of a 1 + 1 + 1 A-Triangle is incident on a different tetrahedron, while those of a 3 + 0 + 0 C-Triangle are all incident on the same tetrahedron.

Faces of the Codon-Polar-Polytope | ||||||
---|---|---|---|---|---|---|

Dimension | Vertex sets | Face Name | Vertices | Edges | Faces | Faces per Dimension |

9 | 4 + 4 + 4 | PolarPolytope | 12 | 66 | 1 | 1 |

8 | 3 + 3 + 3 | 8-Facet | 9 | 36 | 64 | 64 |

7 | 3 + 3 + 2 | 7-Ridge | 8 | 28 | 288 | 288 |

6 | 3 + 2 + 2 | 6A-Face | 7 | 21 | 432 | |

3 + 3 + 1 | 6B-Face | 7 | 21 | 192 | 624 | |

5 | 2 + 2 + 2 | 5A-Face | 6 | 15 | 216 | |

3 + 2 + 1 | 5B-Face | 6 | 15 | 576 | ||

3 + 3 + 0 | 5C-Face | 6 | 15 | 48 | 840 | |

4 | 2 + 2 + 1 | 4A-Face | 5 | 10 | 432 | |

3 + 1 + 1 | 4B-Face | 5 | 10 | 192 | ||

3 + 2 + 0 | 4C-Face | 5 | 10 | 144 | 768 | |

3 | 2 + 1 + 1 | A-Tetrahedron | 4 | 6 | 288 | |

2 + 2 + 0 | B-Tetrahedron | 4 | 6 | 108 | ||

3 + 1 + 0 | C-Tetrahedron | 4 | 6 | 96 | 492 | |

2 | 1 + 1 + 1 | A-Triangle | 3 | 3 | 64 | |

2 + 1 + 0 | B-Triangle | 3 | 3 | 144 | ||

3 + 0 + 0 | C-Triangle | 3 | 3 | 12 | 220 | |

1 | 1 + 1 + 0 | A-Edge | 2 | 1 | 48 | |

2 + 0 + 0 | B-Edge | 2 | 1 | 18 | 66 | |

0 | 1 + 0 + 0 | Vertex | 1 | 0 | 12 | 12 |

−1 | 0 + 0 + 0 | Empty | 0 | 0 | 1 | 1 |

#### 4.2. A Realization of the CodonPolytope

_{1}× Th

_{2}× Th

_{3}product, 64 vertices with 9-space coordinates—concatenations of the relevant sets of 3-space coordinates. To illustrate, polytope vertex (3,2,4) = GCU has coordinates (1,−1,−1,−1,−1,1,−1,1,−1). The realized polytope is centered on the origin of the 9-space, and all its vertices are located on an 8-sphere surface at Euclidian 3-distance from the origin (√(1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1) = √9 for all vertices). The Hamming 1-, 2- and 3-distances between the codon labels of the vertices are in one-to-one correspondence with the Euclidian distances 2√2, 4 and 2√6 between vertices and with one, two, or three edges on the shortest path between vertices. These mappings of the intercodon Hamming distances onto the polytope are well defined and invertible, and thus preserve the Hamming metrics of the code. The polytope is inscribed in a 9-cube with 512 (=2

^{9}) vertices that have space coordinates composed of nine ±1 entries. Only one out of eight cube vertices coincides with a polytope vertex.

#### 4.3. The Point Symmetry Group of the CodonPolytope

_{3}and S

_{4}respectively.

_{1}acts on the 3-subspace spanned by dimensions 1, 2 and 3; A3

_{2}on the space spanned by dimensions 4, 5 and 6; and A3

_{3}on the space spanned by dimensions 7, 8 and 9. Their 18 mirror planes (3 × 6 per tetrahedron), which are 8-dimensional hyperplanes dividing the 9-space into two 9-dimensional half spaces, generate all 13,824 (24

^{3}) polytope symmetries of A3

_{1}× A3

_{2}× A3

_{3}. Each mirror bisects 16 polytope edges and exchanges the 32 vertices incident on these edges, but does not move the other 32 vertices as they lie within the 8-plane. For example, one mirror fixes all 32 {NNA, NNC} (N stands for any letter), but exchanges all 32 other codon labels, NNG ↔ NNU, analogous to the tetrahedron G ↔ U mirror shown in Figure 9. All 18 edges of the three tetrahedrons in Figure 6 (represented by K4 graphs) are bisected by one of the 18 mirrors, e.g., mirror NNG ↔ NNU bisects {AAG, AAU}, but fixes all other vertices of Figure 6. The six mirrors of each A3 are perpendicular to the mirrors of the other two groups so that reflections of different A3 groups commute, e.g., the order of multiplication of A3

_{1}× A3

_{2}× A3

_{3}, the direct product of the three A3 symmetry groups, does not matter. The three A3 groups are identical as are the three 3D-subspaces on which they act. Six transformations (3 rotations, including the 0-degree identity rotation, and three reflections) exchange these 3D-subspaces and form a 9-dimensional reflection group isomorphic to D3, the symmetry group of the equilateral triangle. These transformations exchange the six tetrahedron mirror planes between the three 3D-subspaces, and thus exchange or permute the three A3 groups. The symmetry group of the CodonPolytope thus is formed by the product of the 9-dimensional reflection group isomorphic to D3 with the direct product of the three A3 groups. This space symmetry group is isomorphic to the permutation group formed by the wreath product of S

_{3}with the direct product of three S

_{4}groups: S

_{3}x

_{wreath}(S

_{4})

_{1}× (S

_{4})

_{2}× (S

_{4})

_{3}—in this wreath product S

_{3}permutes the three S

_{4}groups (as described above for the three A3 groups), or equivalently the 4-sets upon which they act (analogous to the permutations of the 3D-subspaces mentioned above). The actions of the permutation group S

_{3}x

_{wreath}(S

_{4})

_{1}× (S

_{4})

_{2}× (S

_{4})

_{3}on the codon set are described in Section 5.1 and Appendix B; for wreath products and permutation group theory see [30,31]. The CodonPolytope symmetry group has order 82,944 (=6 × 24

^{3}).

^{9}× 9!) [32]. These transformations correspond with 9 × 9 matrices having only one non-zero ±1 entry per row and column (in particular, all 512 = 2

^{9}vertices of a unit 9-cube with 9 ± 1 coordinates are permuted by these transformations). Only those matrices that map the set of coordinates of the 64 vertices of a realization of the polytope (Section 4.2) onto itself are symmetries of the polytope and a computer scan of the 185,794,560 matrices identified 82,944 such symmetries; one advantage of a geometric model is that it permits such brute experimental computation. These 82,944 matrices can be partitioned into six sets of similar block matrices, each matrix is composed of nine 3 × 3 blocks: three 3 × 3 blocks contain the 24 tetrahedron symmetries for the three 3D-subspaces (the A-entries) and six 3 × 3 blocks are zero matrices (the dot “.”-entries). These six matrix sets are: $\left(\begin{array}{ccc}A& .& .\\ .& A& .\\ .& .& A\end{array}\right)$ , $\left(\begin{array}{ccc}A& .& .\\ .& .& A\\ .& A& .\end{array}\right)$ , $\left(\begin{array}{ccc}.& .& A\\ .& A& .\\ A& .& .\end{array}\right)$ , $\left(\begin{array}{ccc}.& A& .\\ A& .& .\\ .& .& A\end{array}\right)\text{},$ $\left(\begin{array}{ccc}.& A& .\\ .& .& A\\ A& .& .\end{array}\right)\text{}$, $\left(\begin{array}{ccc}.& .& A\\ A& .& .\\ .& A& .\end{array}\right)$.

_{3}that exchange the 3 perpendicular 3-subspaces (replacing the A-blocks with 3 × 3 identity matrices corresponds with a matrix representation of S

_{3}in 9-space.) Each set of matrices comprises 13,824 (24

^{3}) symmetries, and the union of the six sets 82,944 symmetries. The matrix symmetry group of the CodonPolytope thus is a O-(9, Z)-subgroup with index 2240 (=2

^{9}× 9!/82,944)—only one of every 2240 symmetries of O-(9, Z) is a symmetry of the polytope, but all 185,794,560 transformations of O-(9, Z) are symmetries of the unit 9-cube in which the polytope is inscribed.

_{6}x

_{wreath}(S

_{2})

^{6}. This group has order 46,080 (=6! × 2

^{6}= 720 × 64) and thus is smaller than the CodonPolytope symmetry group, but the 6-cube group is not a subgroup of the polytope group as its index 1.8 (= 82,944/46,080) is not an integer. S

_{3}x

_{wreath}(S

_{2}× S

_{2})

_{1}× (S

_{2}× S

_{2})

_{2}× (S

_{2}× S

_{2})

_{3}of order 384 (=6 × 4

^{3}) is the largest group that the six cube and polytope groups have in common. Each S

_{2}permutes the two bits at one of the six-positions, each S

_{2}× S

_{2}is isomorphic to the Klein Four group, and the S

_{3}wreath product exchanges the 2-bit sets at positions {1,2}, {3,4}, and {5,6}—each set corresponds with a codon position. The S

_{2}× S

_{2}subgroup of S

_{4}corresponds with the three 180 degree rotations of the tetrahedron plus the 0 degree rotation identity of the A3 tetrahedron symmetry group (see Appendix C).

#### 4.4. Symmetries of the Polytope Faces

_{4}, the triangle group to S

_{3}, the edge group to S

_{2}, and the vertex group to S

_{1}, which is usually omitted. As shown above, the 4 × 4 × 4 polytope is the product of three tetrahedrons and its stabilizer group is isomorphic to S

_{3}x

_{wreath}(S

_{4})

_{1}× (S

_{4})

_{2}× (S

_{4})

_{3}(Section 4.3). Similarly the stabilizer of a 2 × 2 × 2 cube face is isomorphic to S

_{3}x

_{wreath}(S

_{2})

_{1}× (S

_{2})

_{2}× (S

_{2})

_{3}of order 48 (=6 × 2

^{3}), which is isomorphic to the cube reflection symmetry group. And the stabilizer group of the 6-face corresponding with a 4 × 3 × 2 subarray is isomorphic to the direct product (S

_{4})

_{1}× (S

_{3})

_{2}× (S

_{2})

_{3}of order 288 (=24 × 6 × 2); this symmetry group lacks a wreath product permuting the three Symmetric groups because these groups are unequal; equivalently, the tetrahedron, triangle and line segment are not congruent and no Euclidian symmetry exchanges them.

## 5. The Symmetries of the CodonGraph and Codon Set

#### 5.1. The Symmetries of the Codon Set That Preserve Hamming Distances

_{4}, which permutes four objects—just four points of zero dimension in an abstract point space in correspondence with the four vertices of the K4-graph or those of the tetrahedron (Section 4.3 and Appendix B and Appendix C). Similarly, the symmetry group of the CodonPolytope—a 9-dimensional geometric object, is isomorphic to the symmetry group of the CodonGraph—a set of 64 vertices and 288 edges, and both are isomorphic to the permutation group S

_{3}x

_{wreath}(S

_{4})

_{1}× (S

_{4})

_{2}× (S

_{4})

_{3}—the wreath product of S

_{3}with the direct product of three S

_{4}(Section 4.3). (Notation: the index-i for i = 1, 2, 3, in (S

_{4})

_{i}corresponds with the codon position acted on by the S

_{4}group, and S

_{3}permutes the codon positions; See also Appendix B and [30,31].) Perhaps it is helpful to “visualize” this permutation group as the symmetry group of a (large) equilateral triangle with three congruent tetrahedrons, labeled 1, 2 and 3, centered on the vertices of this triangle: each tetrahedron is transformed independently of the two others by its own 24 symmetries in its own 3-space in correspondence with the actions of S

_{4}on the tetrahedron vertex labels {A, C, G, U}, and the tetrahedrons are exchanged by the six symmetries of the triangle in correspondence with the actions of S

_{3}on three S

_{4}groups. With the vertices of the triangle and the tetrahedrons labeled as in the Figure 8 and Figure 9, this configuration has 6 × 24 × 24 × 24 = 82,944 differently labeled, but identical geometries. Most importantly, as will be shown below, S

_{3}x

_{wreath}(S

_{4})

_{1}× (S

_{4})

_{2}× (S

_{4})

_{3}acting on the codon space (64 points in an abstract point space with Hamming metric) preserves the Hamming metric: the Hamming distance between any two codons p, q є [64] is the same before and after any of the 82,944 permutations of the codon set induced by this group. The group induces two kinds of permutations of the 64 codons: the three S

_{4}permute the four letters at each of the three codon positions, and the S

_{3}wreath product permutes the three indices, or equivalently, the three codon positions (Appendix B). For example, the permutation (1,2,3,4) → (2,1,4,3) є S4 induces the permutation (A,C,G,U) → (C,A,U,G), and when acting at the middle codon position—e.g., as element of (S

_{4})

_{2}, induces a permutation of all 64 codons: AAA ↔ ACA, AGA ↔ AUA, … etc., a reordering of the lexicographically ordered codons: (1, 2, 3, 4, 5, 6, …, 63, 64) → (2, 1, 4, 3, 6, 5, …, 64, 63), a [64] → [64] permutation of the codon set. Similarly the permutation (1,2,3) → (3,2,1) є S3 acting on the codon set exchanges the letters at the first and third codon position of all codons: AAA ↔ AAA, AAC ↔ CAA, CAG ↔ GAC, … etc., which also reorders the indexed codons. A basic theorem states that every permutation and Symmetric group can be generated from transpositions, or inversions—permutations exchanging two elements (a, b) → (b, a); the above (1,2,3) → (3,2,1) є S3 is a (1,3) → (3,1) inversion, and the (1,2,3,4) → (2,1,4,3) є S

_{4}contains two inversions. As can be readily checked the three inversions contained in S

_{3}and the six in S

_{4}induce permutations of the codon set that preserve the Hamming metric, and therefore all permutations generated by these inversions do as well, that is all of S

_{3}and S

_{4}, and the 82,944 permutations of S

_{3}x

_{wreath}(S

_{4})

_{1}× (S

_{4})

_{2}× (S

_{4})

_{3}.

_{64}and any inversion of S

_{64}: (p, q) → (q, p) є S

_{64}for p, q є [64], affects just two codons—say, for p = 2 and q = 4: AAC ↔ AAU, which changes their intercodon Hamming distances with many fixed codons, with, for example, ACC and AUU (AAC and ACC are at 1-HD, but after the permutation AAC → AAU, AAU and ACC are at 2-HD). All 30 other codons ending with C or U have to undergo the same exchange to preserve the Hamming metric of the codon space, while the 32 codons ending in A or G can remain fixed, and the letters at codon positions 1 or 2 of all codons do not change (these findings are easily confirmed by experiment, or by a long proof by cases). Together these 16 inversions of S

_{64}preserve the Hamming metric, but their union is identical with the inversion (C, U) → (U, C) induced by (3, 4) → (4, 3) є (S

_{4})

_{3}acting on the codon set. Some inversions of S

_{64}(such as for p = 7 and q = 19: ACG ↔ CAG) result in letter changes in more than one codon position, and when complemented by the relevant 15 inversions at each position to preserve the Hamming metric, their union corresponds with an action of the S

_{4}groups acting at each codon position (as above). Alternatively in some cases, such as for ACG ↔ CAG, inversions of S

_{64}(that transpose codon positions 1 and 2) affecting all eight codons ACN ↔ CAN, but then also of the eight AGN ↔ GAN and eight UCN ↔ CUN codons, etc., can complement the initial inversion to preserve the Hamming metric, and their union then equals the inversion (1,2) → (2,1) є S3 acting on the codon set. (The above set of inversions of S

_{64}does not include any permutations of the 16 codons {AAN, CCN, GGN, UUN}, but the action of (1,2) → (2,1) є S3 on these codons also equals the identity, that is they remain fixed.) Thus a single inversion of S

_{64}acting on the codon space does not preserve the Hamming metric, but this inversion can be complemented with a set of similar inversions, the union of which preserves the metric, but then also equals one of the permutations of the codon set induced by an action of S

_{3}x

_{wreath}(S

_{4})

_{1}× (S

_{4})

_{2}× (S

_{4})

_{3}. This wreath product therefore contains all permutations of the codon set that preserve the intercodon Hamming distances; it is the largest subgroup of S

_{64}that does this. Both the wreath product of order 82,944 and the direct product (S

_{4})

_{1}× (S

_{4})

_{2}× (S

_{4})

_{3}of order 13,824 are transitive on the codon set—they permute any codon into any other; the orbit of each codon under these symmetry groups equals the whole set. Both groups are small subgroups of S

_{64}with integer index ≈9.18 × 10

^{84}and ≈1.53 × 10

^{84}respectively (the index equals the order of S

_{64}/order of subgroup). S

_{64}contains all 64! (≈1.27 × 10

^{89}) permutations of the codon set and permutations preserving the Hamming metric are relatively rare—only about 1 of every 10

^{83}permutations of the codon set does so.

#### 5.2. The Symmetries of the CodonGraph

_{4}groups. For example, the inversion (1,2,3,4) → (2,1,3,4) є S

_{4}acting on the values of index-j induces a relabeling of the graph’s vertices (A,C,G,U) → (C,A,G,U) that affects 32 codons with A or C at the middle position (i.e., AAA ↔ ACA): the 16 vertices of the front of the “cube” of Figure 10 (AAA to UAU) are exchanged with the 16 vertices right behind them (ACA to UCU), and this leaves the structure of the array intact (both the vertex and edge sets are invariant, only the labels changed.) The same holds for all such inversions for all three S

_{4}and thus for the direct product (S

_{4})

_{i}× (S

_{4})

_{j}× (S

_{4})

_{k}. (Notation: the index-p with p = i, j, or k in (S

_{4})

_{p}corresponds with the array index i, j, k acted on by the S

_{4}group.) The S

_{3}wreath product permutes the three indices (i, j, k), which is a symmetry of the CodonArray as the array is identical in each of these three directions. Inspection of the array reveals no other symmetries, and exchanging just one vertex (with edges attached) between different rows, requires readjustments of other vertices (with edges attached) like those described for codon inversions in Section 5.1 to restore the array. The symmetry group of the graph thus equals S

_{3}x

_{wreath}(S

_{4})

_{i}× (S

_{4})

_{j}× (S

_{4})

_{k}of order 82,944. (The CodonGraph symmetry group was derived in a different way in [24].) The graph symmetry group contains stabilizer subgroups for all its 3376 subgraphs; they are isomorphic to the stabilizer groups for the polytope faces, and easily derived from the subarrays, as discussed in Section 4.4. For example, the wreath product contains 216 (=6

^{3}) copies of S

_{3}x

_{wreath}(S

_{2})

_{i}× (S

_{2})

_{j}× (S

_{2})

_{k}—isomorphic to symmetry group of the cube, as each S

_{4}contains 6 S

_{2}subgroups; the 216 cube subgraphs are easily seen in the array (take any two vertices of each of the three orthogonal rows and “fill out” the cube). The graph symmetries permute the codon labels of the vertices of the array, and indeed, the array vertices can be labeled in 82,944 ways: Any of the 64 codons can be mapped to vertex (1,1,1), e.g., AAA → (1,1,1) as in Figure 10. This leaves nine labeling choices for a vertex at Hamming 1-distance of AAA, e.g., AAC → (1,1,2), which maps codon position-3 to array index-k and leaves only two choices for the remaining two letters in the third position, e.g., (AAG), (AAU) → (1,1,3), (1,1,4). Labeling one of remaining six vertices at 1-distance of (1,1,1) = AAA, e.g., ACA → (1,2,1), maps the two other codon positions to specific indices: 2 → j and 1 → i, and this restricts the labeling of the other five vertices to these two rows. In total 64 × (9 × 2 × 1) × (6 × 2 × 1) × (3 × 2 × 1) = 82,944 labeling choices—a confirmation of the order of the graph’s symmetry group. Without the array structure—the 288 edges between the 64 vertices in the 9-regular graph, the 64 vertices can be labeled in 64! ways, the number of [64] → [64] onto mappings of 64 labels onto 64 points.

## 6. Symmetries of the Genetic Code

#### 6.1. Exact and Near Symmetries of the Code

#### 6.2. Conservative and Anti-Conservative Symmetries of the Code

_{2}x

_{wreath}(S

_{4})

_{1}× S

_{2}× (S

_{4})

_{3}; the S

_{2}wreath product permutes the 1st and 3rd codon positions, Section 4.4) are conservative or near conservative code symmetries. A few face symmetries are exact code symmetries, but stop codons, or a minority of codons encoding non-similar amino acids in the right table half cause some dissymmetries as well. Most strikingly, the polytope symmetries that exchange the two 7A-ridges (two different 180 rotations generated by two different 2-mirror systems {NAN ↔ NCN, NGN ↔ NUN} and {NAN ↔ NUN, NCN ↔ NGN}), are anti-conservative or near anti-conservative symmetries of the code as they exchange codon assignments between hydrophobic and hydrophilic amino acids for most permuted codons. (An anti-symmetry or black-white, plus-minus, or 0–1 symmetry exchanges parts that are equivalent but of opposite binary value: for example 010 ↔ 101). This hydrophobic ↔ hydrophilic anti-symmetry on the amino acid level corresponds with an anti-symmetry R ↔ Y on the codon level (both 180 rotations above map onto NRN ↔ NYN). Therefore we conjecture that at a very early pre-LUCA stage of the code’s evolution the peptide synthesis machinery distinguished between hydrophobic and hydrophilic amino acids (probably just a few of each kind existed), and at the same time only discriminated between purines and pyrimidines at the middle codon position (the other codon positions do not impact the hydrophobic ↔ hydrophilic anti-symmetry). Recognition of codons by anti-codons through wobble pairing at the middle codon position only, analogous to the wobble pairing at the third position in extant organisms, could have been the biological mechanism underlying the NRN ↔ NYN anti-symmetry. This needs minimally two anti-codons: one with a G and one with a U at the middle position to recognize, respectively, all 32 NYN and 32 NRN codons. The two anti-symmetries, hydrophobic ↔ hydrophilic and R ↔ Y, generate a primitive, initial code capable of controlling to a large extent the hydrophilic / hydrophobic character of synthesized peptides, and thus among others, their 3D-folding and affinity for lipid membranes. This early directed peptide synthesis likely presented a significant selective advantage over a non-directed, random peptide synthesis that probably existed before a genetic code evolved.

#### 6.3. Stronger and Weaker Symmetries of the Code

_{2}symmetries of all 16 edges spanned by the two vertices representing the two codons {NNC, NNU} are exact code symmetries due to codon-anticodon wobble pairing at the 3rd codon position. For the small but slightly larger faces, the tetrahedrons, we conjecture that the 24 tetrahedron symmetries, exact symmetries for eight tetrahedrons and conservative symmetries for six tetrahedrons in the extant code, are remnants of exact symmetries of an earlier code based on codon-anticodon Watson-Crick pairing at the first and second codon position while the third position did not matter. At this stage, the code encodes maximally 16 messages, and all tetrahedron symmetries are exact code symmetries (e.g., acidic amino acid is an exact symmetry at this stage). This earlier code evolves to the canonical code when wobble base pairing at the 3rd codon position becomes relevant, and the protein synthesis machinery develops the capacity to distinguish between very similar amino acids, such as Asp and Glu, so that some exact code symmetries evolve to conservative symmetries. By analogy and continuity going back in time, the larger polytope faces with the weaker code symmetries—the near symmetries and conservative symmetries, and the weaker yet—the near conservative symmetries, are remnants of exact symmetries of even earlier codes, going back to the earliest code represented by the two 7A-ridges (Section 6.2).

#### 6.4. Code Symmetries are Not Random

^{6}) computer generated random codes composed of the same number of codons per amino acid as the canonical code were screened for exact code symmetries and all numerical data on random codes in this section are results from this simulation. Figure 12 shows codon tables for six random codes with the amino acid color scheme of Figure 2, and visual inspection reveals that the random codes lack most of the symmetries displayed by the canonical code—compare Figure 1 and Figure 12. While the canonical genetic code polytope contains eight tetrahedrons (≈17% of all its 48 tetrahedron faces) displaying exact code symmetries, among the million random code polytopes only 3757 codes contain one, and only two codes contain two such tetrahedrons, and none has more. Thus, random codes with four codons at Hamming 1-distance encoding the same amino acid are rare (about four per 10

^{3}random codes), codes with two such “family boxes” much rarer (about 2 per 10

^{6}codes), and more than two extremely rare—so rare that our computer simulations would not stand a chance of generating a random code having eight family boxes like the genetic code (the “trend” suggests only one such code among 10

^{24}random codes). The canonical code polytope possesses 69 edges (≈24% of 288 edges) and 33 triangles (≈18% of 192 triangles) with exact code symmetries, and of these 69 edges 21 are not contained in the eight tetrahedrons displaying exact symmetries, while the same holds for only one triangle. Random codes contain far fewer of these symmetries: none of the random code polytopes possesses more than 28 edges or more than eight triangles (all contained in two tetrahedrons) with exact code symmetries; most frequently random code polytopes contain 11 or 12 such edges (for ≈135,225 per million, or ≈1/6), and no such triangles (≈2/3 has none, and only ≈1/3 has one or more such triangles). Thus stochastic evolution in a pre-LUCA organism will not generate the observed exact symmetry patterns of the genetic code with any practical probability; instead a single, random path most likely generates a code resembling the most frequently simulated codes described above (if the codon assignments of the canonical code are predetermined).

#### 6.5. The CodonPolytope Splitting Model for the Evolution of the Code

**Figure 12.**Codon tables of six random codes. The codes randomly assign the same number of codons to each message as the canonical genetic code. The 64 slots of the codon tables are colored as in Figure 1.

**Figure 13.**Binary codon tree generated by the five step polytope splitting model. The root node of the tree, labeled 0, represents the 64 codon set, nodes 1 and 2 each represent a 32 codon set, etc., The 32 leave nodes, labeled 31–62, represent 32, 2-codon sets. Each codon set corresponds with the vertices of a polytope face.

## 7. Codes Represented by Colorings of the Codon Polytope

#### 7.1. CodonPolytope Colorings as Code Models

^{84}different code functions that map the 64 codons onto the 21 messages, the C: [64] → [21] surjections, and only one of them is the canonical genetic code. Each of these codes maps each codon to a single message, C: codon-j → message-m, for indices j є [64] and m є [21] (each code is a function); each code maps at least one codon to every one of the 21 messages (an onto mapping or surjection reaches all targets, Appendix A); and in no two codes is this mapping the same for all 64 codons: the map C: codon-j → message-m differs for at least one codon-j for two different codes. Each code is modeled by the CodonPolytope via the mapping f: [64] vertices → [64] codons → [21] messages. Let map f

_{1}: vertex-i → codon-j, for i, j є [64], be fixed for all i and j—that is the same for all codes, then f: vertex-i → message-m is uniquely determined by the code C: codon-j → message-m. Each vertex is labeled with a message and no two codes are represented by the same labeling of the CodonPolytope because at least one vertex is assigned a different message. The mathematical literature with relevance to this section [36] uses colors instead of messages; so let map g: message-k to color-k for k є [21] be fixed for all k—the 21 colors represent one-to-one the 21 messages, then any code C: [64] → [21] corresponds to a specific and unique coloring of the 64 vertices of the polytope with 21 colors via the map f: vertex-i → codon-j → message-m → color-m, for i, j є [64] and m є [21]. Figure 14 illustrates some elementary coloring concepts; Appendix F contains additional background material.

#### 7.2. The Genetic Code Represents a Class of 41,472 Equivalent Codes

**Figure 14.**All two-colored tetrahedrons. The 16 ways to color the four vertices of a tetrahedron with two colors, Red and Blue: One tetrahedron with all four vertices Red (4R) and one 4B are shown on the left of the top row, the six tetrahedrons 2R2B fill out the top row and second row, the four 1R3B the third, and the four 3R1B the bottom row. These five sets of colorings are contained in Polya’s colorings classes inventory {4R, 3R1B, 2R2B, 1R3B, 4B}; each class contains equivalent colorings under the symmetry group of the tetrahedron.

#### 7.3. Counting all Classes of Codes Conveying 21 Messages

^{79}colorings classes of [64] → [21] surjections (see Table 3); representatives of different classes are non-equivalent colorings or codes (under the polytope symmetry group). The ≈1.51 × 10

^{84}, [64] → [21] codes thus are partitioned into ≈1.82 × 10

^{79}code classes with an average class size near the maximum size of 82,944 (the difference is within the rounding error). This shows that almost all codes lack exact symmetries (other than the identity—see Section 7.2), and that the canonical code with two exact symmetries and the mitochondrial code with four exact symmetries have an exceptional symmetrical structure.

**Table 3.**The number of CodonPolytope colorings classes with up to 21 colors. The first column lists the number of colors, the second the Polya colorings classes, and the third Polya-p-onto count of classes containing exactly the number of colors of the row. Numbers composed of powers of 10, n × 10

^{m}, are listed as “n m”.

Number of Colors | Polya Count of Colorings Classes | Polya p-Onto Count of Colorings Classes of Exactly p colors | |||
---|---|---|---|---|---|

Power of 10 | Power of 10 | % of Polya Count | |||

1 | 1.0 | 1.0 | |||

2 | 2.2 | 14 | 2.2 | 14 | 100 |

3 | 4.1 | 25 | 4.1 | 25 | 100 |

4 | 4.1 | 33 | 4.1 | 33 | 100 |

5 | 6.5 | 39 | 6.5 | 39 | 100 |

6 | 7.6 | 44 | 7.6 | 44 | 100 |

7 | 1.4 | 49 | 1.5 | 49 | 100 |

8 | 7.6 | 52 | 7.6 | 52 | 99.8 |

9 | 1.4 | 56 | 1.4 | 56 | 99.5 |

10 | 1.2 | 59 | 1.2 | 59 | 98.8 |

11 | 5.4 | 61 | 5.2 | 61 | 97.6 |

12 | 1.4 | 64 | 1.3 | 64 | 95.4 |

13 | 2.4 | 66 | 2.2 | 66 | 92.5 |

14 | 2.7 | 68 | 2.4 | 68 | 88 |

15 | 2.2 | 70 | 1.9 | 70 | 83 |

16 | 1.4 | 72 | 1.1 | 72 | 76 |

17 | 6.8 | 73 | 4.7 | 73 | 69 |

18 | 2.6 | 75 | 1.6 | 75 | 62 |

19 | 8.4 | 76 | 4.4 | 76 | 53 |

20 | 2.2 | 78 | 9.8 | 77 | 45 |

21 | 5.1 | 79 | 1.8 | 79 | 35 |

#### 7.4. Counting All Code Classes that Share the Pattern of the Genetic Code

^{64}colorings classes with the exact colorings pattern of canonical genetic code. (For calculations see Appendix H). Thus, only ≈1.5 of every 10

^{15}code classes of exactly 21 colors displays this pattern (≈1.5 × 10

^{−15}= ≈2.79 × 10

^{64}/≈1.82 × 10

^{79}). A code with this rare pattern is only with vanishing probability generated as a random code of 21 messages. (Our simulations of Section 6.4 used the canonical code pattern as an input to generate only random codes displaying this pattern).

#### 7.5. Colorings Classes under Color Symmetries

_{21}, the Symmetric group on [21] contains 21! (≈5.11 × 10

^{19}) permutations, and the action of S

_{21}on 21 colors renders them equivalent, although distinguishable. The colorings pattern of the code (Section 7.4 above) induces a set partition of the 64 codons: 1

^{2}2

^{9}3

^{2}4

^{5}6

^{3}, for n

^{m}: n = block size, or number of similarly colored codons, and m = number of blocks of size n (for example, 3

^{2}stands for the two sets of three codons—Leu and Stop in the canonical code). Since permutations of colors between different blocks of the same size do not alter the colorings pattern, S

_{21}induces M(21; 2, 9, 2, 5, 3) ≈ 4.89 × 10

^{10}permutations of the colorings pattern (see Appendix A for the multinomial formula) and this renders equivalent as many Polya colorings classes with the same codon set partition, but differing in the colorings of its blocks. These 4.89 × 10

^{10}different Polya colorings classes “collapse” into just one DeBruin 1

^{2}2

^{9}3

^{2}4

^{5}6

^{3}colorings class. (Here, for example, 3

^{2}stands for two blocks of three codons, each block colored differently, but all codons of a block colored identical. Please note the difference with counting set partitions 1

^{2}2

^{9}3

^{2}4

^{5}6

^{3}—the sets are not colored differently, but of the same color, only distinguished by size.) Regrettably an enumeration of all DeBruin colorings using 21 colors is not feasible so that the relative frequency of the DeBruin pattern 1

^{2}2

^{9}3

^{2}4

^{5}6

^{3}cannot calculated directly. However because each of the ≈2.79 × 10

^{64}DeBruin colorings classes with this canonical code inventory pattern (same as the Polya count for the canonical code inventory, Section 7.3) corresponds with 4.89 × 10

^{10}Polya colorings classes, there are 1.36 × 10

^{75}(≈2.79 × 10

^{64}× 4.89 × 10

^{10}) Polya code classes sharing this the DeBruin pattern. Therefore ≈0.75 out of every 10

^{4}Polya code classes of [64] → [21] surjections (≈0.75 = ≈1.36 × 10

^{75}/1.82 × 10

^{79}—Section 7.3) displays this pattern. These classes are not very rare and a million random [64] → [21] codes are likely to contain on average ≈75 codes that assign the 21 messages in the colorings pattern 1

^{2}2

^{9}3

^{2}4

^{5}6

^{3}of the genetic code, but not with the codon blocks colored as in the code (Section 7.4), and also lacking the characteristic symmetries of the canonical code (Section 6.4).

## 8. Discussion

_{3}x

_{wreath}(S

_{4})

_{1}× (S

_{4})

_{2}× (S

_{4})

_{3}(Notation: the index-i for i = 1, 2, 3, in (S

_{4})

_{i}corresponds with the codon position acted on by the S

_{4}group); S

_{4}permutes {A,C,GU}, S

_{3}the three codon positions (Section 4.3 and Section 5). This group contains all 82,944 symmetries of the 64 codon set that preserve the intercodon Hamming distances (the distance between any two codons before and after the permutation is the same). The group induces permutations of the codon set and rearrangements of the codon table; for example, the two S

_{4}acting at the first two codon positions induce all 576 codon table layouts mentioned above.

_{3}x

_{wreath}(S

_{2}× S

_{2})

_{1}× (S

_{2}× S

_{2})

_{2}× (S

_{2}× S

_{2})

_{3}preserve these distances (Section 4.3); the (S

_{2}× S

_{2}) groups are isomorphic to the Klein-4 group, see further below. In general hypercube models of nucleotide sequences such as those used by Eigen ([45], pp. 354–387) do not preserve the intercodon Hamming distances (as they do not contain triangles or tetrahedrons, Section 2.5). Nonetheless the 6-cubes are frequently used for analysis of code patterns: Jiménez-Montaño et al. [17] hold that the cube structure explains amino acid substitution patterns, and according to Karasev and Stefanov [18] the 6-cube’s topology encodes a 4-amino acid-arc helical protein topology—the “topological nature of the code”. Several investigators [12,15,17,46] derive genetic Gray codes based on the cube’s Hamming distances. The 3D “Genetic Hotels” [16,19,47,48] are projections of the 6-cube onto a “3-cube” in R

^{3}space, a 3D-version of the codon table with {C,U,A,G} mapped to {0,1,2,3} and the 1st, 2nd and 3rd codon positions plotted on the x, y and z axes respectively, so CCC corresponds with (0,0,0) and GGG with (3,3,3). The hotel “cube” resembles the CodonArray graph, Figure 10, but the hotel has only three edges per row or column, while the graph has six edges per row and is not a geometric object. In the hotel, the intercodon Hamming 1-distances vary from one to two to three cube edges; the hotel thus distorts this metric even more than the 6-cube. Moreover, the hotel 3-cube is not Euclidian, as the 0, 1, 2, and 3 coordinates are projections of the values 0, 1, α, and 1 + α of the Galois 4-Field: the distance between 1 and α equals 1 + α (not 1 as the cube suggests), and between 1 and 1 + α actually equals α (not 2). The hotel cube can be manipulated with Galois 4-Field algebra (four additions, three multiplications), but does not possess Euclidian symmetries. Different projections onto the hotel result in different 3D-code geometries; for example (C,U,A,G) produces hotel-cube-edges for amino acids encoded by just two codons [16], but (G,U,A,C) does not [48]. Other three-dimensional geometries such as a simple tetrahedral construct with 20 lattice points representing 64 codons [49] also do not preserve the Hamming metric. In fact any geometry preserving this metric in Euclidian space has to be isomorphic to the CodonPolytope (by a mathematical theorem, Section 4.1), a simple 9-polytope, and therefore no such object could exist in a Euclidian space of fewer than nine dimensions. This polytope thus provides a unique geometry for the identification of code symmetries by well-defined Euclidian symmetry transformations such as mirror reflections and rotations. The polytope has 82,944 symmetries, so one is bound to find a few interesting non-obvious code patterns and intriguing symmetry subgroups, but their biological significance might not be obvious either.

_{4}, and Klein-4 is isomorphic to S

_{2}× S

_{2}, a small subgroup of S

_{4}. These isomorphisms show that the symmetries and similarity patterns discussed above can all be expressed as Euclidian symmetries of the tetrahedron and CodonPolytope. The other geometries have drawbacks because the 24 ways to label the four vertices of a square or rectangle with {A,C,G,U} are not equivalent. The non-equivalent ways of labeling correspond with different neighborhoods for each label. Depending on the labeling of the vertices, different nucleotides are non-adjacent (diagonally opposite vertices are separated by two edges, not one), or if on a rectangle, adjacent via long or short edges. Therefore the edge or Euclidian distance between the vertices in these models cannot coincide with the “one point mutation” distance between the four nucleotides, and this is also why n-cube models cannot preserve the intercodon Hamming distances. In contrast, all ways of labeling the regular tetrahedron are equivalent: all four vertices are at 1-edge and at identical Euclidian distance in correspondence with the Hamming 1-distance between the nucleotides. Moreover the tetrahedron A3 reflection symmetry group contains subgroups isomorphic to the rectangle and square symmetry groups, and the polytope as product of three tetrahedrons contains all products of these symmetries, such as K × K × K and K × K mentioned above, in its direct product symmetry subgroup A3 × A3 × A3, isomorphic to (S

_{4})

_{1}× (S

_{4})

_{2}× (S

_{4})

_{3}.

^{2}2

^{9}3

^{2}4

^{5}6

^{3}of the codon set, Section 7). Antoneli et al. [21] claim to have screened all algebraic models within this approach, whether based on Lie groups/Lie algebras, on Lie superalgebras or on finite groups ([21] reviews all earlier work in this area). They identified several Symplectic groups, but foremost Sp(6) and its supersymmetric version as “unique solution”. The Sp(6) model evolves the canonical code from the initially non-coding 64 codons in at most four steps (Figure 5 in [21]). The first step generates six subgroups comprised of 20, 16, 10, 4 and two codons, encoding respectively Gly, Ala, Ser, Val, Asp, and His; in the second step the 20 Gly codons split into eight Gly, six Cys, and six Ile codons (and all other groups other than the two His codons split up as well); and in a third step the six Ile codons split in three Ile and three Stop codons, and most other codon blocks split as well. This splitting scheme generates the codon partition of the canonical code per force—some subgroups are artificially “frozen” (partial symmetry breaking) as otherwise blocks of synonymous codons are split into smaller sets. The similarity patterns of the early codes do not seem to match those of the codon table at all (for example, the three Ile and three Stop codons do not form one block of six related codons), and the fixation of the two His codons in the earliest code seems particular. Only three finite Symplectic groups [23] of orders 103,608; 2,903,040; and 4,245,696 evolve the code under partial symmetry breaking; these groups do not break in a predetermined order and, thus, generate a variety of early codes. Bashford et al. [22,65] find that the Lie superalgebra A(5,0) ≈ sl(6.1) can evolve the anticodon genetic code (anticodon as opposed to amino acid assignments to codon blocks) through partial symmetry breaking in five to six steps in three different ways, one of which resembles the binary tree models mentioned above ([14,24,61] and the polytope model). In a different approach, Sciarino et al. [20,67] assign the four nucleotides to the irreps of the quantum group Uq(su(2)⊕ su(2)) in the limit q → 0 (the crystal basis), obtain the codons as tensor products of the nucleotides, and their amino acid assignments as eigenvalues of a codon reading operator; different genetic codes correspond with different operators. A more recent application of this Crystal Basis Model [68] determines codon-anticodon pairing based on the “minimum principle” of the mitochondrial code: anticodon wobble base U is identified for family boxes of four synonymous codons, and G and U for boxes with two pairs of synonymous codons. This quantum group does not evolve codes, but computes similar wobble bases for early codes such as the “Archetypal” code (15 family boxes encoding a single amino acid, one box encoding Tyr and Stop, each by two codons) [69], which closely resembles the 16 tetrahedron stage of the polytope splitting model. These group theoretical models are mathematically ambitious, but none evolves ab inito with complete symmetry breaking the genetic code from a primordial non-differentiated codon set (no such group was found). The riddle as to how the code itself evolved in pre-LUCA organisms is thus alive and well.

^{89}different codes are possible; these codes assign up to 64 messages to the 64 codons. Assuming the 21 messages of the canonical code, the literature sometimes uses 21

^{64}≈ 4.2 × 10

^{84}as “first approximation,” but there are 1.5 × 10

^{84}(≈36% of 21

^{64}) codes that convey exactly 21 messages, or [64] → [21] surjections [24]. If one further assumes the number of codons assigned to each message by the canonical code as fixed, i.e., one codon to Met, six codons to Ser, etc, then the number of codes is given by the multinomial formula M(64, 1

^{2}2

^{9}3

^{2}4

^{5}6

^{3}) ≈ 2.3 × 10

^{69}(Appendix A. An often quoted formula from ([70], p. 96) is incorrect.) None of these calculations takes the 82,944 symmetries of the polytope into account. Polya’s formula enumerates colorings classes of geometric objects—in each class all colorings are equivalent (essentially the same) under the symmetry group of the object [36]. We adapted Polya’s enumeration to count only colorings using exactly 21 colors (and not fewer), the colorings then correspond with codes. When applied to the CodonPolytope with 20 amino acids and one stop signal as 21 colors the canonical code represents a class of 41,472 equivalent codes, and the mitochondrial code a class of 20,736 codes (Section 7.2). The ≈1.5 × 10

^{84}, [64] → [21] surjections are partitioned into ≈1.8 × 10

^{79}code classes with an average class size of ≈82,944 (within rounding), the maximum class size as virtually all codes, unlike the canonical and mitochondrial codes, lack exact symmetries. (Section 7.3). The ≈2.3 × 10

^{69}codes that assign the same number of codons to same message as the canonical code are partitioned into ≈ 2.8 × 10

^{64}classes of size ≈82,944 (Section 7.4); These spaces of code classes are too vast to find even a single representative of any of the classes of the few extant codes among billions of randomly generated codes; in other words, there is no chance of generating codes like the genetic code by chance.

## Acknowledgments

## Conflicts of Interest

## Appendix A. Some Discrete Mathematics, Notation and a Few Combinatorial Formulas

_{1}, C

_{2}, G

_{3}, U

_{4}. Because of this one-to-one correspondence between the index set and object set, mathematical formulas that are valid for the general index set are valid for the specific object set. We refer to Mazur’s book for the notation and the formulas used in this appendix [25].

^{2}2

^{9}3

^{2}4

^{5}6

^{3}(21 sets), then the number of ways to distribute the [64] codon set over the 21 boxes labeled with the messages equals M(64, 1

^{2}2

^{9}3

^{2}4

^{5}6

^{3}) = 64!/(1!

^{2}× 2!

^{9}× 3!

^{2}× 4!

^{5}× 6!

^{3}) ≈ 2.316 × 10

^{69}. (Biologists often quote [70] but his formula (p. 96) and result are mistaken: because he does not count the three stop codons, his formula has one factor 3! in the denominator, but then his nominator should read 61!—not 64!, and his result should be 5.56 × 10

^{64}—not 1.40 × 10

^{70}as in his text).

^{2}2

^{9}3

^{2}4

^{5}6

^{3}is thus given by: M(64, 1

^{2}2

^{9}3

^{2}4

^{5}6

^{3})/(2! × 9! × 2! × 5! × 3!)] ≈ 2.16 × 10

^{60}; the factor (2! × 9! × 2! × 5! × 3!) ≈ 1.045 × 10

^{9}corresponds with the number of ways the genetic code’s sets can be permuted. The formula counts the number of set partitions of [64] with the size of the subsets equal to 1

^{2}2

^{9}3

^{2}4

^{5}6

^{3}(21 sets). The genetic code’s set partition is quite unique: fewer than 1 in 10,000 partitions of [64] into 21 blocks are comprised of similar sized blocks. (There are S(64,21) ≈ 2.96 × 10

^{64}partitions of [64] in 21 sets—a Stirling number, see below; 2.16 × 10

^{60}/2.96 × 10

^{64}≈ 0.75 × 10

^{−4}).

^{84}—an astronomically large number (in this formula C(21,j) is a binomial coefficient.) Nonetheless, these coding functions are just a subset of ≈36% of all 21

^{64}≈ 4.18827 × 10

^{84}mappings of the 64 codons to the 21 messages, most of which reach only 20 or fewer targets (i.e., most of these 21

^{64}functions are injections not surjections) [24].

## Appendix B. Permutations, Groups and Group Isomorphisms

_{n}, is made up by the set of all n! permutations of [n] in combination with the composition of these permutations. The size of S

_{n}equals its order. For example, S

_{3}, the Symmetric group on three objects of order six is made up by the six permutations of [3] and their composition. The Symmetric group is a mathematical group because it is closed (two permutations make another permutation: π1∙π2 = π3 є S

_{n}), composition is associative (the order of multiplication of more than two permutations does not matter: (π1∙π2)∙π3= π1∙(π2∙π3)), it contains an identity (i), and every permutation π is invertible by its inverse permutation π

^{−1}(π∙π

^{−1}= i).

_{3}. In addition, the three mirrors intersect in the geometric center of the triangle and their line of intersection forms a rotation axis for 0 = 360, 120, and 240 degree counterclockwise rotations: σ

_{0}, σ

_{120}, σ

_{240}. These symmetries of the triangle relabel the vertices: (1,2,3), (3,1,2), and (2,3,1) respectively, in one-to-one correspondence with respectively the identity and the two reordering permutations of S

_{3}. Each rotation can be generated by two mirror reflections: the angle between the mirrors is 60 degrees and reflection in one mirror followed by reflection in the next counterclockwise mirror generates a rotation of 120 degrees (e.g., μ1·μ3 = σ

_{120}); a second reflection in the previous counterclockwise mirror generates a rotation of −120 = + 240 degrees (e.g., μ1·μ2 = σ

_{240}). The zero degree rotation or identity is obtained by reflecting twice in the same mirror. These six symmetries of the equilateral triangle also form a mathematical group (the group requirements listed above are met). The symmetry group of the equilateral triangle is known as D3 (dihedral 3-group); D3 is a reflection group as all its symmetries can be generated by mirror reflections, and D3 is a point group as its geometric center is fixed by all symmetries (the center is unique and always mapped to itself).

_{3}are isomorphic: they correspond with the same abstract group of six elements and rules of composition. As shown above every symmetry of D3 induces a permutation of the vertex labels (1,2,3) that is an element of S

_{3}, and the six symmetries of D3 correspond one-to-one with the six permutations of S

_{3}(both groups have the same order). In addition, the composition of two symmetries (such as two reflections) corresponds with the composition of the corresponding permutations (two involutions). The reversible function (bijection) Φ mapping the symmetries to permutations with preservation of composition of the group elements is called an isomorphism; that is let Φ(μ1) = (1,3,2) = π1 and Φ(μ2) = (3,2,1) = π2; then Φ(μ1∙μ2) = Φ(μ1)∙Φ(μ2) = π1∙π2 = (1,3,2)∙(3,2,1) = (2,3,1) = Φ(σ

_{240}) = Φ(μ1∙μ2); and so in general for any two symmetries s1 and s2: Φ(s1∙s2) = Φ(s1)∙Φ(s2).

_{3}, but equivalently, S

_{3}acts on the three codon positions (1,2,3) as a symmetry group and induces six permutations of the codon positions—group actions on the codon positions, a common way to describe symmetries. D3 is generated by three 2-dimensional plane mirrors and a group isomorphic to D3 is generated by three 8-dimensional hyperplane mirrors and permutes three 3-dimensional subspaces in 9-space as described in Section 4.3. This group is a symmetry group of the CodonPolytope and induces the six permutations of the three codon positions by relabeling the 64 vertices of the CodonPolytope, analogously to the relabeling of the three vertices of an equilateral triangle by D3, and with a one-to-one correspondence to the actions of S

_{3}on the three codon positions—these groups are isomorphic.

_{4}, of order 4! (=24), acting on the four letters of the alphabet of the genetic code, in lexicographical order—(A,C,G,U), permutes their order. For example, the inversion (2,1,3,4) induces the reordering (C,A,G,U) or the exchange A ↔ C, while fixing G and U. S

_{4}is isomorphic to the symmetry groups of the K4-graph and the tetrahedron (Appendix C): the 24 symmetries of the graph and the tetrahedron leave them invariant but permute the vertex labels and thereby induce the 24 permutations of S

_{4}. S

_{4}acts on the codon set by permuting these four letters at a particular codon position; for example, acting on the letters at the 3rd codon position, the exchange A ↔ C induces the exchange of 32 codons NNA ↔ NNC (short for AAA ↔ AAC, ACA ↔ ACC, etc.), while fixing the 16 NNG and 16 NNU codons. The permutations of the codon set: [64] → [64], induced by such actions of S

_{4}are symmetries of the codon set: they fix the set set-wise, the set itself is invariant (indeed all 64! permutations of [64] are symmetries of the codon set). A copy of S

_{4}acts at each codon position, and the actions the three S

_{4}are independent from each other—the order of their actions does not matter: the induced permutations of the letters commute—they form a direct product. The direct product (S

_{4})

_{1}× (S

_{4})

_{2}× (S

_{4})

_{3}is a permutation group of order 24

^{3}= 13,824, and a symmetry group of the codon set—the indices (1,2,3) indicate the codon positions upon which each S

_{4}acts. The wreath product, S

_{3}x

_{wreath}(S

_{4})

_{1}× (S

_{4})

_{2}× (S

_{4})

_{3}, of order 82,944 (=6 × 13,824), is a permutation group acting on the codon set in which S

_{3}permutes the indices (1,2,3) corresponding with the three codon positions (Section 5.1). The identity of S

_{3}recovers the direct product (S

_{4})

_{1}× (S

_{4})

_{2}× (S

_{4})

_{3}from the wreath product. The wreath product does not commute, for example: (1 ↔ 2). (A

_{2}↔ C

_{2}) ≠ (A

_{2}↔ C

_{2}). (1 ↔ 2); the first permutation exchanges codon positions 1 ↔ 2 and then induces A ↔ C at the second position resulting in the exchange A ↔ C at the 2nd codon position, while the second permutation induces A ↔ C at the second position and then exchanges codon positions 1 ↔ 2 resulting in the exchange A ↔ C at the 1st codon position.

_{n}(Cayley’s theorem), and the order of G divides n!, the order of S

_{n}(by Lagrange’s theorem). In particular, S

_{64}acts on the 64 codons as an indexed set [64], and S

_{64}of order 64! is isomorphic to the largest symmetry group of the codon set as it comprises all [64] → [64] maps. Computation confirms that both the direct product (S

_{4})

_{1}× (S

_{4})

_{2}× (S

_{4})

_{3}and the wreath product S

_{3}x

_{wreath}(S

_{4})

_{1}× (S

_{4})

_{2}× (S

_{4})

_{3}are subgroups of S

_{64}with (very large) integer indices (index = order 64!/order subgroup), i.e., their order divides 64! exactly.

## Appendix C. The Symmetry Group of the Tetrahedron

_{4}(S

_{4}permutes the four labels of the tetrahedron vertices; S

_{4}is discussed in Appendix B). The A3 group is generated by reflections in the six tetrahedron symmetry planes, the perpendicular bisectors of the tetrahedron edges, and A3 is thus a reflection or Coxeter group [27]. The 24 symmetries are illustrated with the tetrahedron of Figure 9: the vertex numbers are fixed as reference frame and the vertices/vertex labels {A,C,G,U} moved by the symmetries. The initial labeling (1,2,3,4) = ACGU is changed by the reflection in the M

_{34}-mirror plane of Figure 9, the bisector of the {3,4} edge, to (1,2,3,4) = acUG (fixed labels are in small caps): acGU → acUG. The six mirror reflections produce CAgu, GcAu, UcgA, aGCu, aUgC, acUG. Two consecutive reflections in the same mirror give a 0-degree or identity rotation: acgu. The intersections of mirror planes generate rotation axes as two consecutive reflections equate a rotation. Three 180 degree rotation axes run through the centers of opposite tetrahedron edges and opposite faces of the cube of Figure 9. The 180 degree rotations, CAUG, GUAC, and UGCA, are generated by reflections in two orthogonal mirrors, such as the bisectors of {1,2} and {3,4}. The four 60 and 120 degrees rotation axes that run from the tetrahedron vertices through the centers of the opposite triangular faces are diagonals of the cube of Figure 9. For example, the 60 degree counterclockwise rotation around the axis from vertex-1 through the center of the opposite face moves aCGU → aUCG and is generated by reflection in the {3, 4}-bisector plane, acGU → acUG, followed by reflection in the {2, 3}-bisector, aCUg → aUCg. (The four equilateral triangular faces of the tetrahedron possess the symmetries of the triangle, Appendix B and Figure 9). The eight rotation products are aUCG, aGUC, GcUA, UcAG, UAgC, CUgA, CGAu, GACu. Lastly, reflections of the three 180 degree rotation products, CAUG, GUAC, and UGCA, in a mirror planes at a 45 degree angle with the rotation axis produce the remaining six reflection symmetries: UACG, CGUA, CUAG, GAUC, UGAC, and GUCA. For example, CAUG followed by a reflection in the {1,3}-bisector plane gives UACG, as does UGCA followed by reflection in the {2,4}-bisector. As shown above, the 24 different A3 symmetries induce permutations of the tetrahedron labels in one-to-one correspondence with the 24 S

_{4}permutations of ACGU. Thus, if two A3-symmetries R1, R2 induce permutations p, q, respectively, then R1 followed by R2 induces the permutation p followed by q: A3 and S

_{4}are isomorphic (Group isomorphism is defined in Appendix B).

## Appendix D. The Polytope Product

^{3}cubes.)

## Appendix E. The CodonPolar Polytope

**,**and the sum of vertices (X + Y + Z) in the 2nd column of Table 2 indicates whether these face vertices are incident on one, two or three of the 3-subspace tetrahedrons, e.g., each vertex of an A-triangle (1 + 1 + 1) is incident on a different tetrahedron, while all vertices of a C-triangle (3 + 0 + 0) are incident on the same tetrahedron. The symmetry group of the Polar is identical to the symmetry group of the CodonPolytope and isomorphic with S

_{3}x

_{wreath}(A3)

_{1}× (A3)

_{2}× (A3)

_{3}, with each A3 acting on a different tetrahedron and different 3-subspace of the 9-space, and S

_{3}actions permuting the three 3-subspaces and the tetrahedrons and A3-group mirrors contained in them (see Section 4.3 for a more detailed description of this 9-space symmetry group).

## Appendix F. Colorings, Colorings Classes, and Color Counting Formulas

^{4}) ways to color the four vertices with two colors; the 16, 2-colorings of the tetrahedron are shown in Figure 14. Take one coloring (R, B, B, B)—vertex-1 Red and the other vertices Blue; the symmetries of the tetrahedron move R to any other vertex: (B, R, B, B), (B, B, R, B) or (B, B, B, R)–3th row of Figure 14, so that all four colorings of one vertex R and three B belong to the same colorings equivalence class 1R3B: all four colorings 1R3B are images of each other—equivalent, under the tetrahedron symmetries, essentially the same coloring. The class colorings pattern 1R3B corresponds with code equivalence class: one code word for Red, three code words for Blue. Similarly, all six colorings of two vertices Red and two Blue (2nd row and last two tetrahedrons of the 1st row of Figure 14) are equivalent under the tetrahedron symmetries and, thus, form a single colorings class with colorings pattern 2R2B. The two colorings classes 4R and 4B are each made up by a single coloring—symmetry operations do not color any vertices differently (first two tetrahedrons of the 1st row of Figure 14). In summary: the tetrahedron symmetries partition the 16, 2-colorings of the tetrahedron into five colorings classes {4R, 3R1B, 2R2B, 1R3B, 4B}, each having a distinct colorings pattern, and containing, respectively {1, 4, 6, 4, 1} different 2-colorings.

_{34}by the 2-cycle in (1)(2)(34), which indicates that vertices 1 and 2 are fixed, but vertices three and four exchanged positions; the rotation of three vertices R

_{243}by the 3-cycle in (1)(243), which indicates that vertex 2 moved to the previous position of vertex 4: 2 → 4, 4 →3, and 3 → 2; and the permutation of four vertices P

_{1324}by the 4-cycle (1324), which reads 1 → 3 → 2 → 4 → 1. Let C

_{n}

^{m}indicate m cycles of length n, then these permutations can be abbreviated: I = C

_{1}

^{4}, M

_{34}= C

_{1}

^{2}C

_{2}, R

_{243}= C

_{1}C

_{3}, and P

_{1324}= C

_{4}; and the sum of all 24 permutations of S

_{4}(Appendix B and Appendix C) expressed as the symmetry group’s cycle index: C

_{1}

^{4}+ 6C

_{1}

^{2}C

_{2}+ 3C

_{2}

^{2}+ 8C

_{1}C

_{3}+ 6C

_{4}. The cycle index of the CodonPolytope symmetry group is given in Appendix G. A permutation leaves a coloring invariant when all vertices permuted by a cycle are the same color, but different cycles can have different colors. This observation forms the basis for Polya’s formula that counts the number of colorings classes: in the group cycle index substitute for C

_{j}the number of colors and divide the total by the group order. For the 2-colorings of the tetrahedron Polya’s formula thus reads: (2

^{4}+ 6 × 2

^{2}× 2 + 3 × 2

^{2}+ 8 × 2 × 2 + 6 × 2)/24 = 5 colorings classes. The number of colorings of four tetrahedron vertices using one, two, three, and four colors equals 1, 16, 81, and 256 (=n

^{4}, with n = number of colors); by Polya’s formula these colorings are partitioned into respectively, 1, 5, 15, and 35 colorings classes, but the corresponding number Polya-n-onto colorings classes (see above) is only 1, 3, 3, and 1. (Here we illustrate the iterative subtraction of Polya-p-onto counts for p = 1, … , n–1 to arrive at the Polya-n-onto count: to calculate the Polya-2-onto-count, subtract from the five colorings classes using 2 colors the 2, 1-color classes: 5 − 2 = 3; to calculate the Polya-3-onto count, subtract from the 15 classes all 3, 1-color classes and all 9, 2-onto-colorings (C(3,2) = 3 ways to pick two colors from three colors, and there are three Polya-2-onto classes: 3 × 3 = 9): 15 − 12 = 3; and similarly for the Polya-4-onto count: 35 − 4 × 1 – 6 × 3 − 4 × 3 = 1. The notation C(n,m) indicates the number of ways to choose m out of n, see Appendix A).

_{n}

^{m}, C

_{n}with a color sum (such as R + B for two colors) with each color raised to the power n (so 3C

_{2}

^{2}becomes 3(R

^{2}+ B

^{2})

^{2}), then expand and add all terms and divide the result by the group order. For example, the tetrahedron symmetry group of order 24 has cycle index = C

_{1}

^{4}+ 6C

_{1}

^{2}C

_{2}+ 3C

_{2}

^{2}+ 8C

_{1}C

_{3}+ 6C

_{4}, so that Polya’s formula for two colors R and B equals [(R + B)

^{4}+ 6(R + B)

^{2}(R

^{2}+ B

^{2}) + 3(R

^{2}+ B

^{2})

^{2}+ 8(R + B)(R

^{3}+ B

^{3}) + 6(R

^{4}+ B

^{4})]/24 = R

^{4}+ R

^{3}B + R

^{2}B

^{2}+ B

^{3}R + B

^{4}, which corresponds with the colorings classes inventory {4R, 3R1B, 2R2B, 1R3B, 4B} above, and with only one colorings class per pattern. For comparison, the colorings inventory for a 2-colored square equals R

^{4}+ R

^{3}B + 2R

^{2}B

^{2}+ RB

^{3}+ B

^{4}; it has two R

^{2}B

^{2}classes in correspondence with two adjacent or two diagonally opposed vertices of the same color—these two different colorings are not equivalent under the symmetry group of the square.

_{2}acting on the colors Red and Blue exchanges the two colors, they become equivalent (but not the same, they remain distinguishable). The action of S

_{2}on the colorings inventory of the tetrahedron, {4R, 3R1B, 2R2B, 1R3B, 4B}, renders the two classes 4R and 4B equivalent, they become the single class “all four vertices the same color.” Similarly the classes 3R1B and 1R3B are equivalent under S

_{2}color symmetry as class “three vertices one color, one vertex the other color,” while the Polya class 2R2B equals the unique DeBruin class “two vertices one color, two vertices the other color”. Thus, the DeBruin’s colorings pattern inventory equals {4, (3,1), (2,2)}, and contains just three classes (instead of Polya’s five classes) that comprise respectively, {2, 8, 6} 2-colorings of the tetrahedron (which are easily identified in Figure 14). The DeBruijn’s counting identifies just two different codes conveying two messages: one code using two code words for each message, and one code using three code words for one message, and one code word for the other. With application to the genetic code, the four GAN codons (represented by the vertices of a tetrahedron face of the CodonPolytope) differ only by the last letter {A, C, G, U}, and {GAA, GAG} encode Glu, while {GAC,GAU} encode Asp, and these codes thus correspond to a two color, 2R2B code according to Polya’s formula, and a (2,2) code according to DeBruin’s enumeration; each code represents a unique Polya or DeBruin code class that contains all mathematically equivalent codes assigning any two GAN codons to Asp and the other two GAN codons to Glu.

## Appendix G. The Cycle Indices of the Direct and Wreath Product Groups

_{4})

_{1}× (S

_{4})

_{2}× (S

_{4})

_{3}:

_{1}

^{64}+ 18 C

_{1}

^{32}C

_{2}

^{16}+ 108 C

_{1}

^{16}C

_{2}

^{24}+ 216 C

_{1}

^{8}C

_{2}

^{28}+ 657 C

_{2}

^{32}+ 24 C

_{1}

^{16}C

_{3}

^{16}+ 192 C

_{1}

^{4}C

_{3}

^{20}+ 512 C

_{1}C

_{3}

^{21}+ 3096 C

_{4}

^{16}+ 288 C

_{1}

^{8}C

_{2}

^{4}C

_{3}

^{8}C

_{6}

^{4}+ 1152 C

_{1}

^{2}C

_{2}C

_{3}

^{10}C

_{6}

^{5}+ 864 C

_{1}

^{4}C

_{2}

^{6}C

_{3}

^{4}C

_{6}

^{6}+ 1224 C

_{2}

^{8}C

_{6}

^{8}+ 576 C

_{2}

^{2}C

_{6}

^{10}+ 3744 C

_{4}

^{4}C

_{12}

^{4}+ 1152 C

_{4}C

_{12}

^{5}.

_{3}x

_{wreath}(S

_{4})

_{1}× (S

_{4})

_{2}× (S

_{4})

_{3}:

_{1}

^{64}+ 18 C

_{1}

^{32}C

_{2}

^{16}+ 180 C

_{1}

^{16}C

_{2}

^{24}+ 648 C

_{1}

^{8}C

_{2}

^{28}+ 873 C

_{2}

^{32}+ 24 C

_{1}

^{16}C

_{3}

^{16}+ 1344 C

_{1}

^{4}C

_{3}

^{20}+ 512 C

_{1}C

_{3}

^{21}+ 432 C

_{1}

^{8}C

_{2}

^{4}C

_{4}

^{12}+ 2592 C

_{1}

^{4}C

_{2}

^{6}C

_{4}

^{12}+ 1296 C

_{2}

^{8}C

_{4}

^{12}+ 9576 C

_{4}

^{16}+ 288 C

_{1}

^{8}C

_{2}

^{4}C

_{3}

^{8}C

_{6}

^{4}+ 1152 C

_{1}

^{2}C

_{2}C

_{3}

^{10}C

_{6}

^{5}+ 1440 C

_{1}

^{4}C

_{2}

^{6}C

_{3}

^{4}C

_{6}

^{6}+ 1224 C

_{2}

^{8}C

_{6}

^{8}+ 576 C

_{1}

^{4}C

_{3}

^{4}C

_{6}

^{8}+ 4608 C

_{1}C

_{3}

^{5}C

_{6}

^{8}+ 10368 C

_{1}

^{2}C

_{2}C

_{3}

^{2}C

_{6}

^{9}+ 5760 C

_{2}

^{2}C

_{6}

^{10}+ 6912 C

_{8}

^{8}+ 9216 C

_{1}C

_{9}

^{7}+ 3456 C

_{1}

^{2}C

_{2}C

_{3}

^{2}C

_{4}

^{3}C

_{6}C

_{12}

^{3}+ 5472 C

_{4}

^{4}C

_{12}

^{4}+ 11520 C

_{4}C

_{12}

^{5}+ 3456 C

_{8}

^{2}C

_{24}

^{2}.

## Appendix H. Coloring Computations

_{3}x

_{wreath}(S

_{4})

_{1}× (S

_{4})

_{2}× (S

_{4})

_{3}is given in Appendix G. Polya’s count of the colorings classes equals substitution of the number of colors in the cycle index with division of the result by the group order (Appendix F). The results of the iterative computations of Polya-p-onto colorings, for p = 1, … , 21, are given in Table 3. The number of code classes of [64] → [21] codes equals 1.82 × 10

^{79}(Section 7.3) about 36% of all 5.05 × 10

^{79}Polya colorings classes of the CodonPolytope using 21 (and fewer) colors.

^{2}2

^{9}3

^{2}4

^{5}6

^{3}of [64] (Section 7.5). The formula requires substitution of these 21 colors in the group cycle index with expansion of the terms and division of the total by the group order (Appendix F). In this calculation the first term of the cycle index C

_{1}

^{64}evaluates as the multinomial coefficient M(64, 1

^{2}2

^{9}3

^{2}4

^{5}6

^{3}) = 64!/(1!

^{2}× 2!

^{9}× 3!

^{2}× 4!

^{5}× 6!

^{3}) ≈ 2.316 × 10

^{69}(Appendix A), and this term dominates all other terms of the cycle index: The 2nd term 18 C

_{1}

^{32}C

_{2}

^{16}evaluates to a smaller number than 18 × 32! × 16! ≤ 10

^{50}(because the denominator evaluates to greater than 1) and the remaining terms are even smaller so that even the sum of these terms can be neglected in the calculation. The number of code classes of [64] → [21] codes with the same code pattern as the genetic code thus equals 2.79 × 10

^{64}= 2.316 × 10

^{69}/82,944, or M(64, 1

^{2}2

^{9}3

^{2}4

^{5}6

^{3})/group order (Section 7.4). Similar calculations with Polya’s formula for codes with colorings patterns corresponding with binary divisions of [64] such as those related with the codon polytope splitting model for the evolution of the code (Section 6.5): two colors for 32 codons each, corresponding with the set partition 32

^{2}, and for four colors for 16 codons each or 16

^{4}, and 8

^{8}, and 16

^{4}and 32

^{2}showed that the number of code classes using these patterns equals, respectively, 2.21 × 10

^{13}, 7.98 × 10

^{30}, 2.19 × 10

^{47}, 1.26 × 10

^{62}, and 3.56 × 10

^{74}.

## References

- Knight, R.D.; Freeland, S.J.; Landweber, L.F. Rewiring the keyboard: Evolvability of the genetic code. Nat. Rev. Genet.
**2001**, 2, 49–58. [Google Scholar] [CrossRef] [PubMed] - Koonin, E.V.; Novozhilov, A.S. Origin and evolution of the genetic code: The universal enigma. IUBMB Life
**2009**, 61, 99–111. [Google Scholar] [CrossRef] [PubMed] - Atkins, J.F.; Gesteland, R.F.; Cech, R. RNA Worlds; Cold Spring Harbor Laboratory Press: Cold Spring Harbor, NY, USA, 2011. [Google Scholar]
- Deamer, D.; Szostak, J.W. The Origins of Life; Cold Spring Harbor Laboratory Press: Cold Spring Harbor, NY, USA, 2010. [Google Scholar]
- Woese, C.R. Order in the genetic code. Proc. Natl. Acad. Sci. USA
**1965**, 54, 71–75. [Google Scholar] [CrossRef] [PubMed] - Crick, F.H.C. The origin of the genetic code. J. Mol. Biol.
**1968**, 38, 367–379. [Google Scholar] [CrossRef] - Woese, C.R.; Dugre, D.H.; Saxinger, W.C.; Dugre, S.A. On the Fundamental Nature and Evolution of the Genetic Code. Cold Spring Harbour Symp. Quant. Biol.
**1966**, 31, 723–736. [Google Scholar] [CrossRef] - Stephenson, J.D.; Freeland, S.J. Unearthing the root of amino acid similarity. J. Mol. Evol.
**2013**, 77, 159–169. [Google Scholar] [CrossRef] - Pretzel, O. Error-Correcting Codes and Finite Fields; Oxford University Press: Oxford, UK, 2000. [Google Scholar]
- Hamming, R.W. Error detecting and error correcting codes. Bell Lab. Record.
**1950**, 28, 193–198. [Google Scholar] [CrossRef] - Thompson, T.M. From Error Correcting Codes through Sphere Packing to Simple Groups; The Mathematical Association of America: Washington, DC, USA, 1983. [Google Scholar]
- He, M.X.; Petoukhov, S.V.; Ricci, P.E. Genetic code, Hamming Distance and Stochastic Matrices. Bull. Math. Biol.
**2004**, 66, 1405–1421. [Google Scholar] [CrossRef] [PubMed] - Sánchez, R.; Morgado, E.; Grau, R.A. Genetic code Boolean structure. I. The meaning of Boolean deductions. Bull. Math. Biol.
**2005**, 67, 1–14. [Google Scholar] [CrossRef] [PubMed] - Jiménez-Montaño, M.A. The fourfold way of the genetic code. BioSystems
**2009**, 98, 105–114. [Google Scholar] [CrossRef] - Crowder, T.; Li, C.-K. Studying the Genetic Code by a Matrix Approach. Bull. Math. Biol.
**2010**, 72, 953–972. [Google Scholar] [CrossRef] - José, M.V.; Morgado, E.R.; Govezensky, T. Genetic Hotels for the Standard Genetic Code: Evolutionary Analysis Based upon Novel Three-Dimensional Algebraic Models. Bull. Math. Biol.
**2011**, 73, 1443–1476. [Google Scholar] [CrossRef] [PubMed] - Jiménez-Montaño, M.A.; de la Mora-Basáñez, C.R.; Pöschel, T. The hypercube structure of the genetic code explains conservative and non-conservative aminoacid substitutions in vivo and in vitro. BioSystems
**1996**, 39, 117–125. [Google Scholar] [CrossRef] - Karesev, V.A.; Stefanov, V.E. Topological Nature of the Genetic Code. J. Theor. Biol.
**2001**, 209, 303–317. [Google Scholar] [CrossRef] [PubMed] - José, M.V.; Morgado, E.R.; Govezensky, T. An Extended RNA Code and its Relationship to the Standard Genetic Code: An Algebraic and Geometrical Approach. Bull. Math. Biol.
**2007**, 69, 215–243. [Google Scholar] [CrossRef] [PubMed] - Frappat, L.; Sciarrino, A.; Sorba, P. A crystal base for the genetic code. Phys. Lett. A
**1998**, 250, 214–221. [Google Scholar] [CrossRef] - Antoneli, F.; Forger, M.; Gaviria, P.A.; Hornos, J.E.M. On amino acid and codon assignment in algebraic models for the genetic code. Int. J. Modern Phys. B
**2010**, 24, 435–463. [Google Scholar] [CrossRef] - Bashford, J.D.; Tsohantjis, I.; Jarvis, P.D. A supersymmetric model for the evolution of the genetic code. Proc. Natl. Acad. Sci. USA
**1998**, 95, 987–992. [Google Scholar] [CrossRef] [PubMed] - Antoneli, F.; Forger, M. Symmetry breaking in the genetic code: Finite Groups. Math. Comput. Model.
**2011**, 53, 1469–1488. [Google Scholar] [CrossRef] - Lenstra, R. Evolution of the genetic code through progressive symmetry breaking. J. Theor. Biol.
**2014**, 347, 95–108. [Google Scholar] [CrossRef] [PubMed] - Mazur, D.R. Combinatorics, A guided Tour; The Mathematical Association of America Inc.: Washington, DC, USA, 2010. [Google Scholar]
- Liboff, R.L. Primer for Point and Space Groups; Springer Verlag New York Inc.: New York, NY, USA, 2004. [Google Scholar]
- Grove, L.C.; Benson, C.T. Finite Reflection Groups. Graduate Texts in Mathematics 99; Springer Verlag New York Inc.: New York, NY, USA, 1985. [Google Scholar]
- Robertson, S.A. Polytopes and Symmetry. London Mathematical Society Lecture Note Series 90; Cambridge Univeristy Press: Cambridge, UK, 1984. [Google Scholar]
- Ziegler, G.M. Lectures on Polytopes. Graduate Texts in Mathematics 152; Springer Verlag New York Inc.: New York, NY, USA, 2007. [Google Scholar]
- Passman, D.S. Permutation Groups; Dover Publications Inc.: Mineola, NY, USA, 2012. [Google Scholar]
- Rotman, J.J. An Introduction to the Theory of Groups. Graduate Texts in Mathematics 148; Springer-Verlag New York Inc.: New York, NY, USA, 1995. [Google Scholar]
- Gilmore, R. Lie Groups, Physics, and Geometry; Cambridge University Press: Cambridge, UK, 2008. [Google Scholar]
- Knight, R.D.; Freeland, S.J.; Landweber, L.F. Selection, history and chemistry: The three faces of the genetic code. Trends Biochem. Sci.
**1999**, 24, 241–247. [Google Scholar] [CrossRef] - Grosjean, H.; de Grécy-Lagard, V.; Marck, C. Review. Deciphering synonymous codons in the three domains of life: Co-evolution with specific tRNA modification enzymes. Febs Lett.
**2010**, 584, 252–264. [Google Scholar] [CrossRef] [PubMed] - Graham, J.H.; Raz, S.; Hel-Or, H.; Nevo, E. Fluctuating asymmetry: Methods, theory and applications. Symmetry
**2010**, 2, 466–540. [Google Scholar] [CrossRef] - Harris, J.M.; Hirst, J.L.; Mossinghoff, M.J. Combinatorics and Graph Theory; Springer: New York, NY, USA, 2008. [Google Scholar]
- Jungck, J.R. The genetic code as a periodic table. J. Mol. Evol.
**1978**, 11, 211–224. [Google Scholar] [CrossRef] [PubMed] - Lehman, J. Physico-chemical constraints connected with the coding properties of the genetic system. J. Theor. Biol.
**2000**, 202, 129–144. [Google Scholar] [CrossRef] [PubMed] - Tlusty, T. A colorful origin for the genetic code: Information theory, statistical mechanics and the emergence of molecular codes. Phys. Life Rev.
**2010**, 7, 362–376. [Google Scholar] [CrossRef] [PubMed] - Dragovich, B.; Dragovich, A. p-Adic modeling of the genome and the genetic code. Comput. J.
**2010**, 53, 432–441. [Google Scholar] [CrossRef] - shCherbak, V.I. Arithmetic inside the universal genetic code. Biosystems
**2003**, 70, 187–209. [Google Scholar] [CrossRef] - Jungck, J.R. Genetic codes as codes: Towards a theoretical basis for bioinformatics. In BIOMAT 2008; Mondani, R.P., Ed.; World Scientific Publishing: Singapore, 2009; pp. 300–337. [Google Scholar]
- Tlusty, T. A model for the emergence of the genetic code as a transition in a noisy information channel. J. Theor. Biol.
**2007**, 249, 331–342. [Google Scholar] [CrossRef] [PubMed] - Chechetkin, V.R. Block structure and stability of the genetic code. J. Theor. Biol.
**2003**, 222, 177–188. [Google Scholar] [CrossRef] - Eigen, M. From Strange Simplicity to Complex Familiarity. A Treatise on Matter, Information, Life and Thought; Oxford University Press: Oxford, UK, 2013. [Google Scholar]
- He, P.-A.; Li, D.; Zhang, Y.; Wang, X.; Yao, Y. A 3D graphical representation of protein sequences based on the Gray code. J. Theor. Biol.
**2012**, 304, 81–87. [Google Scholar] [CrossRef] [PubMed] - José, M.V.; Morgado, E.R.; Guimaraes, R.C.; Zamudio, G.S.; de Farias, S.T.; Bobadilla, J.R.; Sosa, D. Three-dimensional algebraic models of the tRNA code and 12 graphs for representing amino acids. Life
**2014**, 4, 341–373. [Google Scholar] [CrossRef] [PubMed] - Sánchez, R.; Grau, R.; Morgado, E. A novel Lie algebra of the genetic code over the Galois field of four DNA bases. Math. Biosci.
**2006**, 202, 156–174. [Google Scholar] [CrossRef] [PubMed] - Trainor, L.E.H.; Rowe, G.W.; Szabo, V.L. A tetrahedral representation of poly-codon sequences and a possible origin of codon degeneracy. J. Theor. Biol.
**1984**, 108, 459–468. [Google Scholar] [CrossRef] - Jestin, J.-L.; Soulé, C. Symmetries by base substitutions in the genetic code predict 2′ or 3′ aminoacylation of tRNAs. J. Theor. Biol.
**2007**, 247, 391–494. [Google Scholar] [CrossRef] [PubMed] - Jestin, J.-L. Degeneracy in the genetic code and its symmetries by base substitutions. C. R. Biol.
**2006**, 329, 168–171. [Google Scholar] [CrossRef] [PubMed] - Danckwerts, H.J.; Neubert, D. Symmetries of genetic code-doublets. J. Mol. Evol.
**1975**, 5, 327–332. [Google Scholar] [CrossRef] [PubMed] - Findley, G.L.; Findley, A.M.; McGlynn, S.P. Symmetry characteristics of the genetic code. Proc. Nat. Acad. Sci. USA
**1982**, 79, 7061–7065. [Google Scholar] [CrossRef] [PubMed] - Bertman, M.O.; Jungck, J.R. Group graph of the genetic code. J. Hered.
**1979**, 70, 379–384. [Google Scholar] [PubMed] - Massey, S.E. A sequential “2-1-3” model of the genetic code evolution that explains codon constraints. J. Mol. Evol.
**2006**, 62, 809–810. [Google Scholar] [CrossRef] [PubMed] - Trifonov, E.N. Consensus temporal order of amino acids and evolution of the triplet code. Gene
**2000**, 261, 139–151. [Google Scholar] [CrossRef] - Higgs, P.G. A four-column theory for the origin of the genetic code: Tracing the evolutionary pathways that gave rise to an optimized code. Biol. Direct
**2009**, 4. [Google Scholar] [CrossRef] [PubMed] - Di Giulio, M. The coevolution theory of the origin of the genetic code. Phys. Life Rev.
**2004**, 1, 128–137. [Google Scholar] [CrossRef] - Wong, J.T. Coevolution theory of the genetic code at age thirty. BioEssays
**2005**, 27, 416–425. [Google Scholar] [CrossRef] [PubMed] - De Pouplana, L.R.; Schimmel, P. Aminoacyl-tRNA synthetases: Potential markers of genetic code development. Trends Biochem. Sci.
**2001**, 26, 591–596. [Google Scholar] [CrossRef] - Delarue, M. An asymmetric underlying rule in the assignment of codons: Possible clue to a quick early evolution of the genetic code via successive binary choices. RNA
**2007**, 13, 161–169. [Google Scholar] [CrossRef] [PubMed] - Rodin, S.N.; Rodin, A.S. On the origin of the genetic code: Signatures of its primordial complementarity in tRNAs and aminoacyl-tRNA synthetases. Heridity
**2008**, 100, 341–355. [Google Scholar] [CrossRef] [PubMed] - Santos, J.; Monteagudo, A. Study of the genetic code adaptability by means of a genetic algorithm. J. Theor. Biol.
**2010**, 264, 854–865. [Google Scholar] [CrossRef] [PubMed] - Buhrman, H.; van der Gulik, P.T.S.; Kelk, S.M.; Koolen, W.M.; Stougie, L. Some mathematical refinements concerning error minimization in the genetic code. IEEE/ACM Trans. Comput. Biol. BioInform.
**2011**, 8, 1358–1372. [Google Scholar] [CrossRef] [PubMed] - Bashford, J.D.; Jarvis, P.D. Spectroscopy of the genetic code. In Quantum Aspects of Life; Abbott, D., Davies, P.C.W., Pati, A.K., Eds.; Imperial College Press: London, UK, 2008; pp. 147–186. [Google Scholar]
- Freeland, S.J.; Wu, T.; Keulmann, N. The case for an error minimizing standard genetic code. Orig. Life Evol. Biosph.
**2003**, 33, 457–477. [Google Scholar] [CrossRef] [PubMed] - Frappat, L.; Sciarrino, A.; Sorba, P. Crystalizing the genetic code. J. Biol. Phys.
**2001**, 27, 1–34. [Google Scholar] [CrossRef] [PubMed] - Sciarrino, A.; Sorba, P. A minimum principle in codon-anticodon interaction. BioSystems
**2012**, 107, 113–119. [Google Scholar] [CrossRef] [PubMed] - Sciarrino, A.; Sorba, P. Codon-anticodon interaction and the genetic code evolution. BioSystems
**2013**, 111, 175–180. [Google Scholar] [CrossRef] [PubMed] - Yockey, H.P. Information Theory, Evolution, and the Origin of Life; Cambridge University Press: New York, NY, USA, 2005. [Google Scholar]

© 2015 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Lenstra, R.
The Graph, Geometry and Symmetries of the Genetic Code with Hamming Metric. *Symmetry* **2015**, *7*, 1211-1260.
https://doi.org/10.3390/sym7031211

**AMA Style**

Lenstra R.
The Graph, Geometry and Symmetries of the Genetic Code with Hamming Metric. *Symmetry*. 2015; 7(3):1211-1260.
https://doi.org/10.3390/sym7031211

**Chicago/Turabian Style**

Lenstra, Reijer.
2015. "The Graph, Geometry and Symmetries of the Genetic Code with Hamming Metric" *Symmetry* 7, no. 3: 1211-1260.
https://doi.org/10.3390/sym7031211