# Group Theory of Syntactical Freedom in DNA Transcription and Genome Decoding

^{1}

^{2}

^{*}

^{†}

## Abstract

**:**

## 1. Introduction

**Sequence motif**An amino-acid sequence pattern that is related to a biological function or a gene. The motif is sometimes called a ‘consensus sequence’.

**DNA-binding domain**A folded protein domain that contains a structural motif that recognizes double- or single-stranded DNA.

**Transcription factor**A sequence-specific DNA-binding factor, or transcription factor, is a protein that controls the rate of transcription of a gene from DNA to messenger RNA by binding to a specific DNA sequence. There are approximately 1600 binding domains in the human genome that function as transcription factors. Classes of DNA-binding domains of transcription factors also exist. The most common are zinc-coordinating DNA-binding domains, helix-loop-helix or helix-turn-helix motifs, basic leucine zipper domains and homeobox domains (playing critical roles in the regulation of development). A classification of human transcription factors and their structural motifs is in References [4,5,6].

**Exon**A part of a gene that encodes a part of the mature RNA produced by that gene after removing all introns (the non-coding regions of RNA transcript) by RNA splicing.

**Promoter**A sequence of DNA in which proteins initiate the transcription of a single RNA from the DNA downstream of it. The TATA box is a sequence found in the core promoter region of some genes in archaea and eukaryotes.

**Zinc finger**A small protein structural motif containing one or more zinc ions in order to stabilize the protein fold.

**Protein**isoform A set of highly similar proteins may originate from a single gene. This process is regulated by the alternative splicing of mRNA. In this process, particular exons of a gene may be included within or excluded from the final, processed messenger RNA (mRNA) produced from that gene. Alternative splicing and the multi-exonic genes are a common feature in eukaryotes.

## 2. Materials and Methods

#### 2.1. Finitely Generated Groups

#### 2.2. Free Groups and Their Conjugacy Classes

#### 2.3. Content of the Paper

## 3. Results

#### 3.1. The TATA Box, the Hecke Groups and More

#### 3.2. Gilbert’s Syndrome

#### 3.3. Single Nucleotide Polymorphism

#### 3.4. A Few DNA/Protein Complexes and Their Transcription Factors

#### 3.4.1. Immediate Early Genes and Their Motifs

#### 3.4.2. The DNA-Binding Domain Fos

#### 3.4.3. The DNA-Binding Domain EGR1

#### 3.4.4. The DNA-Binding Domain Myc

#### 3.5. Genes Whose Transcription Factors Have a Group Structure Away from a Free Group

#### The DNA-Binding Domain of p53

## 4. Discussion

#### 4.1. Aperiodicity of Substitutions

#### 4.1.1. A Two-Letter Sequence for the Transcription Factor of Gene DBX in Drosophila Melanogaster

#### 4.1.2. A Three-Letter Sequence for the Transcription Factor of Gene EGR1

#### 4.1.3. A Four-Letter Sequence for the Transcription Factor of the Fos Gene

## 5. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## References

- Irwin, K. The code-theoretic axiom; the third ontology. Rep. Adv. Phys. Sci.
**2019**, 3, 39. [Google Scholar] [CrossRef] - Planat, M.; Aschheim, R.; Amaral, M.M.; Fang, F.; Irwin, K. Quantum information in the protein codes, 3-manifolds and the Kummer surface. Symmetry
**2020**, 13, 1146. [Google Scholar] [CrossRef] - Planat, M.; Aschheim, R.; Amaral, M.M.; Fang, F.; Irwin, K. Graph coverings for investigating non local structures in protein, music and poems. Science
**2021**, 3, 39. [Google Scholar] [CrossRef] - Lambert, S.A.; Jolma, A.; Campitelli, L.F.; Das, P.K.; Yin, Y.; Albu, M.; Chen, X.; Talpale, J.; Hughes, T.R.; Weirauch, M.T. The human transcription factors. Cell
**2018**, 172, 650–665. Available online: http://www.edgar-wingender.de/huTF_classification.html (accessed on 1 September 2021). - Wingender, E.; Schoeps, T.; Dönitz, J. TFClass: An expandable hierarchical classification of human transcription factors. Nucleic Acids Res.
**2013**, T1, D165–D170. [Google Scholar] [CrossRef] [PubMed] - Sandelin, A.; Alkema, W.; Engström, P.; Wasserman, W.W.; Lenhard, B. JASPAR: An open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res.
**2004**, 32, D91–D94. Available online: https://jaspar.genereg.net/ (accessed on 1 September 2021). [CrossRef] [Green Version] - Planat, M.; Giorgetti, A.; Holweck, F. Saniga, M. Quantum contextual finite geometries from dessins d’enfants. Int. J. Geom. Meth. Mod. Phys.
**2015**, 12, 1550067. [Google Scholar] [CrossRef] [Green Version] - Hall, M., Jr. Subgroups of finite index in free groups. Can. J. Math.
**1949**, 1, 187–190. [Google Scholar] [CrossRef] - Kwak, J.H.; Nedela, R. Graphs and their coverings. Lect. Notes Ser.
**2007**, 17, 118. [Google Scholar] - The Modular Group. Available online: https://en.wikipedia.org/wiki/Modular_group (accessed on 1 October 2021).
- Suzuki, Y.; Tsunoda, T.; Sese, J.; Taira, H.; Mizushima-Sugano, J.; Hata, H.; Ota, T.; Isogai, T.; Tanaka, T.; Nakamura, Y.; et al. Identification and characterization of the potential promoter regions of 1031 kinds of human genes. Genome Res.
**2001**, 11, 677–684. [Google Scholar] [CrossRef] - TATA Box. Available online: https://en.wikipedia.org/wiki/TATA_box (accessed on 1 September 2021).
- Wang, Y.; Jensen, R.C.; Stumph, W.E. Role of TATA box sequence and orientation in determining RNA polymerase II/III transcription specificity. Nucleic Acids Res.
**1996**, 24, 3100–3106. [Google Scholar] [CrossRef] [Green Version] - Li, Y.; Buckley, D.; Wang, S.; Klaassen, C.D.; Zhong, X.B. Phenobarbital-Responsive Enhancer Module of the UGT1A1. Drug Metab. Disp.
**2009**, 37, 1978–1986. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Chadaeva, I.V.; Ponomarenko, P.M.; Rasskazov, D.A.; Sharypova, E.B.; Kashina, E.V.; Zhechev, D.A.; Drachkova, I.A.; Arkova, O.V.; Savinkova, L.K.; Ponomarenko, M.P.; et al. Candidate SNP markers of reproductive potential are predicted by a significant change in the affinity of TATA-binding protein for human gene promoters. BMC Genom.
**2018**, 19, 16–38. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Zerbino, D.R.; Wilder, S.P.; Johnson, N.; Juettemann, T.; Flicek, P.R. The ensembl Regulatory Build. Genome Biol.
**2015**, 16, 56. [Google Scholar] [CrossRef] [Green Version] - Hodgson, C.D.; Weeks, J.R. Symmetries, Isometries and length spectra of closed hyperbolic three-manifolds. Exp. Math.
**1994**, 3, 261–274. [Google Scholar] [CrossRef] - Gallo, F.T.; Katche, C.; Morici, J.F.; Medina, J.H.; Weisstaub, N.V. Immediate early genes, memory and psychiatric disorders: Focus on c-Fos, Egr1 and Arc. Front. Behav. Neurosci.
**1998**, 12, 79. [Google Scholar] [CrossRef] [PubMed] - Glover, J.N.; Harrison, S.C. Crystal structure of the heterodimeric bZIP transcription factor c-Fos-c-Jun bound to DNA. Nature
**1995**, 373, 257–261. [Google Scholar] [CrossRef] - Hashimoto, H.; Olanrewaju, Y.; Zheng, Y.; Wilson, G.G.; Zhang, X.; Cheng, X. Wilms tumor protein recognizes 5-carboxylcytosine within a specific DNA sequence. Genes Dev.
**2019**, 28, 2304–2313. [Google Scholar] [CrossRef] [Green Version] - Nair, S.K.; Burley, S.K. X-ray structures of Myc-Max and Mad-Max recognizing DNA: Molecular bases of regulation by proto-oncogenic transcription factors. Cell
**2003**, 112, 193–205. [Google Scholar] [CrossRef] [Green Version] - Zeeman, E.C. Linking spheres. Abh. Math. Sem. Univ. Hamburg
**1960**, 24, 149–153. [Google Scholar] [CrossRef] - Rolfsen, D. Knots and Links; AMS Chelsea Publishing: Providence, RI, USA, 2000. [Google Scholar]
- Schaeffer, L.N.; Huchet-Dymanus, M.; Changeux, J.P. Implication of a multisubunit Ets-related transcription factor in synaptic expression of the nicotinic acetylcholine receptor. EMBO J.
**1998**, 17, 3078–3090. [Google Scholar] [CrossRef] - Nguyen, T.T.; Grimm, S.A.; Bushel, P.R.; Li, J.; Li, Y.; Bennett, B.D.; Lavender, C.A.; Ward, J.M.; Fargo, D.C.; Anderson, C.W.; et al. Revealing a human p53 universe. Nucl. Acids Res.
**2018**, 46, 8153–8167. [Google Scholar] [CrossRef] [Green Version] - Nakamivhi, N.; Yoneda, Y. Transcription factors and drugs in the brain. Jpn J. Pharmacol.
**2002**, 89, 337–348. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Chen, Y.; Zhang, X.; Dantas Machado, A.C.; Ding, Y.; Chen, Z.; Qin, P.Z.; Rohs, R.; Chen, L. Structure of p53 binding to the BAX response element reveals DNA unwinding and compression to accommodate base-pair insertion. Nucleic Acids Res.
**2013**, 41, 8368–8376. [Google Scholar] [CrossRef] [PubMed] - Baake, M.; Grimm, U. Aperiodic Order, Volume I: A Mathematical Invitation; Cambridge University Press: Cambridge, UK, 2013. [Google Scholar]
- Planat, M.; Aschheim, R.; Amaral, M.M.; Fang, F.; Irwin, K. Complete quantum information in the DNA genetic code. Symmetry
**2020**, 12, 1993. [Google Scholar] [CrossRef] - Planat, M.; Chester, D.; Aschheim, R.; Amaral, M.M.; Fang, F.; Irwin, K. Finite groups for the Kummer surface: The genetic code and quantum gravity. Quantum Rep.
**2021**, 3, 68–79. [Google Scholar] [CrossRef] - Grandy, J.K. The three neurogenetic phases of human consciousness. J. Conscious Evol.
**2018**, 9, 24. [Google Scholar] - Changeux, J.P. Allosteric receptors: From electric organ to cognition. Annu. Rev. Pharmacol.
**2010**, 50, 1–38. [Google Scholar] [CrossRef] [PubMed] - Feinberg, T.E.; Mallatt, J. The evolutionary and genetic origin of consciousness in the Cambrian Period over 500 million years ago. Front. Psychol.
**2013**, 4, 667. [Google Scholar] [CrossRef] [Green Version] - Amaral, M.M.; Fang, F.; Hammock, D.; Irwin, K. Geometric state sum models from quasicrystals. Foundations
**2021**, 1, 155–168. [Google Scholar] [CrossRef] - Amaral, M.M.; Fang, F.; Aschheim, R.; Irwin, K. On the Emergence of Space Time and Matter from Model Sets. Preprint
**2021**, 2021110359. [Google Scholar] [CrossRef]

**Figure 1.**The DNA-binding domain of the immediate early gene Fos. The name in the protein data bank is 1FOS.

**Figure 2.**(

**Left**) Cartoon representation of the Cys${}_{2}$His${}_{2}$ zinc finger motif, consisting of an $\alpha $-helix and an antiparallel $\beta $-sheet. The zinc ion (green) is coordinated by two histidine residues and two cysteine residues. (

**Right**) Cartoon representation of the protein ZNF268 (blue) containing three zinc fingers in complex with DNA (orange). The coordinating amino acid residues and zinc ions (green) are highlighted. The name of the DNA-binding domain in the protein data bank is 4R2A.

**Figure 3.**(

**Up**) Crystal structure of Myc and Max in complex with DNA. (

**Down**) The link $L=A\cup B$ (which is supposed to control the binding domain Myc) is attached to the plane ${R}^{2}$ in the half-space ${R}_{+}^{3}$. It is not splittable. This can be proven by checking that the fundamental group $\pi ={\pi}_{2}\left(L\right)$ is not free [22] and ([23] p. 90). One gets ${\pi}_{2}=\left(\right)open="\langle "\; close="\rangle ">x,y,z\left|\right(x,(y,z))=z$, where (.,.) means the group theoretical commutator. The cardinality sequence of cc of subgroups of ${\pi}_{2}$ is $[1,3,10,51,164,1365,9422,81594,721305,\cdots ]$.

**Figure 4.**Crystal structure of p53 binding domain. The reference number in the protein data bank is 4HJE.

r | d = 1 | d = 2 | d = 3 | d = 4 | d = 5 | d = 6 | d = 7 |
---|---|---|---|---|---|---|---|

1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |

2 | 1 | 3 | 7 | 26 | 97 | 624 | 4163 |

3 | 1 | 7 | 41 | 604 | 13,753 | 504,243 | 24,824,785 |

4 | 1 | 15 | 235 | 14,120 | 1,712,845 | 371,515,454 | 127,635,996,839 |

5 | 1 | 31 | 1361 | 334,576 | 207,009,649 | 268,530,771,271 | 644,969,015,852,641 |

**Table 2.**Group structure of a TATA box. Column 1 is for the selected consensus sequence (rows 4 to 6 are for the TATA box in the core promoter of UGTIA1 gene). Column 2 is for the cardinality sequence (card seq) of conjugacy classes (cc) of subgroups in the finitely generated group whose relation (rel) is the consensus sequence (cons seq). Column 3 identifies the Hecke group ${H}_{q}=\left(\right)open="\langle "\; close="\rangle ">A,T|{A}^{2}{T}^{-q}$, which is close to the group under consideration (based on its card seq of subgroups). Column 4 refers to some references in the literature. Bold digits feature the fit to a Hecke group.

Rel: Cons Seq | Card Struct of cc of Subgroups | Group | Literature |
---|---|---|---|

TATAAAA | $[\mathbf{1},\mathbf{1},\mathbf{2},\mathbf{3},\mathbf{2},\mathbf{8},\mathbf{7},\mathbf{10},\mathbf{18},\mathbf{28},\cdots ]$ | ${H}_{3}$ | [6] (MA0108.1) |

TATAAAAA | $[\mathbf{1},\mathbf{3},\mathbf{2},\mathbf{8},\mathbf{6},\mathbf{19},\mathbf{16},\mathbf{69},\mathbf{83},\mathbf{238},\cdots ]$ | ${H}_{4}$ | [13] |

A(TA)${}_{5}$TAA | $[\mathbf{1},\mathbf{3},\mathbf{3},\mathbf{7},\mathbf{6},\mathbf{34},\mathbf{42},\mathbf{123},\mathbf{319},\mathbf{706},\cdots ]$ | ${H}_{6}$ | [14] |

A(TA)${}_{6}$TAA | $[\mathbf{1},\mathbf{1},\mathbf{1},\mathbf{1},\mathbf{1},\mathbf{1},\mathbf{34},\mathbf{77},\mathbf{79},\mathbf{51},\cdots ]$ | ${H}_{7}$ | . |

A(TA)${}_{7}$TAA | $[\mathbf{1},\mathbf{3},\mathbf{2},\mathbf{8},\mathbf{6},\mathbf{19},\mathbf{16},171,315,1022\cdots ]$ | $\approx {H}_{4}$ | . |

A(TA)${}_{8}$TAA | $[\mathbf{1},\mathbf{1},\mathbf{2},\mathbf{3},\mathbf{2},\mathbf{8},\mathbf{7},\mathbf{10},308,792\cdots ]$ | $\approx {H}_{3}$ | . |

**Table 3.**Group analysis of a few known and candidate SNP markers (taken from [15]) Column 1 is for the selected gene. Column 2 is for the SNP marker. Column 3 is for the card seq for the finitely generated group $fp$ whose relation (rel) is the marker. Column 4 is for the reference paper and the letter indicates the heuristic confidence level of the candidate SNP marker (in alphabetical order from the best (A) to the worst (E)). The computed closeness of the finitely generated group to the free group ${F}_{2}$, most of time, correlates to a lower risk of illness, as described in [15]. The symbol * corresponds to the only two-base SNP marker in the table. The card seq is the same as the sequence for the fundamental group of 3-manifold $m002$. The latter manifold is the smallest volume closed 3-manifold and is non-orientable [17].

Gene | Rel: Marker | Card Seq of cc of Subgroups | Literature |
---|---|---|---|

ESR2 | TTAAAAGGAA | $[\mathbf{1},7,17,114,423,4526,30364,293306\cdots ]$ | Table 1 in [15], B |

HSD17B1 | AGCCCAGAGC | $[\mathbf{1},\mathbf{3},\mathbf{7},\mathbf{26},217,124,18443,219870\cdots ]$ | ., A |

. | CAAGCCCAGA | $[\mathbf{1},7,14,109,396,3347,19758,188940\cdots ]$ | ., A |

PGR | AAAGGAGCCG | $[\mathbf{1},7,17,142,475,4125,23509,225871\cdots ]$ | ., A |

GSTM3 | GGGTATAAAG | $[\mathbf{1},7,14,109,396,3347,19758,188940\cdots ]$ | ., E |

. | CCCCTCCCGC | $[\mathbf{1},\mathbf{3},\mathbf{7},\mathbf{26},\mathbf{97},\mathbf{624},\mathbf{4163},\mathbf{34470}\cdots ]$ | ., C |

. | CCCTCCCGCT | . | ., C |

IL1B | AAAACAGCGA | $[\mathbf{1},7,14,89,224,1842,10191,86701\cdots ]$ | Table 2 in [15], A |

CYP2A6 | AAAGGCAAC | $[\mathbf{1},7,17,134,683,7077,64225\cdots ]$ | ., A |

DHFR | GGGACGAGGG | $[\mathbf{1},\mathbf{3},\mathbf{7},\mathbf{26},\mathbf{97},\mathbf{624},\mathbf{4163},\mathbf{34470}\cdots ]$ | ., A |

. | GGACGAGGGG | . | ., A |

LEP | GGGGCGGGA | $[\mathbf{1},\mathbf{3},\mathbf{7},\mathbf{26},\mathbf{97},\mathbf{624},\mathbf{4163},\mathbf{34470}\cdots ]$ | Table 3 in [15], C |

GCG | TGCGCCTTGG | $[\mathbf{1},\mathbf{3},\mathbf{7},\mathbf{26},119,816,4865,40489\cdots ]$ | ., B |

GH1 | TATAAAAAGG | $[\mathbf{1},7,14,109,396,3347,19758,188940$] | ., E |

. | GTATAAAAAG | . | ., D |

. | GGTATAAAAA | . | ., E |

. | AGGGCCCACA | $[\mathbf{1},\mathbf{3},\mathbf{7},\mathbf{26},127,860,5661,45710\cdots ]$ | ., A |

. | AAAGGGCCCC | $[\mathbf{1},\mathbf{3},10,67,266,3458,30653,312237\cdots ]$ | ., A |

. | AAAGGGCCA | . | ., A |

NOS2 | TCTTGGCTGC | Table 4 in [15], A | |

TPI1 | ATATAAGTGG | $[\mathbf{1},\mathbf{3},\mathbf{7},30,125,856,4832,40246\cdots ]$ | ., B |

GJA5 | TATTAAACAC | $[\mathbf{1},\mathbf{3},10,35,140,921,5778,47238\cdots ]$ | ., E |

HBD | AAAAGGCAGG | Table 5 in [15], A | |

F2 | AACCCAGAGG | $[\mathbf{1},\mathbf{3},\mathbf{7},\mathbf{26},127,860,5661,45710\cdots ]$ | ., A |

F8 | GGAAGAGGGA | $[\mathbf{1},\mathbf{3},2,7,4,18,9,27,36,68\cdots ]$ * | Table 6 in [15], A |

F3 | GCGCGGGGCA | ., A | |

F11 | TTTTTAGTAA | . | ., D |

. | TTTTTAGTAA | $[\mathbf{1},7,17,114,423,4526,30364,293306\cdots ]$ | ., A |

. | AAGGAAATTT | $[\mathbf{1},\mathbf{3},\mathbf{7},\mathbf{26},195,1692,11803,73192\cdots ]$ | ., A |

AR | GTGGAAGATT | $[\mathbf{1},\mathbf{3},\mathbf{7},34,139,931,5208,43867\cdots ]$ | Table 7 in [15], A |

. | CCACGACCCG | $[\mathbf{1},7,20,167,754,7232,60860,683597\cdots ]$ | ., D |

MTHFR | TCCCTCCCA | ., A | |

DMNT1 | TGTGTGGCCCG | . | ., A |

. | GTGTGTGCCC | . | ., A |

. | GACGAGCCCA | $[\mathbf{1},\mathbf{3},\mathbf{7},42,131,912,6011,47322\cdots ]$ | ., A |

NR5A1 | ACAAGAGAAA | ., A | |

. | GGTGTGAGAG | $[\mathbf{1},7,14,89,264,1987,11086,93086\cdots ]$ | ., A |

**Table 4.**Group structure of motifs for transcription factors of immediately early genes Fos, EGR and Myc. Most of the time, the card seq of the group defined with the relation/motif is the free group ${F}_{2}$ (for a 3 letter motif) or ${F}_{3}$ (for a 4 letter motif). There are two exceptions for the EGR1 gene, depending on the selected motif, where the card seq corresponds to the modular group ${H}_{3}$ or the Baumslag–Solitar group $BS(-1,1)$, which is the fundamental group of the Klein bottle. The card seq for ${H}_{3}$ is in Table 2. The card seq for $BS(-1,1)$ is $[1,3,2,5,2,7,2,8,3,8,2,13,2,9,4,\cdots ]$.

Gene | Rel: Motif | Card Seq | Literature |
---|---|---|---|

Fos | TGAGTCA | ${F}_{3}$ | [19] |

. | TGACTCA | ${F}_{3}$ | [6], MA MA0099.2 |

EGR1 | GCGTGGGCG | ${F}_{2}$ | [6], MA0162.1 |

. | CCGCCCCCG | ${H}_{3}$ | ., MA0162.2 |

. | CCGCCCCCGC | $BS(-1,1)$ | ., . |

. | ACGCCCACGCA | ${F}_{2}$ | ., MA0162.3 |

. | GGCCCACGC | . | ., MA0162.4 |

EGR2 | CCGCCCACGC | . | ., MA0472.1 |

. | ACGCCCACGCA | . | ., MA0472.2 |

EGR3,EGR4 | ACGCCCACGCA | . | ., [ MA0732.1, MA0733.1] |

Myc | CACGTG | ${F}_{3}$ | [19] |

. | CGCACGTGGT | . | [6], MA0147.1 |

. | CCCACGTGCTT | . | ., MA0147.2 |

. | CCACGTGC | . | ., MA0147.3 |

Mycn, Max::Myc, etc | GACCACGTGGT, etc. | . | ., [MA0104.1, etc.] |

**Table 5.**Group structure of motifs for some transcription factors that are not leading to free groups. The card seq for ${\pi}_{1}$ is $[1,4,1,2,4,2,1,7,2,2,4,2,2,8,1,2,7,2,3,\cdots ]$; for ${\pi}_{1}^{\prime}$ it is $[1,1,1,2,1,3,3,1,2,2,1,1,9,2,14,2,1,\cdots ]$. The card seq for ${\pi}_{2}$ is already in Figure 3 as $[1,3,10,51,164,1365,9422,81594,721305,\cdots ]$. The card seq for ${\pi}_{3}$ is $[1,7,14,89,264,1987,11086,93086,\cdots $]; for ${\pi}_{3}^{\prime}$, it is $[1,7,50,867,15906,570528,\cdots ]$; for ${\pi}_{3}^{\prime \prime}$, it is $[1,7,50,739,15234,548439,\cdots ]$; for ${\pi}_{3}^{\left(3\right)}$, it is $[1,7,41,668,14969,550675]$. The card seq for ${\pi}_{4}$ is $[15,82,1583,30242\cdots ]$. The index i in ${\pi}_{i}$ refers to the rank of the group under examination. The three sections are for motifs on 2, 3 and 4 letters, respectively.

Gene | Rel: Motif | Card Seq | Literature |
---|---|---|---|

NKX6-2 | TAATTAA | ${H}_{3}$ | [6], [MA0675.1, MA0675.2] |

HoxA1, HoxA2 | TAATTA | ${\pi}_{1}$ | [6], [MA1495.1, MA0900.1] |

POU6F1, Vax | . | . | ., [MAO628.1, MA0722.1] |

RUNX1 | TGTGGT | . | ., MA0511.1 |

RUNX1 | TGTGGTT | ${\pi}_{1}^{\prime}$ | [6], MA0002.2 |

EHF | CCTTCCTC | . | ., MA0598.1 |

POU6F1 | TAATGAG | ${\pi}_{2}$ | [6] MA1549.1 |

PITX2 | TAATCCC | . | ., [MA1547.1, MA1547.2] |

ELK4 | CTTCCGG | . | ., MA0076.2 |

OTX2, Dmbx1 | GGATTA | ${\pi}_{3}$ | [6], [MA0712.2, MA0883.1] |

PitX1, PitX2, PitX3, OTX1 | TAATCC | . | .,[MA0682.1, MA0711.1] |

N-box | TTCCGG | . | [24] |

p53 | CACATGTCCA | ${\pi}_{3}^{\prime}$ | [25] |

GZF1 | TGCGCGTCTATA | . | [4] |

NF-kappa-B | GGGAATTTCC | . | [6], [MA0107.1, MA1911.1] |

STAT1 | TTTCCCGGAA | . | ., MA0137.2 |

. | TTCCAGGAA | . | ., MA0137.3 |

STAT4 | TTCCAGGAAA | . | ., MA0518.1 |

FOSL1::Jun | ATGACGTCAT | ${\pi}_{3}^{\prime \prime}$ | [6], MA1129.1 |

USF2 | GTCATGTGACC | . | . , MA0626.1 |

PAX1 | CGTCACGCATGA | . | . , MA0779.1 |

STAT2 | TTCCAGGAAG | . | . , MA0144.1 |

FOS | GATGACGTCATCA | ${\pi}_{3}^{\left(3\right)}$ | [6], MA1951.1 |

MAFA, MAFF,MAFK | TGCTGAGTCAGCA | . | ., [MA1521.1, MA0495.2, MA0946.2] |

CREB | TGACGTCA | ${\pi}_{4}$ | [6], [MA0018.2, MA018.3] |

USF2 | GGTCACGTGACC | . | ., MA0526.4 |

SMAD3, SMAD5 | GTCTAGAC | . | ., [MA0795.1, MA1557.1], [26] |

**Table 6.**A short account of the function or dysfunction (through mutations or isoforms) of genes associated with transcription factors and sections in Table 5.

Gene | Type | Function | Dysfunction |
---|---|---|---|

NKX6-2 | homeobox | central nervous system, pancreas | spastic ataxia |

HoxA1 | homeobox | embryonic devt of face and heart | autism |

HoxA2 | . | . | cleft palate |

Pou6F1 | . | neuroendocrine system | clear cell adenocarcinoma |

Vax | . | forebrain development | craniofacial malform. |

RunX1 | Runt-related | cell differentiation, pain neurons | myeloid leukemia |

EHF | homeobox | epithelial expression | carcinogenesis, asthma |

PitX2 | . | eye, tooth, abdominal organs | Axenfeld–Rieger syndrome |

ELK4 | Ets-related | serum response for c-Fos | |

OTX1,OTX2 | homeobox | brain and sensory organ devt | medulloblastomas |

Dmbx1 | . | . | farsightedness and strabismus |

PitX1 | . | organ devt, left–right asymmetry | autism, club foot |

PitX3 | . | lens formation in eye | congenital cataracts |

N-box | Ets-related | synaptic expression | drug sensitivity |

p53 | p53 domain | ‘Guardian of the genome’ | cancers |

GZF1 | Zinc fingers | protein coding | short stature, myopia |

NF-kappa-B | . | DNA transcription, cytokines | apoptosis |

STAT1 | Stat family | signal activator of transcription | immunodeficiency 31 |

STAT4 | Stat family | signal activator of transcription | rheumatoid arthritis |

FOSL1::Jun | leucine zipper | cellular proliferation | marker of cancer |

USF2 | helix-loop-helix | transcription activator | |

PAX1 | paired box | fetal development | Klippel–Feil syndrome |

FOS | leucine zipper | cellular proliferation | cancers |

Maf | . | pancreatic development | congenital cerulean cataract |

CREB | bZIP | neuronal plasticity | Alzheimer’s disease |

USF2 | helix-loop-helix | transcription activator | |

SMAD | homeo domain | cell development and growth | Alzheimer’s disease |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Planat, M.; Amaral, M.M.; Fang, F.; Chester, D.; Aschheim, R.; Irwin, K.
Group Theory of Syntactical Freedom in DNA Transcription and Genome Decoding. *Curr. Issues Mol. Biol.* **2022**, *44*, 1417-1433.
https://doi.org/10.3390/cimb44040095

**AMA Style**

Planat M, Amaral MM, Fang F, Chester D, Aschheim R, Irwin K.
Group Theory of Syntactical Freedom in DNA Transcription and Genome Decoding. *Current Issues in Molecular Biology*. 2022; 44(4):1417-1433.
https://doi.org/10.3390/cimb44040095

**Chicago/Turabian Style**

Planat, Michel, Marcelo M. Amaral, Fang Fang, David Chester, Raymond Aschheim, and Klee Irwin.
2022. "Group Theory of Syntactical Freedom in DNA Transcription and Genome Decoding" *Current Issues in Molecular Biology* 44, no. 4: 1417-1433.
https://doi.org/10.3390/cimb44040095