# Algebraic Morphology of DNA–RNA Transcription and Regulation

^{1}

^{2}

^{*}

^{†}

## Abstract

**:**

## 1. Introduction

Again since the one face, constant in symmetry, appears sometimes fair and sometimes not, can we doubt that beauty is something more than symmetry, that symmetry itself owes its beauty to a remoter principle?[1] (Ennead I, Sixt Tractate, p66).

## 2. Theory

#### 2.1. Finitely Generated Groups, Free Groups and Their Conjugacy Classes

#### 2.2. The $SL(2,\mathbb{C})$ Character Variety of a Finitely Generated Group and a Groebner Basis

#### 2.3. Algebraic Geometry and Topology of DNA/RNA Sequences

#### 2.3.1. Two-Base Sequences

#### 2.3.2. Three-Base Sequences

#### 2.3.3. Four-Base Sequences

## 3. Discussion

## 4. Results

#### 4.1. Algebraic Morphology of the Transcription Factor Prdm1

#### 4.1.1. The Character Variety

#### 4.1.2. The Groebner Basis

#### 4.2. Algebraic Morphology of Homeodomains for Nanog and Xvent

**Table 2.**A few (three-base) transcription factors whose group structure is away from a free group or whose Groebner basis of the $SL(2,\mathbb{C})$ character variety contains a (possibly almost) singular surface. The symbol gene is for the identification of the transcription factor in the Jaspar database [34], motif is for the consensus sequence of the transcription factor, card seq is for the cardinality sequence of conjugacy classes of subgroups of the group whose motif is the generator, simple sing is for the identification of a surface with simple singularities within the Groebner basis and the last column is for a reference paper and the corresponding disease. The group ${F}_{2}$ is the free group of rank two. The card seq for ${\pi}_{2}$ is $[1,3,10,51,164,1230,7829,59835,491145\cdots ]$, close to the card seq of the group $\left(\right)$. The latter group is found as governing the structure of many transcription factors and is associated to the link found in ([13], Figure 2). The card seq for ${\pi}_{3}$ is $[7,14,89,264,1987,11086,93086\cdots ]$. The surface ${f}_{b}^{\left({A}_{1}\right)}(x,y,z)={x}^{2}+{y}^{2}-6{z}^{2}+4xyz$ (not defined in the text) is part of the character variety for the genes Pitx1, OTX1, etc.

Gene | Motif | Card Seq | Simple Sing | Ref & Disease |
---|---|---|---|---|

Prdm1 | ACTTTC | ${F}_{2}$ | ${S}_{1},{S}_{2}(x,y,z)$ | [34], MA0508.2 lupus, rheumatoid arthritis MA1549.1 lung adenocarcinoma MA0076.2 gastric cancer [MA0712.2, MA0883.1] medulloblastomas [37] drug sensitivity |

POU6F1 | TAATGAG | ${\pi}_{2}$ | no | |

ELK4 | CTTCCGG | . | no, Fricke | |

OTX2 | GGATTA | ${\pi}_{3}$ | no | |

N-box | TTCCGG | . | no, Fricke | |

Pitx1,OTX1,⋯ | TAATCC | . | ${f}_{H}^{\left(4\right)},{f}_{b}^{\left({A}_{1}\right)}(x,y,z)$ | [34], [MA0682.1,MA0711.1] autism, epilepsy, ⋯ |

Nanog | TAATGG | . | ${f}_{H}^{\left(4\right)},{f}_{a}^{\left({A}_{1}\right)}(x,y,z)$ | [35] cancer cells |

Xvent | CTAATT | F2 | ${f}_{4,\left\{\right\}}^{\left(2{A}_{1}\right)},{f}^{\left({A}_{2}\right)}(x,y,z)$ | [36] |

#### 4.3. Algebraic Morphology of microRNAs

**Table 3.**A few human (prefix ‘hsa’) microRNAs whose group structure is away from a free group or whose Groebner basis of the $SL(2,\mathbb{C})$ character variety contains a singular surface. The symbol mir is for the identification in the Mir database [43], seed is for the seed of the miRNA, card seq is for the cardinality sequence of conjugacy classes of subgroups of the group whose seed is the generator, sing is the identification of a singular surface within the Groebner basis and the last column is for a reference paper and the corresponding disease [40]. The card seq for ${\pi}_{1}$ and ${\pi}_{1}^{\prime}$ are given in ([4], Table 5). The card seq for ${\pi}_{2}^{\prime}$ is $[1,3,7,34,139,931,5208,43867\cdots ]$. For hsa-mir-124-1-3p, one encounters the Fricke surface ${f}_{2,\left\{\right\}}^{\left({A}_{1}\right)}=xyz+{x}^{2}+{y}^{2}+{z}^{2}-2y$ in the character variety.

mir | Seed | Card Seq | Simple Sing | Ref & Disease |
---|---|---|---|---|

hsa-mir-193b-5p | GGGGUU | ${\pi}_{1}$ | no | [40,43] lung cancer |

GGGGUUU | ${\pi}_{1}^{\prime}$ | no | ||

hsa-mir-155-3p | UCCUAC | ${F}_{2}$ | ${f}_{b}^{\left({A}_{1}\right)}(x,y,z)$ | [40,41,43] multiple sclerosis |

UCCUACA | ${\pi}_{2}$ | no | ||

hsa-mir-193a-5p | GGGUCUU | ${F}_{2}$ | ${f}_{b}^{\left({A}_{1}\right)}(x,y,z)$ | [40,43] breast cancer |

hsa-mir-223-5p | GUGUAUU | . | . | . |

hsa-mir-133-3p | UUGGUC | ${F}_{2}$ | ${f}_{b}^{\left(3{A}_{1}\right)}(x,y,z)$ | [40,43] atrial fibrillation |

UUGGUCC | ${\pi}_{2}^{\prime}$ | no | ||

hsa-mir-124-3p | AAGGCA | ${F}_{2}$ | ${f}_{b}^{\left(3{A}_{1}\right)},{f}_{2,\left\{\right\}}^{\left({A}_{1}\right)}$ | [43,44] |

AAGGCAC | . | no sing | Alzheimer’s disease |

**Table 4.**The opposite strand of the microRNA considered in Table 3. The seed sequence is made of 4 distinct bases and the corresponding card seq is the free group ${F}_{3}$ of rank 3. The Groebner basis contains 4 copies of the generic collection of surfaces ${\kappa}_{4}(x,y,z)$, ${f}^{\left(3{A}_{1}\right)}(x,y,z)$, ${\kappa}_{3}(x,y,z)$, etc., as shown in Figure 5, except for the -5p strand of mir-133, where there are only 3 copies of the generic surfaces.

mir | Seed | Card Seq | Sing | Ref & Disease |
---|---|---|---|---|

hsa-mir-193b-3p | ACUGGCC | ${F}_{3}$ | $4\times $ generic | [40,43] |

hsa-mir-155-5p | UUAAUGCUA | . | . | [40,41,43] |

hsa-mir-193a-3p | ACUGGCC | . | . | [40,43] |

hsa-mir-223-3p | GUCAGUU | . | . | . |

hsa-mir-124-5p | GUGUUCA | . | . | . |

hsa-mir-133-5p | GCUGGUA | . | $3\times $ generic | [43,44] |

## 5. Conclusions

**Figure 1.**

**Left**: the Nanog transcription factor (PDB 9ANT).

**Right**: the pre-miR-155 secondary structure [16].

**Figure 2.**(

**Up**): Complementary base-pairing between miR-155-3p and the human Irak3 (interleukin-1 receptor-associated kinase 3) mRNA ([16], Figure 5). The requisite‘seed sequence’ base-pairing is denoted by the bold dashes. (

**Down**): the surface ${f}_{b}^{\left({A}_{1}\right)}(x,y,z)={x}^{2}+{y}^{2}-6{z}^{2}+4xyz$.

**Figure 3.**(

**Left**): the Cayley cubic ${\kappa}_{4}(x,y,z)$. (

**Right**): the surface ${f}_{a}^{\left({A}_{1}\right)}(x,y,z)$.

**Figure 4.**The Fricke surface ${V}_{1,1,1,1}(x,y,z)={f}_{a}^{\left(3{A}_{1}\right)}(x,y,z)$ (with three simple singularities of type ${A}_{1}$).

**Figure 5.**(

**Up**): Complementary base-pairing between miR-155-5p and the human Spi1 (spleen focus forming virus proviral integration oncogene) ([16], Figure 4). The requisite ‘seed sequence’ base-pairing is denoted by the bold dashes. (

**Down (from left to right)**): the surfaces ${f}_{H}^{\left(4\right)}={\kappa}_{4}(x,y,z)$, ${f}^{\left(3{A}_{1}\right)}(x,y,z)$ and ${\kappa}_{3}(x,y,z)$, four copies of them are contained within the Groebner basis for the character variety.

**Figure 7.**(

**Left**): the cubic surface ${f}_{4,\left\{\right\}}^{\left(2{A}_{1}\right)}(x,y,z)$. (

**Right**): the cubic surface ${f}_{b}^{\left(3{A}_{1}\right)}(x,y,z)$.

**Table 1.**The counting of conjugacy classes of subgroups of index d in the free group ${F}_{r}$ of rank r = 1 to 3. The last column is the index of the sequence in the on-line encyclopedia of integer sequences [18].

r | Card Seq | Sequence Code |
---|---|---|

1 | $[1,1,1,1,1,1,1,1,1,\cdots ]$ | A000012 |

2 | $[1,3,7,26,97,624,4163,34470,314493,\cdots ]$ | A057005 |

3 | $[1,7,41,604,13753,504243,24824785,1598346352,\cdots ]$ | A057006 |

