# Dihedral Reductions of Cyclic DNA Sequences

## Abstract

**:**The data-analytic methodology of dihedral reductions for cyclic orbits of distinct-base codons is described both in terms of Fourier analysis over the dihedral groups and in (algebraically equivalent) terms of canonical projections. Numerical evaluations are presented for discrete and continuous scalar data indexed by cyclic orbits.

## 1. Introduction

The role of group-theoretic arguments in biology has a long history, in which R.A. Fisher’s classification of segregation genotypes in the theory of polysomic inheritance is a classical example [1]. In the theory of experimental designs, it was also Fisher who demonstrated the explicit usefulness of cyclic groups in the theory of confounding in factorial experiments [2,3], now widely used in biology and genetics studies. In recent decades, the theory and applications of algebraic methods in statistics and probability became a well-established area of interest, e.g., [4,5].

In structural biology as well, applications of symmetry arguments have been used to formulate working hypotheses and to suggest explanation and prediction, e.g., [6–10]. An explicit connection between symmetry arguments and data analytic reasoning in structural biology can be exemplified by the study [11] of the evolutionary importance of purine and pyrimidine content in the human immunodeficiency virus type 1, based on the statistical assessment of the frequency diversity of cyclic sets, defined as the ratio

More specifically, the present communication is aimed at showing that there is a broader group-theoretic and data-analytic framework within which the methodology described in [11] can be identified and further utilized, thus leading to eventually richer biological interpretations and explanatory narratives.

The framework of interest (Symmetry Studies) was described originally in [12] and is briefly reviewed in [13]. We will also refer to [14] for notions of Fourier analysis over the finite groups relevant to the present applications. See also [15–18] for related discussions, and [19,20] for applications in the field of linear optics.

This paper is divided as follows. The basic definitions, assumptions and notations are introduced in the next section. The cyclic reductions are discussed in Section 3. Numerical evaluations are presented in Section 4. Additional background material is presented in Appendices A, B, and C.

## 2. Definitions, Assumptions and Notation

Any DNA sequence in length of ℓ base pairs (bp) can be represented as a point in the set ${\mathcal{A}}^{L}$ of all mappings

_{ℓ}act on the left according to

^{−}

^{1}is necessary so its defining property η · (τ · s) = (ητ) · s can be verified. Permutations σ in subgroups H of S

_{4}act on the right according to

_{ℓ}× S

_{4}act bilaterally according to

In what follows, we will indicate by

_{ℓ}on a given sequence $S\in {\mathcal{A}}^{\mathcal{L}}$. We will refer, generically, to these sets as cyclic orbits or cyclic sets.

Throughout this communication, sequences written in lower case will always indicate the cyclic orbit generated by the corresponding sequence, to be written in upper case. For example,

It follows directly from (4) that two sequences S and F are complementary if

_{4}, written in cycle notation, representing the standard DNA complementarity of base pairs. For example, ACT and TGA are complementary sequences, and in that case we also say that their corresponding orbits act and tga are also complementary. Specifically,

#### 2.1. Injective Sequences

The dihedral reductions to be considered in this communication are obtained for DNA sequences in length of three (or codons) composed of distinct bases {A,G,C,T}. That is, for the injective mappings into $\mathcal{A}$ with domain L = {1, 2, 3}. These sequences account for the 24 distinct injective codons factored into 8 distinct cyclic orbits of length three.

Although the group actions (3)–(5) are defined for all mappings in ${\mathcal{A}}^{L}$, the resulting data-analytic applications may need to be adapted when non-injective sequences are included, due to the fact that the resulting actions may no longer be transitive. In that case, the data analysis is carried piecewise within the transitive parts [12]. In addition, because the actions on the injective sequences are faithful, any experimental results indexed by the points in the orbit are in one-to-one correspondence with the group elements, and can consequently be indexed by the group elements themselves. It is in the resulting group algebra structure that Fourier transforms can naturally be defined.

#### 2.2. Scalar Measurements

Throughout this paper it will be opportune to distinguish the following types of experimental data:

Data indexed by sequences, $x:{\mathcal{A}}^{\mathcal{L}}\mapsto \mathbb{R}$, indicated by x

_{S};Data indexed by cyclic orbits, $x:\mathcal{O}\mapsto \mathbb{R}$, indicated by x

_{s};Data indexed by group elements, $x:G\mapsto \mathbb{R}$, indicated by x

_{τ}, x_{σ},….

For example, the frequency diversity Equation (1) for act in terms of frequency counts x_{S} over a given region of the genome is given by

#### 2.3. Orbit Invariance

Every symmetry orbit has an intrinsic arbitrariness in the choice of its generating point, so that the resulting orbit is the same regardless of its generator. For example, recalling Equation (6),

Therefore, one would want the corresponding data summaries

A class of data summaries with this (orbit) invariance property, as shown in [14], is given precisely by the Fourier transforms

A class of (faithful) group actions on the cyclic orbits that allows us to identify x_{τ} with x_{s} and evaluate the Fourier transforms (or orbit invariants) will be introduced in the next section.

#### 2.4. Dihedral Orbits

The dihedral groups D_{n}, for n = 3, 4, …, can be realized as the group

_{n}a (non-commutative) group structure of order 2n. In addition, when n = 2,

_{1}= {1, h}, where h

^{2}= 1, and D

_{0}= {1} into the dihedral class as well.

The action (3) under G = D_{3} gives four distinct dihedral orbits

Similarly, the action (3) under G = D_{4} gives three distinct dihedral orbits

## 3. Invariant Reductions

In Diagrams (10) and (11), D_{3} rotations and reversals are shown sideways along the rows of the diagrams and complementary orbits are shown along columns, so that each box is labeled by a cyclic orbit. We shall refer to the orbits in each of the diagrams simply as conjugated orbits.

In addition, each orbit is labeled by the **polarity** (⊕, ⊖) of the sequence’s strand and by the encoding sense (→) or anti-sense (←) **direction** with which a gene or protein product reads off the sequence. More specifically, following (10), if any point in the act orbit is labeled with a **positive** polarity and with a reading **sense** direction then:

The corresponding point in the tca orbit has

**positive**polarity and the reading is in the**anti-sense**direction;The corresponding point in the tga orbit has

**negative**polarity and the reading is in the**sense**direction, and;The corresponding point in the agt orbit has

**negative**polarity and the reading is in the**anti-sense**direction.

Diagram (11) shows the complementary orbits gct and agc, with the same polarity and direction interpretation as in Diagram (10).

Figure 1 shows a configuration space for the conjugated cyclic orbits of Diagrams (10) and (11), relative to which Figure 2 shows, respectively on the left and right images, the common direction and common polarity configuration subspaces. In this configuration space (obviously not unique), same-direction subspaces span two intercepting tetrahedrons, whereas same-polarity subspaces span two parallel faces of the configuration space.

#### 3.1. D_{2}-Invariant Reductions

There is a transitive faithful action of C_{2} × C_{2} ≃ D_{2} on the cyclic orbits s of (10), given by

τ | σ | act | tca | tga | agt |
---|---|---|---|---|---|

1 | 1 | act | tca | tga | agt |

1 | (AT)(GC) | tga | agt | act | tca |

(13) | 1 | tca | act | agt | tga |

(13) | (AT)(GC) | agt | tga | tca | act |

As a consequence, any experimental data x_{s} indexed by the orbits (s) in the diagram can be reduced by the tools of dihedral Fourier analysis over D_{2}. We emphasize that the transitiveness and faithfulness of the D_{2} action on the set of orbits is necessary to identify the orbits with the group elements and then proceed to the determination of the (D_{2}) orbit invariants using the Fourier transforms. Following [14], these four one-dimensional transforms are simply

#### 3.2. Entropy Invariants

The orbit invariants determined above are functions of any scalars x_{s} obtained over the orbit s, such as its diversity (7), its raw sum (8), or its total molecular weight. When x_{s} are positive integers, such as the sum of frequency counts over the orbit s, then the entropy (Ent ) of the observed distributions
$(\mathcal{L})$ of frequency counts given by,

#### 3.3. D_{4}-Invariant Reductions

There are three non-equivalent transitive right actions σs of D_{4} on the set

D

_{4}≃< (ATGC), (AG) >,D

_{4}≃< (AGCT), (AC) >,D

_{4}≃< (ACTG), (AT) >.

The action of D_{4} ≃< (ATGC), (AG) > is given by:

_{s}indexed by ${\mathcal{O}}_{3}$ can be reduced by dihedral Fourier transforms over D

_{4}or the corresponding canonical projections decompositions. These two views, both leading to the identification of the orbit invariants, are outlined next.

#### 3.4. Canonical Projections

The linear representation of D_{4} in
${\mathbb{R}}^{8}$ defined by (16) is given by the permutation matrices associated with the rotations

Similarly, the action generated by D_{4} ≃< (AGCT), (AC) > yields a linear representation of

_{4}≃< (ACTG), (AT) > yields a linear representations of

The resulting canonical projections ${\mathcal{P}}_{\xi}$, indexed by the irreducible representations

_{4}, evaluated for the representation of (17), are shown in Appendix C, along with the components ${x}^{\prime}{\mathcal{P}}_{\xi}x$ of the resulting decomposition

#### 3.4.1. Interpretation of the Components ${x}^{\prime}{\mathcal{P}}_{\xi}x$

The particular representation (17) leads to the following interpretation of each of the non-trivial (orbit invariant) components
${x}^{\prime}{\mathcal{P}}_{\xi}x$ of ‖x‖^{2}, in terms of the combinations of polarity (⊕,⊖) and direction (→, ←),

_{s}(the data indexed by that labels).

The projection ${\mathcal{P}}_{\alpha}$ identifies a one-dimensional invariant comparing the overall mean effects

The projection ${\mathcal{P}}_{\gamma -}$ identifies a one-dimensional invariant contrasting the same variation described above. Lastly, the projection ${x}^{\prime}{\mathcal{P}}_{\beta}x$ identifies a two-dimensional invariant assessing direction given polarity effects in terms of

#### 3.5. Dihedral Fourier Analysis

Reading from the column under the act orbit in (16), the points in the group algebra $\u2102{D}_{4}$ are given by

It is opportune to remark here that in the definition of Equation (21) we arbitrarily assigned the identity in D_{4} to x_{act}. Any of the other potential assignments would be precisely a relabeling of the orbit’s starting point. The Fourier transforms, however, would remain orbit invariant, in the sense of Equation (9).

## 4. Numerical Evaluations

In this section we apply the cyclic reductions described in Section 3 to specific complete genomes of the human immunodeficiency virus type 1 and the hepatitis C virus.

#### 4.1. Relative Entropy Study of the HIV1 BRUCG Isolate

Following Section 3.2, the data indexed by the cyclic orbits are simply the sums x_{s} of the frequency counts x_{S} with which the sequence S occurs in a given region of the genome, that is,

The frequency counts were evaluated by scanning the genome one base at a time in the 5′−3′ direction. The sequence in FASTA format was downloaded from the NCBI website ( http://www.ncbi.nlm.nih.gov). Computations were evaluated using the Symmetry Computing Toolbox (Symmetry Computing Toolbox, ⓒM.Viana). This particular HIV1 isolate, used here for numerical illustration only, also appears in the study of the HIV1’s evolutionary properties [11].

The frequency counts were obtained for the complete genome of human immunodeficiency virus type 1, isolate BRU (LAV-1), sequence ID gi:326417, accession number K02013.1, HIVBRUCG. See [23]. The full 9229 bp-long sequence was partitioned into six equal-length adjacent regions numbered 1–6, where the cyclic summaries x_{s} were evaluated. The frequency counts for the conjugated cyclic orbits of act corresponding to Diagram (10) are shown in (22) and (23), whereas in (24) and (25) show the frequency counts for the conjugated cyclic orbits of gct corresponding to Diagram (11).

Figure 3 shows the resulting relative entropy invariants, as defined in Section 3.2, both for the act and for the gct conjugated cyclic orbits.

For example, reading from Region 3, in Equations (24) and (25), we have

**Interpretations:**

Reading again from Diagrams (10) and (11), it follows that the invariant

**polarity**uncertainty; the invariant

**direction**uncertainty, whereas the invariant

Polarity uncertainty: $\text{Ent}\phantom{\rule{0.2em}{0ex}}{\mathcal{L}}_{act+tca}=\text{Ent}(act+tca,tga+agt)$;

Direction uncertainly: $\text{Ent}\phantom{\rule{0.2em}{0ex}}{\mathcal{L}}_{act+tga}=\text{Ent}(act+tga,tca+agt)$;

Interaction: $\text{Ent}\phantom{\rule{0.2em}{0ex}}{\mathcal{L}}_{act+agt}=\text{Ent}(act+agt,tca+tga)$.

#### 4.2. Statistical Assessment

The statistical assessment of the entropy can be obtained by numerically evaluating its sampling distribution based on 10,000 randomly generated observations from the posterior (Beta) distribution conjugated to binomial likelihood for the data, relative to the uniform prior probability distribution. Based on the resulting sampling distribution, a numerical evaluation of a posterior 95% credibility interval (CI) for the relative entropy can be obtained.

For example, reading from Region 5, in (24) and (25), we have,

#### 4.3. Orbit Diversity Decomposition for the HIV1 Samples

In this section we apply the canonical decomposition introduced in Section 3.4 and evaluated in Appendix C to reduce the diversity data shown in Equation (1) indexed by the joint set of conjugated orbits (the conjugated orbits of act adjoined to the conjugated orbits of gct), using the D_{4} action defined in Section 3.3. The orbit diversity for the joint set of conjugated orbits is shown in (26) for each sequence in the sample of 10 Brazilian sequences referenced in Appendix B.1.

The inclusion of the error (due to sampling variability) term in the canonical decomposition for the sample is obtained by tensoring the decomposition induced by the representation of interest, shown in Appendix C, with the standard canonical decomposition [12] (Chapter 4)

_{n}is the n × n identity matrix and $\mathcal{Q}=I-\mathcal{A}$. The canonical decomposition for the sample is then

Because ${\mathcal{A}}^{\prime}=\mathcal{A}$, ${\mathcal{A}}^{2}=\mathcal{A}$ and $\mathcal{A}e=e$, we have

Similarly,

The degrees of freedom in each case are obtained by the traces of the corresponding projections, which are also equal to the dimension of the projecting (invariant) subspaces. Under suitable parametric assumptions the magnitude of the ratios

Under large-sample parametric assumptions and independent dihedral covariance structure it follows that, with the exception of the contrast associated with γ_{−}, all F-ratios are significantly high (statistically distinct from zero).

#### 4.4. Orbit Diversity Decomposition for the HCV Samples

This section replicates the methods described in Section 4.3 for a sample of 10 Brazilian hepatitis C sequences. The orbit diversity for the joint set of conjugated orbits for each sequence in the sample is shown in (28). Their accession numbers are referenced in Appendix B.1.

The corresponding analysis of variance decomposition is shown in (29).

It should be evident, by comparing the magnitude of the F-ratios,

virus | ${\mathcal{P}}_{\alpha}$ | ${\mathcal{P}}_{\gamma +}$ | ${\mathcal{P}}_{\gamma -}$ | ${\mathcal{P}}_{\beta}$ |
---|---|---|---|---|

HIV1 | 238.342 | 5.889 | 1.882 | 137.962 |

HCV | 28.846 | 8.653 | 0.009 | 22.115 |

## 5. Summary

In this communication we constructed dihedral D_{2} reduction of conjugate injective cyclic orbits in length of three, a dihedral D_{4} reduction of their combined set, and a dihedral D_{3} reduction of the set of conjugate injective cyclic orbits in length of four. In each case, the experimental scalar data can be any summary obtained over the cyclic orbits, such as the sum or an extreme value of the frequency counts over the cyclic orbit, the entropy of a frequency distribution over the orbit, its amino acid content, or, as in [11], the orbit’s frequency diversity. In the case of matrix data, the data-analytic methods of group rings, instead of group algebras would then be the appropriate methodology [14].

## Acknowledgments

The author is thankful to the referees’ comments and clarifying suggestions.

## Appendix

#### A. HIV1 and HCV Sequences

The following are the accession numbers for the HIV1 and HCV sequences considered in the present study:

#### B. Additional Studies

#### B.1. Relative Entropy Study of 10 Brazilian HIV1 Sequences

The relative entropy evaluations illustrated above in Section 4.1 were replicated for a sample of 10 Brazilian HIV1 sequences, referenced in Appendix A. The raw frequency counts and the corresponding relative entropy profiles for each of 10 sequences are linked in [24].

#### B.2. Relative Entropy Study of 10 Brazilian HCV Sequences

Similarly to the study for the HIV1, a sample of 10 Brazilian hepatitis C sequences was evaluated for their relative entropy. The sequences are referenced in Appendix A. The raw frequency counts and the corresponding relative entropy profiles along each genome are linked in [25]. The relative entropy invariant profiles clearly highlight the structural differences between the two types of viruses.

#### B.3. Relative Entropy Study of Random Reference Sequences

It is statistically useful to compare the cyclic reductions obtained for HIV1’s isolate described above with those from random DNA sequences of comparable lengths. The results, based on 20 random sequences, shown in [26], clearly indicate that the observed variations in relative entropy (invariants) for the conjugated gct orbits, both for HIV1 and HCV sequences, are well below what one would expect to observe for random sequences of comparable lengths.

#### C. Canonical Projections

The following are the canonical projections

^{2}, with the entries of x indexed in correspondence with (act, tca, tga, agt, gct, tcg, cga, agc). Here:

## Conflicts of Interest

The author declares no conflict of interest.

## References

- Fisher, R.A. The theory of linkage in polysomic inheritance. Philos. Trans. Roy. Soc. London. Ser. B
**1947**, 233, 55–87. [Google Scholar] - Fisher, R.A. The theory of confounding in factorial experiments in relation to the theory of groups. Ann. Eugen
**1942**, 11, 341–353. [Google Scholar] - Fisher, R.A. A system of confounding for factors with more than two alternatives, giving completely orthogonal cubes and higher powers. Ann. Eugen
**1945**, 12, 283–290. [Google Scholar] - Viana, M. Algebraic Methods in Statistics and Probability; Contemporary Mathematics; Richards, D., Ed.; American Mathematical Society: Providence, RI, USA, 2001; Volume 287. [Google Scholar]
- Viana, M. Algebraic Methods in Statistics and Probability II; Contemporary Mathematics; Wynn, H., Ed.; American Mathematical Society: Providence, RI, USA, 2010; Volume 516. [Google Scholar]
- Findley, G.L.; Findley, A.M.; McGlynn, S.P. Symmetry characteristics of the genetic code. Proc. Natl. Acad. Sci. USA
**1982**, 79, 7061–7065. [Google Scholar] - Sergienko, I.V.; Gupal, A.M.; Vagis, A.A. Symmetry in encoding genetic information in DNA. Cybern. Syst. Anal
**2011**, 47, 408–414. [Google Scholar] - Hornos, J.E.; Braggion, L.; Magini, M.; Forger, M. Symmetry preservation in the evolution of the genetic code. IUBMB Life
**2004**, 56, 125–130. [Google Scholar] - Zandi, R.; Reguera, D.; Bruinsma, R.F.; Gelbart, W.M.; Rudnick, J.; Reiss, H. Origin of icosahedral symmetry in viruses. Proc. Natl. Acad. Sci. USA
**2004**, 101, 15556–15560. [Google Scholar] - Finkel, D.L. HIV-1 ancestry primordial expansions of RRE and RRE- related sequences. J. Theor. Biol
**1992**, 3, 285–302. [Google Scholar] - Doi, H. Importance of purine and pyramidine content of local nucleotide sequences (six bases long) for evolution of human immunodeficiency virus type 1. Evolution
**1991**, 88, 9282–9286. [Google Scholar] - Viana, M. Symmetry Studies, an Introduction to the Analysis of Structured Data in Applications; Cambridge University Press: New York, NY, USA, 2008. [Google Scholar]
- Souza, D.J.; Chaves, L.M.; Viana, M.A.G. Symmetries in symbolic sequences. Rev. Bras. Biom
**2010**, 1, 73–86. [Google Scholar] - Viana, M.; Lakshminarayanan, V. Dihedral Fourier Analysis, Data-Analytic Aspects and Applications; Lecture Notes in Statistics; Springer: New York, NY, USA, 2013; Volume 206. [Google Scholar]
- Viana, M. Canonical invariants for three-candidate preference rankings. Can. Appl. Math. Q
**2007**, 15, 203–222. [Google Scholar] - Viana, M. Canonical Decompositions and Invariants for Data Analysis; Elsevier: Amsterdam, The Netherlands, 2009. [Google Scholar]
- Viana, M. Symmetry studies and decompositions of entropy. Entropy
**2006**, 8, 88–109. [Google Scholar] - Viana, M. Symmetry orbits and their data-analytic properties. Rev. Mat
**2013**, 20, 155–166. [Google Scholar] - Viana, M. Dihedral Polynomials. In Mathematical Optics: Classical, Quantum, and Computational Methods; Lakshminarayanan, V., Calvo, M.L., Alieva, T., Eds.; CRC Press: Boca Raton, FL, USA, 2012. [Google Scholar]
- Viana, M.; Lakhsminarayanan, V. Symmetry studies of refraction data. J. Modern Opt
**2014**, 61, 138–146. [Google Scholar] - Viana, M. Symmetry-Related Decompositions of Uncertainty. Proceedings of the XI Brazilian Meeting on Bayesian Statistics, Amparo-SP, Brazil, 18–22 March 2012; Stern, J., Lauretto, M., Polpo, A., Diniz, M., Eds.; American Institute of Physics: Melville, NY, USA, 2012. [Google Scholar] [CrossRef]
- Cartan, E. The Theory of Spinors; MIT Press: Cambridge, MA, USA, 1966. [Google Scholar]
- Human immunodeficiency virus type 1, isolate BRU, complete genome (LAV-1), Available online: http://www.ncbi.nlm.nih.gov/nuccore/K02013.1 accessed on 23 December 2014.
- Relative entropy study of 10 Brazilian HIV1 sequences, Available online: https://app.box.com/s/ui9nt6uu6pitc6mxl6cg accessed on 23 December 2014.
- Relative entropy study of 10 Brazilian HCV sequences, Available online: https://app.box.com/s/zgpv0fkpd1pz9c30ql1q accessed on 23 December 2014.
- Relative entropy study of random reference sequences, Available online: https://www.box.com/s/g7rguk0dy3ben5w7e93x accessed on 23 December 2014.

**Figure 3.**The top profiles show (respectively to the legends shown top to bottom) the relative entropy for the joint distributions of (act, tca, tga, agt); (act + tca, tga + agt); (act + tga, tca + agt), and (act + agt, tca + tga). The bottom profiles display the corresponding results for the conjugated orbits of gct.

**Figure 4.**Posterior 95% credibility intervals for the relative entropy of binomial distributions of total sample size n = 208.

© 2015 by the authors; licensee MDPI, Basel, Switzerland This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Viana, M.A.G.
Dihedral Reductions of Cyclic DNA Sequences. *Symmetry* **2015**, *7*, 67-88.
https://doi.org/10.3390/sym7010067

**AMA Style**

Viana MAG.
Dihedral Reductions of Cyclic DNA Sequences. *Symmetry*. 2015; 7(1):67-88.
https://doi.org/10.3390/sym7010067

**Chicago/Turabian Style**

Viana, Marlos A.G.
2015. "Dihedral Reductions of Cyclic DNA Sequences" *Symmetry* 7, no. 1: 67-88.
https://doi.org/10.3390/sym7010067