Dihedral Reductions of Cyclic DNA Sequences

Viana, Marlos A.G.

doi:10.3390/sym7010067

Open AccessArticle

Dihedral Reductions of Cyclic DNA Sequences

by

Marlos A.G. Viana

Symmetry Studies Laboratory, University of Illinois at Chicago Eye Center, Chicago, IL 60612, USA

Symmetry 2015, 7(1), 67-88; https://doi.org/10.3390/sym7010067

Submission received: 26 January 2014 / Accepted: 18 December 2014 / Published: 16 January 2015

Download

Browse Figures

Versions Notes

Abstract

: The data-analytic methodology of dihedral reductions for cyclic orbits of distinct-base codons is described both in terms of Fourier analysis over the dihedral groups and in (algebraically equivalent) terms of canonical projections. Numerical evaluations are presented for discrete and continuous scalar data indexed by cyclic orbits.

Keywords:

Fourier analysis; group representations; dihedral groups; cyclic orbits; canonical projections; nucleotides; group orbits; invariance

1. Introduction

The role of group-theoretic arguments in biology has a long history, in which R.A. Fisher’s classification of segregation genotypes in the theory of polysomic inheritance is a classical example [1]. In the theory of experimental designs, it was also Fisher who demonstrated the explicit usefulness of cyclic groups in the theory of confounding in factorial experiments [2,3], now widely used in biology and genetics studies. In recent decades, the theory and applications of algebraic methods in statistics and probability became a well-established area of interest, e.g., [4,5].

In structural biology as well, applications of symmetry arguments have been used to formulate working hypotheses and to suggest explanation and prediction, e.g., [6–10]. An explicit connection between symmetry arguments and data analytic reasoning in structural biology can be exemplified by the study [11] of the evolutionary importance of purine and pyrimidine content in the human immunodeficiency virus type 1, based on the statistical assessment of the frequency diversity of cyclic sets, defined as the ratio

x_{O} = \frac{{max}_{O} f}{{min}_{O} f}

(1)

of extreme (max/min) frequency counts (f) in the cyclic set

O

evaluated over a given region of the genome. In the context of symmetry studies, the frequency diversity is just but one of the many possible data summaries indexed by cyclic sets or orbits. In the present communication, the frequency diversity, as well as cyclic summaries such as the raw sum

x_{O} = \sum_{O} f

(2)

of frequency counts along the orbit, will be shown to share similar algebraic and data-analytic structures.

More specifically, the present communication is aimed at showing that there is a broader group-theoretic and data-analytic framework within which the methodology described in [11] can be identified and further utilized, thus leading to eventually richer biological interpretations and explanatory narratives.

The framework of interest (Symmetry Studies) was described originally in [12] and is briefly reviewed in [13]. We will also refer to [14] for notions of Fourier analysis over the finite groups relevant to the present applications. See also [15–18] for related discussions, and [19,20] for applications in the field of linear optics.

This paper is divided as follows. The basic definitions, assumptions and notations are introduced in the next section. The cyclic reductions are discussed in Section 3. Numerical evaluations are presented in Section 4. Additional background material is presented in Appendices A, B, and C.

2. Definitions, Assumptions and Notation

Any DNA sequence in length of ℓ base pairs (bp) can be represented as a point in the set $A^{L}$ of all mappings

S : L = {1, \dots, ℓ} \mapsto A = {A, G, C, T}

where, typically, permutations τ in subgroups G of the full symmetric group S_ℓ act on the left according to

S τ^{- 1}

(3)

The inverse of the group element appearing in the group action τ · s = sτ⁻¹ is necessary so its defining property η · (τ · s) = (ητ) · s can be verified. Permutations σ in subgroups H of S₄ act on the right according to

σ S,

(4)

and subgroups G × H of the direct product group S_ℓ × S₄ act bilaterally according to

σ S τ^{- 1} .

(5)

In what follows, we will indicate by

s = {S γ^{- 1}; γ \in C_{ℓ}}

(6)

the orbit generated by the left action of the cyclic group C_ℓ on a given sequence

S \in A^{ℒ}

. We will refer, generically, to these sets as cyclic orbits or cyclic sets.

Throughout this communication, sequences written in lower case will always indicate the cyclic orbit generated by the corresponding sequence, to be written in upper case. For example,

a c t = {ACT, CTA, TAC}, a g c t = {AGCT, GCTA, CTAG, TAGC} .

It follows directly from (4) that two sequences S and F are complementary if

F = σ S,

with

σ = (AT) (GC)

the permutation in S₄, written in cycle notation, representing the standard DNA complementarity of base pairs. For example, ACT and TGA are complementary sequences, and in that case we also say that their corresponding orbits act and tga are also complementary. Specifically,

F = σ S \Rightarrow f = {F γ^{- 1}; γ \in C_{ℓ}} = {σ S γ^{- 1}; γ \in C_{ℓ}} = σ {S γ^{- 1}; γ \in C_{ℓ}} = σ f .

2.1. Injective Sequences

The dihedral reductions to be considered in this communication are obtained for DNA sequences in length of three (or codons) composed of distinct bases {A,G,C,T}. That is, for the injective mappings into $A$ with domain L = {1, 2, 3}. These sequences account for the 24 distinct injective codons factored into 8 distinct cyclic orbits of length three.

Although the group actions (3)–(5) are defined for all mappings in $A^{L}$ , the resulting data-analytic applications may need to be adapted when non-injective sequences are included, due to the fact that the resulting actions may no longer be transitive. In that case, the data analysis is carried piecewise within the transitive parts [12]. In addition, because the actions on the injective sequences are faithful, any experimental results indexed by the points in the orbit are in one-to-one correspondence with the group elements, and can consequently be indexed by the group elements themselves. It is in the resulting group algebra structure that Fourier transforms can naturally be defined.

2.2. Scalar Measurements

Throughout this paper it will be opportune to distinguish the following types of experimental data:

Data indexed by sequences, $x : A^{ℒ} \mapsto ℝ$ , indicated by x_S;
Data indexed by cyclic orbits, $x : O \mapsto ℝ$ , indicated by x_s;
Data indexed by group elements, $x : G \mapsto ℝ$ , indicated by x_τ, x_σ,….

For example, the frequency diversity Equation (1) for act in terms of frequency counts x_S over a given region of the genome is given by

x_{a c t} = \frac{\max {x_{ACT}, x_{CTA}, x_{TAC}}}{min {x_{ACT}, x_{CTA}, x_{TAC}}},

(7)

whereas the raw sum Equation (2) for the same orbit is

x_{a c t} = x_{ACT} + x_{CTA} + x_{TAC} .

(8)

2.3. Orbit Invariance

Every symmetry orbit has an intrinsic arbitrariness in the choice of its generating point, so that the resulting orbit is the same regardless of its generator. For example, recalling Equation (6),

a c t = c t a = t a c .

Therefore, one would want the corresponding data summaries

x_{a c t}, x_{c t a}, x_{t a c}

to be stable, or invariant, under different choices of orbit generators. Obviously Equations (7) and (8) are both orbit invariants. This is a universal requirement that applies to all summaries obtained from data indexed by symmetry orbits.

A class of data summaries with this (orbit) invariance property, as shown in [14], is given precisely by the Fourier transforms

< x, ξ > = \sum_{τ \in G} x_{τ} ξ_{τ},

evaluated at the (irreducible) representations ξ of G. The invariance property says that, regardless of the different orbit relabelings τx, their Fourier transforms < τx, ξ > stay bound to certain well-defined (irreducible representation) subspaces of the original data module or vector space [14]. That is,

< τ x, ξ > = ξ_{τ} < x, ξ >,

(9)

so that the transforms reduce as the corresponding irreducible characters.

A class of (faithful) group actions on the cyclic orbits that allows us to identify x_τ with x_s and evaluate the Fourier transforms (or orbit invariants) will be introduced in the next section.

2.4. Dihedral Orbits

The dihedral groups D_n, for n = 3, 4, …, can be realized as the group

C_{n} = {1, r, r^{2}, \dots, r^{n - 1}}

of rotations of a regular n-side polygon, adjoined with the corresponding reversals

C_{n} h = {h, r h, r^{2} h, \dots, r^{n - 1} h},

or h-mirrored rotations, giving D_n a (non-commutative) group structure of order 2n. In addition, when n = 2,

D_{2} = {1, r, h, r h} ≃ C_{2} \times C_{2},

also known as the Klein four-group, is often considered as one of the dihedral groups. This commutative group, when realized as the symmetries of a regular rectangle, describes vertical and horizontal line reflections and their two-fold 180° rotation composition. For completion, we also include D₁ = {1, h}, where h² = 1, and D₀ = {1} into the dihedral class as well.

The action (3) under G = D₃ gives four distinct dihedral orbits

a c t \cup t c a, t g a \cup a g t, g c t \cup t c g, c g a \cup a g c,

where

\begin{matrix} a c t = {ACT, CTA, TAC}, & t c a = {TCA, CAT, ATC}, \\ t g a = {TGA, GAT, ATG}, & a g t = {AGT, GTA, TAG}, \\ g c t = {GCT, CTG, TGC}, & t c g = {TCG, CGT, GTC}, \end{matrix}

and

c g a = {CGA, GAC, ACG}, a g c = {AGC, GCA, CAG}

are the cyclic orbit components, accounting for the 24 distinct injective codons.

Similarly, the action (3) under G = D₄ gives three distinct dihedral orbits

a c t g \cup g t c a, a g c t \cup t c g a, a t g c \cup c g t a,

where

\begin{matrix} a c t g = {ACTG, CTGA, TGAC, GACT}, & g t c a = {GTCA, TCAG, CAGT, AGTC}, \\ a g c t = {AGCT, GCTA, CTAG, TAGC}, & t c g a = {TCGA, CGAT, GATC, ATCG}, \end{matrix}

and

a t g c = {ATGC, TGCA, GCAT, CATG}, c g t a = {CGTA, GTAC, TACG, ACGT}

are the cyclic orbit components, accounting for the 24 distinct injective sequences in length of four.

3. Invariant Reductions

In Diagrams (10) and (11), D₃ rotations and reversals are shown sideways along the rows of the diagrams and complementary orbits are shown along columns, so that each box is labeled by a cyclic orbit. We shall refer to the orbits in each of the diagrams simply as conjugated orbits.

In addition, each orbit is labeled by the polarity (⊕, ⊖) of the sequence’s strand and by the encoding sense (→) or anti-sense (←) direction with which a gene or protein product reads off the sequence. More specifically, following (10), if any point in the act orbit is labeled with a positive polarity and with a reading sense direction then:

The corresponding point in the tca orbit has positive polarity and the reading is in the anti-sense direction;
The corresponding point in the tga orbit has negative polarity and the reading is in the sense direction, and;
The corresponding point in the agt orbit has negative polarity and the reading is in the anti-sense direction.

(10)

Diagram (11) shows the complementary orbits gct and agc, with the same polarity and direction interpretation as in Diagram (10).

(11)

Figure 1 shows a configuration space for the conjugated cyclic orbits of Diagrams (10) and (11), relative to which Figure 2 shows, respectively on the left and right images, the common direction and common polarity configuration subspaces. In this configuration space (obviously not unique), same-direction subspaces span two intercepting tetrahedrons, whereas same-polarity subspaces span two parallel faces of the configuration space.

3.1. D₂-Invariant Reductions

There is a transitive faithful action of C₂ × C₂ ≃ D₂ on the cyclic orbits s of (10), given by

σ S τ^{- 1}, S \in s,

with

σ \in {1, (AT) (GC)} ≃ C_{2}, τ \in {1, (13)} ≃ C_{2} .

Specifically,


τ	σ	act	tca	tga	agt
1	1	act	tca	tga	agt
1	(AT)(GC)	tga	agt	act	tca
(13)	1	tca	act	agt	tga
(13)	(AT)(GC)	agt	tga	tca	act

As a consequence, any experimental data x_s indexed by the orbits (s) in the diagram can be reduced by the tools of dihedral Fourier analysis over D₂. We emphasize that the transitiveness and faithfulness of the D₂ action on the set of orbits is necessary to identify the orbits with the group elements and then proceed to the determination of the (D₂) orbit invariants using the Fourier transforms. Following [14], these four one-dimensional transforms are simply

\begin{array}{l} x_{act} + x_{tca} + x_{t g a} + x_{agt}, \\ x_{act} + x_{tca} - x_{t g a} - x_{agt}, \\ x_{act} - x_{tca} + x_{t g a} - x_{agt}, \\ x_{act} - x_{tca} - x_{t g a} + x_{agt}, \end{array}

and constitute a set of one-dimensional orbit invariants for the data [14]. Similar reductions can then be obtained for (11).

3.2. Entropy Invariants

The orbit invariants determined above are functions of any scalars x_s obtained over the orbit s, such as its diversity (7), its raw sum (8), or its total molecular weight. When x_s are positive integers, such as the sum of frequency counts over the orbit s, then the entropy (Ent ) of the observed distributions $(ℒ)$ of frequency counts given by,

Ent ℒ_{1} = Ent (x_{act}, x_{tca}, x_{t g a}, x_{agt}),

(12)

Ent ℒ_{a c t + t c a} = Ent (x_{act} + x_{tca}, x_{t g a} + x_{agt}),

(13)

Ent ℒ_{a c t + t g a} = Ent (x_{act} + x_{t g a}, x_{tca} + x_{agt}),

(14)

and

Ent ℒ_{a c t + a g t} = Ent (x_{act} + x_{agt}, x_{tca} + x_{t g a}) \to

(15)

are also orbit invariants [17,21]. In Section 4 these orbit invariants are evaluated for Diagrams (10) and (11) to describe a specific DNA sequence and query it for potential structural variations along the genome.

3.3. D₄-Invariant Reductions

There are three non-equivalent transitive right actions σs of D₄ on the set

O_{3} = {a c t, t c a, t g a, a g t, g c t, t c g, c g a, a g c}

of all injective cyclic orbits in length of three, jointly reducing Diagrams (10) and (11), generated by:

D₄ ≃< (ATGC), (AG) >,
D₄ ≃< (AGCT), (AC) >,
D₄ ≃< (ACTG), (AT) >.

The action of D₄ ≃< (ATGC), (AG) > is given by:

(16)

thus showing that any experimental data x_s indexed by

O_{3}

can be reduced by dihedral Fourier transforms over D₄ or the corresponding canonical projections decompositions. These two views, both leading to the identification of the orbit invariants, are outlined next.

3.4. Canonical Projections

The linear representation of D₄ in $ℝ^{8}$ defined by (16) is given by the permutation matrices associated with the rotations

{1, (1467) (2358), (16) (47) (25) (38), (1764) (2853)},

and with the reversals

{(15) (26) (34) (78), (12) (37) (48) (56), (13) (24) (57) (68), (18) (27) (36) (45)},

and hence generated by

< (1467) (2358), (15) (26) (34) (78) > ≃ D_{4} .

(17)

Similarly, the action generated by D₄ ≃< (AGCT), (AC) > yields a linear representation of

< (1485) (2376), (12) (35) (46) (78) > ≃ D_{4},

(18)

whereas the action generated by D₄ ≃< (ACTG), (AT) > yields a linear representations of

< (1537) (2648), (12) (34) (58) (67) > ≃ D_{4} .

(19)

The resulting canonical projections $P_{ξ}$ , indexed by the irreducible representations

ξ \in {1, α, γ_{+}, γ_{-}, β}

of D₄, evaluated for the representation of (17), are shown in Appendix C, along with the components

x^{'} P_{ξ} x

of the resulting decomposition

∥ x ∥^{2} = \sum_{ξ} x^{'} P_{ξ} x

(20)

of the sum of squares. More generally,

{‖ x ‖}^{2} = \sum_{ξ} x^{'} P_{ξ} Φ x

, where x′Φx is an Euclidean fundamental form [22]. These decompositions are often used for the statistical analysis of continuous data (analysis of variance).

3.4.1. Interpretation of the Components $x^{'} P_{ξ} x$

The particular representation (17) leads to the following interpretation of each of the non-trivial (orbit invariant) components $x^{'} P_{ξ} x$ of ‖x‖², in terms of the combinations of polarity (⊕,⊖) and direction (→, ←),

x^{'} = \underset{r o t a t i o n s}{\underset{︸}{(\frac{\oplus}{\to}, \frac{⊖}{\leftarrow}, \frac{\oplus}{\leftarrow}, \frac{⊖}{\to})}}, \underset{r e v e r s a l s}{\underset{︸}{\frac{\oplus}{\to}, \frac{⊖}{\to}, \frac{\oplus}{\leftarrow}, \frac{⊖}{\leftarrow}}}

corresponding to the components of

x^{'} = \underset{r o t a t i o n s}{\underset{︸}{a c t, a g t, t c g, c g a}}, \underset{r e v e r s a l s}{\underset{︸}{g c t, t g a, t c a, a g c}}

where, here for simplicity of notation, we let s (the labels) indicate x_s (the data indexed by that labels).

The projection $P_{α}$ identifies a one-dimensional invariant comparing the overall mean effects

(\frac{\oplus}{\to} + \frac{⊖}{\leftarrow} + \frac{\oplus}{\leftarrow} + \frac{⊖}{\to}) v s . (\frac{\oplus}{\to} + \frac{⊖}{\to} + \frac{\oplus}{\leftarrow} + \frac{⊖}{\leftarrow})

between rotations and reversals. The projection

P_{γ +}

identifies a one-dimensional invariant combining the overall within-rotation sensitivity to polarity given overall direction variation as assessed by

(\frac{\oplus}{\to} - \frac{⊖}{\leftarrow} + \frac{\oplus}{\leftarrow} - \frac{⊖}{\to})

with the corresponding within-reversal variation assessed by

(\frac{\oplus}{\to} - \frac{⊖}{\to} + \frac{\oplus}{\leftarrow} - \frac{⊖}{\leftarrow}) .

The projection $P_{γ -}$ identifies a one-dimensional invariant contrasting the same variation described above. Lastly, the projection $x^{'} P_{β} x$ identifies a two-dimensional invariant assessing direction given polarity effects in terms of

(\frac{\oplus}{\to} - \frac{\oplus}{\leftarrow}), and (\frac{⊖}{\to} - \frac{⊖}{\leftarrow}) .

3.5. Dihedral Fourier Analysis

Reading from the column under the act orbit in (16), the points in the group algebra $ℂ D_{4}$ are given by

x = a c t 1 + a g t r + t c g r^{2} + c g a r^{3} + g c t h + t g a r h + t c a r^{2} h + a g c r^{3} h,

(21)

from which we obtain the corresponding Fourier transforms

\begin{array}{l} < x, 1 > = a c t + a g t + t c g + c g a + g c t + t g a + t c a + a g c, \\ < x, α > = a c t + a g t + t c g + c g a - (g c t + t g a + t c a + a g c), \\ < x, γ_{+} > = (a c t - a g t + t c g - c g a) + (g c t - t g a + t c a - a g c), \\ < x, γ_{-} > = (a c t - a g t + t c g - c g a) + (g c t - t g a + t c a - a g c), \\ < x, β > = (\begin{matrix} a c t + g c t - t c a - t c g & - a g c - a g t + c g a + t g a \\ - a g c + a g t - c g a + t g a & a c t - g c t + t c a - t c g \end{matrix}) . \end{array}

Parseval’s equality

∥ x ∥^{2} = \sum_{ξ \in {\hat{D}}_{n}} \frac{n_{ξ}}{2 n} {‖ < x, ξ > ‖}^{2}

establishes the correspondence with the decomposition Equation (20) obtained in terms of the canonical projections.

It is opportune to remark here that in the definition of Equation (21) we arbitrarily assigned the identity in D₄ to x_act. Any of the other potential assignments would be precisely a relabeling of the orbit’s starting point. The Fourier transforms, however, would remain orbit invariant, in the sense of Equation (9).

4. Numerical Evaluations

In this section we apply the cyclic reductions described in Section 3 to specific complete genomes of the human immunodeficiency virus type 1 and the hepatitis C virus.

4.1. Relative Entropy Study of the HIV1 BRUCG Isolate

Following Section 3.2, the data indexed by the cyclic orbits are simply the sums x_s of the frequency counts x_S with which the sequence S occurs in a given region of the genome, that is,

x_{s} = \sum_{S \in s} x_{S} .

The frequency counts were evaluated by scanning the genome one base at a time in the 5′−3′ direction. The sequence in FASTA format was downloaded from the NCBI website ( http://www.ncbi.nlm.nih.gov). Computations were evaluated using the Symmetry Computing Toolbox (Symmetry Computing Toolbox, ⓒM.Viana). This particular HIV1 isolate, used here for numerical illustration only, also appears in the study of the HIV1’s evolutionary properties [11].

The frequency counts were obtained for the complete genome of human immunodeficiency virus type 1, isolate BRU (LAV-1), sequence ID gi:326417, accession number K02013.1, HIVBRUCG. See [23]. The full 9229 bp-long sequence was partitioned into six equal-length adjacent regions numbered 1–6, where the cyclic summaries x_s were evaluated. The frequency counts for the conjugated cyclic orbits of act corresponding to Diagram (10) are shown in (22) and (23), whereas in (24) and (25) show the frequency counts for the conjugated cyclic orbits of gct corresponding to Diagram (11).

(22)

(23)

(24)

(25)

Figure 3 shows the resulting relative entropy invariants, as defined in Section 3.2, both for the act and for the gct conjugated cyclic orbits.

For example, reading from Region 3, in Equations (24) and (25), we have

\begin{matrix} x_{g c t} = 8 + 15 + 10 = 33, & x_{a g c} = 34 + 40 + 53 = 127, \\ x_{t c g} = 2 + 1 + 8 = 11, & x_{c g a} = 1 + 19 + 3 = 23, \end{matrix}

so that the resulting relative entropy invariants, from Equations (12)–(15), are given by

\begin{array}{c} Ent (33, 127, 11, 23) / max Ent = 0.717, \\ Ent (33 + 11, 127 + 23) / max Ent = 0.772, \\ Ent (33 + 127, 11 + 23) / max Ent = 0.669, \\ Ent (33 + 23, 127 + 11) / max Ent = 0.866, \end{array}

as displayed in the profiles shown in the bottom part of Figure 3. The top part of Figure 3 shows the corresponding profiles for the GCT conjugate cyclic orbits.

Interpretations:

Reading again from Diagrams (10) and (11), it follows that the invariant

Ent ℒ_{a c t + t c a} = Ent (a c t + t c a, t g a + a g t)

introduced in Equation (13) is a measure of polarity uncertainty; the invariant

Ent ℒ_{a c t + t g a} = Ent (a c t + t g a, t c a + a g t)

in Equation (14) is a measure of direction uncertainty, whereas the invariant

Ent ℒ_{a c t + a g t} = Ent (a c t + a g t, t c a + t g a)

introduced in Equation (15) is a measure of the interaction between direction and polarity. In summary:

Polarity uncertainty: $Ent ℒ_{a c t + t c a} = Ent (a c t + t c a, t g a + a g t)$ ;
Direction uncertainly: $Ent ℒ_{a c t + t g a} = Ent (a c t + t g a, t c a + a g t)$ ;
Interaction: $Ent ℒ_{a c t + a g t} = Ent (a c t + a g t, t c a + t g a)$ .

4.2. Statistical Assessment

The statistical assessment of the entropy can be obtained by numerically evaluating its sampling distribution based on 10,000 randomly generated observations from the posterior (Beta) distribution conjugated to binomial likelihood for the data, relative to the uniform prior probability distribution. Based on the resulting sampling distribution, a numerical evaluation of a posterior 95% credibility interval (CI) for the relative entropy can be obtained.

For example, reading from Region 5, in (24) and (25), we have,

x_{g c t} = 71, x_{t c g} = 12, x_{c g a} = 23, x_{a g c} = 102,

so that, from Equations (12)–(15), the polarity relative entropy is

Ent (83, 125) / \max Ent = 0.97, 95 % CI = (0.921518, 0.996971)

the direction relative entropy is,

Ent (94, 114) / \max Ent = 0.99, 95 % CI = (0.962473, 0.999979),

and

Ent (173, 35) / \max Ent = 0.65, 95 % CI = (0.540566, 0.769399)

is the relative entropy for the polarity-direction interaction or residual term. The credibility intervals thus suggest that the drop in polarity uncertainty in Region 5 is statistically distinct from the other two uncertainties and from the relative entropy of a uniform distribution of the same total binomial size, namely (104, 104), which is (0.983009, 0.999997). Figure 4 shows the posterior 95% credibility bands for the relative entropy of binomial distributions of total sample size

n = x_{g c t} + x_{t c g} + x_{c g a} + x_{a g c} = 208,

from which the above credibility intervals can be identified. The range of the monogram is half the total binomial sample because of the entropy (orbit) invariance property.

4.3. Orbit Diversity Decomposition for the HIV1 Samples

In this section we apply the canonical decomposition introduced in Section 3.4 and evaluated in Appendix C to reduce the diversity data shown in Equation (1) indexed by the joint set of conjugated orbits (the conjugated orbits of act adjoined to the conjugated orbits of gct), using the D₄ action defined in Section 3.3. The orbit diversity for the joint set of conjugated orbits is shown in (26) for each sequence in the sample of 10 Brazilian sequences referenced in Appendix B.1.

(26)

The inclusion of the error (due to sampling variability) term in the canonical decomposition for the sample is obtained by tensoring the decomposition induced by the representation of interest, shown in Appendix C, with the standard canonical decomposition [12] (Chapter 4)

I_{n} = A + Q,

where

A

is the n × n projection matrix with all entries equal to 1/n, I_n is the n × n identity matrix and

Q = I - A

. The canonical decomposition for the sample is then

I_{g n} = I_{g} \otimes I_{n} = (P_{1} + P_{α} + P_{γ +} + P_{γ -} + P_{β}) \otimes (A + Q)

from which we obtain the (multivariate) analysis of variance shown in 27, where

x^{'} (P_{ξ} \otimes A) x

are the sample mean effects,

x^{'} (P_{ξ} \otimes Q) x

the sampling error terms, and

x^{'} (I_{g} \otimes Q) x

the total sampling error and g is the group order. More specifically, it is assumed that

x = {(x_{τ})}_{τ \in G} \in ℝ^{g n}, x_{τ} ~ N (μ_{τ} \otimes e, I_{n}), x ~ N (μ \otimes e, Σ \otimes I_{n}),

where e′ = (1, …, 1) with n components, μ is the vector of dihedral means, and Σ is the dihedral covariance structure. It then follows, for all

P_{ξ}

, that

(P \otimes A) x ~ N ((P \otimes A) (μ \otimes e), (P \otimes A) (Σ \otimes I_{n}) (P^{'} \otimes A^{'})) .

Because $A^{'} = A$ , $A^{2} = A$ and $A e = e$ , we have

(P \otimes A) x ~ N ((P μ) \otimes e, (P Σ P^{'}) \otimes A)) .

Similarly,

(P \otimes Q) x ~ N ((P μ) \otimes e, (P Σ P^{'}) \otimes Q)) .

The degrees of freedom in each case are obtained by the traces of the corresponding projections, which are also equal to the dimension of the projecting (invariant) subspaces. Under suitable parametric assumptions the magnitude of the ratios

\frac{(x^{'} (P_{ξ} \otimes A) x) / tr ((P_{ξ} \otimes A))}{(x^{'} (I_{8} \otimes Q) x) / tr (I_{8} \otimes Q)}

can be assessed by (typically non-central) F-distributions with

n_{ξ}^{2}

and g(n − 1) degrees of freedom. The corresponding underlying parametric hypotheses

μ^{'} P_{ξ} μ = 0

are those introduced earlier in Section 3.4.

(27)

Under large-sample parametric assumptions and independent dihedral covariance structure it follows that, with the exception of the contrast associated with γ₋, all F-ratios are significantly high (statistically distinct from zero).

4.4. Orbit Diversity Decomposition for the HCV Samples

This section replicates the methods described in Section 4.3 for a sample of 10 Brazilian hepatitis C sequences. The orbit diversity for the joint set of conjugated orbits for each sequence in the sample is shown in (28). Their accession numbers are referenced in Appendix B.1.

(28)

The corresponding analysis of variance decomposition is shown in (29).

(29)

It should be evident, by comparing the magnitude of the F-ratios,


virus	$P_{α}$	$P_{γ +}$	$P_{γ -}$	$P_{β}$
HIV1	238.342	5.889	1.882	137.962
HCV	28.846	8.653	0.009	22.115

identified by the decompositions in (27) and (29), that the two viruses have significantly distinct joint cyclic diversity profiles. Additional numerical studies are referenced in Appendix B.

5. Summary

In this communication we constructed dihedral D₂ reduction of conjugate injective cyclic orbits in length of three, a dihedral D₄ reduction of their combined set, and a dihedral D₃ reduction of the set of conjugate injective cyclic orbits in length of four. In each case, the experimental scalar data can be any summary obtained over the cyclic orbits, such as the sum or an extreme value of the frequency counts over the cyclic orbit, the entropy of a frequency distribution over the orbit, its amino acid content, or, as in [11], the orbit’s frequency diversity. In the case of matrix data, the data-analytic methods of group rings, instead of group algebras would then be the appropriate methodology [14].

Acknowledgments

The author is thankful to the referees’ comments and clarifying suggestions.

Appendix

A. HIV1 and HCV Sequences

The following are the accession numbers for the HIV1 and HCV sequences considered in the present study:

B. Additional Studies

B.1. Relative Entropy Study of 10 Brazilian HIV1 Sequences

The relative entropy evaluations illustrated above in Section 4.1 were replicated for a sample of 10 Brazilian HIV1 sequences, referenced in Appendix A. The raw frequency counts and the corresponding relative entropy profiles for each of 10 sequences are linked in [24].

B.2. Relative Entropy Study of 10 Brazilian HCV Sequences

Similarly to the study for the HIV1, a sample of 10 Brazilian hepatitis C sequences was evaluated for their relative entropy. The sequences are referenced in Appendix A. The raw frequency counts and the corresponding relative entropy profiles along each genome are linked in [25]. The relative entropy invariant profiles clearly highlight the structural differences between the two types of viruses.

B.3. Relative Entropy Study of Random Reference Sequences

It is statistically useful to compare the cyclic reductions obtained for HIV1’s isolate described above with those from random DNA sequences of comparable lengths. The results, based on 20 random sequences, shown in [26], clearly indicate that the observed variations in relative entropy (invariants) for the conjugated gct orbits, both for HIV1 and HCV sequences, are well below what one would expect to observe for random sequences of comparable lengths.

C. Canonical Projections

The following are the canonical projections

P_{ξ} = \frac{n_{ξ}}{g} \sum_{τ} χ_{τ^{- 1}}^{ξ} ρ_{τ},

evaluated for the permutation representation ρ described in Section 3.4, along with the corresponding components

x^{'} P_{ξ} x

of ‖x‖², with the entries of x indexed in correspondence with (act, tca, tga, agt, gct, tcg, cga, agc). Here:

P_{1} = \frac{1}{8} (\begin{matrix} 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 \end{matrix})

gives an overall mean,

μ^{'} P_{1} μ = \frac{1}{8} {(μ_{act} + μ_{a g c} + μ_{agt} + μ_{c g a} + μ_{g c t} + μ_{tca} + μ_{t c g} + μ_{t g a})}^{2},

P_{α} = \frac{1}{8} (\begin{array}{r} 1 & - 1 & - 1 & 1 & - 1 & 1 & 1 & - 1 \\ - 1 & 1 & 1 & - 1 & 1 & - 1 & - 1 & 1 \\ - 1 & 1 & 1 & - 1 & 1 & - 1 & - 1 & 1 \\ 1 & - 1 & - 1 & 1 & - 1 & 1 & 1 & - 1 \\ - 1 & 1 & 1 & - 1 & 1 & - 1 & - 1 & 1 \\ 1 & - 1 & - 1 & 1 & - 1 & 1 & 1 & - 1 \\ 1 & - 1 & - 1 & 1 & - 1 & 1 & 1 & - 1 \\ - 1 & 1 & 1 & - 1 & 1 & - 1 & - 1 & 1 \end{array})

gives the (handedness) contrast

μ^{'} P_{α} μ = \frac{1}{8} {(μ_{act} - μ_{a g c} + μ_{agt} + μ_{c g a} - μ_{g c t} - μ_{tca} + μ_{t c g} - μ_{t g a})}^{2}

between the cyclic summaries

{a c t, a g t, t c g, c g a},

indexed by the rotations and the summaries

{g c t, t g a, t c a, a g c}

indexed by reversals as a sum of within-rotation and within-reversal variability;

P_{γ +} = \frac{1}{8} (\begin{array}{r} 1 & 1 & - 1 & - 1 & 1 & 1 & - 1 & - 1 \\ 1 & 1 & - 1 & - 1 & 1 & 1 & - 1 & - 1 \\ - 1 & - 1 & 1 & 1 & - 1 & - 1 & 1 & 1 \\ - 1 & - 1 & 1 & 1 & - 1 & - 1 & 1 & 1 \\ 1 & 1 & - 1 & - 1 & 1 & 1 & - 1 & - 1 \\ 1 & 1 & - 1 & - 1 & 1 & 1 & - 1 & - 1 \\ - 1 & - 1 & 1 & 1 & - 1 & - 1 & 1 & 1 \\ - 1 & - 1 & 1 & 1 & - 1 & - 1 & 1 & 1 \end{array})

gives the sum

μ^{'} P_{γ +} μ = \frac{1}{8} {(μ_{act} - μ_{a g c} - μ_{agt} - μ_{c g a} + μ_{g c t} + μ_{tca} + μ_{t c g} - μ_{t g a})}^{2}

of within-rotation and within-reversal variability, whereas

P_{γ -} = \frac{1}{8} (\begin{array}{r} 1 & - 1 & 1 & - 1 & - 1 & 1 & - 1 & 1 \\ - 1 & 1 & - 1 & 1 & 1 & - 1 & 1 & - 1 \\ 1 & - 1 & 1 & - 1 & - 1 & 1 & - 1 & 1 \\ - 1 & 1 & - 1 & 1 & 1 & - 1 & 1 & - 1 \\ - 1 & 1 & - 1 & 1 & 1 & - 1 & 1 & - 1 \\ 1 & - 1 & 1 & - 1 & - 1 & 1 & - 1 & 1 \\ - 1 & 1 & - 1 & 1 & 1 & - 1 & 1 & - 1 \\ 1 & - 1 & 1 & - 1 & - 1 & 1 & - 1 & 1 \end{array})

gives the linear contrast

μ^{'} P_{γ -} μ = \frac{1}{8} {(μ_{act} + μ_{a g c} - μ_{agt} - μ_{c g a} - μ_{g c t} - μ_{tca} + μ_{t c g} + μ_{t g a})}^{2}

between within-rotation and within-reversal variability. Lastly,

P_{β} = \frac{1}{2} (\begin{array}{r} 1 & 0 & 0 & 0 & 0 & - 1 & 0 & 0 \\ 0 & 1 & 0 & 0 & - 1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 & 0 & - 1 \\ 0 & 0 & 0 & 1 & 0 & 0 & - 1 & 0 \\ 0 & - 1 & 0 & 0 & 1 & 0 & 0 & 0 \\ - 1 & 0 & 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & - 1 & 0 & 0 & 1 & 0 \\ 0 & 0 & - 1 & 0 & 0 & 0 & 0 & 1 \end{array})

gives the sum

μ^{'} P_{β} μ = \frac{1}{2} {(μ_{act} - μ_{t c g})}^{2} + {(μ_{a g c} - μ_{t g a})}^{2} + {(μ_{agt} - μ_{c g a})}^{2} + {(μ_{g c t} - μ_{tca})}^{2}

of two contrasts in dimension of 2.

Conflicts of Interest

The author declares no conflict of interest.

References

Fisher, R.A. The theory of linkage in polysomic inheritance. Philos. Trans. Roy. Soc. London. Ser. B 1947, 233, 55–87. [Google Scholar]
Fisher, R.A. The theory of confounding in factorial experiments in relation to the theory of groups. Ann. Eugen 1942, 11, 341–353. [Google Scholar]
Fisher, R.A. A system of confounding for factors with more than two alternatives, giving completely orthogonal cubes and higher powers. Ann. Eugen 1945, 12, 283–290. [Google Scholar]
Viana, M. Algebraic Methods in Statistics and Probability; Contemporary Mathematics; Richards, D., Ed.; American Mathematical Society: Providence, RI, USA, 2001; Volume 287. [Google Scholar]
Viana, M. Algebraic Methods in Statistics and Probability II; Contemporary Mathematics; Wynn, H., Ed.; American Mathematical Society: Providence, RI, USA, 2010; Volume 516. [Google Scholar]
Findley, G.L.; Findley, A.M.; McGlynn, S.P. Symmetry characteristics of the genetic code. Proc. Natl. Acad. Sci. USA 1982, 79, 7061–7065. [Google Scholar]
Sergienko, I.V.; Gupal, A.M.; Vagis, A.A. Symmetry in encoding genetic information in DNA. Cybern. Syst. Anal 2011, 47, 408–414. [Google Scholar]
Hornos, J.E.; Braggion, L.; Magini, M.; Forger, M. Symmetry preservation in the evolution of the genetic code. IUBMB Life 2004, 56, 125–130. [Google Scholar]
Zandi, R.; Reguera, D.; Bruinsma, R.F.; Gelbart, W.M.; Rudnick, J.; Reiss, H. Origin of icosahedral symmetry in viruses. Proc. Natl. Acad. Sci. USA 2004, 101, 15556–15560. [Google Scholar]
Finkel, D.L. HIV-1 ancestry primordial expansions of RRE and RRE- related sequences. J. Theor. Biol 1992, 3, 285–302. [Google Scholar]
Doi, H. Importance of purine and pyramidine content of local nucleotide sequences (six bases long) for evolution of human immunodeficiency virus type 1. Evolution 1991, 88, 9282–9286. [Google Scholar]
Viana, M. Symmetry Studies, an Introduction to the Analysis of Structured Data in Applications; Cambridge University Press: New York, NY, USA, 2008. [Google Scholar]
Souza, D.J.; Chaves, L.M.; Viana, M.A.G. Symmetries in symbolic sequences. Rev. Bras. Biom 2010, 1, 73–86. [Google Scholar]
Viana, M.; Lakshminarayanan, V. Dihedral Fourier Analysis, Data-Analytic Aspects and Applications; Lecture Notes in Statistics; Springer: New York, NY, USA, 2013; Volume 206. [Google Scholar]
Viana, M. Canonical invariants for three-candidate preference rankings. Can. Appl. Math. Q 2007, 15, 203–222. [Google Scholar]
Viana, M. Canonical Decompositions and Invariants for Data Analysis; Elsevier: Amsterdam, The Netherlands, 2009. [Google Scholar]
Viana, M. Symmetry studies and decompositions of entropy. Entropy 2006, 8, 88–109. [Google Scholar]
Viana, M. Symmetry orbits and their data-analytic properties. Rev. Mat 2013, 20, 155–166. [Google Scholar]
Viana, M. Dihedral Polynomials. In Mathematical Optics: Classical, Quantum, and Computational Methods; Lakshminarayanan, V., Calvo, M.L., Alieva, T., Eds.; CRC Press: Boca Raton, FL, USA, 2012. [Google Scholar]
Viana, M.; Lakhsminarayanan, V. Symmetry studies of refraction data. J. Modern Opt 2014, 61, 138–146. [Google Scholar]
Viana, M. Symmetry-Related Decompositions of Uncertainty. Proceedings of the XI Brazilian Meeting on Bayesian Statistics, Amparo-SP, Brazil, 18–22 March 2012; Stern, J., Lauretto, M., Polpo, A., Diniz, M., Eds.; American Institute of Physics: Melville, NY, USA, 2012. [Google Scholar] [CrossRef]
Cartan, E. The Theory of Spinors; MIT Press: Cambridge, MA, USA, 1966. [Google Scholar]
Human immunodeficiency virus type 1, isolate BRU, complete genome (LAV-1), Available online: http://www.ncbi.nlm.nih.gov/nuccore/K02013.1 accessed on 23 December 2014.
Relative entropy study of 10 Brazilian HIV1 sequences, Available online: https://app.box.com/s/ui9nt6uu6pitc6mxl6cg accessed on 23 December 2014.
Relative entropy study of 10 Brazilian HCV sequences, Available online: https://app.box.com/s/zgpv0fkpd1pz9c30ql1q accessed on 23 December 2014.
Relative entropy study of random reference sequences, Available online: https://www.box.com/s/g7rguk0dy3ben5w7e93x accessed on 23 December 2014.

Figure 1. A configuration space for the conjugated cyclic orbits.

Figure 2. Common-direction (left) and common-polarity (right) configuration subspaces for the conjugated cyclic orbits of Diagrams (10) and (11).

Figure 3. The top profiles show (respectively to the legends shown top to bottom) the relative entropy for the joint distributions of (act, tca, tga, agt); (act + tca, tga + agt); (act + tga, tca + agt), and (act + agt, tca + tga). The bottom profiles display the corresponding results for the conjugated orbits of gct.

Figure 4. Posterior 95% credibility intervals for the relative entropy of binomial distributions of total sample size n = 208.

© 2015 by the authors; licensee MDPI, Basel, Switzerland This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Viana, M.A.G. Dihedral Reductions of Cyclic DNA Sequences. Symmetry 2015, 7, 67-88. https://doi.org/10.3390/sym7010067

AMA Style

Viana MAG. Dihedral Reductions of Cyclic DNA Sequences. Symmetry. 2015; 7(1):67-88. https://doi.org/10.3390/sym7010067

Chicago/Turabian Style

Viana, Marlos A.G. 2015. "Dihedral Reductions of Cyclic DNA Sequences" Symmetry 7, no. 1: 67-88. https://doi.org/10.3390/sym7010067

APA Style

Viana, M. A. G. (2015). Dihedral Reductions of Cyclic DNA Sequences. Symmetry, 7(1), 67-88. https://doi.org/10.3390/sym7010067

Article Menu

Dihedral Reductions of Cyclic DNA Sequences

Abstract

1. Introduction

2. Definitions, Assumptions and Notation

2.1. Injective Sequences

2.2. Scalar Measurements

2.3. Orbit Invariance

2.4. Dihedral Orbits

3. Invariant Reductions

3.1. D₂-Invariant Reductions

3.2. Entropy Invariants

3.3. D₄-Invariant Reductions

3.4. Canonical Projections

3.4.1. Interpretation of the Components $x^{'} P_{ξ} x$

3.5. Dihedral Fourier Analysis

4. Numerical Evaluations

4.1. Relative Entropy Study of the HIV1 BRUCG Isolate

4.2. Statistical Assessment

4.3. Orbit Diversity Decomposition for the HIV1 Samples

4.4. Orbit Diversity Decomposition for the HCV Samples

5. Summary

Acknowledgments

Appendix

A. HIV1 and HCV Sequences

B. Additional Studies

B.1. Relative Entropy Study of 10 Brazilian HIV1 Sequences

B.2. Relative Entropy Study of 10 Brazilian HCV Sequences

B.3. Relative Entropy Study of Random Reference Sequences

C. Canonical Projections

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Dihedral Reductions of Cyclic DNA Sequences

Abstract

1. Introduction

2. Definitions, Assumptions and Notation

2.1. Injective Sequences

2.2. Scalar Measurements

2.3. Orbit Invariance

2.4. Dihedral Orbits

3. Invariant Reductions

3.1. D2-Invariant Reductions

3.2. Entropy Invariants

3.3. D4-Invariant Reductions

3.4. Canonical Projections

3.4.1. Interpretation of the Components x ′ P ξ x

3.5. Dihedral Fourier Analysis

4. Numerical Evaluations

4.1. Relative Entropy Study of the HIV1 BRUCG Isolate

4.2. Statistical Assessment

4.3. Orbit Diversity Decomposition for the HIV1 Samples

4.4. Orbit Diversity Decomposition for the HCV Samples

5. Summary

Acknowledgments

Appendix

A. HIV1 and HCV Sequences

B. Additional Studies

B.1. Relative Entropy Study of 10 Brazilian HIV1 Sequences

B.2. Relative Entropy Study of 10 Brazilian HCV Sequences

B.3. Relative Entropy Study of Random Reference Sequences

C. Canonical Projections

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3.1. D₂-Invariant Reductions

3.3. D₄-Invariant Reductions

3.4.1. Interpretation of the Components $x^{'} P_{ξ} x$