Entropy-Variance Curves of Binary Sequences Generated by Random Substitutions of Constant Length

We study some properties of binary sequences generated by random substitutions of constant length. Specifically, assuming the alphabet {0,1}, we consider the following asymmetric substitution rule of length k: 0→〈0,0,…,0〉 and 1→〈Y1,Y2,…,Yk〉, where Yi is a Bernoulli random variable with parameter p∈[0,1]. We obtain by recurrence the discrete probability distribution of the stochastic variable that counts the number of ones in the sequence formed after a number i of substitutions (iterations). We derive its first two statistical moments, mean and variance, and the entropy of the generated sequences as a function of the substitution length k for any successive iteration i, and characterize the values of p where the maxima of these measures occur. Finally, we obtain the parametric curves entropy-variance for each iteration and substitution length. We find two regimes of dependence between these two variables that, to our knowledge, have not been previously described. Besides, it allows to compare sequences with the same entropy but different variance and vice versa.


Introduction
Binary sequences (strings or chains) appear naturally in physical systems for describing growing processes generated from an initial state. Examples are ubiquitous, from the spin up-down systems in physics to codification in information theory and its reflection in biomolecules as DNA and RNA (where two sets of nucleotides exist: purines and pyrimidines). They are also common objects in mathematics, representing, for example, sets of words formed from letters of an alphabet or symbolic representations of dynamical systems, where mapping can be defined to transform real trajectories into two state series [1]. Binary sequence can also codified fractal sets obtained by an iterative process, as occurs in the classical Cantor set [2,3].
In order to study the properties of systems with fractal geometry, Mandelbrot defined some random sets that are generated recursively from a given initial set as a percolation process [2,4]. These sets are formed by successive application of a defined set of rules that, either divide the initial set and discard some subsets in each partition, or enlarge the initial set by substituting the existing subsets by multiple replicas of themselves. These sets can be embedded in 1D, 2D and 3D spaces, conditioning some of their characteristics, in particular, percolation.
In this paper, we study the properties of the sequences formed by performing random substitutions of constant length (see, for instace, [5]). Specifically, we consider a binary alphabet, that we denotate as {0, 1}, and substitutions of length k = 2, 3, . . .. As in the classical percolation process, we assume that, once a void site is formed, i.e., a 0 occurs at this site, subsequent substitutions of it yield k zeros, always. On the other hand, filled sites, i.e., 1 at this site, are substituted by random words of k letters with a (uniform) probability of inserting a 1 defined by a parameter p. In this substitution, the probability of inserting a 0 is q = 1 − p.
The process defined in this way gives rise to sequences of increasing size. The number of ones (filled sites) in the sequence at iteration i is a stochastic variable that depends on the probability p and the length k. We study the probabilistic properties of these sequences. Specifically, we derive the probability distribution of the number of ones at iteration i. We also obtain the expected values of X and X 2 , the corresponding variance VAR, and obtain some properties of them. As an alternative measure of uncertainty, we calculate the mean entropy of these sequences and compare it with their variance [6,7]. Finally, we find the parametric curves that relate both measures for each iteration and substitution length. These curves present two regimes: for low values of p the dependence entropy-variance is convex (concave down), whereas for large values of p this dependence is practically lost. Therefore, for each iteration and substitution length, we can generate sequences with the same entropy but with different variance and vice versa.

Sequence Generation
The generation of the sequences studied in this work is a one-dimensional example of the classical well-known Mandelbrot's percolation process [4]. Contrary to the classical formulation, where an initial segment of length L is subdivided into k subsegments of length L/k and, with probability p, some of them are chosen to be further divided, the model we study here considers an initial digit, that we denotate by 1, that is substituted by a word of length k formed by digits {0, 1}. As in the classical model, we assume that digits denotated by 0 are substituted by a word of k zeros (see Figure 1) and sites occupied by ones are substituted by a word of k letters with a independent probability p. This procedure is applied successively to generate a binary sequence of finite length, after i iterations or, in the limit, infinite length. When k = 3, this procedure can be viewed as a probabilistic version of the Cantor set for 0 < p < 1 [2].
These procedures are called substitutions, as they are obtained by replacing a digit by a word of digits [1,8,9]. In this context, we study the following map that applies at each random substitution of length k: where Y j is a Bernoulli random variable with parameter p ∈ [0, 1], i.e., p means the (uniform) probability of inserting 1 in each position and independently for all j. We note that an alternative description of this substitution process can be defined by using the Kronecker product [10][11][12][13]. The deterministic version of this procedure brings about well known sequences. For the classical Cantor set, this map reduces to: 1 → 1 0 1 which, starting from a unique 1, yield the sequences: s 2 = (1, 0, 1, 0, 0, 0, 1, 0, 1) s 3 = (1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1) whose limit of the iterative process is the infinite sequence that corresponds to the classical Cantor set. For this particular case, the length of the i-th vector is L i = 3 i . In general, for substitutions of constant length k, the ith-iterated vector has length L i = k i . There are other well-known sequences generated iteratively by substitutions of different length. For instance, the Fibonacci sequence is generated applying the following map [14,15]: Starting from 0, after six iterations we obtain the binary vector: (1, 0, 1, 1, 0, 1, 0, 1). Here, the number of 1 of the ith-sequence represents the fertile population of rabbits. Thus, the number of mature pairs of rabbits after six generations is 5. Notice that, in principle, the position in the sequence has not a spatial meaning. It is also worthy to remark that this sequence can be generated by different maps [14]. Another self-similar sequence generated by a recursive process is the so-called Morse-Thue (MT), which is obtained by the following map: starting from 0 (although it could be also initiated by 1) [14,15]. Also for this example, there is an alternative way of generating this sequence, simply by take the sequence of the preceding step and add the complement. Besides, this sequence is aperiodic, although completely deterministic. On the contrary, it can be proven that there are short and long range correlations [14].

Main Statistical Properties
In this section, we study the statistical properties of the sequences generated by applying the rule (1) recurrently to the starting sequence (1). A classical issue to be considered is the probability distribution of each of the letters of the alphabet. In particular, in the case of binary sequences, this problem is reduced to know the probability distribution of, for instance, the number of ones of any population of sequences generated from a set of random rules, e.g. random substitutions.
As a starting point, the classical Bernoulli process could be illustrative. This process can be also viewed as a random substitution of constant length, k, where both letters, e.g., 0 and 1, are identically substituted according with the rule: being Z j and Y j Bernoulli processes with parameter p ∈ [0, 1]. In each iteration, i, the length of the sequence is n = k i , and the probability distribution of the number of ones, X, is given by the binomial distribution: By applying this distribution function, the expected values of the number of ones in a sequence of length n, X n , and its square, respectively E B (X n ) and E B (X 2 n ), can be calculated: for n = 1, 2, 3, . . .. Using these expressions, the variance of X reads: Note that for the binomial distribution, the ratio variance-mean is equal to 1 − p for any n.
Similarly, we can compute these moments for the sequences generated from random substitutions defined in the previous section. In this case, a recurrent formula provides this distribution for any iteration i and for any value of the substitution length k.
Let us present first the results for a substitution of length k = 2. As stated before, we start the generation of the binary sequences from a sequence with a unique one. Thus, the probability of having this initial sequence is assumed to be 1: In the first step, a sequence of length two is formed whose distribution of ones depends on the probability p (see Figure 2). To obtain the the distribution probability of ones in this first step, P(X 1 ), we multiply from left P(X 0 ) by the substitution matrix: givin rise to the vector: where X 1 = (0, 1, 2). In the next substitution, a four digits sequence is formed and the probability distribution of ones can be computed from the previous one as follows: where X 2 = (0, 1, 2, 3, 4). The third iteration confirms the formula (see Figure 2): with where X 3 = (0, 1, 2, 3, 4, 5, 6, 7, 8), and The general expression for the recurrent formula for the probability distribution of the number of ones in the i-th iteration is: being the vector X i = (0, 1, 2, . . . , 2 i ) and the general substitution matrix, M (2 i +1)×(2 i−1 +1) , whose coefficients are given by: (12) are given by: By recurrence, Formula (18) can be converted into a matrix product chain:  Figure 4 shows both the analytical (black continuous curves) and the corresponding numerical distributions obtained from 1000 realizations. As it can be seen, the distributions exhibit a peak at X = 0 whose height decreases as p increases. Besides this peak, the distributions also present other maxima whose heights depends on p, being smaller for lower values of p. This multimodality could be a consequence of the generation rule that fosters the emergence of clusters of scalable sizes. As a matter of fact, the fractal properties of these sequences are the main subject of a forthcoming paper.
The probability of generating a null sequence, i.e., formed completely by 0, after i substitutions, P(X i = 0), corresponds to the first component of P(X i ). The following recurrence formula can be obtained [16]: for i = 1, 2, . . .. Note that p P(X i = 0) represents the probability of yielding a non-null sequence after a substitution that includes at least one 1 in the previous non-null sequence and then, 1 − p P(X i = 0) represents the probability of yielding a null sequence after a substitution of length 1. This expression can be rewritten as: Because the substitution has length k = 2, and it is formed by independent digits, this expression must be multiplied by itself resulting in Equation (22).  . Concretely, p = 0.5 (a) , p = 0.8 (b) , p = 0.9 (c) and p = 0.99 (d) . As it can be seen, the distributions for low value of p exhibit a peak at X = 0 that disappears when p tends to 1. The first term in this sequence is: which yields the second one: It is worthy to remark that the expansions of P i (X i = 0) as polynomials of p and q = 1 − p have the triangle by rows (oeis-A202019) [17] as coefficients.
Assuming that this sequence of probability distributions converges to φ, the following equation holds: whose lowest solution: depends on p as depicted in Figure 5. Note that, for p below p c = 1 2 the probability of generating a null infinite sequence is one, i.e., φ(p) = 1 for p < p c .
Equation (22) can be generalized straightforwardly to any substitution length k using the same demonstration as above (see [16]). Now, the probability of having a null sequence at iteration i + 1 is obtained after multiplying k times the factor 1 − p P(X i = 0) : As before, in the limit this equation converges to: that also exhibits a critical value of p: As it can be observed in Figure 5, as the substitution length k increases the probability of generating a null sequence decreases for p > p c (k); also does the range where this probability is one, i.e., p c (k) → 0 as k → ∞.

Expected Values and Variance
Having already the probability distributions of ones, it is straightforward to compute its expected value after i iterations: This equation can be written in matrix form: Now, by applying Formula (18) It turns out that where by E B (X i ) we refer to a vector whose entries are the expectation of a Binomial random variable Y j ∼ Bin(2 j, p), j = 0, 1, 2, ..., 2 i−1 for successive iterations. Thus: Then, the following recurrence formula holds: for i = 0, 1, 2, . . . and E 1 = 2 p. This recurrence yields the expression: for i = 0, 1, 2, . . .. This formula can be straightforwardly generalized to any length k: The mean of the distribution of 0 in the sequence formed after i substitutions of length k is: that yields a ratio #1/#0 at the ith-iteration: This ratio tends to 0 as i → ∞ for all 0 ≤ p < 1 and for all k. Furthermote, the limit of µ 10 i (p) as p → 1 is infinite for all values of i = 1, 2, . . .. Note that this ratio is one for the value of p µ=1 = 2 − 1 i , independently of k. Similarly, it can be calculated the expected value of X 2 , E(X 2 ). First, for the case k = 2 we obtain: For the second iteration, it is straightforward to calculate this value from this definition: Using the recurrence of the probability distributions for successive iterations, the expected value of the next iteration is also computed: that simplifies to: Here we have used the expected value of X 2 for the binomial distribution for the number of ones in a sequence of length n.
In order to calculate the expected values of X 2 for successive iteration we find the following recursive formula: that can be derived using Equation (8) for the expected value of X 2 . Effectively, where X 2 i = 0, 1, 2 2 , 3 2 , . . . , (2 i ) 2 As before, E B (X 2 i ) represents a vector with entries E(Y 2 j ) where Y j ∼ Bin(k j, p) for j = 0, 1, 2, ..., 2 i−1 and, consequently, is given by: Then, since this vector can be split into two vectors: E B (X 2 i ) = 0, 2 p, 2 2 p, . . . , 2 i p + 0, 2 p 2 , 2 2 p 2 , . . . , 2 i p 2 (49) Equation (45) holds. It can be proven that the expected value of X 2 for the i-th iterations depends on p as follows: for n = 1, 2, 3, . . .. If we replace the geometric sum: This expression can be applied to calculate the variance of the distribution of the number of ones in a sequence generated after i iterations, i.e., VAR( that simplifies to: This expression can be further compacted if we use the expression of the mean (37): These computations can be equally performed for any length k = 3, 4, . . .. A general formulation for any value of k yields the following functions of p for the expected values of X and X 2 at the i-th iteration: Then, the variance is straightforward calculated and yields: The limits as i tends to infinity of the mean µ and the variance VAR depend on the value of p. For 0 < p < 1 k , these limits equal 0, whereas for 1 k < p < 1, the limits are infinite. Contrary to the Bernoulli sequences, the variance reaches a maximum for an intermediate value of p, that depends on both k and i (see Figure 6). An interesting index that could provide information about the stochasticity of the generation process is the variance-mean ratio, also known as the dispersion index, D. For iteration i, it can be written as: By comparison with the Poisson distribution, that possesses a value of the dispersion equal to 1, it is said that a process that has a D-value lower than 1 is "under-dispersed"; on the contrary, having a D-value larger than 1 reflects an "over-dispersed" stochasticity. Since D i (0, k) = 1 and the derivative of D i (p, k) is positive at p = 0 for all i > 1 and furthermore, D i (1, k) = 0, then there exist a value of p(k, i) = p 1 such that D i (p 1 , k) = 1. This value of p is unique because D i (p, k) exhibits a unique maximum in (0, 1). Moreover, it is very close to 1 for all k and i and tends to 1 as i tends to infinity, meaning that under-dispersion only occurs when the probability of inserting 0 is very low.
As observed in Figure 6, the standard deviation of X exhibits a maximum at p = p m . Besides, this maximum exists for any k and i and it is achieved for the value of p such that: which can be written as follows: which implies that: for any k and for each iteration i. Summing the finite sum of the left hand side of this equation yields to: This equation cannot be explicitly solved for p for any value of k and i. Nonetheless, the roots of the polynomial: can be obtained numerically. For each substitution length, k = 1, 2, . . ., we are able to fit the roots of polynomial r k,i (p) to the function of i: The values of the fitting parameters α(k) and β(k), as well as the corresponding residual sum of squares, are shown in Table 1. As it can be observed, as the k-value increases, it seems that α(k) → 0.5 and β(k) → 1, that is: This suggests that the roots of the polynomials, i.e., the values of p where the variance of X achieves its maximum, tends to 1 as i tends to infinity. In addition, the variance of X at the maximum tends to infinity, although, by definition, this variance is null at p = 1. We have not a rigorous explanation of the reason behind the almost independence with k of the roots of the polynomial r k,i (p). Likely, it seems to be related to the self-similar properties of the generation process.

Entropy
In this section, we study the mean entropy of a population of sequences generated from the substitution rules defined in the previous section. As a matter of illustration, we consider first the entropy of the Bernoulli process of probability p: This function is symmetric and exhibits a maximum at p = 1 2 . If we now consider the sequences formed by n stochastic variables that follow a Bernoulli process, we can compute the entropy of a sequence of length n with a number of ones j in terms of the frequency j n , that coincides, in average, with p: Obviously, the sequences with minimum information content, j n = p = 0 and j n = p = 1, have null entropy, i.e., H(0) = H(1) = 0. In contrast, the maximum entropy is achieved for j = n 2 and takes the value: H( n 2 ) = 1. For a population of Bernoulli sequences of probability p, the mean entropy can be calculated applying the binomial distribution to the entropy of the sequences generated with a probability p: where H is the row vector entropy for a sequence of length n whose components, H(k) are the corresponding entropies Equation (67) for all possible combinations of ones. For instance, for n = 1, this formula reduces to: where H(k) are the entropies of sequences with 0, 1 and 2 ones: H(0) = 0, H(1) = log 2 (2) = 1 and H(2) = 0. Similarly, the mean entropy for a length n = 2 can be calculated as follows: To compute the mean entropy of a population of sequences randomly generated by substitutions according to the rules defined in Section 2, we have to know the probability of appearance of each of the frequencies of ones in order to weights adequately the sequence entropies. As an example, let us start with the substitution length k = 2. In this case, the presents an interesting behaviour, as it can be seen in Figure 7b. There is a critical value that separates two regimes: for p < p ch (i), h i (p) is negative, whereas for p > p ch (i), h i (p) is positive. As it can be seen, p ch (i) depends on the iteration: as i → ∞, p ch (i) → 1. As a consequence, the p-interval where h i (p) is positive shrinks to 0. Besides, h i (p) tends to 0 for all p, which means that the entropy converges to the function H ∞ (p), for the infinite sequence. As it can be seen, h i (p) is positive for p > p ch (i). As the iteration increases, the interval of p where h i is positive shrinks. As a matter of fact, in the limit, it tends to 0. Furthermore, h i tends to 0 as i → ∞, which means a convergence to a limit entropy for the infinite sequence. The substitution length is k = 2.
Because both variance and entropy are measures of uncertainty, it could be interesting to find a functional dependence between both (see Figures 7 and 8). The curves we see, for the case k = 2, can be parametrized either by p, for each iteration i, or by i, for each p. The parametric equations for each p and i are given by Equations (53) and (71), for the variance and the entropy, respectively. As it can be observed, the relationship between H and VAR depends on p for each iteration i. It is important to remark that, contrary to what is known for classical probability distributions [6,7], the dependence of H on VAR is not the graph of a function. Instead, the variation of p ∈ [0, 1] brings about a closed curve. Certainly, for p < p r , this functional dependence coincides with that of the classical probability distributions [6]. On the contrary, for p > p r , where both variables decrease with p, we find an almost linear dependence between them. Figure 8 depicts the complete parametric curves for the first iterations obtained analytically.
An interesting consequence of this dependence is that, for the same variance, we can find a value of p that maximizes the entropy. This can be done for any iteration i > 1 (remember that for i = 1, H(p) = VAR(p) for all p ∈ [0, 1]). Note that the dispersion index, D, of these sequences would be different because the mean would change at both values of p. As an example, we find two values of p with a similar variance, in particular, VAR(0.85) ≈ VAR(0.99) ≈ 8700 but, with a mean entropy values H(0.85) = 0.67 and H(0.99) = 0.41. Obviously, we can also find values of p that give rise to sequences with the same entropy that have different variance. For example, a similar mean entropy is achieved for p = 0.79 and p = 0.99, whereas the variance of these populations are very distant, VAR(0.79) ≈ 3368 and VAR(0.99) ≈ 8741. Figure 8. Parametric curves H − VAR in terms of p for the first 10 iterations. All these curves can be split into two regimes: (i) for 0 < p < p r , where H has a concave (down) dependence on VAR and (ii) for p > p r , when both variables decreases with p almost linearly. The value of p r coincides with the point of the curve farthest from the origin. The dashed line connects these points. Due to the large value of the variance in comparison with the corresponding entropy values, the value of p r tends to be the value where the maximum of VAR occurs. An interesting consequence of this dependence is that, for a given iteration i, we can find two values of p that yield sequences with the same variance but, with different entropies, having one of them the maximum value. The substitution length is k = 2.

Concluding Remarks
Motivated by the classical percolation process defined by Mandelbrot [4], we have studied a class of random substitutions of constant length that maps 0 → 0, 0, . . . , 0 and 1 → Y 1 , Y 2 , . . . , Y k , being Y j a Bernoulli process with parameter p ∈ [0, 1]. Contrary to the classical Bernoulli process, this is an asymmetric rule that yields infinite sequences with an average ratio #1/#0 that tends to 0 for any 0 ≤ p < 1. By applying this substitution map, we have randomly generated binary sequences whose length increases with the substitution length k as k i for i = 0, 1, 2, . . . for iteration i. Specifically, we have analyzed the combinatorial, statistical and entropic properties of this substitution process starting from a sequence formed by the digit 1.
This asymmetry is reflected in the mass distribution of the number of ones, specifically with regards to its uncertainty. We have computed the variance of this distribution and the entropy of the ensemble of sequences generated at each iteration as a function of p. The relationship between these two magnitudes is depicted in the H-VAR curves of Figure 8. It is interesting to remark the two regimes that appear in the curves H-VAR, i.e., concavity down for p < p r and up for p > p r that, to the best of our knowledge, have not been shown before [6]. This allows to compare sequences generated with different values of H but with the same variance and, on the contrary, sequences that have the same entropy and different variance. We have left for a forthcoming paper the study of other properties of the generated sequences as, for instance, the distribution of the sizes of the sets of ones and the fractality, measured by the Haussdorff dimension.
The same kind of substitution rules can also be performed in the plane or in the space to generate sets with specific properties. In particular, random substitutions have been applied to study percolation properties of 2D fractal objects [2,4]. Using an iterated function system [3], random fractals are generated as a function of a probabilistic control parameter, e.g. the probability of any site of being occupied, and their percolation characteristics are quantified in terms of this parameter.