Low-Entropy Stochastic Processes for Generating k -Distributed and Normal Sequences, and the Relationship of These Processes with Random Number Generators

: An inﬁnite sequence x 1 x 2 ... of letters from some alphabet { 0, 1, ..., b − 1 } , b ≥ 2, is called k -distributed ( k ≥ 1) if any k -letter block of successive digits appears with the frequency b − k in the long run. The sequence is called normal (or ∞ -distributed) if it is k -distributed for any k ≥ 1. We describe two classes of low-entropy processes that with probability 1 generate either k -distributed sequences or ∞ -distributed sequences. Then, we show how those processes can be used for building random number generators whose outputs are either k -distributed or ∞ -distributed. Thus, these generators have statistical properties that are mathematically proven.


Introduction
In 1909, Borel defined k-distributed and ∞-distributed sequences as follows: A sequence of digits in base b is k-distributed if for any k-letter word w over the alphabet {0, 1, ... , b − 1} lim t→∞ ν t (w)/(t − |w|) = b −|w| (1) where ν t (w) is a number of occurrences of w in the sequence x 1 ...x |w| , x 2 ...x |w|+1 , ..., x t−|w|+1 ...x t ). The sequence is normal (or ∞-distributed) if it is k-distributed for any k ≥ 1. Borel called normal to base b a real number from the interval (0, 1) whose expansion in base b is normal sequence, and showed that almost all real numbers are normal to any base (with respect to the uniform measure) [1,2]. It is interesting that the construction of ∞-distributed sequence in an explicit form was first achieved by Champernowne in 1933 [3], who proved that the sequence 0 1 2 ...9 10 11 12 ... 99 100 101 102 ...
is ∞-distributed. Later, many ∞-distributed sequences were described and investigated in numerous papers (see for review [4]). Many researchers suppose that fractional parts of π, e, and √ 2 and some other "mathematical" constants are normal, but it is not proven [2,5]. On the other hand, for π empirical counting over several billions of its digits suggests that this might be true (see [5,6]).
One of the reasons for interest in kand ∞-distributed sequences is due to the fact that they are closely related to the concept of randomness. If we imagine that someone tosses a fair coin with sides marked 0 and 1, he/she obtains (almost surely) an ∞-distributed sequence [2,5]. A mathematical model of such an experiment is a sequence of an independent and identically distributed (i.i.d.) symbols from {0, 1} generated with probabilities (1/2, 1/2). Note that quite often this i.i.d. process and the sequences generated from them are called "true random" [2].
The true random sequences are very desirable in cryptography, simulation and modeling applications. Of course, it is practically impossible to generate them tossing a coin and nowadays there are many so-called generators of pseudo-random numbers (PRNGs), whose aim is, informally speaking, to calculate sequences which mimic the truly random (see [2,[7][8][9][10]). For brevity, in what follows, we consider the case when a process generates letters from the alphabet {0, 1}, but the obtained results can be extended to the case of any alphabet.
Modern PRNGs is a computer program whose input is a short word (a so-called seed), whereas its output is a long (compared to the input) word. Having taken into account that the seed is a true random word, the PRNG can be considered as an expander of randomness which stretches a short seed into a long word [2,7,10]. The output of "perfect" PRNG would have to generate true random output sequence. However, it is impossible.
To be more precise, we note that a mathematically correct definition of a random sequence was obtained in a framework of algorithmic information theory established by Kolmogorov (see [11][12][13][14][15]). In particular, it is shown that any algorithm (i.e., a Turing machine) can neither generate (infinite) random sequences nor stretch a short random sequence into a longer one. It means that PRNGs do not exist. The same is true in a framework of Shannon information theory. Indeed, it is known that the Shannon entropy of the true random process (i.e., i.i.d. with probabilities (1/2, 1/2)) is one bit per letter, whereas for all other processes the entropy (per letter) is less than one (see [16]). On the other hand, any PRNG stretches a short true random sequence into a long one. The entropy of the output is not greater then the entropy of the input and, hence, the per letter entropy of the output is strictly less than 1 bit. Therefore, the demands of true randomness and low entropy are contradictory. Thus, we see that, in a framework of algorithmic information theory, as well as in a framework of Shannon information theory, "perfect" PRNGs do not exist.
In such a situation, researchers suggest and investigate PRNGs, which meet some "probabilistic" properties of true random sequences [2,17]. In particular, a property that a PRNG generates ∞-distributed sequences is very desirable (see [2]).
Another important type of random number generators (RNGs) is physical random number generators, among which the so-called quantum random number generators (QRNG) have become very popular in recent decades and are widely used in practice. By definition, the physical RNGs are devices whose output is a binary sequence that must be truly random (or at least look truly random) (see [10]). According to M. Herrero-Collantes and J.C. Garcia-Escartin [10], a physical RNG can be divided into the two following blocks: the entropy source and the post-processing stage. The output of the source of entropy is a bit string obtained by measuring a physical random process with subsequent quantization. The goal of post-processing is to translate this bit string into a true random binary sequence. Nowadays, there are many methods of post-processing [10], but, nevertheless, the statistical (probabilistic) properties of many physical RNGs and, in particular, QRNGs, are not proven mathematically and should be experimentally tested [10,18,19]. Even the so-called device-independent QNRGs guarantee only the randomness of their output, but true randomness must either be verified or obtained by post-processing [10]. Thus, transformations that transform the output into a normal sequence are desirable for all types of RNGs.
Here, we describe such random processes that their entropy is much less than 1, but Equation (1) is valid for generated sequences either for all integers or for k from a interval 1, ..., K, where K is an integer. This shows that there exist low-entropy PRNGs which generate such sequences that Equation (1) is valid (for b = 2). The description of the suggested processes show that they can be used to develop PRNGs with the property in Equation (1). The described processes are generalization of so-called two-faced processes suggested in [20][21][22].
In detail, we propose the following two processes. First, we describe the k-order Markov chain, which is a so-called two-faced process of order k, k ≥ 1, for which, with probability one, for any generated sequence x 1 x 2 ... and all binary words w ∈ {0, 1} k , the frequency of occurrence of the word w in the sequence x 1 ...x |w| , x 2 ...x |w|+1 , ..., x t−|w|+1 ...x t ... goes to 2 −|w| . Secondly, we describe so-called normal two-faced processes for which this property is true for all k.
We also propose the so-called two-faced transformation, which translates the trajectories of any random process into the trajectories of a two-faced process. This transformation is applicable to the creation of a PRNG with proven statistical properties.

K-Distributed Sequences and Two-Faced Markov Chains
First, we consider a pair of examples in order to explain the main idea of considered Markov chains. Let a matrix of transition probabilities T be as follows: where α 0 and α 1 are non-negative and their sum equals 1 (i.e., For example, let α 0 = 0.9, α 1 = 0.1. Then, the "typical" output sequence can be as follows: On the one hand, this sequence is clearly not true random. On the other hand, the frequencies of 1s and 0s goes to 1/2 due to the symmetry of the matrix in Equation (2). Hence, the output is 1-distributed. Again, based on the symmetry, we can build the following matrix of the second order whose output will be 2-distributed: 00 01 10 11 For α 0 = 0.9, α 1 = 0.1, the "typical" output sequence can be as follows: where gaps correspond to seldom transitions. It can be easily seen that frequency of any two-letter word goes to 1/4.
Let us give a formal definition of two-faced Markov chains. First, we define two families of random processes T k,p andT k,p , where integer k and p ∈ (0, 1) are parameters. The processes T k,p and T k,p are Markov chains of the connectivity (memory) k, which generate letters from the binary alphabet. We define them inductively. The matrix of T k,p is defined as follows: Let the transition matrices T k,p andT k,p be defined, then T k+1,p and T k+1,p are as follows for all u ∈ {0, 1} k ( vu is a concatenation of v and u). We can see that For example, To describe the process, the initial probability distribution should be defined. We say that the initial distribution of T k,π andT k,π is uniform, if for all w ∈ {0, 1} k P{x 1 ...x k = w} = 2 −k . Sometimes, we consider different initial distributions, which is why, in all cases, the initial distribution is mentioned.
Let µ be stationary process. Its conditional Shannon entropy of order m, m = 1, 2, ..., is defined as follows and the limit entropy is as follows see [16].
The main properties of Markov chains T k,p andT k,p , k ≥ 1, are described by the following Theorem 1. Let x 1 x 2 ... be generated by T k,p (orT k,p ), k ≥ 1, and w ∈ {0, 1} k . Then, (i) If the initial distribution is uniform over {0, 1} k , then for any j ≥ 0. (ii) For any initial distribution of the Markov chain T k,p (orT k,p ) (iii) With probability one the Markov chains T k,p andT k,p generates k-distributed sequences.
The proof is given in Appendix A.
Having taken into account this theorem, we give the following. (10) is valid for any w ∈ {0, 1} k , the process is asymptotically two-faced of order k. The process is two-faced of order k, if Equation (9) is true.

Definition 1. If Equation
It turns out that, in a certain sense, there are many two-faced processes. More precisely, the following theorem is true. Theorem 2. Let X = x 1 x 2 ..., Y = y 1 y 2 ... be random processes. We define the process Z = z 1 z 2 ... by following equations z 1 = x 1 ⊕ y 1 , z 2 = x 2 ⊕ y 2 , ... where a ⊕ b = (a + b) mod 2. If X is a k-order (asymptotically) two-faced process, then Z is also the k-order (asymptotically) two-faced process (k ≥ 1).
The proof is given in Appendix A.

Two-Faced Transformation
Now, we show that any stochastic process can be transferred into a two-faced one. For this purpose, we describe transformations which transfer random processes into two-faced ones. First, we define matrices M k andM k , k ≥ 1, which are based on matrices T k,p andT k,p .

Definition 2. The matrix M k is defined by the following equation:
Let us define the matrix M k as follows: for any k ≥ 1, v ∈ {0, 1} k , w ∈ {0, 1}. The matrixM k is obtained fromT k,p analogously. Note that, from Equation (6), we obtain M k+1 = (M kMk ) .

Note that, from this definition and Equation
Theorem 3. Let k ≥ 1 be an integer, X = x 1 x 2 ... be generated by a stochastic process, and τ k be a two-faced transformation. If v is uniformly distributed on {0, 1} k , then for any u ∈ {0, 1} k and r ≥ 1 i.e., τ k (X, v) is two-faced of order k process. The proof is given in Appendix A.
Consider now the question of the complexity of the described transformation τ k allowing transform any process into a two-faced. When directly implementing the transformation τ k , one must store matrix M k of 2 rows and 2 k columns, i.e., just 2 k+1 numbers. Storing such matrices becomes impossible when k exceeds hundreds. Therefore, the question arises of constructing a simpler algorithm that does not require an exponential growth of memory with increasing k. It turns out that there exists an algorithm which requires O(k) bits of memory and finite number of operation (per an output letter).
Similarly,τ * k (x r+1 , y r−k+1 y r−k+2 ...y r ) =H r ⊕ x r+1 . From those definitions and Equation (12), we can see that for any i ≥ 1 It is important to note that there exists the simple algorithm for carrying out the transformation τ * k . Indeed, just store the letters y r−k+1 y r−k+2 ...y r and the value H r in the computer's memory. Then, read the letter x r+1 , calculate y r+1 = H ⊕ x r+1 , include y r+1 and exclude y r−k+1 , i.e., store the new word y r−k+2 y r−k+2 ...y r+1 . Then, calculate the new H k := H k ⊕ y r+1 ⊕ y r−k+1 , read the new letter x r+2 and so on. The proof is given in Appendix A.

Definition 4. If, for any binary word v, Equation
and denote it as ∞ i=1 X i .
The proof is given in Appendix A. From this and Theorem 2, we can derive the following.

Corollary 1.
If X = x 1 x 2 ... and Y = y 1 y 2 ... are stochastic processes and X is normal two-faced, then the process Z = z 1 z 2 ..., Note that the entropy of the processes X 1 , X 2 , ... can be small; hence, the entropy of the process ∞ i=1 X i can be arbitrary small. On the other hand, the process looks truly random.

Experiments
Here, we present some experiments describing the two-faced processes with different parameters. We compared obtained sequences with truly random applying the χ 2 test [23]. For this purpose, N-letter sequence x 1 x 2 ...x N , N = 1000, were generated, whereas the initial part x −k+1 ...x 0 was uniformly distributed. The sequence x 1 x 2 ...x N was presented as x 1 x 2 x k , x k+1 x k+2 ...x 2k , ... and the frequency of occurrence of all words from {0, 1} k was estimated. Then, was calculated, where N = 1000, ν(w) is the number of occurrences of w in the sequence x 1 x 2 x k , x k+1 x k+2 ...x 2k , .... (Note that x 2 estimates the frequency deviation from the uniform distribution.) Then, x 2 was compared with the quantile χ 2 d, 0.99 , where d = 2 k − 1; see [23]). If x 2 > χ 2 d, 0.99 , we rejected H 0 . Table 1 contains results of calculations. (The entropy is equal to −(p log 2 p + (1 − p) log 2 (1 − p))). Thus, we can see that the two-faced processes can be obtained from low-entropic ones.

Conclusions
In this paper, we describe low-entropic processes which mimic truly random ones. In other words, the output is either ∞-distributed or k-distributed for some integer k. In addition, we show how those processes can be directly used in order to construct (or "improve ") PRNGs.
Funding: This work was supported by Russian Foundation for Basic Research (grant 18-29-03005).

Conflicts of Interest:
The author declares no conflict of interest.

Appendix A. Proofs of Theorems
Proof of Theorem 1. We prove that ..x k ) ∈ {0, 1} k , is a limit (or stationary) distribution for the processes T k,p andT(k, p). For this purpose, we show that the system Having taken into account the definitions and Equations (4) and (5), we can see that the equality T k,p (x k / 0x 1 ...x k−1 ) + T k,p (x k /1x 1 ...x k−1 ) = 1 is valid for all (x 1 ...x k ) ∈ {0, 1} k . From the law of total probability and the latest equation, we derive Equation (A1). Taking into account that the initial distribution is uniform and, hence, is the limiting one, we derive the first claim in Equation (9). Any transition probability is either p or 1 − p, hence, they are greater than 0, thus T k,p is ergodic and Equation (9) is true due to ergodicity. Let us prove Statement (iii). All transition probabilities of T k,p are nonzero numbers. Hence, this Markov chain is a stationary ergodic process; therefore, for any w ∈ {0, 1} k , the limit lim t→∞ ν t (w)/(t − |w|) equals E(P{x j+1 ...x j+k = w}) (see [24]). From this and Statement (ii), we obtain Statement (iii).
Proof of Theorem 3. We prove Equation (15) by induction on r. By the condition of the theorem, y −k+1 y −k+2 ...y 0 obeys the uniform distribution on {0, 1} k , hence Equation (15) is true for r = 1. Supposing this equation is proven for r, let us prove it for r + 1. The matrix M k ( , ) has 2 k columns, each of which contains 0 and 1. For any x, half of the corresponding elements of the row M k (x, ) are 0, whereas the others are 1. By induction, y r−k+1 y r−−k+2 ...y r obeys the uniform distribution; hence, with probability 1/2, M k (x r+1 , y r−k+1 y r−−k+2 ...y r ) = 0. The theorem is proven.
Proof of Theorem 5. Suppose w ∈ {0, 1} k . There exists such an integer n i that n i ≥ k and define .. }.) Clearly, ∞ j=1 X j = X i ⊕ D. X i is (asymptotically) n i -order two-faced and, from Theorem 2, we derive that ∞ j=1 X j is n i -order two-faced. Taking into account that k ≤ n i , we can see that ∞ j=1 X j is k-order (asymptotically) two-faced. Thus, Equation (9) (Equation (10)) is true and the theorem is proven.