A Novel Method for Increasing the Entropy of a Sequence of Independent, Discrete Random Variables

In this paper, we propose a novel method for increasing the entropy of a sequence of independent, discrete random variables with arbitrary distributions. The method uses an auxiliary table and a novel theorem that concerns the entropy of a sequence in which the elements are a bitwise exclusive-or sum of independent discrete random variables.


Introduction
Sequences of random variables play an important role in many fields of science.If we have a sequence of k random variables, then a natural and important question is as follows: what is the entropy of the sequence, and how does it grow with k? Examples include dynamical systems, cryptography, simulations, and statistics.Usually, we require that the value of the entropy be maximal or at least close to the maximum.
The literature devoted to sequences includes many books, journals and conference papers.The authors consider various aspects of sequences and their applications [1] but most of these papers concentrate on the mathematical description of sequences, e.g., as a ring of integers, the correlation within a sequence elements, measures of sequence complexity, whether a sequence can be modeled as a sequence of independent, identically distributed random numbers, and so on.The problem of maximizing the entropy of a sequence of independent, discrete random variables is considered mainly OPEN ACCESS by scientists involved in the theory and practice of random numbers.Because high entropy of a random sequence is a necessary condition of its use in cryptography, several general methods that increase the sequence entropy have been proposed [2][3][4][5][6][7][8][9][10][11].These methods include both very simple correctors and complicated hash functions or ciphers.Examples of simple correctors are combining exclusive-or two or more neighbor bits produced by a random source, feeding a linear feedback shift register (LFSR) with a sequence with small entropy, or using the von Neumann corrector.Examples of complicated correctors are hash functions SHA-1, SHA-2, decryption-encryption standard (DES), advanced encryption standard (AES), and so on.Solutions using extractor algorithms and resilient functions that "filter out" any deterministic bits from a raw sequence with a deterministic function also exist [5][6][7][8][9][10].A good review of post-processing methods is given in [6].
In this paper, we define a bitwise exclusive-or sum of discrete random variables and prove formally that under certain assumptions, an infinite bitwise exclusive-or sum of independent discrete random variables with values encoded by l bits has a uniform distribution.We propose to use this property, the symbols that have already been produced, and an auxiliary table to produce a sequence that has almost maximal entropy using a single random source.The differences between existing algorithms and the algorithm XOR-B proposed in this paper are such that the uniform distribution has been proven formally and the algorithm does not need auxiliary deterministic circuit.Known algorithms are based on heuristic arguments and empirical experiments or introduce deterministic bits such as methods using LFSRs.The exception is the von Neumann corrector.Unfortunately, the von Neumann corrector produces sequences with variable and difficult to predict bit rates, which significantly limits its applications.Consequently, the most frequently used post-processing without an external auxiliary deterministic circuit and significant bit reduction uses a hash function or a cipher implemented in a field programmable gate array (FPGA) or in an application specific integrated circuit (ASIC) [11].
The organization of this paper is as follows.Section 2 contains the basic definitions.The proof of the proposed theorem and the algorithms for producing a sequence with almost maximal entropy with the use of a sequence with smaller entropy and an auxiliary table are contained in Section 3. The same section contains an example that illustrates a practical use of the proposed theorem.The paper ends with the conclusions in Section 4.

Bitwise Exclusive-or Sum of Random Variables
The mathematical environment for the method and theorem proposed in this paper is the probability space and operations on its elements.The probability space is a measurable space that is defined by a triple ( , , ) , where [12][13][14]: (1) Ω is any non-empty set called the sample space.
(2) F is a σ -algebra or σ -field, i.e., a collection of subsets of Ω that satisfies four postulates: for any countable collection of subsets { } for any countable collection of subsets { } (3) P is a countably additive measure, i.e., a mapping from F to the closed interval 0,1 The sample space Ω is the set of all possible random outcomes of some experiment.A random variable assigns a numerical value to each of those outcomes [15]: Definition 1.Given a probability space ( , , ) Formula (1) states that the function X must be measurable.For a continuous probability space, the sample space Ω is uncountable, and the random variable X can assign an uncountable number of values.For the discrete probability space, the sample space Ω is finite or countable [12][13][14][15].Definition 2. We say that X is a discrete random variable if the function X assumes only a finite or countable number of values.
In this paper, we assume that a random source X S produces the sequences of symbols k x , ( ) (1), (2),..., ( ) when the limit exists.
Formula (2) determines the average rate of the asymptotic growth of the entropy ( ) H S with increasing k if the limit exists [16].The entropy ( ) H S is also termed the entropy rate.The value of ( ) H S is in bits per time-step (bit/time-step).If { ( )} X k is a sequence of independent and identically distributed (i.i.d.) random variables, then [16] ( ) ( ) ( ) If { ( )} X k is a sequence of independent but not identically distributed random variables, then [16] 1 {( ( )} ( ( )) Following Thomas and Cover, it worth noting that we can choose sequences of distributions on (0), (1),... X X such that the limit ( ) does not exist [16].
The greatest value of ( ) H S is equal to l bits per time-step.This value is obtained when { ( )} X k is a sequence of independent and uniformly distributed random variables.Thus, the key point for the greatest entropy is the independence and uniformity of the distributions on (1), (2),... X X .Because the values of ( ) X k are encoded by l bits, the random variable ( ) X k can be written unequivocally as a sequence of l binary random variables ( ) i X k , 0,1,..., 1 Let Y S be another random source that produces sequences of symbols k y , Definition 4. A bitwise exclusive-or sum ( ) Z k of independent random variables ( ) Consequently, Similarly, the probability that ( ) Y k assumes a given value y from [0, 2 1] l − is equal to ( ) For the random variable ( )

Increasing the Entropy of a Sequence of Discrete Random Variables with an Auxiliary Table
As noted in the previous section, the process of emitting symbols from {0,1,..., 1} X k are independent and uniformly distributed random variables.In this section, we assume that ( ) X k are independent but not uniformly distributed.They can also have different distributions for different k.We search for a computationally efficient algorithm for processing a sequence of symbols k x emitted by X S that provides at the output a sequence that can be modeled as a sequence of independent and uniformly distributed random variables.The proposed algorithm uses an auxiliary table T with L cells and has the following form: Fill all of the cells of T with the L subsequent l-bit words produced by X S ; Computations: , ˆ[ ]: The subsequent l-bit words k x are written cyclically into  Algorithm XOR-A uses the property that a sequence formed from a sequence of independent random numbers is also random.This property has been used by MacLaren and Marsaglia [17] to permute sequences from the linear congruential pseudorandom generator with an auxiliary and simpler, pseudorandom number generator.They noticed that the shuffled sequence might have better statistical properties and a longer period.This method is known in the literature as Algorithm M [18].Another heuristic approach used in algorithm XOR-A is the observation that combining XOR random words can also provide sequences with better statistical properties, in particular with higher entropy rates [19].No exact theory explaining why combined generators can significantly improve the properties of sequences has been published yet.We know only the heuristic arguments.For example, Deng and George, and later Deng et al., provided arguments based on probability densities [20,21].Some heuristic arguments come from Marsaglia [22].One of the intuitive arguments for why combining improves the statistical properties of sequences and why it is better than shuffling is that a combined generator can produce new numbers, whereas shuffling only permutes existing elements.
An important factor in combined generators is the choice of mathematical operation that combines independent digital words.Because combining should not significantly increase the computation effort, only simple operations are considered: addition, addition modulo p, where p is an integer, and bitwise exclusive-or.The latter has a mixing property, commonly used, e.g., in ciphers.Examples include the decryption-encryption standard, the advanced encryption standard or the secure hash algorithms SHA-1 or SHA-2, etc., [11,23,24].The XOR operation applied to sequences of bits produced by independent source generators also reduces the correlation between adjacent bits [3,6].To use this property, we can divide a sequence into, e.g., l-bit disjoint blocks and compute the XOR function for each block.As the result, a sequence with a lower correlation between adjacent bits is obtained, but this method reduces the bit rate l times.Combining source streams according to Algorithm XOR-A decreases the output speed only slightly because of the fast operations used (this problem will be discussed at the end of Section 3).The distance between the elements that form the i-th, 1, 2,..., i R = , source stream changes randomly because n s is random.Let us emphasize, that this distance cannot be constant because the obtained source sequences are shifted versions of the same sequence.It cannot also be deterministic because elements of the source streams may repeat periodically, producing periodic patterns in the output sequence.This limits the use of algorithm XOR-A to sequences with a length not greater than the period of the auxiliary generator when the source of numbers { } n s is deterministic.However, if { } n s is produced by an entropy source, then this limitation does not exist.
Algorithm XOR-A provides a sequence { } n z of l-bit words.Because a bitwise exclusive-or sum of random variables is also a random variable, each n z is a value of a certain discrete random variable ( ) Proof.Independently of k and the value of ˆn s , the random variable ( ) Z n , 1,2,... n = is a bitwise exclusive-or sum of 1 R + independent random variables with theoretically arbitrary, the same or different distributions.Consequently, proving the distribution of ˆ( ) ( ) ( ) ... ( ) can be reduced to proving the distribution of the random variable where (0), (1),..., ( ) U U U R are independent with arbitrary, the same or different distributions.Because the values of ( ) U r , 0,1,..., r R = are encoded by l bits, the random variable ( ) U r can also be written unequivocally as a sequence of l binary random variables ( ) Similarly, the values of Z are also encoded by l bits, and Z can be written unequivocally as a sequence of l binary random variables i Z , 0,1,..., 1 To prove that Z has a uniform distribution, we first show that ( 0) ( 1) 1/2 , where P denotes the probability and 0,1,..., 1 i l = − .The key to this proof is a calculation trick and the methodology introduced by R. B. Davies in a private paper [3].The link to this page can also be found at the National Institute of Standards and Technology (NIST) webpage [25].If i U takes the values of 0 and 1, then ( ) 1 2 takes the values of 1 and −1.
. Assuming that ( ) i E Z is the expected value (mean) of the random variable i Z , we obtain the following: If (0), (1),..., ( ) Because ( ) ( ) we obtain or where ( ) Considering the sign of i Δ , Based on the assumption, the random variables can assume any value from the interval [0, 2 1] l − with a nonzero probability, and no value from [0, 2 1] l − is "certain," i.e., it cannot be assumed with a probability equal to unity.Consequently, any subsequence of an l-bit sequence also appears with a nonzero and smaller than unity probability, and the variables ( ) i U r , 0,1,..., r R = , 0,1,..., 1 i l = − assume both zero and one for each i.In this case, the expected value ( ) is always nonzero and is smaller than unity, i.e., 0,1,..., 1, 0,1,..., 0 2 ( ) Because the product of the numbers with values in the interval (0,1) decreases to zero as the quantity of multiplied numbers increases, we obtain Variable i Z takes only two values, zero or one.It leads to the following equality Because the random variables ( ) U r , 0,1,..., r R = assume all values from [0, 2 1] l − with a nonzero and smaller than unity probability and ( 0) ( 1) 1/2 l − for R approaching infinity.This completes the proof.
Let us emphasize that the assumption that ( ) U r , 0,1,..., r R = assumes values from [0, 2 1] l − with a nonzero and smaller than unity probability is necessary to draw the conclusion that equal probabilities of assuming zeros and ones by i Z results in a uniform distribution of This assumption eliminates all of the specific situations, i.e., combining variables with fixed (probability equal unity) values or the lack of some numbers (zero probability).As an example, let us consider the case of 2 l = .If ( ) U r assumes, e.g., only two values, 3 and 0, with a probability of 1/2 (the numbers 1 and 2 are assumed to have zero probability), then the result of the combination is the numbers 3 or 0, independent of R. The number 3 is obtained if we combine an odd number of number 3s, and the number 0 is obtained for an even number of number 3s, although Words 1 2 , ,..., R t t t are chosen from the words that were already produced by the random source X S .
Theorem 1 shows that perfect uniformity (the entropy equal to l bits/time-step) can only be obtained in the limit R →∞ , i.e., for table T with L →∞ cells.In a practical system, L is finite, and consequently, the entropy can only be close to the theoretical limit.Assuming a fixed R, which results from the assumed acceptable value of the entropy ( ) A small m enables some of the words 1 2 , ,..., R t t t chosen in step n to repeat for the next several iterations with a large probability.A greater value for m reduces this probability but increases the size of table T. A smallest acceptable m should ensure that the smallest distance, i.e., ˆ1 n s = , repeats statistically no more frequently than every R iterations when we use n L x + as 0 t .This condition can be written as where P is the probability.Because { } ˆn s has a uniform distribution, Condition (21) takes the following form: or Because m must be an integer, inequality (23) takes the form where α     is the smallest integer number that is greater than α , α 0 The basic weakness of algorithm XOR-A is the necessity of using an additional source of randomness that provides random numbers n s with a uniform distribution.Because the proof of the proposed theorem indicates that such numbers are available at the output of algorithm XOR-A, they can be used to produce the numbers n s .We exploit here the fact that any subsequence of digits of a random number can be used to form another random number.Therefore, if n z is random, then any sequence formed from bits of n z is also random [19].Assuming that ˆ: 1 trunc ( ) where the truncation operation leaves the m higher-order bits of number n z , the algorithm XOR-A can be modified in the following way:

Initialization:
Choose the size L of an auxiliary Table T; Fill all of the cells of T with the L subsequent l-bit words produced by X S ; 1 ˆ: 1 s = Computations: A limitation for algorithm XOR-B is the number R of words 1 2 , ,..., R t t t that are read from T.
Parameter R must satisfy the condition One of the factors determining the utility of a proposed algorithm is its computational complexity.The framework for the complexity of computations is contained in a classical paper by M. O. Rabin [26].In algorithm XOR-B the following operations on l-bit words can be extracted during the production a single l-bit word n z : where operation (i) can be reduced to multiplication by an inverse number.Assuming that the elementary operation is addition, the complexity of computations of (i) and (ii) can be equal to or ( log / log log ) O l l l , where O is one of the family of Bachmann-Landau notations known as "Big O" [27,28].The concrete value depends on the assumed multiplication algorithm.The complexity of the computation of (iii) is ( ) O l or (log ) O l .As previously, the details depend on the addition algorithm assumed.Algorithm XOR-B also uses logic operation XOR on l-bit words.The computational complexity of this operation can be reduced to l multiplications of one-bit numbers with addition in the Galois field GF (2).The complexity of this operation is ( ) O l .The dominating operation in algorithm XOR-B is: The assignment from Equation ( 27) can be factored into R operations XOR of the form The complexity of Equation ( 28) is equal to: (constant time of reading elements from table T)+ ( -bit XOR) f l l multiplications l addition l division with rest Omitting a constant C, Equation ( 29) is reduced to Because the dominating operation is multiplication, the computational complexity of algorithm XOR-B can be assessed as The total computational effort depends on both l and parameter R. Parameter R does not depend on l and is assumed to be fixed value for algorithm XOR-B.Computational effort determines a total time of computations and is usually measured in clock cycles necessary to execute an algorithm.In our case, it is production of a single number n z .It requires: R operations XOR defined by Equation ( 28), one load of l-bit word, one computation of mod j n L = , one truncation, and one substitution [ ]: T j x = .The detailed comparison of XOR-B with SHA-1 and SHA-2 is reliable when the same software and hardware platform is used for all algorithms.We limit our considerations to general comment, because it is not a subject of this paper.For very large R, greater than 80 rounds for SHA-1, SHA-512/384 and 64 rounds for SHA-256/224, the algorithm XOR-B can theoretically be slower.However, the high value of R indicates that some numbers n x occur with very small probability.There is no proof that SHA-1 and SHA-2 always provide uniform distribution at the output for biased raw sequences [29].

Example
Theorem 1 indicates that the simplest method for increasing the entropy of a sequence is to group its elements into blocks with 1 R + elements and compute the bitwise exclusive-or sum of all of the elements for each block.This method is numerically inefficient because to obtain an N-element sequence with a greater entropy, a random source X S must produce ( ) For example, if 1000 N = and 10 R = , then the source X S must produce 11,000 symbols for the first method and 1160 symbols when algorithm XOR-B is used.
To compare different methods and assess the uniformity of the distribution of numbers obtained using the proposed theorem, a pseudorandom number generator with a Gaussian distribution of generated numbers was used.The Gaussian distribution occurs in many physical phenomena and plays a key role in the Shannon model of communication.As a source of independent numbers, we used a pseudorandom number generator from the Mathematica package [30] , ,..., R t t t changes randomly for each iteration.This additional randomness comes from the higher-order bits of the output number n z , excluding the first iteration, when it is fixed and equal to unity.In the experiment, frac l is fixed and equal to 16 bits.The value of int l is changed to obey the three exemplary ranges of the Gaussian distribution: 3 = .
To assess the uniformity of the obtained distributions, a chi-square test was used.This test measures the agreement between the empirical distribution and the theoretical distribution for a given degree of freedom (number of categories) and significance level.The value of the statistic 2 χ is computed using the formula [18,19] ( ) , ranges 3 ± , 5 ± , 7 ± , and the two methods of producing uniformly distributed random numbers are summarized in Tables 1-3.
The results shown in binary random variables, then ( ) i Z k is also a binary random variable.

1 ,
random source with a uniform distribution.The elements of the R + 1 vectors 0 ,..., R t t t are next summed modulo 2, which forms a single output n z with values from [0, 2 1] l − .To avoid multiple summing of the same number, the size of T satisfies the condition 2 m L R ≥ ⋅ , where R is an integer.For the same reason, we use 1 ˆn n s s + = instead of n s .Producing an N-element sequence { } n z requires the emission of (N + L) l-bit words k x .

( 2 )
formed randomly and independently of the previous sets.Consequently, ( ) Z n and ( ) Z n j + are also independent for any 0 j ≠ .The open problem is the distribution of variables ( ) Z n when ( ) X k can have arbitrary, the same or different distributions.An auxiliary source provides the random numbers n s with a uniform distribution the multiplication of l-numbers with Karatsuba's algorithm and ( log / log log ) O l l l for the Schönhage-Strassen algorithm.When algorithm XOR-B is implemented in a hardware supporting multiplication of l-bit numbers, the time of multiplication can be the same as for the elementary operation.Consequently, the complexity of Equation (28) can be reduced to ( ) O l .It is the same as the complexity of hash algorithms with the same word length ( 32 l = for SHA-1 and SHA-256/224 and 64 l = for SHA-512/384).
. The instruction [μ,σ] NormalDistribution provides numbers with a Gaussian distribution with an average μ and standard deviation σ .Because the obtained numbers are both positive and negative, we first compute their absolute values.Next, we consider numbers from the interval encoded by int frac l l l = + bits, where int lis the number of bits that encode the integer part and frac l is the number of bits that encode the fractional part.If a generated number does not belong to the interval numbers can, theoretically, assume values from minus to plus infinity), then we repeat the generation to obtain a number from this interval.The output is the bitwise exclusive-or sum of the 1 where H is the number of categories, Ni is the number of samples from the i-th category, Pi is the probability that each sample falls into category i, and N is the total number of samples.The critical value c χ of the chi-square test for 50 categories and the significance level β 0.01 = , chosen in the numerical experiment, is equal to 74.92 (we do not assess the parameters of the distribution).The results of the tests for 5 10 N =

Table 1 .
Table 1 indicate that we require only four numbers from the interval 3 ± to form uniformly distributed numbers from the numbers with Gaussian distributions with μ 1 = and σ 2 = for both methods.When the probability of producing certain numbers from The values of the statistic 2 χ for the range 3 ± .

Table 2 .
The values of the statistic 2 χ for the range 5 ± .

Table 3 .
The values of the statistic 2 χ for the range 7 ± .