On Nonlinear Complexity and Shannon's Entropy of Finite Length Random Sequences

Pseudorandom binary sequences have important uses in many fields, such as spread spectrum communications, statistical sampling and cryptography. There are two kinds of method in evaluating the properties of sequences, one is based on the probability measure, and the other is based on the deterministic complexity measures. However, the relationship between these two methods still remains an interesting open problem. In this paper, we mainly focus on the widely used nonlinear complexity of random sequences, study on its distribution, expectation and variance of memoryless sources. Furthermore, the relationship between nonlinear complexity and Shannon's entropy is also established here. The results show that the Shannon's entropy is strictly monotonically decreased with nonlinear complexity.


Introduction
Pseudorandom binary sequences have important uses in many fields, such as error control coding, spread spectrum communications, statistical sampling and cryptography [1][2][3].A good random number generation will help to improve the results in these applications.At the beginning, most existing methods

OPEN ACCESS
for generating pseudorandom bit sequences are based on the mid-square method, the linear congruential method, linear and nonlinear feedback shift registers, etc.These kinds of pseudorandom bit generators (PRBGs) are not secure enough for their inner fixed linear structure.The chaotic system is highly sensitive to its initial condition and parameters, together with its random-like and unpredictability, et al., is very useful in improving its security, and is better than the Linear Feedback Shift Register in the cryptographic properties.Therefore, in recent years, chaotic system is regarded as an important pseudorandom source in the design of random bit generators [4][5][6][7][8][9].
There are two kinds of method in evaluating the properties of sequences, one is based on the probability measure.Till now, many information-theoretic studies of pseudorandom sequences have been provided.In 1949, Shannon first introduced the concept of "entropy" from thermodynamics into information science, and proposed it as an uncertainty measure of random variables [10].Kohda et al. [11] studied the statistical properties of binary sequences generated by a class of ergodic maps with some symmetric properties, and a simple sufficient condition for these maps to produce a sequence of independent and identically distributed binary random variables.They also evaluated the dependence of a chaotic real-valued trajectory generated by the Chebyshev map of degree k by using the N-th-order dependency moments, and find that the real-valued trajectory generated by the Chebyshev map of degree k has the k-th-order correlated property [12].Visweswariah et al. [13] showed that under general conditions, the optimal variable-length source codes asymptotically achieve optimal variable-length random bit generation in a rather strong sense.Beirami et al. indicated that the entropy rate plays a key role in the performance and robustness of chaotic map truly random number generators [14], and provided converse and achievable bounds on the binary metric entropy [15], which is the highest rate at which information can be extracted from any given map using the optimal bit-generation function, and et al.
Besides using the probability measure to evaluate a random sequence, many researchers provided the so-called deterministic complexity measures.Among all these complexity measures, the following four measures may be the most important ones.They are linear complexity, Lempel-Ziv complexity, eigenvalues and nonlinear complexity, respectively [16][17][18][19][20][21][22][23][24][25][26].Among these four measures, linear complexity and Lempel-Ziv complexity have achieved widely studied.However, the nonlinear complexity has not been studied to the same extents.
Both these two kinds of measure are used to measure the properties of random sequences.However, the relationship between them is still lack of studies [20,27,28].Lempel et al. [20] established the relation between Lempel-Ziv complexity and normalized entropy of random binary sequence.The relationship between T-complexity and KS entropy for one-dimensional Logistic map is shown in [27].Reference [28] shows the relationship between eigenvalue and Shannon's entropy of random sequences.In this paper, we will study on the nonlinear complexity of random sequences generated by the memoryless sources.Furthermore, we will also provide that the nonlinear complexity is inverse correlated with Shannon's entropy.
The rest of our paper is organized as follows.The expectation and variance of nonlinear complexity of random sequences are discussed in Section 2. In Section 3, we will establish the relationship between nonlinear complexity and Shannon's entropy.Section 4 concludes the whole paper.

Nonlinear Complexity of Random Binary Sequences
First, we give some basic definitions.Let s = s0, s1, s2, … be a sequence and si j = si, ...sj, with i ≤ j, be its tuple.If s has finite length N, then s N : = s0 N−1 denotes the whole sequence.Any ultimately periodic sequence can be generated by a feedback shift register, satisfying a recurring relation 1 2 ( , ,..., ), 0 where n > 0 equals the length of the FSR.The function h is called the nonlinear feedback function.Nonlinear complexity, is defined as the minimum order of the FSR which can generate s N , denoted as c(s N ).The sequence c(s 1 ), ... , c(s N ) is called nonlinear complexity profile.
The minimal FSR of a sequence is not unique [26].Therefore, it is not convenience to calculate the nonlinear complexity by using its definition.Usually, we use the following proposition to determine the nonlinear complexity of a given sequence.This proposition is taken from [25,26].
Proposition 1 ([25,26]): Let l be the length of the longest tuple in a sequence s N that occurs at least twice with different successors.Then c(s N ) = l + 1.
This Proposition is valid if the constant term of the feedback function of the FSR is allowed to be nonzero.If we are confined to zero constant terms, then we have that c(s N ) = max{l + 1, m + 1}, where m is the length of the longest nonending run of zeros in s N .Clearly m ≤ l + 1, since the existence of a run of m zeros followed by any nonzero element directly implies that l ≥ m − 1.
Proposition 1 shows that, if the nonlinear complexity of a sequence equals to l, then the following two conditions must hold, and vice versa.
(1) There exist a tuple with length l − 1, which occurs at least twice.
(2) All the tuples with length l do not occur more than once.Consider a random binary sequence s N = s0, s1, …, sN-1 with its nonlinear complexity c(s N ) = l.Let the probability of symbol "0" occurs be Pr(0) = p.Then the probability of all the tuples with length l do not occur more than once can be written as C − + is the number of all possible two positions, where tuple of length l may occur simultaneously, 2p 2 − 2p + 1 is the probability which two arbitrary tuples of length 1 are the same.Therefore, according to conditions (1) and ( 2), the probability of c(s N ) = l can be calculated as ) When p = 0.5, the sequence comes to be uniform, and the probability of c(s N ) = l can be simplified written as Let p = 0.3 and 0.5, respectively, and the length N = 1000, the probability distribution of nonlinear complexity of random binary sequences are shown in Figures 1 and 2. From these two figures we can see that, theoretically, the nonlinear complexity of a random binary sequence may be any integer which is smaller than the sequence's length.However, the probability almost equals to zero except for a relatively narrow interval.According to the probability distribution (1), the expectation of nonlinear complexity can be written as If the sequence is uniformly distributed, with p = 0.5, then the expectation can be simplified written as Jansen et al. [26] derived the expectation of nonlinear complexity for the sequences with alphabet of any cardinality.For binary sequences, the expectation approximately equals to 2log2N.Figure 3 compares our result with the 2log2N line, which shows that they are approximately the same for large N.
By calculating the value of E(c(s N )) in Equation ( 4) and 2log2N with different length N, we can get the following error e e = E(c(s N )) − 2log2N The value of e with different length N is shown in Figure 4. From Figure 4 we have that, for moderate large N, the error will almost remain the same, which is about 0.3019.As we know, if the length N be a quite large number, then the value 0.3019 is rather small which can be ignored.Therefore, for moderate large N, the expectation of nonlinear complexity can be approximately written as  4)  Correspondingly, for a general random sequence with p ≠ 0.5, the expectation can be approximately written as for moderate large N.
Let p = 0.1, 0.2, 0.3, 0.4 and 0.5, respectively, we can compare the value of the expectation of nonlinear complexity of random sequences, as show in Figure 5. From Figure 5 we can find that, the expectations of nonlinear complexity profile have a clear hierarchy.The more uniform the sequence is, the smaller the expectation of nonlinear complexity is.

Figure 5.
The expectation of nonlinear complexity profile with p = 0.5 (blue line with a star logo), p = 0.4 (red line with a diamond shaped logo), p = 0.3 (black line with a circular logo), p = 0.2 (green line with a square logo) and p = 0.1 (yellow line with a triangle logo), respectively.
Next, we can derive the variance of nonlinear complexity of random sequences for moderate large N as where ( ( ) ) N P c s l = is given by (1).If the sequence is uniformly distributed, with p = 0.5, then the variance can be simplified written as here, ( ( ) ) N P c s l = is given by ( 2).Now we take p = 0.5 as an example.Figure 6 shows that the variance of nonlinear complexity will become stable, which has no relevance with the growth of length N (about when N > 400).Furthermore, we can derive our results for the sequences with finite alphabets.Let the number of alphabets be M.The probabilities of each alphabet are p1, p2, … , pM, respectively, with p1 + p2 + … + pM = 1.Then the probability of all the tuples with length l does not occur more than once can be written as  is the probability which two arbitrary tuples of length 1 are the same.Thus, the probability of c(s N ) = l can be calculated as The expectation of nonlinear complexity of random finite alphabet sequence can be approximately written as for moderate large N.
If the M-alphabets sequence is uniform with p1 = p2 = … = pM = 1/M, the conclusions ( 9) and ( 10) can be simplified written as ) 2log From ( 12) we know that, with the same length, the larger the alphabet size is, the smaller the expectation of nonlinear complexity is.

The Relationship between Nonlinear Complexity and Shannon's Entropy
In this section, we will reveal the relationship between nonlinear complexity and Shannon's entropy.From Figure 6 we know that the variance of nonlinear complexity is stable for a moderate large length N, thus the nonlinear complexity of a random binary sequence with length N approximately equals to its expectation.According to (6), the probabilities of "0" and "1" in random binary sequences are   The relationship between nonlinear complexity and Shannon's entropy is shown in Figure 7.As shown in Figure 7 we can see that the Shannon's entropy has inverse correlation with nonlinear complexity.Correspondingly, for a uniformly distributed random M-alphabets sequence, we can also establish the relationship between nonlinear complexity and Shannon's entropy as 2 log ( ( )) From ( 14) we can also have that the Shannon's entropy is strictly monotonically decreased with nonlinear complexity.

Conclusions
In this paper, we mainly study on the nonlinear complexity of random sequences of memoryless source.The statistical properties of nonlinear complexity of random sequences, including probability distribution, expectation and variance are provided.Furthermore, we also establish its relationship to Shannon's entropy.The result shows that these two measures are exactly opposite.In our future work, we will study the relationship between other complexity measures and probability (entropy) measure.

Figure 4 .
Figure 4.The error between the expectation of nonlinear complexity and 2log2N.

Figure 6 .
Figure 6.The variance of nonlinear complexity of random binary sequence with p = 0.5.
p1 + p2 = 1.Then, the function between Shannon's entropy and nonlinear complexity is

Figure 7 .
Figure 7.The relationship between nonlinear complexity and Shannon's entropy.