The Eigenvalue Complexity of Sequences in the Real Domain

The eigenvalue is one of the important cryptographic complexity measures for sequences. However, the eigenvalue can only evaluate sequences with finite symbols—it is not applicable for real number sequences. Recently, chaos-based cryptography has received widespread attention for its perfect dynamical characteristics. However, dynamical complexity does not completely equate to cryptographic complexity. The security of the chaos-based cryptographic algorithm is not fully guaranteed unless it can be proven or measured by cryptographic standards. Therefore, in this paper, we extended the eigenvalue complexity measure from the finite field to the real number field to make it applicable for the complexity measurement of real number sequences. The probability distribution, expectation, and variance of the eigenvalue of real number sequences are discussed both theoretically and experimentally. With the extension of eigenvalue, we can evaluate the cryptographic complexity of real number sequences, which have a great advantage for cryptographic usage, especially for chaos-based cryptography.


Introduction
Sequence complexity can be regarded as a series of measures that depicts the different characteristics of sequences. For cryptographic uses, the most important complexity measures of sequences are linear complexity, Lempel-Ziv (LZ) complexity, eigenvalue, and nonlinear complexity. The nonlinear complexity of a sequence y is an important measure, and it is defined as the length of the shortest Feedback Shift Register (FSR) that generates y. For the shortest Linear Feedback Shift Register (LFSR), it is referred to as the linear complexity. These two measures have been studied for many decades [1][2][3][4][5][6]. In addition, Lempel and Ziv proposed another well known complexity measure for a given sequence, which is called the LZ complexity [7]. The complexity is related to the number of distinct phrases and the rate of their occurrence along the sequence. In the same study, the eigenvalue was provided from a similar aspect as well, while the eigenvalue profile more closely reflected the rate of vocabulary growth than the LZ complexity. The relationship between LZ complexity and nonlinear complexity was studied in [8], which shows that these two complexity measures are converse in a sense.
For all these complexities, there exists a premise, which is that the measured sequences should be on the finite field. This implies that the cardinality of the state set of the sequences should be finite. For linear complexity and nonlinear complexity, the cardinality of the state set is always set to be two, which corresponds to a binary sequence. The Lempel-Ziv complexity and the eigenvalue can measure the sequences with N symbols, where N is finite. [9] studied the relationship between the eigenvalue and Shannon's entropy of finite symbol sequences. The authors of [10] studied the relationship between the nonlinear complexity and Shannon's entropy of random binary sequences. Moreover, the authors of [11] investigated a method to construct finite length sequences with the large nonlinear complexity on the finite field. In addition, the authors of [12] used the Lempel-Ziv complexity as a nonlinear analysis tool on the characterization of the effects of sleep deprivation on the electroencephalogram, etc. To the best of our knowledge, most of the studies in these complexity measures, including theoretical analyses and practical applications, were subjected to this constraint.
Nowadays, many physical systems can be used in cryptography for its complex dynamical properties, such as chaos-based cryptography [13][14][15][16][17][18][19][20]. However, chaotic systems are based on the real number field R n , and the cardinality of the state variable is infinite. Currently, we always prove a chaos-based secure cryptographic algorithm based on its high dynamical complexity. However, the dynamical complexity is not completely equal to the cryptographic complexity. Thus, the security of chaos-based cryptographic algorithms is not guaranteed by cryptography researchers [21].
In order to overcome this weakness of chaos-based cryptography, we should evaluate the complexity of chaotic sequences in a cryptographic way. However, all the cryptographic complexity measures currently used are only available for finite symbol sequences. Thus, we should extend the cryptographic complexity from the finite field to the real number field. In this paper, we mainly focused on the eigenvalue complexity, extending this measure from the finite field to the real number field to evaluate the cryptographic properties of real number sequences. The probability distribution, expectation, and variance of the eigenvalue of real number sequences are discussed both theoretically and experimentally.
The rest of this paper is organized as follows. In Section 2, a brief introduction for the eigenvalue is presented and its extension to the real number field is proposed. The eigenvalue of two kinds of real number sequences are discussed in Section 3, including the uniformly distributed random sequence and the logistic chaotic sequence. In Section 4, four kinds of chaotic sequences are evaluated and compared by using the extended eigenvalue measure. Finally, Section 5 concludes the whole paper.

Eigenvalue for Binary Sequences
The eigenvalue was first proposed by Lempel and Ziv in [7], which described the number of words occurring from a particular parsing procedure of the sequence. Here, we simply summarized the definition of eigenvalue.
Let F 2 denote the binary field and x N = x 0 x 1 x 2 . . . x N-1 be s binary sequence with length N. x i j is denoted as the tuple x i . . . x j in the sequence, i ≤ j. The prefix and suffix of sequence x N is defined as x 0 j and x j N-1 , respectively. When j < N-1, they refer to proper prefix and proper suffix, respectively.
The vocabulary of a sequence x N is the set consisting of all tuples. If a tuple x i j does not belong to the vocabulary of a proper prefix of x N , it is called an eigenword. The eigenvalue of a sequence equals the total number of eigenwords. The eigenvalue profile of x N is the integer-valued sequence determined by k(y i ), i = 1, 2, . . . , N.

Proposition 1 ([7]
). The eigenvalue k(y N ) of sequence y N equals the least l such that y N is reproducible by y l . Based on the definition of the eigenvalue that we have, the eigenvalue of x N equals to k if, and only if, the following two conditions hold. Obviously, the definition of the eigenvalue is available for binary sequences, and can be extended to finite symbol sequences at most. However, the chaotic signals are defined on the real number domain, which cannot be measured by this index. Therefore, extending the eigenvalue measure to the real number domain is beneficial to chaos-based cryptography and many other aspects as well.

Eigenvalue of Sequences in the Real Domain
As we know, the eigenvalue measures the rate of growth of its vocabulary of a sequence. However, strictly speaking, for a real random or chaotic sequence, there will not exist a tuple that occurs more than once. Thus, we cannot judge a tuple based on whether it belongs or does not belong to the vocabulary of a proper prefix.
In the real number field, the Euclidean distance is always used to judge whether two points are close or not. Furthermore, in the dynamics analysis, many measures are based on the Euclidean distance, such as the Lyapunov exponent, Kolmogorov entropy, and embedding dimension. Therefore, in this paper, we used the Euclidean distance d to judge whether the new generated signal was repeated or not. Assume that the sequence y N = y 0 y 1 y 2 . . . y N-1 is a sequence in the real domain, y i ∈R. The state y j is regarded to be identical with y i if |y j -y i | < d, where i < j, x i is a state in the prefix y 0 j . d is defined as the undifferentiated distance that is used to judge whether two real numbers can be regarded as the same. On this basis, we can judge whether the tuple in a sequence belongs to the vocabulary of a proper prefix or not, and whether the eigenvalue can be used in the real number sequences. Consider a random real number sequence . Assume that the distribution function of this sequence is p(x). The probability P(d) of the distance of two states x i and x j , being larger than d, can be calculated as As shown in Equation (1), the probability P(d) can be calculated as the sum of three probabilities. One is the probability of x i ∈(a, a+d) and x j ∈(x i +d, b); one is the probability of x i ∈(b-d, b) and x j ∈(a, x j -d); and one is the probability of x i ∈(a+d, b-d) and x j ∈(a, x i -d) or (x i +d, b). Obviously, the probability P(d) is influenced by the undifferentiated distance d. The states x i and x j in the sequence are regarded to be identical with the probability 1-P(d). Thus, the probability of tuple x i x i+1 . . . x N-1 belongs to the vocabulary M of its proper prefix, which can be written as According to conditions (1) and (2), the probability of k(x N ) = k can be written as As we know, if the tuple x k-1 x k . . . x N-1 ∈M, the tuple x k x k+1 . . . x N-1 must belong to M as well. Thus, we have Furthermore, once the tuple x k x k+1 . . . x N-1 does not belong to M, the tuple x k-1 x k . . . x N-1 will not belong to M either. Thus, we have According to Equations (4)-(6), Equation (3) can be simplified as When Equation (2) is put into Equation (7), the probability of k(x N ) = k can be written as Based on Equation (8), the expectation and variance of the eigenvalue of the real number random sequence x N can be written as and respectively. Next, we use the extended eigenvalue to measure the complexity of uniformly distributed random sequences and logistic chaotic sequences.

Eigenvalue of Uniformly Distributed Random Sequence
Consider a uniformly distributed random sequence, whose distributed function is Without loss of generality, we can limit the region from (a, b) into (0, 1). The corresponding distributed function is f (x) = 1, 0 < x < 1. When the distribution function is brought into Equation (1), Therefore, the probability of k(x N ) = k can be depicted as In order to have a more intuitive understanding, the probability distribution of the eigenvalue is depicted in Figure 1 with different undifferentiated distances. The length N is set to be 1000.
In order to have a more intuitive understanding, the probability distribution of the eigenvalue is depicted in Figure 1 with different undifferentiated distances. The length N is set to be 1000. In Figure 1, we can see that most of the sequences' eigenvalue are located in a relatively narrow interval. Obviously, with different distances d, the probabilities of k(x N ) = k are different. The peak will move left with the growth of d, and the peak value will be gradually decreased. Thus, to evaluate the eigenvalue of a real number sequence, the choice of distance d is crucial. Consider that [9] has studied the eigenvalue probability of n-symbols' random sequences with uniformly distributed sequences. In order to keep consistency, we should choose 2d − d 2 = 1/n, and then the distance d should be chosen by Therefore, we can compare with the eigenvalue of binary random sequence by choosing d = 0.2929, and we can compare with the eigenvalue of 3-symbols random sequence by choosing d = 0.1835, and we can compare with the eigenvalue of 4-symbols random sequence by choosing d = 0.1340, etc.
Based on the distribution of eigenvalues, the expectation of the eigenvalue for random real number sequences can be approximately calculated as In Figure 1, we can see that most of the sequences' eigenvalue are located in a relatively narrow interval. Obviously, with different distances d, the probabilities of k(x N ) = k are different. The peak will move left with the growth of d, and the peak value will be gradually decreased. Thus, to evaluate the eigenvalue of a real number sequence, the choice of distance d is crucial. Consider that [9] has studied the eigenvalue probability of n-symbols' random sequences with uniformly distributed sequences. In order to keep consistency, we should choose 2d − d 2 = 1/n, and then the distance d should be chosen by Therefore, we can compare with the eigenvalue of binary random sequence by choosing d = 0.2929, and we can compare with the eigenvalue of 3-symbols random sequence by choosing d = 0.1835, and we can compare with the eigenvalue of 4-symbols random sequence by choosing d = 0.1340, etc.
Based on the distribution of eigenvalues, the expectation of the eigenvalue for random real number sequences can be approximately calculated as for moderate-large N. Set N = 10,000, a uniformly distributed random sequence is randomly generated. Figure 2 shows the eigenvalue of this sequence. In Figure 2, we can see that all the numerical results are near the theoretical curve we derived in Equation (15), which indicates that the expectation of eigenvalue of random real number sequences is correct. Based on Equation (10), the variance of the eigenvalue of random real number sequences can be approximately written as for moderate-large N. Set N = 10000, a uniformly distributed random sequence is randomly generated. Figure 2 shows the eigenvalue of this sequence. In Figure 2, we can see that all the numerical results are near the theoretical curve we derived in Equation (15), which indicates that the expectation of eigenvalue of random real number sequences is correct. Based on Equation (10), the variance of the eigenvalue of random real number sequences can be approximately written as Set length N from 1000 to 50000. Figure 3 shows that for different distances d, the variances of the eigenvalue are all quite stable with the growth of length N. Set length N from 1000 to 50,000. Figure 3 shows that for different distances d, the variances of the eigenvalue are all quite stable with the growth of length N.

Eigenvalue of Logistic Chaotic Sequence
Consider the following logistic chaotic map, where yi∈(−1, 1) is the state variable. For an initial condition y0, we can generate a chaotic sequence y0y1...yn s according to the iteration. The distribution function of Equation (17) is [22] According to Equation (1), the probability P(d) of the distance of two states, yi and yj, is larger than d and can be calculated as

Eigenvalue of Logistic Chaotic Sequence
Consider the following logistic chaotic map, where y i ∈(−1, 1) is the state variable. For an initial condition y 0 , we can generate a chaotic sequence y 0 y 1 . . . y n s according to the iteration. The distribution function of Equation (17) is [22] f (y) = 1 According to Equation (1), the probability P(d) of the distance of two states, y i and y j , is larger than d and can be calculated as When Equation (19) is brought into Equation (8), the probability of k(x N ) = k for the logistic chaotic sequence can be easily calculated. Figure 4 depicts the probability distribution of the eigenvalue of logistic chaotic sequences with different d values. In Figure 4, we can see that, as with random sequences, the eigenvalue are also located in a relatively narrow interval, and the peak will move left with the growth of d. The peak value will gradually be decreased as well.  Based on Equation (9), the expectation of the eigenvalue of the logistic chaotic sequence can be written as π π π π π π (20) for moderate-large N. When we randomly select an initial condition, Figure 5 shows the eigenvalue of this generated logistic sequence. Obviously, the eigenvalues of this sequence are all around the theoretical curve we derived in Equation (20). Based on Equation (9), the expectation of the eigenvalue of the logistic chaotic sequence can be written as for moderate-large N. When we randomly select an initial condition, Figure 5 shows the eigenvalue of this generated logistic sequence. Obviously, the eigenvalues of this sequence are all around the theoretical curve we derived in Equation (20). for moderate-large N. When we randomly select an initial condition, Figure 5 shows the eigenvalue of this generated logistic sequence. Obviously, the eigenvalues of this sequence are all around the theoretical curve we derived in Equation (20).   Figure 6, we can see that there are almost no differences among the expectation eigenvalue of the logistic sequences and random sequences. After enlarging, we can see that the eigenvalue of logistic sequences is just a little lower than the random sequence, which implies that the logistic sequence cannot be regarded as a perfect random sequence in this sense. For other undifferentiated distances, the results are similar. Therefore, we omit them here to avoid redundancy.   Figure  6, we can see that there are almost no differences among the expectation eigenvalue of the logistic sequences and random sequences. After enlarging , we can see that the eigenvalue of logistic sequences is just a little lower than the random sequence, which implies that the logistic sequence cannot be regarded as a perfect random sequence in this sense. For other undifferentiated distances, the results are similar. Therefore, we omit them here to avoid redundancy. Correspondingly, the variance of the eigenvalue of logistic sequences can be approximately written as The variances of the eigenvalue of logistic sequences with different distances d are depicted in Figure 7. This figure indicates that for every distance, the eigenvalue of logistic sequences are all stable with the growth of length N. Correspondingly, the variance of the eigenvalue of logistic sequences can be approximately written as The variances of the eigenvalue of logistic sequences with different distances d are depicted in Figure 7. This figure indicates that for every distance, the eigenvalue of logistic sequences are all stable with the growth of length N.
The variances of the eigenvalue of logistic sequences with different distances d are depicted in Figure 7. This figure indicates that for every distance, the eigenvalue of logistic sequences are all stable with the growth of length N.

Measure the Complexity of Chaotic Sequences
With the extension of eigenvalue, we can use this complexity measure to evaluate the cryptographic characteristics of different chaotic sequences. Here, the following four kinds of 1-D chaotic sequences are generated and compared.

A.
Chebyshev map Chebyshev map can be written as where x i ∈(−1, 1) is the state variable, a is the control coefficient. The Chebyshev map will be chaotic since a 2. In this test, we always set a = 3. B.
Sine map Sine map can be mathematically described as where r∈(0, 1] is the control parameter. In this test, we set r = 2 to make the Sine map chaotic. C.
Tent map Tent map is a kind of piece-wise function, which can be described as where p∈(0, 1) is the control parameter. Particularly, when p = 0.5, the generated sequence will quickly fall into a short cycle. Therefore, we always set p = 0.49 in this test. D.
Logistic map The Logistic map has already been described in Equation (17), which we omitted here to avoid redundancy.
Since the state variables of these four maps are in different domains, for consistency, we first compressed them to the identical interval (0, 1). When d = 0.1, the eigenvalue of these four kinds of chaotic sequences are depicted in Figure 8. Figure 8 shows that the chaotic sequences generated by the sine map have the largest eigenvalue, whereas the chaotic sequences generated by the Tent map have the lowest eigenvalue. For other distances d, the results are similar. Thus, it can be seen that with the extended eigenvalue, we can evaluate the cryptographic complexity of real number sequences effectively. However, it should be noted that this result does not imply that the Sine map is better than other chaotic maps in cryptographic application. On the one hand, the eigenvalue is only one of the cryptographic complexity measures; on the other hand, the eigenvalue value is influenced by the control parameter of chaotic maps. For example, the eigenvalue of the Sine chaotic sequence will be lower than the eigenvalue of the Chebyshev chaotic sequence when r = 1.

Conclusions
In order to evaluate the cryptographic complexity of real number sequences, in this paper, we extended the so-called eigenvalue from the binary field to the real number field. The extended eigenvalue was influenced by the undifferentiated distance, and we gave an exact value of this distance corresponding to the N-symbol sequences. Both uniformly distributed random sequences and logistic sequences were used as examples. The probability distribution, expectation and variance of these two kinds of real number sequences were discussed both theoretically and experimentally. With the extension of eigenvalue, we could evaluate the cryptographic complexity of real number sequences, which has a great advantage for cryptographic usage, especially for chaos-based cryptography. Furthermore, four kinds of chaotic sequences were evaluated by this extended complexity measure, which indicates that our study is effective and of great interest. Thus, it can be seen that with the extended eigenvalue, we can evaluate the cryptographic complexity of real number sequences effectively. However, it should be noted that this result does not imply that the Sine map is better than other chaotic maps in cryptographic application. On the one hand, the eigenvalue is only one of the cryptographic complexity measures; on the other hand, the eigenvalue value is influenced by the control parameter of chaotic maps. For example, the eigenvalue of the Sine chaotic sequence will be lower than the eigenvalue of the Chebyshev chaotic sequence when r = 1.

Conclusions
In order to evaluate the cryptographic complexity of real number sequences, in this paper, we extended the so-called eigenvalue from the binary field to the real number field. The extended eigenvalue was influenced by the undifferentiated distance, and we gave an exact value of this distance corresponding to the N-symbol sequences. Both uniformly distributed random sequences and logistic sequences were used as examples. The probability distribution, expectation and variance of these two kinds of real number sequences were discussed both theoretically and experimentally. With the extension of eigenvalue, we could evaluate the cryptographic complexity of real number sequences, which has a great advantage for cryptographic usage, especially for chaos-based cryptography. Furthermore, four kinds of chaotic sequences were evaluated by this extended complexity measure, which indicates that our study is effective and of great interest.