A Novel Method to Identify Initial Values of Chaotic Maps in Cybersecurity

Chaos theory has applications in several disciplines and is focusing on the behavior of dynamical systems that are highly sensitive to initial conditions. Chaotic dynamics are the impromptu behavior displayed by some nonlinear dynamical frameworks and have been used as a source of diffusion in cybersecurity for more than two decades. With the addition of chaos, the overall strength of communication security systems can be increased, as seen in recent proposals. However, there is a major drawback of using chaos in communication security systems. Chaotic communication security systems rely on private keys, which are the initial values and parameters of chaotic systems. This paper shows that these chaotic communication security systems can be broken by identifying those initial values through the statistical analysis of standard deviation and variance. The proposed analyses are done on the chaotic sequences of Lorenz chaotic system and Logistic chaotic map and show that the initial values and parameters, which serve as security keys, can be retrieved and broken in short computer times. Furthermore, the proposed model of identifying the initial values can also be applied on other chaotic maps as well.


Introduction
In physics, jerk is the third derivative of position, with respect to time.It has been shown that a jerk equation, which is equivalent to a system of three first-order, ordinary, non-linear differential equations, is in a certain sense the minimal setting for solutions showing chaotic behavior.Therefore, chaos [1][2][3][4] has many applications in physics.With the exponential increase in information technology devices, the need for reliable authentication solutions has also been increased [5,6].The traditional passive authentication systems that require the indirect involvement of users proves to be ineffective, e.g., loss or wrongful acquisition of username and password by an unapproved client results in loss of confidential data [7].In contrast, a biometric authentication system, which is based on physical attributes of users, can be much more effective and secure [8].The physical attributes can be fingerprint, face recognition, palm veins, palm print, DNA, hand geometry, retina, iris recognition and odor/scent [9].Biometrics cannot easily be stolen, lost, duplicated, hacked, or shared; additionally, they are impervious to social designing assaults since clients are obliged to be available to utilize a biometric component.In recent years, there has been a rapid increase seen in publication of new biometrics and security algorithms that use the application of nonlinear dynamics, such as chaos theory [10][11][12][13][14][15][16][17][18][19].
Chaos [20][21][22][23] is a kind of science that deals with parts of the world that are unpredictable, apparently random, not necessarily random, disorderly auratic and irregular misbehaved.A class of models is identified [24] that can represent the two-phase microfluidic flow in different experimental conditions.
The identification procedure adopted is based on the nonlinear systems synchronization theory.Formally, chaos theory was introduced by a meteorologist, Edward N. Lorenz [25], who examined the weather system and found it to be a chaotic system.He also coined the term "The Butterfly Effect", for chaos theory, in analogy to the term being sensitively dependant on the initial conditions.The sensitive dependence of weather on the initial conditions in physical interpretation means that a small puff of wind can cause a storm after few months.In other words, a hurricane's formation is contingent on whether a distant butterfly had flapped its wings several weeks before.This effect is the main reason of application of chaotic theory in biometrics security, along with the other applications in the fields of natural sciences, engineering, stock exchange and so on [26,27].In [28], an experimental robust synchronization of hyperchaotic circuits is proposed.Based on the concept of the master stability function, the two circuits are coupled through a unique scalar signal.Experimental results obtained from two hyperchaotic circuits are presented to show that synchronization occurs widely in the range of electronic component tolerances.
Chaotic cryptosystems [29][30][31][32] were first proposed in the early 1990s and gained popularity instantly.It has been seen by numerous specialists [33] that there exists a nearby relationship in the middle of chaos and cryptography; numerous properties of chaotic frameworks have their relatives in conventional cryptosystems.Chaotic systems have a few convincing gimmicks ideal to secure correspondences, such as sensitivity to initial condition, ergodicity, control parameters and irregular-like conduct, which can be associated with some traditional cryptographic properties of great ciphers, for example, confusion and diffusion proposed by Shanon [34].On the other hand, various publications demonstrate a few blemishes, particularly the early proposed analog chaotic security approaches, which can be effectively softened in short computer times [35].Likewise, the execution investigation and security issues were not considered in proposing these strategies, which resulted in being frail against differential assaults.This manuscript highlights an another flaw in chaotic biometric system and attempts to predict the chaotic sequences of Lorenz and Logistic chaotic maps based on their long-term sequence, resulting in pointing out a major drawback in chaotic secure systems.
The rest of the paper is organized as follows: Section 2 presents the preliminaries, related work and explanation of different chaotic maps.Section 3 presents the auto-correlation functions.Section 4 gives the detail simulation of standard deviation and variance analysis for two chaotic maps.Section 5 concludes the paper.

Preliminaries and Related Work
In symmetric cryptography, the algorithm is public and only the key is private.The communication between the transmitter and the receiver along with the transmittance of cryptographic algorithm is done on an insecure channel.The key, which is more crucial, is sent on a secure and expensive channel, as shown in Figure 1.It is assumed that the eavesdropper has the knowledge of cryptographic algorithm, as well as access to the ciphertext.The eavesdropper can launch different differential attacks based on the knowledge of these data to get the true original plaintext, which is usually a difficult task.However, if the eavesdropper can get access to the key, then the retrieval of true original plaintext is a simple and straightforward task.

Related Work
In this paper, the concentration is on the retrieval of keys involved in chaotic cryptography, which are usually the initial values and parameters of the chaotic maps being used.Before the retrieval of keys, it is essential to know which chaotic system is used in cryptographic algorithm.In [35], it is stated that, if someone has access to the long-term sequence of the chaotic map, he/she can judge the map by analyzing the auto-correlation of that sequence as the auto-correlation for each and every chaotic map's sequence is different.
Behnia et al. [10] proposed a symmetric chaotic cryptosystem based on coupled maps.The keys used in the cryptosystem are the control parameters of those maps.Liu [11] presented a color image encryption scheme based on one-time keys and robust chaotic systems, in which the keys are given by the piecewise linear chaotic map (PWLCM).Hussain [36] worked extensively on image encryption schemes based on nonlinear dynamical systems.The different techniques presented for cryptography used the keys based on initial values and parameters of these nonlinear dynamical systems.Jamal et al. [37] proposed a watermarking scheme for the copyrights of digital images based on different sequences of logistic maps; the keys used are the initial values and parameters involved.One of the approaches to construct Substitution boxes (S-boxes), which are the only nonlinear component in the Advanced Encryption Standard (AES) [38], is based on the chaotic sequences.The sequences are generated through the initial values and parameters of different chaotic maps, which serve as keys.Khan et al. [39] presented a similar technique for the construction of S-boxes based on Lorenz chaotic system.
There are numerous other chaotic security systems presented in the literature, in which security keys are the initial values and parameters of chaotic maps; a detail overview of chaotic cryptography can be found in [40].The present manuscript attempts to shorten the keyspace of these presented schemes or tries to break the keys as much as possible via the statistical analysis of standard deviation and variance.

Lorenz System
The Lorenz attractor was first derived from a simple model of convection in the Earth's atmosphere, given as [25]: The above set of equations is a dynamical nonlinear system with two nonlinearities, xz and xy.The inputs a, b and r are constant physical characteristics of air flow, x represents the amplitude of convective currents in the air cell, y corresponds to the temperature difference between rising and falling currents and z to the deviation of the temperature from the normal temperature in the cell.
For the numerical solution, the system is first transformed into iterative form and the numerical solution is then computed.The numerical solution of Lorenz system shows that, for 0 < r < 1, the overall system will be stable with the steady response; for 1 < r ≤ 24, the system will also be stable with the periodic response; and, for r > 24, the system yields chaotic response.The sequences used in the proposed technique are from this chaotic region.Figure 2 shows the numerical chaotic solution of Lorenz system for x − y, x − z and x − y − z structures.Furthermore, Figure 3a shows the individual x-sequence against number of iterations with initial conditions of (x 0 , y 0 , z 0 ) → (−8, 8, 27).To illustrate the sensitive dependence of initial conditions, Figure 3b shows the plot for x-sequence with initial conditions of (x 0 , y 0 , z 0 ) → (−8.00000000, 8, 27) as well as the plot for x-sequence with initial conditions of (x 0 , y 0 , z 0 ) → (−8.00000001, 8, 27).It can be seen that the two sequences completely differ apart after 1300 iterations despite of the difference in one of the initial conditions by a margin of 0.00000001.This is one of the prime reasons of applications of chaotic maps in cybersecurity.

Logistic Map
The logistic map is a model of population growth first proposed in [41].It is derived from the continuous form of differential equation defined as [41]: where r is the Malthusian parameter (rate of greatest populace development) and k is the carrying limit (i.e., the most extreme maintainable populace).Dividing both sides by k and characterizing x = m/k then gives the differential mathematical statement: The discrete version in the form of difference equation is described as where the initial parameters are r ∈ (0, 4) The individual x-sequence against number of iterations with initial conditions of (x 0 , y 0 , z 0 ) → (−8, 8, 27).(b) Sensitive dependence of initial conditions: the plot for x-sequence with initial conditions of (x 0 , y 0 , z 0 ) → (−8.00000000, 8, 27) as well as the plot for x-sequence with initial conditions of (x 0 , y 0 , z 0 ) → (−8.00000001, 8, 27).It can be seen that the two sequences completely differ apart after 1300 iterations, despite of the difference in one of the initial conditions by a margin of 0.00000001.
The parameter r, as mentioned above, is the rate of populace development, or, in physical term, characterizes the rate of warming in a convection equation or may be speed of liquid in a mechanical pivoting circle of convection.The normal for logistic equation is vigorously subordinate upon parameter r.May [42][43][44] analyzed at length the behavior of logistic equation based upon r.After plotting the execution of logistic iterative parameter x n as a function of r, it was perceived that, when r is low, the map settles on a consistent state after a few cycles.At the point when r is high, the stable state breaks into bifurcation, into a two-state occasional structure; this bifurcation is further isolated into a four-state intermittent structure and after that into eight.In the included estimation of r, the map sequence goes into an unpredictable behavior region, the chaotic region.
The logistic map is used vigorously in cryptography.The long-term random (pure somehow) sequences along with sensitiveness of initial conditions are the reason of application in the subject of cryptography along with steganography and watermarking.Figure 4a shows the individual x-sequence against number of iterations with initial conditions of (x 0 , r) → (0.5, 3.7).To illustrate the sensitive dependence of initial conditions, Figure 4b shows the plot for x-sequence with initial conditions of (x 0 , r) → (0.5, 3.700000000) as well as the plot for x-sequence with initial conditions of (x 0 , r) → (0.5, 3.700000001).It can be seen that the two sequences completely differ apart after 75 iterations despite of the difference in one of the initial conditions by a margin of 0.000000001.As the case with Lorenz system, this property of sensitivity is one of the prime reasons of applications of chaotic maps in cybersecurity.The individual x-sequence against number of iterations with initial conditions of (x 0 , r) → (0.5, 3.7).(b) Sensitive dependence of initial conditions; shows the plot for x-sequence with initial conditions of (x 0 , r) → (0.5, 3.700000000) as well as the plot for x-sequence with initial conditions of (x 0 , r) → (0.5, 3.700000001).It can be seen that the two sequences completely differ apart after 75 iterations despite the difference in one of the initial conditions by a margin of 0.000000001.
For the proposed work, it is assumed that we have access to the long-term sequence of the chaotic map being employed in cybersecurity.From that long sequence, the second step is to identify the type of chaotic map using auto-correlation function described in the previous section and then lastly identify the initial values using different statistical analysis.The proposed framework is shown in Figure 5.

Auto-Correlation of Chaotic Maps
To identify the initial conditions of the chaotic maps, the first step is to recognized the chaotic map used in the security communication system.To do so, we first performed the auto-correlation analysis to show that the auto-correlation function of each and every chaotic map is unique and different from the rest, demonstrating that, if someone has the access to the long-term sequence of any chaotic map, he/she can tell from which chaotic map it belongs.
Auto-correlation is the mathematical function used to find the cross similarity of a signal with itself.Informally, it is the closeness between perceptions as a capacity of the time lag between the signals.The auto-correlation function for the sequence of chaotic map is defined as [45]: where x represents the chaotic sequence, N is the total number of iterations, * denotes the complex conjugate and l is the lag such that l ∈ L = [0, 1, 2, ..., N − 1].If the number of iterations, N, can be realized as wide-sense stationary random process, auto-correlation can be stated as an estimate of theoretical γ x (l), given as [45]: where E{.} is the mean operator.The unity at zero-lag normalization divides each sequence value by the auto-correlation or auto-correlation estimate at zero lag, such that [45]: The biased estimate of the theoretical auto-correlation defined in wide-sense stationary random process is stated as [45]: The above defined mathematical structures are applied on long-term sequences of different chaotic maps.The analysis are done based on 1000 iterations of each map with their respective initial values and parameters resulted in 2000 values of correlation.It is shown that, by having the long-term sequence of any chaotic map, the identity of that map can be observed by applying auto-correlation functions, as shown in Figures 6 and 7.It can be seen that the graphs for every chaotic map is different from the other, thus, in the cryptanalysis of chaotic security algorithms, the identity of chaotic map can be revealed as depicted.The identification of chaotic map from the auto-correlation graph can be done by the visual structure and/or numerical values of the graph.For instance, the auto-correlation graph of Logistic map shown in Figure 6e is visually different to the auto-correlation graphs of other chaotic maps and thus can easily be identified.On the contrary, the auto-correlation graphs of Henon x-sequence and Henon y-sequence are visually the same as each other (Figure 6a,b, respectively).However, the numerical values of these two auto-correlation graphs are different from each other and thus can also be differentiated.
It is to be noted that the auto-correlation graphs remain same for a same chaotic map despite the usage of different initial values and parameters.After getting the knowledge of specific chaotic map, the next task is to obtain the initial values and parameters which served as keys.This is achieved by applying statistical analysis of standard deviation and variance, as explained in the next section.

Identification of Initial Values
In this paper, it is attempted to analyze the randomness of long-term logistic sequences from a different perspective by performing statistical analysis.We have taken two chaotic maps for the sake of demonstration only and the proposed set of analysis can also be applied to other maps.The details of proposed algorithm in form of block diagram is shown in Figure 8. From the chaotic cryptosystem, the long-term chaotic sequence is obtained.As mentioned above, we have access to the long-term chaotic sequence.Although it would be a much harder task to have access to the long-term sequence of the chaotic map being used, this is out of the scope of the presented work.The auto-correlation of chaotic sequence is calculated and plotted in a graph.Then, this graph is matched with the database of auto-correlation graphs of all chaotic maps.If there is no matched, the algorithm stops.However, in the case of matching one of the maps, the different statistical analysis are applied to identify the initial values and parameters of that specific chaotic map.The simulations and analyses were done using MATLAB R2017a software on Windows 10 platform having Intel Core i5-6200U CPU, 2.30 GHz with RAM of 8.00 GB.It is worth mentioning here that there is a so-called phenomenon of "dynamical degradation of chaotic properties" when chaotic systems are implemented or simulated on a finite precision machine, for instance a digital computer, which limits the precision and accuracy of the proposed model of the identification of initial values and parameters of chaotic maps.However, we simulated our results considering at least 15 decimal values for the initial parameters (e.g., in case of Logistic map, we considered the initial value of r as 3.854632547852415), which are more than enough regarding the application of cryptography, as the initial values used in those systems have at most of 15 decimal values for the initial parameters.Moreover, the goal of this work was not to identify the "exact" initial values used in cryptography, but to significantly reduce the keyspace to an extent where a brute force attack can be practically possible.

Initial Values for Logistic Map
As mentioned in Equation ( 4), the Logistic map sequence depends upon the initial values of two parameters, i.e., r and x 0 and our goal is to identify these initial values of Logistic chaotic map.However, in Figure 5, if we do have the access to the long-term chaotic sequence, e.g., 1000 values of sequence, i.e., x 1 , x 2 , ..., x 1000 , it is a simple and straightforward task to get x 0 using Equation ( 4) with the help of x 1 .Therefore, the concentration is only on the identification of the value of r employed.
To do so, first, the mean, median and mode analysis are done on the long-term sequences of logistic map; it was observed that the mean is almost same despite different values of parameter r, as were the median and mode analyses.To illustrate this effect, Figure 9 shows the median values of logistic map for different initial values of x against r with the range of 3.6-4.0.Although the median values fluctuate in the range of 0.45-0.75against r , there is no linear trend (increasing or decreasing), making it a difficult task to assign distinct median values to distinct values of r. Figure 10 shows the mode values of logistic map for different initial values of x against r with the range of 3.6-4.0.The condition is even worse in this case, as, compared to median analysis, not only is there no linear trend but the mode values are also not constant for different initial values of x.The variance along with the standard deviation are the only two analyses that were found to give somehow different and partially unique values against different values of parameter r.
The term wariance was first coined by Fisher [46].The variance of group of numeric data tells how far the numeric data are spread out.Mathematically, it is defined as: where x i is the value of logistic map iteration at ith index, µ is the mean and n is the total number of numeric data or iterations in this case.

Setting Number of Iterations n
The variance analysis was applied on the chaotic sequences of logistic map.The simulation results show that the variance of large data of chaotic sequence for a fixed value of parameter r is the same despite different initial values of x.This effect is demonstrated in Figure 11a, where the plot of variance values of logistic sequence of 5000 iterations for a 3.7 value of r against different initial values of x is shown.It can be seen that the variance remains almost the same for each and every initial value of x.
However, if the number of iterations decreases, then there will be variation between the variance values for different initial values of x. Figure 11b shows the plot of variance values of logistic sequences of 50 iterations for a 3.7 value of r against different initial values of x.It can be seen that there is a variation between the variance values.It was observed through different simulation results that at least 1200 iterations would be needed to illustrate an almost constant value of variance.

Variance against Parameter r
After setting a fixed number of iterations on which the values of variance for different initial values of x is same, variance for different values of parameter r is calculated as: The variance against r was calculated, as shown in Figure 12.The variance values of logistic sequence of 1200 iterations for a 0.5 initial value of x against different values of parameter r are plotted.It can be seen that the variance for different values of r are different and partially unique, as depicted below: Based on the above equation, we can categorized the variance values of logistic sequence against the parameter r, as shown in Table 1.
In Table 1, if the value of parameter r lies within 3.60-3.73,the variance of generated logistic sequence of 1200 iterations from this range of r with any initial value of x will lie within 0.04-0.05.Thus, if someone can extract 1200 iterations, he/she can easily access the value of r by just examining the variance values.In cryptography, as stated above, the cryptographic algorithm is public and only the keys are private and, in the cryptographic algorithm involving logistic map, the keys are the initial value of x and value of parameter r.The initial value of x does not matter that much, as it changes on each iteration.By assuming that, if someone got the access of 1200 iterations, the current iteration value of x is itself a key, the only key involved is then the value of parameter r.In Table 1, it can be seen that, by examining values of variance, one can know the range of r, which is used as a key.Although it might not tell exactly the value of r used (i.e., the used value might be r = 3.645321908), but it can certainly reduce the key space by a large margin and then, simply by the brute force attack, one can easily break the cryptographic algorithm.
In Table 1, there are four major regions of variance.The first two regions are unique.If the value of variance lies within in either of these two regions, the corresponding range of r will be unique and can be determined.However, the last two regions overlap with each other.Both regions have almost the same variance range of 0.085-0.110.If the variance of 1200 iterations lies within this range, then it will be difficult to tell to which corresponding value of r they belong.
The third region is a small one with the range of r 3.825-3.860.It is an interesting observation that the logistic sequence for the parameter r = 3.825-3.860behaves periodically, while the other region does not.Figure 13 shows the bifurcation diagram of logistic map for r = 3.825-3.860.In addition, the periodicity of logistic map for this range is shown in Figure 14a-c for r = 3.83, 3.84 and 3.85, respectively.Thus, if the variance lies within either Region 3 or Region 4, the decision will be based on the periodicity of that region.If there is periodicity in the sequence, then it belongs to Region 3, otherwise to Region 4.

Initial Values of Lorenz System
The term standard deviation was first coined by Fisher [46].The standard deviation of group of numeric data tells how far the numeric data are spread out.Mathematically, it is defined as: where x i is the value of Lorenz system iteration at ith index, µ is the mean and n is the total number of numeric data or iterations in this case.The above defined analyses were applied on x, y and z sequences of Lorenz system for different values of r.As stated above, for r > 24, the Lorenz system shows chaotic behavior; cryptographers use the value of r in this chaotic region of r > 24 as a key along with the initial values of x, y and z.Assuming that, if someone has access to 1000 or more iterations of sequences, the current value of iteration is itself a key regarding the initial values of x, y and z, thus the only key that matters is the value of r being used.It was observed that the standard deviation for the 1000 iterations of any sequence of Lorenz system is different for different values of r, as shown in Table 2. Figures 15-17 show the standard deviation of Lorenz system for x sequence, y sequence and z sequence, respectively.The standard deviation is the same for a single sequence despite different initial conditions.The standard deviation is computed from r = 24 to r = 64 with the interval of 0.5.Mathematically, the standard deviation of sequences of Lorenz system from r = 24 to r = 64 can be expressed as: In addition, the value of standard deviation for all sequences of Lorenz system can be approximately written as the function of parameter r, such that where α is a constant.Thus, by looking at the standard deviation values and having the long-term sequence of Lorenz system, one can easily observe the value of r being used, indirectly shortening the key space or even breaking it in shorter computer time.

Conclusions
Based on the results and analysis done, it is concluded that there is a major loophole in chaotic communication symmetric security systems.Taking into account the security algorithm is public, only the key is private and the keys involved in chaotic cryptosystems are the initial values and parameters of chaotic maps.The keyspace can be shortened by applying the proposed method of statistical analysis or can even break in short computer times.Security engineers and mathematical cryptographers do take this effect into consideration when proposing new schemes based on chaotic maps.
Similar to all new proposals, we strongly encourage the analysis of our framework before its immediate deployment for cryptanalysis.The proposed work is a general framework intended for the identification of initial values and parameters.Although the proposed analyses are performed only on Lorenz system and Logistic map, it can be applied on other chaotic maps as well to demonstrate the same effects.Moreover, future research can be conducted to obtain and assess the long-term chaotic sequence from cryptosystems.

Figure 1 .
Figure 1.The illustration of the application considered in this work; basic communication model of symmetric cryptography.

Figure 4 .
Figure 4. (a)The individual x-sequence against number of iterations with initial conditions of (x 0 , r) → (0.5, 3.7).(b) Sensitive dependence of initial conditions; shows the plot for x-sequence with initial conditions of (x 0 , r) → (0.5, 3.700000000) as well as the plot for x-sequence with initial conditions of (x 0 , r) → (0.5, 3.700000001).It can be seen that the two sequences completely differ apart after 75 iterations despite the difference in one of the initial conditions by a margin of 0.000000001.

Figure 5 .
Figure 5. Proposed framework for identifying the initial values of chaotic maps using three steps: accessing the long term chaotic sequence, identification of the type of chaotic map using auto-correlation graphs and identification of the initial values using different statistical analysis.

Figure 6 .Figure 7 .
Figure 6.Auto-correlation graphs of different chaotic maps with their respective initial conditions and parameters taken on 1000 iterations: (a) Henon chaotic map x-sequence; (b) Henon chaotic map y-sequence; (c) Ikeda chaotic map q-sequence; (d) Ikeda chaotic map w-sequence; (e) Logistic map; and (f) Quadratic map.

Figure 8 .
Figure 8.The top-level block diagram of proposed work.

Figure 9 .
Figure 9. (a)-(l) Median values of logistic map for different initial values of x against r with the range of 3.6-4.0.Although the median values fluctuate in the range of 0.45-0.75against r, there is no linear trend (increasing or decreasing), making it a difficult task to assign distinct median values to distinct values of r.

Figure 10 .
Figure 10.(a)-(l) Mode values of logistic map for different initial values of x against r with the range of 3.6-4.0.The condition is even worse in this case as compared to median analysis; not only is there no linear trend but the mode values are also not constant for different initial values of x.

Figure 11 .
Figure 11.(a) Plot of variance values of 5000 logistic sequence iterations for 3.7 value of r against different initial values of x, where the variance remains almost the same for each and every initial value of x.(b) The plot of variance values of logistic sequences of 50 iterations for a 3.7 value of r against different initial values of x.It can be seen that there is a variation between the variance values in this case.

050 3 .Figure 12 .
Figure 12. (a)-(l) Plot of variance values of 1200 logistic sequence iterations for 0.5 value of x against different values of parameter r.The variance for different values of r are different and partially unique.

Table 1 .
Variance range against parameter r range.