A Generic Model of the Pseudo-Random Generator Based on Permutations Suitable for Security Solutions in Computationally-Constrained Environments

Symmetric cryptography methods have an important role in security solutions design in data protection. In that context, symmetric cryptography algorithms and pseudo-random generators connected with them have strong influence on designed security solutions. In the computationally constrained environment, security efficiency is also important. In this paper we proposed the design of a new efficient pseudo-random generator parameterized by two pseudo-random sequences. By the probabilistic, information-theoretic and number theory methods we analyze characteristics of the generator. Analysis produced several results. We derived sufficient conditions, regarding parameterizing sequences, so that the output sequence has uniform distribution. Sufficient conditions under which there is no correlation between parameterizing sequences and output sequence are also derived. Moreover, it is shown that mutual information between the output sequence and parameterizing sequences tends to zero when the generated output sequence length tends to infinity. Regarding periodicity, it is shown that, with appropriately selected parameterizing sequences, the period of the generated sequence is significantly longer than the periods of the parameterizing sequences. All this characteristics are desirable regarding security applications. The efficiency of the proposed construction can be achieved by selection parameterizing sequences from the set of efficient pseudo-random number generators, for example, multiple linear feedback shift registers.


Introduction
The expansion of communication and network technologies, as well as technological advances in the design and implementation of microprocessor devices, have led to the ability to informational connecting different devices and creation of intelligent systems capable of monitoring and managing complex processes. Communication devices utilize the Internet infrastructure and protocols to create a world of connected devices, like Wireless Sensor Networks (WSN) and Internet of Things (IoT). This technological advancement enables the progress of many technological and life processes bringing to us smart cities, autonomous vehicles, robotization and intelligent robot behavior [1][2][3][4]. In that context, information security has a very important role in compromising the integrity and privacy of data in such an integrated world can cause serious damage, even to the level of a general disaster [5][6][7][8][9]. Therefore, in addition to security mechanisms incorporated into Internet protocols, additional security mechanisms incorporated into devices and systems are used to prevent unintended behavior. Moreover, a huge number of that type of devices (sensors, cameras, surveillance systems) need to work in real time fashion so that the defined security mechanisms do not disrupt system behavior. They must be designed in such a way that it is easy to implement them both in hardware and software and their application should not disrupt system behavior i.e., they must be efficient [10,11].
There are various IoT applications, connected with different types of sensors, that have become an integral part of our lives, and most of them can be classified in common areas such as smart healthcare services, smart home, intelligent transportation, smart grid, etc. However, as a consequence of mass deployment, many IoT challenges have arisen, such as limited processing capability and memory resources, large amount of data to transmit, different operating characteristics of hardware, and heterogeneous data and networks types [12][13][14][15][16]. Moreover, personal privacy, data confidentiality and integrity are also a great challenge of IoT that must be overcome, particularly for devices with limited resources and heterogeneous technologies [13,14,[17][18][19]. Cryptography can be used to protect confidentiality (or secrecy) of data and communication. It can also be used to ensure the integrity (or accuracy) of information as well as for authentication (and non-repudiation) services [20]. An important point in the IoT world is that most IoT solutions have a "closed design", so it is often very difficult or even impossible to incorporate additional security mechanisms after the production process is completed. On the other hand, as a consequence of the limited software and hardware resources of IoT devices, the suite of cryptographic algorithms that can be implemented is narrowing, so the right measure must be found between the desired level of security and implementation capabilities, which makes the security issue even more challenging [13,16]. Different cryptographic algorithms that offer roughly the same level of security may require different power and resource consumption, so you need to choose the right one, subject to the limitations of some specific IoT application and deployed hardware [19]. Given that public-key crypto algorithms, compared to symmetric crypto algorithms, have far greater power and resource consumption due to their high processing time [21], it is a natural choice to use a symmetric algorithm in IoT security solution design. Detailed analysis and comparison of symmetric block-type algorithms such as AES, RC6, Twofish, SPECK128, LEA, and ChaCha20-Poly1305 algorithms in IoT devices are given in [16]. On the other hand, stream or sequential symmetric key ciphers are typically faster than block-type. Block ciphers, in general, require more memory resources to encrypt/decrypt larger chunks (block) of data, while sequential ciphers usually take only one or a few bits at a time, they have relatively low memory requirements and therefore are suitable to implement in limited scenarios. Stream cryptography algorithms, as a subgroup of symmetric cryptography algorithms, are among the most common cryptography data protection techniques. The idea comes from Shannon's one-time pad system, where instead of a random sequence of encryption bits, a series of bits obtained from a pseudo-random generator is used [20]. The sequence generated by the pseudo-random generator is used for plain-text encryption and its properties determine the security of the protected data. Therefore, the basic cryptography goal in stream cryptography systems is to design pseudo-random bit/symbol generators with good cryptographic characteristics. Many ideas have been implemented in the last fifty years, with more or less success.
One of the most popular and widely used pseudo-random sequence generator is the RC4 generator defined in 1987 by Ron Rivest. The description of the algorithm was revealed by reverse engineering of the RSA INC software [22], and the correctness of the algorithm description obtained was confirmed by Rivest himself [23,24].
The RC4 algorithm owes its popularity to its simplicity and ease of implementation in both software and hardware. The high popularity and applicability has attracted the attention of the cryptanalytic community. The results of a deep and thorough analysis of this algorithm led to the detection of a number of weaknesses of the algorithm. A comprehensive review of the weaknesses identified is given in [25] where the empirically detected weaknesses are theoretically proved as well as the original results of the authors of the article. The compromitation of this algorithm was additionally contributed by the implementation methods in security protocols, so that its use in security protocols has not been recommended since 2015 [26].
On the other hand, the beauty and elegance of the idea itself suggest the possibility of its exploitation.
Our work is aimed to define low complexity and efficient generic model of the pseudo-random generator that does not suffer from the weaknesses immanent to the RC4 and which is suitable for the implementation of the security solution in the computational constrained microprocessor environments, e.g., WSN and IoT. This paper defines a pseudo-random generator that can, in some way, be considered as a generalization of ideas related to RC4 because it uses the time varying permutations, sequences for permutation changing and addressing output element from the current generator state. In order to prove plausible cryptographic properties of the proposed pseudo-random generator different mathematical techniques are used to analyze probability distribution of the output sequence, correlation properties, information leakage between the state of the generator and output sequence and periodicity.
The paper is organized in four parts. After the first part containing motivation for this work and introduction in the second part we introduce necessary notation, describe proposed pseudo-random generator and his relationship to RC4 in brief. The third part contains the analysis of the generator, some comments and remarks. The fourth part contains a summary of the paper's results.

Notation and Generating Algorithm Description
Let I k = {0, 1, . . . , k − 1}. Then with P k we will denote a set of all bijections from I k to I k , and as usual, its elements will be named permutations. Set P k is a totally ordered set by the, so called, lexicographic order and have exactly k! elements, where ! denotes factorial operation. Elements of the set P k we will denote by Π 1 , Π 2 , . . . , Π k! .
By P {X} we will denote the probability of set X.
Then we will define a pseudo-random sequence {Z n } ∞ n=1 with equations From (1) generating algorithm for the sequence {Z n } ∞ n=1 is obvious. As a first step we will construct the sequence of permutations {g n } ∞ n=0 , g n ∈ P k using sequence {C n } ∞ n=1 and element Z n of the sequence {Z n } ∞ n=1 is computed as a value of the function g n at the point A n . Graphical presentation of the sequence generating process is given on the Figure 1. Defined generator algorithm apply time-varying permutations as well as the RC4 but in RC4 fixed set of permutations, set of transpositions, is used. Graphical presentation of the RC4 algorithm is given on the Figure 2. where the summatios and numbers with denote reduction modulo 256. In the defined generator case any set S which is generator of P k can be used. Sequences that are used in RC4 applied transposition determination and address of permutation table position corresponds to sequences {C n } ∞ n=1 and {A n } ∞ n=1 in our algorithm respectively. While the mentioned sequences from RC4 are precisely defined, in our case sequences {C n } ∞ n=1 and {A n } ∞ n=1 are arbitrarily chosen and the generator is parameterized by those two sequences. Later in the paper we define sufficient conditions for sequences {C n } ∞ n=1 and {A n } ∞ n=1 to achieve good pseudo-random and security properties.

Analysis of the Generator
For every pseudo-random generator it is necessary to analyze its properties regarding the possibilities of output sequence prediction or reconstruction of the generator initial state. In that sense desirable properties are uniform distribution of the output sequence, nonexistence of the correlation between the output sequence and elements of the generator, nonexistence of the output sequence auto-correlation and long period of the output sequence. These features are especially important for generators used in security solutions and lack any of them usually have serious consequences on the security of the system. Different examples can be found in [20,27].

Distribution of the Generated Sequence
Intuitively one can expect that {Z n } ∞ n=1 has a uniform distribution but relatively weak constraints demanded for the sequences {A n } ∞ n=1 and {C n } ∞ n=1 require formal proof for the expectations. By the next theorem we will show that {Z n } ∞ n=1 has an asymptotically uniform distribution.
then pseudo-random sequence {Z n } ∞ n=1 has an asymptotically uniform distribution i.e., The proof of the Theorem 1 will be derived in two steps. First, using Markov chains theory, we will show that sequence {g n } ∞ n=0 has asymptotically uniform distribution and after that, in the second step, using that result we will show the statement of the theorem.
Proof. First we will analyze the sequence of functions {g n } ∞ n=0 . It is obvious that g n ∈ P k , as a consequence of (P k , •) being group. Next we will show that To prove (2) we will observe the sequence {g n } ∞ n=0 as a stationary Markov chain over the set of states P k . Indeed, according to the definition of {g n } ∞ n=0 , transition from the g n to g n+1 doesn't depend on the history of g n , but only on the current state g n and the value of C n .
Denote by G n = [P {g n = Π i }] 1×k! a row matrix whose elements are the probabilities that after n steps the chain is in the state Π i . Let t i j be the probability that the chain changes state from Π i to Π j in one step and T = t i j k!×k! be one step transition matrix for the Markov chain. Denote by T n = t n i j k!×k! n-step probability transition matrix of that system starting at the state Π i changes to state Π j after exactly n steps. It is well known, (see [28,29]), that T n = T n and that, When the limit values in (3) exists. To show that lim n→∞ T n exists it is sufficient to show that such n 0 ∈ N exists for which t n 0 i j > 0 for all i, j ∈ {1, 2, ..., k!} (see [28,29]). Let us define numbers n i j as Due to the properties of the set S it is clear that n i j > 0. Let n 0 be max i,j∈{1,2,...,k!} n i j and show that t n 0 i j is greater than zero. Because (P k , •) is group then the equation x • Π i = Π j has exactly one solution, i j > 0 it is sufficient to show that at least one summand in (5) is greater then zero, see [28,29]. Because n i j > 0 we can find a set of indices i 1 , i 2 , . . . , i n i j , n ij ≤ n 0 such Because the index of identical permutation is 1, then the summand which corresponds to the set of indices is evidently greater then zero and we showed that lim n→∞ T n exists. Because convergence is component wise, lim n→∞ t n i j = t * j exists. Limit value lim n→∞ T n can be determined as a solution of the system of a equations known as It is easy to check, by substitution, that the t * (6).
From now on the proof is straightforward. Let l ∈ I k be arbitrary, then Because {Π i (j) = l |} and { A n = j} are independent random variables it follows from (7) that Finding limit of the both sides of the (8) it follows which proves the theorem.

Remark 1.
Asymptotically uniform distribution of the sequence Z n , n = 1, 2, ... . . . , as we have shown, is a consequence of the asymptotically uniform distribution of {g n } ∞ n=0 .
it is easy to verify that {g n } ∞ n=0 has a uniform distribution and as a consequence Z n , n = 1, 2, . . . has a uniform distribution too.

Theorem 2.
If the random variables A n , n = 1, 2, ... . . . are uniformly distributed then for all z ∈ I k P {Z n = z} = 1 k Proof. By the generator definition we have because A n is uniformly distributed from (10) it follows that By these theorems we showed that generator has at least asymptotically uniform distribution of the values in the generator output sequence. This is important security feature because it indicates impossibility of the prediction of the generator output sequence based on the probability distribution of the output sequence values.

Correlation Properties
Theorem 3. If the random variables A n , n = 1, 2, ... . . . are uniformly distributed then for all a, b ∈ I k , 1. By the generator definition we have Now, using notation from the Theorem 1 P g n+k = Π i | g n = Π j is equal to t k i,j and putting it in (11) it follows that a ) and using that A n+k , A n are independent random variables from (12) it follows that Now, using that A n+k and A n are uniformly distributed independent random variables it follows from (13) that , and statement is proved.
2. Using statement of the Theorem 2 that P {Z n = a} = 1 k by the definition of conditional probability it follows that which proves the statement.

Information Leakage
Information leakage means existence of the correlation between generator output sequence and elements of its inner state. Such type correlation may be base for the process of reconstruction of the some generator state during his generation history. Knowledge of the one state during the generator work and knowledge of the generator algorithm allows prediction of the future elements of the output sequence which is undesirable in security applications. In this part, correlation with the state element sequences {A n } ∞ n=1 and {C n } ∞ n=1 with {Z n } ∞ n=1 is considered. where z ∈ I k and c ∈ I m Proof.

By the definition
Because { f C n • g n−1 = Π i } and f C n • g n−1 = Π j are disjoint events for i, j ∈ I k and i = j, from (15) it follows that Now, using Bayes theorem from (16) it follows that (17) it follows that Grouping summands which depends on j from (18) it follows that Taking the limit from the both sides in (19) it follows which proves the statement.

By the definition of mutual information we have that
We will start with computing H (Z n ) .
Taking the limit from the both sides in (21) and using Theorem 1 we have lim n→∞ H (Z n ) = log 2 k.
In the same way it follows that Taking the limit from both sides in (23) and using part 1 of the Theorem 4 we obtain Using (22) and ( 24) in ( 20)  Proof.

By the definition of conditional probability it follows that
Because (((g n = Π i ∧ Π i (A n ) = z) ∧ A n = a)) and g n = Π j ∧ Π j (A n ) = z ∧ A n = a are disjoint events when i = j from (26) it follows that Using that P (A ∧ B) = P (A|B) · P (B) from (27) it follows that Because {g n = Π i } and {Π i (A n ) = z ∧ A n = a} are independent random variables from (28) it follows that Taking the limits from the both sides in (29) it follows In the same way as in the Theorem 4 it follows that lim n→∞ H (Z n ) = log 2 k By the definition of conditional entropy it follows that Taking the limit of both sides in (30) and statement of the part 1 we obtain And finally, using the (22) and (31) in the definition for the I (Z n , A n ), it follows that which proves the statement.

Periodicity
Every pseudo-random generator can be viewed as a finite automaton with output over the finite set of states and symbols. Because the automaton transition function is deterministic it follows that the output sequence must be periodic. So, {A n } ∞ n=1 , and {C n } ∞ n=1 are periodic and denote their periods by A and B respectively. It is easy to verify that {g n } ∞ n=0 , and {Z n } ∞ n=1 are periodic too and denote their periods G, Z respectively. In this part relations between A, C, G and Z are considered and some sufficient conditions under A, C, GM are defined which improves the value of Z. For that we need a few Lemmas.
To find out period of {Z n } ∞ n=1 we will first determine the period of the {g n } ∞ n=0 . Proof. First we have to prove that lC is a period, but it is straightforward.
Next step is to prove that lC is the fundamental period i.e., that every other period is divisible by lC.
Suppose contrary, that lC isn't the fundamental period i.e., that the fundamental period is d, d | lC and d < lC. From g k+λd = g k we have where I is the identical permutation. Multiplying (32) with f C k from the left we have and applying (32) we have f C k+λd = f C k , k ≥ 1.
From this we have C k+λd = C k, k ≥ 1 which means that d is a period of {C n } ∞ n=1 and that C |d i.e., d = rC, r < l. Now, look at g 1+λd and we conclude that and conclude that l|r which is in contradiction with r < l so we proved that lC is the fundamental period of {g n } ∞ n=0 . Proof. By straightforward computation we can easily check that GA is a period of the {Z n } ∞ n=1 . We need only to prove that GA is fundamental period of {Z n } ∞ n=1 . Suppose that the fundamental period isn't GA but it is d. Then d | GA and because (G, for every λ ≥ 1, λ ∈ N. We can set λ = λ 1 · G d 1 , λ 1 ≥ 1 in the equation above which transform it to and having in mind that G is a period of {g n } ∞ n=0 we obtain From the fact that g k is bijection it follows that A k+λ 1 Gd 2 = A k which means that Gd 2 is a period of {A n } ∞ n=1 and consequently that A|Gd 2 . Using that (G, A) = 1 we have that A|d 2 and with d 2 |A we conclude that A = d 2 . According to former observations we have that d must be of the form d 1 A and rewriting of (33) yields This equation we can simplify to n=1 . Now we put our attention to g k+nG+λd 1 A A k+nG+λd 1 A .
so we have that functions g k+λd 1 A , g k are equal on the set {A k+nG | n ∈ N}. Set {x | k + nG ≡ A x, n ∈ N} is equal I A , because (G, A) = 1, and from the periodicity of {A n } ∞ n=1 follows {A k+nG | n ∈ N} = I k . Functions g k+λd 1 A , g k are equal on their domain so we have that g k+λd 1 A = g k , λ ≥ 1, λ ∈ N which means that d 1 A is a period of the {g n } ∞ n=0 . In the same way as above we have that d 1 = G and we have that the fundamental period of the {Z n } ∞ n=1 is GA.
Following Corollary is a trivial consequence of the Lemma 2. This corollary shows that with suitably chosen pseudo-random sequence {A n } ∞ n=1 output sequence will have the period greater or equal to period of the {A n } ∞ n=1 . The next theorem, the main result of this paragraph, is a straightforward application of the former Lemmas. A statement of this theorem is a stronger variant of the Corollary 1 and it shows that with appropriately selected pseudo-random sequences {A n } ∞ n=1 and {B n } ∞ n=1 period of the generator output sequence is significantly longer then period sequences {A n } ∞ n=1 and {B n } ∞ n=1 .

Conclusions
Confidentiality of different sensor networks is a very serious requirement since such networks cannot fully achieve their purpose without having the necessary security. Namely, for various IoT applications, which typically have limited processing capabilities, restricted memory capacity and power constraints, one of the key challenges is to design an efficient and reliable cryptographic generator that meets the desired security requirements. In this paper, we defined a pseudo-random generator that can, in some way, be considered as a generalization of ideas related to RC4 because it uses time varying permutations and sequences for permutation changing and addressing output element from the current permutation are considered in a general fashion. In the paper, we analyze properties of the purposed generator. The proposed pseudo-random generator can be implemented efficiently in software and hardware, for example by using output of the multiple linear shift registers as input sequences {A n } ∞ n=1 , and {C n } ∞ n=1 of the generator. The security characteristics considered in the paper potentiate application of the generator in the computational constrained environments security solutions.
In the first part of the proposed generator properties analysis, the generator output sequence probability distribution is considered. Theorem 1 establishes sufficient conditions for the generator output sequence have an asymptotically uniform probability distribution. Moreover, sufficient conditions are established for the distribution of the output sequence to have the exact uniform distribution, Remark 1 and Theorem 2 . The generator output sequence uniform distribution indicates resistance of the generator to attacks based on output sequence elements prediction.
The second part of the generator analysis deala with the correlation properties of the generator output sequence in which it was shown, by Theorem 3 , that the output sequence elements are asymptotically independent and accordingly no immanent remote correlations are detected, unlike to the RC4 generator (see [25]). This property indicates resistance to autocorrelation type atacks.
The third part of the analysis relates to the possibility of information leaking about the internal state of the generator, sequence {A n } ∞ n=1 and {C n } ∞ n=1 , through the output sequence {Z n } ∞ n=1 . Theorems 4 and 5 show that the amount of information that flows through an output sequence tends to zero when the length of the sequence tends to infinity. In practical terms, this means that, by rejecting the initial segment of a generated sequence of a given length, the amount of information about the state of the generator flowing into the output sequence is arbitrarily small.
In the last part of the generator analysis, the period length of the output sequence is analyzed. It has been shown that if sequences {A n } ∞ n=1 and {C n } ∞ n=1 are chosen in a suitable manner and their periods satisfy the conditions of Theorem 6 , the output sequence has a significantly longer period than the sequences {A n } ∞ n=1 and {C n } ∞ n=1 . According to the performed analysis results proposed generator has provably good security characteristics.
Complexity and implementation considerations about this proposal are determined by the generation and complexity of the sequences {C n } ∞ n=1 and {A n } ∞ n=1 . Relatively weak constraints demanded for the probability distribution for the sequences {C n } ∞ n=1 and {A n } ∞ n=1 in the Theorem 1 allow implementation of the efficiently generated sequences, for example sequences generated by the multiple linear feedback shift registers .
The method described in this paper makes it possible to obtain a pseudo-random sequence with asymptotically uniform distribution and longer period using two pseudo-random sequences with irregular (non-uniform) probability distributions. Required initial conditions for two pseudo-random sequences are not serious limitations for this method because they describe natural requirements for the pseudo-random sequences, i.e., that values of their elements exhaust the set on which they are defined. An interesting question arising in this context is the speed of convergence in the Theorem 1, i.e., the number of steps after which we can use the sequence {Z n } ∞ n=1 as uniformly distributed. It is not possible to answer this question generally because the matrix T is defined by the chosen set S and probability distribution of the random variable C. Consequently, for each set S and random variable C it has to be analyzed separately. In practice, this does not make any restrictions on the application of the proposed generator because in every concrete case it is possible to compute number of transition steps to achieve representation of the limit values by the desired accuracy.