Measuring Avalanche Properties on RC4 Stream Cipher Variants

: In the last three decades, the RC4 has been the most cited stream cipher, due to a large amount of research carried out on its operation. In this sense, dissimilar works have been presented on its performance, security, and usability. One of the distinguishing features that stand out the most is the sheer number of RC4 variants proposed. Recently, a weakness has been reported regarding the existence of statistical dependence between the inputs and outputs of the RC4, based on the use of the strict avalanche criterion and the bit independence criterion. This work analyzes the inﬂuence of this weakness in some of its variants concerning RC4. The ﬁve best-known variants of RC4 were compared experimentally and classiﬁed into two groups according to the presence or absence of such a weakness.


Introduction
A stream cipher is a method widely used to cipher large volumes of information per unit of time. Thus, scenarios, such as mobile telephony, wireless networks, and cloud computing use this method as an encryption option [1]. Examples such as the encryption E0 in Bluetooth [2], A5 in GSM [3], and RC4 in wired equivalent privacy (WEP), Wi-Fi protected access (WPA), WAP2 [4] have been reported in the literature.
In this respect, it is impossible not to highlight the RC4 stream cipher, which has been mentioned on several occasions as the most used stream cipher in practice [4], mostly due to its inclusion in other scenarios such as Secure Sockets Layer (SSL), Transport Layer Security (TLS), Microsoft Windows, Lotus Notes, Apple Open Collaboration Environment (AOCE), and Oracle Secure SQL [4].
However, in the last decade, some applications [5,6] avoided the RC4 encryption, given some weaknesses found [7]. Although it is not considered safe [8], RC4 continues to motivate a lot of research [8][9][10], and it is cited to measure the effectiveness of methods that analyze weaknesses in stream ciphers [11][12][13] or the performance of applications that make use of cryptography [14][15][16].
Many of these investigations consist of proposals for modifications to the RC4 stream cipher, which has resulted in a considerable number of variants of this cipher being reported; without much effort, it is possible to find more than 20 variants in the literature . All of these variants seek to increase the performance, usability, or security of the encryptor.
However, not a few have included reports on the weaknesses of the RC4 that remain in these variants or other new ones reported [4].
While the studies on the RC4 stream cipher continue, research on the proposed variants have also increased, to verify the safety of these variants concerning RC4. This work analyzes the influence of one of the weaknesses reported in the literature on RC4 in the five best known (and most referenced) variants.

Motivation
In international literature, up until the first decade of the 21st century, the cryptographic analysis of stream ciphers or random number generators has become customary by applying statistical tests to their outputs to measure their degrees of randomness [41][42][43]. There are numerous statistics among those grouped in the batteries of NIST [44], Diehard [45], TestU01 [46], and Knuth [47], among others [48].
These tests have become standard, but since the first decade of this century, there has been a considerable increase in studies that show the inability of these tests in detecting other weaknesses that lead to the appearance of related keys, considerably weak keys due to the provocation of patterns in outputs, statistical dependence between inputs and outputs, statistical dependence between internal state and outputs, etc. Weaknesses that can cause attacks [49,50], which are practical or not, are considered to maintain a cipher in the suite of some computer applications.
In this way, it is essential to avoid the previous weakness and to have methods to detect it in the design and evaluation stage of the algorithm. In particular, it is necessary to have statistical tests that are capable of detecting the existence of significant statistical dependencies between the inputs and outputs of stream ciphers. In general, there are very few statistical test reports to detect the existence of statistical dependencies between the outputs and inputs of a stream cipher. Therefore, the design of statistical tests that allow for their evaluation in this sense is highly important in cryptography. In [49,50], Algorithms 1 and 2 were presented to extend the strict avalanche and bit independence criteria stream ciphers, respectively.
Output: If f satisfies the SAC 1: T = 0 2: for i = 1 → n do 3: for r = 1 → l do 4: for r = 1 → l do 4: In applying both criteria, the RC4 stream cipher was used as a test case to evaluate the effectiveness of the proposed algorithms experimentally [49,50]. As a result, it was shown that the RC4 presents a weakness of statistical dependence when large input keys are used. Although this weakness is not new, it has been reported in previous works. It is interesting to highlight the proposed evaluation using these two algorithms to determine its presence. As mentioned, there are numerous modification proposals on the RC4 stream cipher. The main motivation of this work is to evaluate whether some of the RC4 variants present this weakness. For this, both algorithms will be applied to the five most well-known RC4 variants RC4+ [27], VMPC [25], RC4A [26], NGG [23], and GGHN [24]. The results will be compared experimentally with those obtained for RC4.

SAC and BIC
Let D = {X r |1 ≤ r ≤ l} the set of n-bit inputs X r of a function (cipher) f : F n 2 → F m 2 and Y r = f (X r ) the output of m bits corresponding to the input X r for each 1 ≤ r ≤ l, where n, m ∈ N. In [51] the difference between the outputs Y r = f (X r ) and Y i r = f (X i r ) corresponding to the inputs X r and X i r is called the avalanche vector V i r = Y r ⊕ Y i r , where X i r = X r ⊕ e i , with e i , 1 ≤ i ≤ n, denoting the unit vectors e 1 = (1, 0, . . . , 0, 0, 0), . . . , e n = (0, 0, . . . , 0, 0, 1).
where v i rj ∈ F 2 , with 1 ≤ j ≤ m and 1 ≤ i ≤ n, is called an avalanche variable, represents the row r of the matrix H i defined in Table 1 in [50].
Based on its definition, the strict avalanche criterion (SAC) [51] checks whether changing any input bit implies changing approximately half of the output bits. It is said that f satisfies the SAC if for all i, every avalanche variable v i ·j with 1 ≤ j ≤ m and 1 ≤ i ≤ n, follows a binomial distribution with parameters l and 1/2, that is, v i ·j ∼ B(l, 1/2). On the other hand, the bit independence criterion (BIC) [51] measures the degree of independence between each pair v i ·j , v i ·k of avalanche variables, with 1 ≤ j, k ≤ m. In this way, the two criteria measure a different characteristic of the effect of changing one bit of the input on the output bits. The SAC verifies uniformity in the distribution of each output bit, while the BIC measures the degree of independence between the output bits. One important characteristic is to measure whether there is a correlation between these two criteria [52]. In [53], it was concluded that these two tests are quite uncorrelated by using the absolute correlation coefficient.
Both algorithms have in common the construction of the avalanche matrix. This matrix constitutes the basis for the detection of statistical dependency between the inputs and outputs of f . It can be seen that the BIC has a longer execution time when comparing all pairs of random variables.
In SAC and BIC, the random variable T measures the number of failures of the null hypothesis H 0 presented in [49,50]. In the case of the SAC criterion and for the BIC criterion, T can be approximated by the normal distribution with the mean and variance presented in [50] by,

RC4 Stream Cipher Description
Ron Rivest designed the RC4 algorithm in 1987 for the company RSA Data Security [4]. Its implementation is straightforward and fast and aims to generate sequences in units of one byte and allow keys of different lengths. The internal state of RC4 consists of a permutation S of the numbers 0, . . . , N − 1 and two indices i, j ∈ {0, . . . , N − 1}. The index i is known, while j and the permutation S remain secret.
The RC4 comprises two components: the key scheduling algorithm (KSA) and the pseudo-random generator algorithm (PRGA). The KSA generates an initial state from the input parameter K. This starts with an array {0, 1, . . . , N − 1} where N = 256 by default. In the end, an initial state S N is obtained, see Algorithm 3. There are a variety of results on the weaknesses of the RC4 cipher [4], especially in the non-pseudorandomness of the permutation resulting from the KSA [11,54,55], and the reflection of input patterns in the outputs and the permutation [12,56,57].

RC4 Stream Cipher Variants Description
Numerous variants of the RC4 stream cipher algorithm have been proposed in the literature, many poorly referenced. Each variant seeks to solve or strengthen the RC4 algorithm against reported weaknesses or improve its practice performance. In this section, we present the most well-known and referenced variants.

RC4+ Algorithm
The stream cipher RC4+ was proposed in [27] to exploit the qualities of the RC4 algorithm and provide additional tools, to obtain a greater margin of safety by adding some operations to its structure. Modifications in both algorithms of the RC4 cipher are proposed and are called KSA+ and PRGA+ (see Algorithms 5 and 6).
The KSA+ consists of three basic scramblings, scrambling with IV, and zigzag scrambling. The initialization and the first layer basic scrambling are the same as in the original RC4 KSA. The second layer shuffles the permutation using IVs and the secret key K, and in the third layer, more shuffling is achieved in a zigzag way (see Algorithm 5).
10: end for Where and mean right shifting and left shifting, respectively, and L is the number of output bytes. The proposed design is based on avoiding that each output byte depends only on one input of the permutation. In this sense, they incorporate the exclusive or operation ⊕ and two new indices t , and t .

RC4A Algorithm
The stream cipher RC4A was proposed in [26] with the objective to reduce the correlations between output bytes and internal variables by increasing the internal state (see Algorithms 7 and 8).

14: end for
The difference between the algorithms RC4A KSA (see Algorithm 7) and KSA RC4 is that the first generates two state table by using two different keys. Thus, secret internal state of RC4A consists of two permutations S 1 , S 2 and three variables i, j 1 , j 2 .

11: end for
In each round two output bytes are generated, the first byte is generated from S 2 and the second byte is generated from S 1 . Thus, this cipher speed up the RC4 encryption.

VMPC Algorithm
In [25] was proposed the algorithm VMPC (see Algorithms 9 and 10), whose name comes from "Variably Modified Permutation Composition", which is based in the transformation

14: end for
In the KSA, the main differences from RC4 are that the length of loop where permutation S is updated six times more than RC4-firstly, three times with the key, and after three times more with the initialization vector IV. The second update is an optional loop in KSA used by the designer to improve the diffusion using an initialization vector. However, designer statements show that it is possible to obtain good diffusion without using the initialization vector.

NGG Algorithm
In [24], the authors proposed some modifications to the RC4 algorithm to expand RC4 to 32/64 bits with a table size significantly smaller than 2 32 or 2 64 . The proposed algorithm used different word and table sizes. The authors tried to keep the original structure of RC4 as much as possible, but the proposed changes affect some underlying design principles in which the security of RC4 is based (see Algorithms 11 and 12).

Swap(S[i], S[j])
8: The new algorithm was denoted by RC4(n, m) where N = 2 n is the size of the array S in word, m is the word size in bits, and n ≤ m. It can be noted that the contents of the array S do not constitute a complete permutation of 32-bit or 64-bit words.

GGHN Algorithm
A version of NGG was introduced in [23], named GGHN cipher. A third variable k is used in KSA and PRGA to increase the security of the cipher, k is initialized in the KSA and is key dependent (see Algorithms 13 and 14).  Table 1 in [23]. For example, for n = 8 and m = 32, r = 20.

Measuring Avalanche Properties in RC4 Variants
Both statistical tests depend on the parameters n size of inputs, m size of outputs, and l number of inputs. In [50], we discuss and give recommendations for selecting these parameters to guarantee the effectiveness of the tests. The values of n, m, and l in this work were selected in such a way that they conform to these recommendations based on the significance levels α 1 and α 2 chosen.

Selecting Parameters n, m, l for Experiments
It is known [11,12,56] that, when using large input keys in the RC4, it is possible to find the so-called "related keys", which provide outputs whose values are correlated with high probability. Therefore, the maximum possible value of the size of the inputs n = 2048 bits was selected. According to [50], the increase of the m parameter has the most significant influence on the increase in the execution time of the tests.
Furthermore, it is known [54,55] that the first four RC4 output values have the most significant bias. Thus, from both details, m = 32 bits was selected. For the number l of entries, the same value used in [50] was chosen, that is, l = 65,538. In the case of the two significance levels α 1 and α 2 , values that minimize errors were selected in both cases, type I and type II. According to the results in [50], for α 1 = 0.01 and α 2 = 0001, observed values are obtained that satisfy the theoretically expected values; in this way, these were the chosen values.

Experiments on the SAC Criterion
When evaluating the SAC criterion, E(T i ) = m · α 1 = 32 × 0.01 = 0.32 avalanche variable uniformity rejections for each change of bit i, with 1 ≤ i ≤ n, and E(T) = n · m · α 1 = 2048 × 32 × 0.01 ≈ 656 rejects on all changes. Table 1 shows the expected value and the observed values for each cipher, while Figure 1 shows the distribution of the fails per bit changed when evaluating the SAC.  It is observed that the RC4A and NGG ciphers do not satisfy the SAC; they even have worse performance than the RC4 stream cipher. This result is not surprising since both variants do not add greater randomness to the operation of the RC4. In the case of the RC4A, the internal state is increased by using two permutation tables instead of one. While in the NGG, the internal state is also increased, but this time, the size in bits of each component of the permutation is increasing. Based on the results achieved, none of these two modifications eliminate this weakness of the RC4; it could even be said that they make it more detectable.
On the other hand, the RC4+, VMPC and GGHN ciphers do satisfy the SAC criterion since they have an observed value of failures very close to the expected value, for the significance level α 2 chosen. Table 2 shows the Z T values for each cipher against the critical value Z 1−α 2 = 3.09. Remarkably, the NGG and GGHN algorithms are very similar. The difference is that GGHN adds a variable k to the input-dependent internal state. To measure the implication of this variable, the same a arrangement of initial random values was used in both cases for the permutation that is used in the KSA. From the results obtained, it is experimentally demonstrated that this variable makes the difference in the behavior between both variants of RC4.

Experiments on the BIC Criterion
In the application of the BIC to the ciphers, E(T i ) = C m 2 · α 1 = 496 × 0.01 = 4.96 avalanche variable uniformity rejections are expected for each bit change i, with 1 ≤ i ≤ n, and E(T) = n · C m 2 · α 1 = 2048 × 496 × 0.01 ≈ 10,158 rejects on all you change. Table 3 shows the expected value and the observed values for each cipher, while Figure 2 shows the distribution of the fails per bit changed when evaluating the BIC.  In this case, RC4A and NGG do not satisfy the BIC and maintain a worse behavior than the RC4 stream cipher. However, it is important to highlight how the RC4A from bit i = 1150, approximately, has a slightly better performance than the NGG. This is due to the rapid trend of the NGG towards the maximum value of possible failures.
The RC4+, VMPC, and GGHN ciphers satisfy the BIC criterion showing stable behavior, with the significance level α 2 chosen. Table 4 shows the Z T values for each cipher against the critical value Z 1−α 2 = 3.09. These results show that it is possible to obtain different results in both criteria. The GGHN cipher exhibits a better behavior in the BIC than the VMPC and RC4+ ciphers, the opposite of the SAC criterion.
Using the SAC and BIC criteria extended to stream ciphers, one of the weaknesses in RC4 was the existence of statistical dependence between the outputs and inputs of RC4 (avalanche weaknesses). The RC4A and NGG variants were experimentally shown not to satisfy the SAC and BIC tests. Its behavior is even worse than that of the RC4; these variants do not eliminate the statistical dependence between the outputs and inputs of the RC4; on the contrary, they increase it. The VMPC, RC4+, and GGHN variants substantially reduce the statistical dependency between the RC4 inputs and outputs and meet the SAC and BIC tests. It is recommended not to use the RC4A and NGG variants as they do not eliminate the avalanche type weaknesses present in the RC4.

Conclusions
RC4 has been one of the most studied stream ciphers in the literature in recent decades, resulting in dozens of variant proposals through various modifications. In this work, five variants of RC4 were analyzed using the SAC and BIC criteria extended to stream ciphers. The VMPC, RC4+, and GGHN variants meet both criteria, while the RC4A and NGG variants do not meet both criteria. Even the NGG variant had worse results than RC4 itself. Future work should apply these criteria to other stream ciphers outside the RC4 scheme.