Bit Independence Criterion Extended to Stream Ciphers

: The bit independence criterion was proposed to evaluate the security of the S-boxes used in block ciphers. This paper proposes an algorithm that extends this criterion to evaluate the degree of independence between the bits of inputs and outputs of the stream ciphers. The effectiveness of the algorithm is experimentally conﬁrmed in two scenarios: random outputs independent of the input, in which it does not detect dependence, and in the RC4 ciphers, where it detects signiﬁcant dependencies related to some known weaknesses. The complexity of the algorithm is estimated based on the number of inputs l , and the dimensions, n and m , of the inputs and outputs, respectively.


Introduction
Randomness is an essential component in the security of cryptographic algorithms [1,2].In particular, stream ciphers are composed of pseudo-random number generators and base their security on the statistical characteristics of these generators [1].Several stream ciphers can be found in the literature whose description is based on different methods for the generation of pseudo-random numbers [3].
In practice, to determine if a generator is suitable to be used for cryptographic purposes, several statistical tests are usually applied on it to measure the randomness of its outputs [4][5][6].There are numerous statistical tests to measure the randomness of the outputs of a pseudo-random number generator, among these those grouped in the batteries of NIST [7], Diehard [8], TestU01 [9], and Knuth [10], among others [2].However, despite a large number of statistical tests being present in these batteries, none of them measure the correlation between the inputs and outputs of the stream cipher; they only measure the randomness of the outputs, which is a necessary, but not sufficient, condition to consider the generator for use in cryptography.
To consider a stream cipher secure, there must be no statistically significant correlation between the structure of its inputs and outputs.If "patterns" depending on the structure of the cipher input are generated in the output of stream ciphers, this could provide information about the input used.In the literature, there are reports of cryptanalysis based on this type of weakness [11,12].In this way, it is essential to avoid the previous weakness and to have methods to detect it in the design and evaluation stage of the algorithm; in particular, it is necessary to have statistical tests that are capable of detecting the existence of significant statistical dependencies between the inputs and outputs of stream ciphers.
In general, there are very few statistical test reports to detect the existence of statistical dependencies between the outputs and inputs of a stream cipher.Therefore, the design of statistical tests that allow for the evaluation of them in this sense is highly important in cryptography.
The strict avalanche criterion (SAC) and the bit independence criterion (BIC) were proposed in [13] to evaluate the strength of the S-boxes used in block ciphers [14].These two criteria measure different characteristics of the change's effect that an input bit has on the output bits; while the SAC verifies uniformity in the distribution of each output bit, the BIC measures the degree of independence between the output bits [15].The SAC has been extended to be applied to stream ciphers [16][17][18][19][20][21][22].In [22], the RC4 stream cipher [23] was evaluated through the SAC and the existence of statistical dependence between the input bits and outputs of the RC4 was detected for inputs of large size.This confirms the results obtained in [24][25][26][27], where the existence of related inputs in RC4 was reported.The idea developed in [22] was to determine the behavior of the distribution of the bits in the output by changing any bit in the input.In the design of stream ciphers, the distribution behavior of the output elements must be uniformly distributed, regardless of the bit that is being changed at the input [5].Otherwise, the outputs could provide information on the input bits, which constitutes a weakness that, in the worst-case scenario, could lead to an attack.A discussion of attacks on stream ciphers can be found in [28].However, the BIC has not been applied, to the best of our knowledge, to assess the degree of statistical independence between the bits of the output stream ciphers from changing a bit of the input.In this paper, we propose an algorithm that extends this criterion to evaluate the degree of independence between the input bits and the outputs of the stream ciphers.The effectiveness of the algorithm was experimentally confirmed in two scenarios: random outputs independent of the input, in which it does not detect dependence, and in the RC4 cipher, where it detects significant dependencies related to some known weaknesses [22,[24][25][26].

Preliminaries
A stream cipher can be viewed as a function f : F n 2 → F m 2 that transforms a binary input vector X = (x 1 , . . ., x n ) of n bits into a binary output vector Y = f (X) = (y 1 , . . ., y m ) of m bits, where n, m ∈ N. In [13], the difference between the outputs Y = f (X) and Y i = f (X i ), corresponding to the inputs X and X i , is called the avalanche vector and denoted by V i = Y ⊕ Y i , where X i = X ⊕ e i , with 1 ≤ i ≤ n and e i the unit vector with 1 in the i-th component.In A1, Appendix A).Given the set D = {X 1 , . . ., X l } of l inputs X r of n bits, with 1 ≤ r ≤ l, a binary matrix H i is constructed for each e i , 1 ≤ i ≤ n.To construct the matrix H i , the avalanche vectors It is said that f satisfies the BIC if, by changing any bit i in the l inputs X r ∈ D, it is satisfied that every pair of avalanche variables v i •j and v i •k are independent, with 1 ≤ j, k ≤ m.The matrix H i will be called the SAC matrix associated with the vector e i and is shown in Table 1.
To measure the degree of independence between the pairs of avalanche variables, Webster and Tavares [13] used Pearson's correlation coefficient.In [29], the maximum value of these coefficients was used as a test statistic, denoted here by If all pairs of avalanche variables v i •j and v i •k are independent, then ideally, BIC Pearson ( f ) = 0. Therefore, in practice, when BIC Pearson ( f ) ≈ 0, it is concluded that f satisfies the BIC.

Avalanche Vectors
Avalanche Variables The SAC [13] verifies whether each output bit changes approximately half of the time by changing an input bit.Using the SAC matrix H i , it is said that f satisfies the SAC if for all i and every avalanche variable v i •j , with 1 ≤ j ≤ m and 1 ≤ i ≤ n, HW(v i •j ) is binomial distributed with parameters n = l and p = 1 2 , i.e., v i •j ∼ B l, 1  2 , where HW(•) is the Hamming weight.On the other hand, the BIC [13] measures the degree of independence between each pair v i •j , v i •k of avalanche variables.Thus, the two criteria measure a different characteristic from the effect produced on the output bits changing an input bit; the SAC verifies uniformity in the distribution of each output bit, while the BIC measures the degree of independence between the output bits.
In [30], a new method to assess the correlation between statistical randomness tests based on mutual information was presented, using some test statistics and p-values of the tests.This tool can be used to determine the degree of correlation between these two statistical tests.In [29], an assessment of the independence between these two tests through absolute correlation coefficient is given, concluding that these tests are quite uncorrelated.

Stream Ciphers and RC4
The stream ciphers perform the encryption by converting plain text into bit-by-bit cipher-text through the use of a keystream and the XOR operation.A keystream is nothing more than a sequence of numbers generated in a pseudo-random way.This is achieved by building a pseudo-random number generator.The sequence of pseudo-random numbers used must meet certain statistical properties to be considered suitable for cryptographic use.In many applications (see [4,31]), ciphers of this type have become very important tools since they are very fast and their implementation is simpler than other ciphers, e.g., a block cipher.In these types of scenarios, the problem is in the transmission of a large amount of data in communication networks in a short time.
There are a wide variety of design proposals [32] to build pseudo-random number generators.Among these, the RC4 algorithm [23] stands out from others for its wide use in different applications and protocols.The RC4 stream cipher [23] is optimized to be used in 8-bit processors, being extremely fast and exceptionally simple.It was included in network protocols such as Secure Sockets Layer (SSL), Transport Layer Security (TLS), Wired Equivalent Privacy (WEP), Wi-Fi Protected Access (WPA), and in various applications used in Microsoft Windows, Lotus Notes, Apple Open Collaboration Environment (AOCE), and Oracle Secure SQL [23].In the last decade, some applications [33,34] avoided RC4 encryption given some weaknesses found [35].However, although it is not considered very secure [36], RC4 is still one of the most widely used stream ciphers [37], and continues to motivate research nowadays [36][37][38].Furthermore, this cipher is a good option to measure the effectiveness of methods that analyze weaknesses in stream ciphers related to those already known in RC4 [22,[24][25][26], or to check the performance of hardware or software schemes that make use of cryptography [39][40][41].
The RC4 has two main components: the key scheduling, and the pseudo-random number generator.The key scheduling generates an internal random permutation S of values from 0 to 255, from an initial permutation, a (random) key K of l-byte length, and two pointers i and j.The maximal key length is of l = 256 bytes (see Algorithm 1).
Algorithm 1 RC4 key-scheduling Swap S[i] and S[j] 8: end for The main part of the algorithm is the pseudo-random number generator that produces one-byte output in each step.As usual, for stream ciphers, the encryption will be an XOR of the pseudo-random sequence with the message (see Algorithm 2).
Algorithm 2 RC4 pseudo-random generator Output S[(S[i] + S[j]) mod 256] 8: end while The weaknesses found can be classified according to the theme they exploit, some of which are: 1.
Weak keys.

2.
Key recovery from the state.

3.
Key recovery from the key-stream.
While the fifth point is the most studied subject in the literature, the third point is the most serious attack made to RC4.The theme that is exploited in this paper has been deeply studied-in particular, Grosul and Wallach [24] demonstrated that certain related key-pairs generate similar output bytes in RC4.Later, Matsui [25] reported colliding key pairs for RC4 for the first time, and then stronger key collisions were found in [26].For the RC4 stream cipher, several modifications have been proposed; while some modified only certain components or some operations, others completely changed the algorithm (see [42]).It is important to note that even RC4 variants have had a lot of attention in the scientific community (see [43]).

BIC Algorithm in Stream Ciphers
In this section, an algorithm is proposed to extend the bit independence criterion (BIC) to stream ciphers, experimentally confirming its effectiveness.The two main differences that arise in this scenario with respect to its application in S-boxes are discussed.
Let f be the function that will be evaluated by the BIC, D = {X 1 , . . ., X l } the set of l inputs X r of n bits generated randomly and m the number of bits of the outputs of f , the proposed method consists of the following steps: Step 1. Construct the n SAC H i , (i = 1, . . ., n) matrices of dimension l × m.

2.
Evaluate Y i r = f (X r ⊕ e i ), and generate the output Y i r of size m, where e i is the canonical vector.
Step 2. Evaluate the independence between the avalanche variables v i •j and v i •k .

1.
For each pair (j, k), with 1 ≤ j, k ≤ m and j = k, measure the independence between the avalanche variables Set a significance level α 1 and decide, using a statistical criterion, if the observed value of the test statistic allows to reject or not the hypothesis of independence between v i •j Count the number T i of rejections between C m 2 pairs of the matrix H i .Step 3. Decision on whether or not to comply with the BIC criterion: 1.
Count the total number T of rejections between the n matrices H i .
Decide, using a statistical criterion, whether the observed value of T allows to reject the BIC compliance.
The following sections describe each of these steps and end with the proposal of an algorithm to evaluate the BIC in stream ciphers.

Building the SAC Matrix
First difference.When evaluating the BIC in S-boxes, it is possible to go through the entire space of l = 2 n inputs since n usually takes small values; however, this is impractical in stream ciphers where the dimension of the input space can be 2 128 or greater.To solve this problem, it is proposed to use the same approach applied in the randomness assessment to the outputs of pseudo-random generators through statistical tests [2].This approach consists of generating a sample of l inputs with l 2 n , and to determine the strength of the cipher from the results obtained from this sample.
The l inputs are chosen randomly in the space of 2 n possible inputs.This is the main difference; while the BIC test works over all of the input space with S-boxes, the stream cipher works with a randomly selected subset of the sample space.

Test of Independence between Two Avalanche Variables v i
• j and v i •k Second difference.In [13], Pearson's correlation coefficient ρ was used to measure the degree of independence between the pairs of avalanche variables.The use of such a coefficient in [13,29] has two main disadvantages: the first one is that it only detects linear correlations, and the second one is that the critical region for the rejection of the null hypothesis is not explicitly defined, i.e., a threshold is not defined below which BIC Pearson ( f ) ≈ 0 is decided.Thus, it can be a reason for an imprecision in the decision when dealing with small coefficient values.In order to solve the first aforementioned disadvantage, mutual information can be applied to measure the degree of independence between pairs of avalanche variables [44], but in this case, it is important to determine which estimator to use, since there are no estimators of unbiased entropy of minimal variance; the second disadvantage can be solved by defining the critical region using a transformation of the correlation coefficient of the type t = (N − 2)ρ 2 /(1 − ρ 2 ), where t is distributed as a t-Student distribution with N − 2 degrees of freedom [45].
Another approach is that when v i •j and v i •k are independent, then [46].In this work, independence will be evaluated by measuring the adjustment HW(s i jk ) to the binomial distribution B(l, 1/2), where HW(•) is the Hamming weight.This allows setting a threshold for the decision criterion on independence between v i •j and v i •k .Since H i is a binary matrix, the adjustment to the binomial distribution will be measured by the χ 2 -test with 1 degree of freedom, with the test hypothesis given by: That is, The test statistic used is As usual [2], the value α 1 is such that 1 the null hypothesis H 0 is rejected.It is left for future works, to compare the effectiveness of these three criteria for evaluating independence between the avalanche variables.

BIC Acceptance Test
To decide whether the stream cipher f satisfies the BIC, it is necessary to take into account the number of rejections of H 0 on the n matrices; for this, a random variable T, which counts the total number of rejections on n matrices is defined: where and with significance α 1 0 otherwise.(6) The variable T i counts the number of rejections of the null hypothesis H 0 in the matrix H i .
Expected number of rejections of H 0 .In each of the n SAC H i matrices, C m 2 pairs of columns are formed, thus the number of rejections T satisfies When T = 0, we have the ideal case for compliance with the BIC, since all the pairs of columns are independent, while as T 0, the number of non-independent column pairs increases.Under the hypothesis test above, with a significance level α 1 , the expected number of rejections of H 0 is: for each matrix H i .In total, among the n matrices SAC are expected H 0 rejections.The random variable follows a binomial distribution B(n • C m 2 , α 1 ).Taking into account that generally α 1 < 0.1, this distribution can be approximated, in this case, to the Poisson distribution with parameter . Since λ is large, due to large values of n • C m 2 , then the Poisson distribution can be approximated by the Normal distribution with mean and variance: Thus Decision criteria.To compare the Z T value with the N(0, 1) distribution, a significance level α 2 is selected.Then, it is tested if f does not satisfy the BIC, with a significance level α 2 , if Z T > Z 1−α 2 .It can be seen that if 0 ≤ T ≤ E(T|H 0 ), then the values of Z T decreases with respect to Z 1−α 2 and Z T > Z 1−α 2 is not satisfied, so the BIC is fulfilled.On the other hand, if T E(T|H 0 ), then the values of Z T will be greater as T increases, so Z T > Z 1−α 2 is satisfied and the BIC compliance is rejected.
Normality of the test statistic T. In the expression of T there are n , whose distributions under H 0 and H 1 are different: Under H 0 , all variables t(v i •j , v i •k , α 1 ) are independent, identically distributed and take the value of 1 with probability that appear in the expression of T are not identically distributed, since the rejection of the BIC means that there are several matrices H i for which the hypothesis H 0 of independence between v i •j and v i •k is rejected.In this case, p i jk = α 1 and may be different when i, j, k varies.For this reason, a binomial does not appear directly as the distribution of T. However, it is still possible to approximate the distribution of T by the Normal distribution.For this it is sufficient to calculate the mean between the probabilities of all the variables t(v i •j , v i •k , α 1 ) and the distribution of T can be approximated by the binomial distribution B(n . This distribution, in turn, can be approximated by the Normal distribution, taking into account high values of n • C m 2 .The precision of this approximation depends on the difference between the probabilities p i jk involved in P n•C m 2 , therefore the variance value between these probabilities can be a measure of the quality of the approximation. When comparing the distribution of T under H 0 and H 1 , similarities and differences are observed.They are similar in that in both cases T follows a Normal distribution, but there are two differences, the first and most important is observed between the expected values of both distributions (it will be higher under H 1 ) and the second refers to the level of adjustment to this distribution (may be lower under H 1 ).In the rest of this work, the proposed method to evaluate the BIC in stream ciphers will be called the BIC test.

BIC Test Algorithm
Given a set D = {X 1 , . . ., X l } of l randomly chosen n bits inputs to the function f , constructs for each binary vector e i (1 ≤ i ≤ n) its associated SAC matrix H i and for all for j, k with j = k, it is checked if HW(s i jk ) follow the B l, 1 2 distribution, see the proposed Algorithm 3.

Algorithm 3 BIC stream ciphers algorithm
Input: f function to evaluate, n size of the inputs of f , m size of the outputs of f , α 1 and α 2 levels of significance, D set of l inputs to the function f .Output: for each (j, k) do Independence check between v i •j and v i end for 11: end for 12: if Z T > Z 1−α 2 then f does not satisfy the BIC 13: else f satisfies the BIC 14: end if

Complexity of the Algorithm
In steps 3-5 of the algorithm, f is used to generate m output bits.Assuming that the stream cipher f generates each output with a constant cost, then O(lm) operations are performed in these steps, since l times m output bits are generated from f .In steps 6-10 of the algorithm, O(m 2 l) operations are performed due to the computation C m 2 times the Hamming weight in a sequence of l bits.Thus the algorithm performs O n max(l m, l m 2 ) = O n l m 2 operations, and the number of algorithm operations depends on the number n of input bits, the number m of output bits, and the number l of inputs used.It can be seen that the increase in the parameter m has a greater influence than n and l in increasing the number of operations of the algorithm.In the particular case m = n = l, O(m 4 ) operations are performed.

Parameter Selection
As seen in the previous section, the number of operations of the BIC algorithm depends on three parameters, the number l of inputs, the number n of bits of each input, and the number m of bits of each output.
Selection of l such that p ≈ 0.5 and HW(s i jk ) fit to the binomial distribution B(n, 1/2).The number l of entries influences the effectiveness of the χ 2 -test in determining whether two columns are independent.Increasing l guarantees a greater fit of HW(s i jk ) to the binomial distribution B(n, 1/2); however, it causes an increase in the number of operations.In practice, the idea is to obtain a cost-effectiveness ratio using a value of l such that it maintains the fit and provides a practical number of operations.
Using the confidence interval for proportions [47], it is possible to obtain a value of l 0 , such that prefixing l > l 0 achieves a good fit.This confidence interval is given by Solving for l we get to where e = p − p, is the deviation of p over p, and q = (1 − p).
Example 1. Calculation of the lower bound l 0 for l.A value l 0 from which, with high probability, it is satisfied that q ≈ p ≈ 0.5 is needed.Then, substituting for a significance level α 1 = 0.01 and a deviation e whose absolute value |e| satisfy inequality |e| = | p − 0.5| ≤ 0.03, we get In this way, for the significance level α 1 and the deviation e selected, it is concluded that l must be chosen such that l > l 0 = 2189.
Example 2. Convergence of p and deviation e. Table 2 shows the behavior of the deviation e observed for several l, l > l 0 = 2189, with n = 64 and m = 32.It can be seen how, for most of the estimated e, the imposed condition is met | e| ≤ 0.03.Selection of n, m under the null hypothesis H 0 .The number n of inputs and the number m of outputs influence the sample size for the calculation of the number T of rejections of H 0 .In general, we will have d = n • C m 2 pairs of columns to check and it is expected, with probability α 1 , that λ = α 1 • d pairs of columns will be rejected.
Let λ 0 = α 1 • d 0 be some default value of λ from which the distribution of T can be approximated to N(0, 1).It is necessary to select n and m such that d > d 0 is satisfied and a value of λ such that λ > λ 0 is obtained.It is advisable to select a high value of λ 0 that avoids the use of corrections and provides a good fit.
It is known that increasing λ 0 provides better precision in the Poisson approximation to the Normal distribution.To obtain d 0 , we can use the confidence interval for proportions [47], this time in an approximation to the Normal distribution with one tail.So, we have Solving for d we get to Example 3. Calculation of the lower bound d 0 for d.Substituting, p = 0.01, q = 0.99, with a significance level α 2 = 0.001 and a deviation |e| of 0.003, we obtain Then, , for the values of α 1 and e chosen, it is enough to select values of n and m such that λ > λ 0 ≈ 105.In Table 3, for α 1 = 0.01, some values of n and m are highlighted in italics from which λ > λ 0 = 105.To select n, m and l, the trade-off between reducing computational cost and maximizing effectiveness can be taken into account.However, it is very important to be careful when selecting which values to use, since minimizing computational cost could limit the effectiveness of the BIC method and overestimate the quality of the stream cipher.It is advised to prioritize increasing effectiveness.

Experiments and Discussion of the Results
In this section, experiments are carried out in two different scenarios.In the first scenario, the behavior of the Z T test statistic is investigated under the hypothesis H 0 of compliance with the BIC test, evaluating the test on random H i matrices.The second scenario shows the behavior of the Z T test statistic when evaluating it in a stream cipher that does not meet this criterion.

Scenario 1 (BIC in Random SAC Matrices)
It is expected that under H 0 , we obtain E(Z T |H 0 ) = 0, σ 2 (T|H 0 ) = 1 and Z T ∼ N(0, 1).The experiments in this scenario were carried out under uniform and independent randomly generated SAC matrices, to evaluate compliance, under H 0 , of the N(0, 1) distribution of Z T .
Taking into account Table 3, four sets of parameters were selected, two for n = m and two for n = m: The values l ∈ {4096, 8192, 16,384, 32,678} will be varied, in order to verify the influence of the variation of the parameters n, m and l in the adjustment of Z T .The values of n and m with the lowest computational cost were selected, that is, the values of n and m that provide the lowest values of λ such that λ > λ 0 = 105.
The values n and m will be used as a power of two, since current ciphers work with inputs and outputs whose size has these characteristics and also l to speed up, in terms of execution time, the computation of the BIC method.However, it is important to note that the BIC method can be used for any value of n, m and l, as long as the requirements outlined in the previous section are met.
Normality of Z T in H i random matrices.Tables 4 and 5 show the values E(Z T |H 0 ) and σ 2 (Z T |H 0 ) respectively observed in each sample, for each value of n, m and l.The analysis of Figure 1 and Tables 4 and 5, suggests the fulfillment of the hypothesis H 0 about the distribution of Z T ∼ N(0, 1), for all the values of the parameters l, n, m selected.As can be seen in Tables 4 and 5, by varying l, n, m, the values E(Z T |H 0 ) and σ 2 (Z T |H 0 ) of the observed distribution of Z T maintain the fit to the parameters µ = 0 and σ 2 = 1 expected in a distribution N(0, 1). Figure 1 shows the bell shape and approximate symmetry of the obtained distributions.Normality Test.The Shapiro-Wilks [48] test for normality was applied to all selected parameter sets.The results are shown in Figure 2 and Table 6.In Figure 2 we can see how the observed distribution of Z t for all the values of l, n, m, fit the distribution N(0, 1).Table 6 shows the p-values corresponding to the Shapiro-Wilk normality test for each of the chosen parameter sets.It is observed that in all cases, the p-values are greater than the usual values assumed for α, such as 0.01 or 0.05 and are consistent with the assumed normality hypothesis.The higher the value of n = m, the higher the p-value, which corresponds to the influence of these parameters on the value of λ (see Table 3).
BIC test application on H i random matrices.To evaluate the behavior of the BIC test in random matrices, each Z t was compared with the critical value Z 1−α 2 , and the number of rejections of H 0 was counted.Tables 7 and 8 show the results for various levels of significance α 2 and l = 16,384.The observed number of rejections is expected to correspond to that expected according to the selected α 2 level, which would allow choosing α 2 , to obtain zero rejections in this scenario.For the value of α 2 = 0.0001 located in the last row of both tables, no statistical dependence is detected as expected in random matrices, confirming the effectiveness of the criterion and illustrating the importance of the proper selection of α 2 , according to the number d = n • C m 2 of pairs of columns whose independence is evaluated.For the values of l, n, m, α 1 , α 2 used, such that no Type I error is made, the probability of making a Type II error must be calculated and the values that minimize it must be chosen.In this sense, experiments will be carried out in the second scenario on a stream cipher.

Scenario 2 (BIC in Stream Cipher)
For this scenario, it is convenient to apply the test to a stream cipher that violates the BIC.RC4 was chosen because there are reports of the existence of dependencies between the inputs and outputs in this cipher [22][23][24][25].Experiments were performed setting the parameters n = m ∈ {32, 64, 128, 160, 256} and 1000 sets D of l = 16,384 entries each were built.In each set, Z T was calculated and compared with the critical value Z 1−α 2 , varying α 2 .Figure 3 shows the distribution of the 1000 values of Z T obtained.Table 9 show the values E(Z T ) and σ 2 (Z T ) observed in each sample, for each value of n, m, and l.To verify the normality of the data, the Shapiro-Wilks [48] normality test was applied to all the selected parameter sets.The results are shown in Figure 4 and Table 10.In Figure 4 we can see how by increasing the values of m = n the Normal distribution N(µ, 1) of the statistician Z t is maintained, however, the value of µ increases (see Figure 3 and Table 9).
It is observed that in all cases the p-values are greater than the usual values assumed for α, such as 0.01 or 0.05 and the samples maintain normality.
In Table 11 it is noted how in RC4 the effectiveness of the criterion increases as the values of n and m increase.That is, increasing the values m = n increases the number of correct decisions to reject H 0 .As mentioned, it is known that by increasing the value of n in RC4 the probability of finding very similar outputs, or even the same, increases for inputs that differ by a few bits [22,[24][25][26].This experiment confirms the effectiveness of the BIC test by detecting dependence between the inputs-outputs of RC4 and allows us to conclude that in RC4, the effectiveness is an increasing function of the value of the parameters n = m.All cases in which the observed number of rejections exceeds the expected value are indicated in italics.
An important feature in statistical tests is the determination of type I and type II errors [2].Under H 0 , we have that v i •j and v i •k are independent, then the type I error consists in rejecting independence when they are and therefore deciding that the cipher has a weakness when it does not have it.Meanwhile, not rejecting H 0 when there is a dependency means that it would be decided that the cipher passes the BIC, when in fact it does not pass it, and a type II error would be committed.Table 12 shows the proportion of Type I and II errors, committed by the BIC test, for some parameter sets.It can be seen that for α 2 = 0.0001 type I and II errors are not made.The outputs of RC4 [23] are known to pass numerous statistical tests [49], however they do not satisfy the BIC statistical test proposed in this work.This shows that the BIC statistical test complements the classic randomness tests, therefore it constitutes a tool to consider to evaluate stream ciphers.Binary output vector of m bits corresponding to input X i r , Y i = f (X i r ) V i r = Y r ⊕ Y i r = (v i r1 , v i r2 , . . ., v i rm ) Avalanche vector associated with vector e i and input X r v i rj ∈ F 2 Avalanche variable associated to vector e i and input X r with 1 ≤ j ≤ m

Figure 1
corresponds to the observed distribution of 1000 values of Z T , for each pair of parameters n and m, and each value of l.

32 Figure 1 .
Figure 1.Observed distribution of 1000 values of Z T in random H i matrices for various values of n, m, and l.

Figure 2 .
Figure 2. Adjustment of the observed distribution from Z T to N(0, 1) in H i random matrices, which satisfy the BIC.

Table 1 .
SAC matrix H i = (v i rj ) of dimension l × m for the change of bit i over the set D of l inputs.

Table 2 .
Values of the deviation | e| for several l, l > l 0 = 2189 with n = 64 and m = 32.

Table 3 .
λ values for multiple values of n and m with α 1 = 0.01.Values of n and m are highlighted in italics from which λ > λ 0 = 105.

Table 4 .
Observed E(Z T |H 0 ) values for each selected n, m, l value.

Table 5 .
Observed σ 2 (Z T |H 0 ) values for each selected n, m, l value.

Table 6 .
p-values of the Shapiro-Wilk test of normality for samples of Z t , in random H i matrices, that satisfy the BIC.

Table 9 .
Expected value E(Z T ) and variance σ 2 (Z T ) of Z T for SAC matrices generated with the RC4.

Table 10 .
p-values of the Shapiro-Wilk test of normality on samples of Z t for SAC matrices generated with the RC4 with n = m ∈ {32, 64, 128, 160, 256}.

Table 11 .
Expected E(# [Z T > Z 1−α 2 | H 0 ]) and observed # [Z T > Z 1−α 2] number of rejections in 1000 repetitions of the BIC test in SAC matrices generated with the RC4.All cases in which the observed number of rejections exceeds the expected value are indicated in italics.

Table 12 .
Proportion of type I and II errors made by the BIC test.