Study of Parameters in the Genetic Algorithm for the Attack on Block Ciphers

: In recent years, the use of Genetic Algorithms (GAs) in symmetric cryptography, in particular in the cryptanalysis of block ciphers, has increased. In this work, the study of certain parameters that intervene in GAs was carried out, such as the time it takes to execute a certain number of iterations, so that a number of generations to be carried out in an available time can be estimated. Accordingly, the size of the set of individuals that constitute admissible solutions for GAs can be chosen. On the other hand, several ﬁtness functions were introduced, and which ones led to better results was analyzed. The experiments were performed with the block ciphers AES( t ), for t ∈ { 3,4,7 } .


Introduction
There are several methods and tools that are used as optimization methods and predictive tools. Several heuristic algorithms have been used in the context of cryptography; in [1], the Ant Colony Optimization (ACO) heuristic method was used, and a methodology with S-AES block encryption was tested, using two pairs of plain encrypted texts. In [2], a combination of GA and ACO methods was used for cryptanalysis of stream ciphers. In [3][4][5], the possibilities of combining and designing these analyzes using machine learning and deep learning tools were shown. In [6][7][8], the methods of the Artificial Neural Network (ANN), Support Vector Machine (SVM), and Gene-Expression Programming (GEP) were used as predictive tools in other contexts.
The Genetic Algorithm (GA) is an optimization method used in recent years in cryptography for various purposes, mainly to carry out attacks on various encryption types. Some of the research conducted in this direction is mentioned next. In [9], the authors presented a combination of the GA with particle swarm optimization (another heuristic method based on evolutionary techniques); they called their method genetic swarm optimization and applied it to attack the block cipher Data Encryption Standard (DES). Their experimental results showed that better results were obtained by applying their combined method than by using both methods separately. The proposal presented in [10] provided a preliminary exploration of GA's use over a Permutation Substitution Network (SPN) cipher. The purpose of the scan was to determine how to find weak keys. Both works [9,10] used a known plaintext attack, i.e., given a plaintext T and the corresponding ciphertext C, one is interested in finding the key K. In [10], the fitness function evaluates the bitwise difference (Hamming distance) between C and the ciphertext of T, using a candidate for the key, whereas, on the contrary, in [9] the Hamming distance between T and the decryption of the ciphertext of C is measured. In [11], a ciphertext-only attack on simplified DES was shown, obtaining better results than by brute force. The authors used a fitness function that combined the relative frequency of monograms, digrams, and trigrams (for a particular language). Since the key length was very small, they were able to use this kind of function. The approach in [12] was similar to [11]; it used essentially the same fitness function, but with different parameters. It was also more detailed regarding the experiments and compared them concerning brute force and random search. For more details on the area of cryptanalysis using GAs, see [13][14][15].
As in all evolutionary algorithms, it is always a difficulty in the GA that, as the number of individuals in the space of admissible solutions grows, in this case, the set of keys, it is necessary to perform a greater number of generations in order to obtain the best results. It is clear that the greater the number of generations, the more time the algorithm consumes, so it is important to be able to estimate the time that may be necessary to execute a certain number of desired generations. On the other hand, it is necessary to analyze fitness functions that allow obtaining better results with the fittest individuals obtained.
Symmetry is omnipresent in the universe; in particular, it is present in symmetric cryptography, where the secret key is known for both authorized parts in the communication channel essentially by symmetry. We worked with block ciphers, an important primitive of symmetric cryptographic, where the key space (the population of admissible solutions for the GA in this case) is exponentially big, making it impossible in many cases to fully move in that space.
In the present work, the ideas to divide the key space that were started in [16,17] were followed. Both methodologies for dividing the key space allow the GA search space to be reduced over a subset of individuals. For this case, we studied the behavior of time and the introduction of various fitness functions. The structure of the work is as follows. In Section 2, the general ideas of the GA and two methodologies for partitioning the key space are presented; in Section 3, several parameters of the cryptanalysis for block ciphers using the GA are studied; in Section 3.1, the time it takes to execute a certain number of iterations is analyzed, so that a number of generations to be carried out in an available time can be estimated; and in Section 3.2, other fitness functions are proposed. Finally, Section 4 gives the conclusions.

The Genetic Algorithm
The GA is a heuristic optimization method. We assume that the reader knows the general ideas of how the GA works; see Algorithm 1. In this section, we briefly describe the GA scheme used in this work.

Algorithm 1 Genetic algorithm.
Input: m (quantity of individuals in the population), F (fitness function), g (number of generations). Output: the individual with the highest fitness function as the best solution.
1: Randomly generate an initial population P i with m individuals (possible solutions). 2: Compute the fitness of each individual from P i with F. 3: while the solution is not found, or the g generations are not reached do 4: Select parent pairs in P i .

5:
Perform the crossover of the selected parents, and generate a pair of offspring. 6: Mutate each of the resulting descendants. 7: Compute the fitness of each of the descendants with F and their mutations. 8: By the tournament method between two, based on the fitness of the parents and descendants, decide what is the new population P i for the next generation (selecting two individuals at random each time and choosing the one with the highest fitness). 9: end while The individuals from the populations are elements of the key space taken as binary blocks. For Crossover, the crossing by two points was used, and the crossover probability was fixed to 0.6. The Mutate operation consisted of interchanging the values of the bits of at most three random components of the binary block with a mutation rate of 0.2. The values of 0.6 and 0.2 were fixed for all experiments, and the study of the incidence of the variation of these values in the behavior of the GA was not addressed in this paper. An individual x is better adapted than another y, if it has greater fitness, i.e., if F(x) > F(y). Fitness functions are studied in more detail in Section 3.2. For the specification of the GA to block ciphers, see Section 3 of [16].

Key Space Partition Methodologies
The methodologies introduced in [16,17] allow GAs to work on a certain subset of the set of admissible solutions as if it were the complete set. The importance of this fact is that it reduces the size of the search space and gives the heuristic method a greater chance of success, assuming that the most suitable individuals are found in the selected subset. Let F k 1 2 be the key space of length k 1 ∈ Z >0 . It is known that F k 1 2 has cardinality 2 k 1 , and therefore, there is a one-to-one correspondence between F k 1 2 and the range 0, 2 k 1 − 1 . If an integer k 2 is set, (1 < k 2 ≤ k 1 ), then the key space can be represented by the numbers, where q ∈ 0, 2 k 1 −k 2 − 1 and r ∈ 0, 2 k 2 − 1 . In this way, the key space is divided into 2 k 1 −k 2 blocks (determined by the quotient in the division algorithm dividing by 2 k 2 ), and within each block, the corresponding key is determined by its position, which is given by the remainder r. The main idea is to stay in a block (given by q) and move within this block through the elements (given by r) using the GA. Note in this methodology that first q is set to choose a block, and then, r varies to be able to move through the elements of the block; however, the complete key in F k 1 2 is obtained from Expression (1). We refer to this methodology as BBM. For more details on the connection with GAs, see [16].
The following methodology is based on the definition of the quotient group of the keys G K whose objective is to make a partition of F k 1 2 in equivalence classes. It is known that F k 1 2 , as an additive group, is isomorphic to Z 2 k 1 . Let h be the homomorphism defined as follows: where k 2 ∈ Z >0 and 0 < k 2 < k 1 . We denote by N the kernel of h, i.e., Then, by the definition of h, we have that N is composed by the elements of Z 2 k 1 , which are multiples of 2 k 2 . It is known that N is an invariant subgroup; therefore, the main objective is to calculate the quotient group of Z 2 k 1 by N, and in this way, the key space will be divided into 2 k 2 equivalence classes. We denote by G K the quotient group of Now, N can be described, taking into account that its elements are multiples of 2 k 2 . For this, we take Q = {0, 1, 2, . . . , 2 k 1 −k 2 − 1}, then: On the other hand, In this way, Z 2 k 1 is divided into a partition of 2 k 2 classes given by N. G K is called the quotient group of keys. Let, be a block cipher, T a plaintext, K a key, and C the corresponding ciphertext, i.e., C = E(K, T); K is said to be a consistent key with E, T, and C, if C = E(K , T) (see [16]).
The idea here is also to go through, from the total space, the elements that are in a class and then find one (or several) consisting of the keys of that class. To be able to go through the elements of each class, note that Z 2 k 2 is isomorphic with G K , and the isomorphism corresponds to each r ∈ Z 2 k 2 its equivalence class r + N in G K ; thus, selecting a class is setting an element r ∈ Z 2 k 2 . On the other hand, the elements of N are of the form q 2 k 2 (q ∈ Q); therefore, the elements of the class r + N are of the form, Then, the problem of looping through each element of each equivalence class consists of first setting an element of Z 2 k 2 and then looping through each element of the set Q, to find a key of G K using Equation (8). The elements of the set Q have block length k d = k 1 − k 2 , and each class has 2 k d elements. We refer to this methodology as TBB. Note that the TBB methodology is a kind of dual idea with respect to the BBM methodology, i.e., one first stays in the same class (given by r) and then moves within this class through the elements (given by q) using the GA. In this case, the length of the blocks is 2 k d instead of 2 k 2 .
The main difficulty in these methodologies is the choice of k 2 , since it is the parameter that determines the number of equivalence classes and, therefore, the number of elements within them. If in G K , k 2 increases, the classes have fewer elements, but there are more classes; on the contrary, if it decreases, so does the number of classes, but the number of elements of each increases. Something similar happens in the first methodology. The operations of the space partitioning and going through the elements of each class are done with the decimal representation and the specific operations of the GA with the binary representation. For more details, see [16,17].
In Figure 1, the relationship of the content by subsections and the attack on block ciphers are shown in a flowchart.

Time Estimation
In GAs, less complex operations such as mutation and crossing are performed within each class, where the elements have block length k 2 ≤ k 1 or k d ≤ k 1 depending on the way of partitioning the space. However, despite the variation of these two parameters, the calculation of the fitness function, being the function of greater complexity within the GA, is carried out using (8), i.e., with the complete key of length k 1 , and not with the part of it found in the class. This means that a variation in the number of elements in a class does not affect the fitness function's cost. Moreover, if all the parameters remain the same, the GA's time in each generation must be quite similar, even if k 2 varies. To check this, experiments were done with a PC with an Intel(R) Core (TM) i3-4160 CPU @ 3.60GHz (four CPUs), and 4GB of RAM. AES(t) encryption was used, a parametric version of AES, where t ∈ {3, 4, 5, 6, 7, 8} and also AES(8) = AES (see [18,19]). The experiment consisted of executing the GA with the BBM methodology and measuring the time (in minutes) that it took in a generation for different values of k 2 (keeping the other parameters fixed), then verifying if these data were used to forecast the time it would take in n generations. The size of the population was m = 100 in all cases.
Tables 1-3 summarize the results corresponding to AES (3), AES(4), and AES (7), respectively. The first column has the different values that were given to k 2 . The second column is the average time t k 2 that was obtained for a generation in 10 executions of each k 2 . The general mean for all the k 2 values is t m = 0.0435571 minutes approximately in Table 1, t m = 0.0519393 in Table 2, and t m = 0.1900297 in Table 3. The third column represents the number of generations (n g ). The real-time that the algorithm takes, t r , appears in the fourth column. The fifth column is the estimated time, t e , that should be delayed, the calculation of which is based on: t e = t m n g .
Finally, the last column is the error of the prediction, E p = |t r − t e |. With these experiments, we wanted to check for the procedure whether if for a specific value of k 2 and having n g generations, then the approximate time (t) that the GA would take to complete those generations was t ≈ t e .
With a generation, or very few, the average time it took for the GA was slightly slower, decreasing and tending to stabilize at a limit as it performed more iterations. This was due to probabilistic functions that intervened in the GA and a set of operations to randomly create an initial population. Therefore, the criterion for calculating the average time t k 2 was to let the GA finish executing in a certain number of generations, either because it found the key or because it reached the last iteration without finding it, and then calculate the average. Therefore, calculating t k 2 in a few generations or setting the amount to one, would get longer times; however, doing so would be valid if the intention were to go over the top in estimating the time that the algorithm consumed.  In the case of AES(7) ( Table 3), we only experimented with the values 17 and 18 of k 2 , since considering all the previous (or higher) values would take a considerably longer time (given the greater strength of AES (7)). Similar results were obtained if more values of k 2 were chosen to calculate t m . For example, using a PC Laptop with a processor: Intel (R) Celeron (R) CPU N3050 @ 1.60GHz (two CPUs), ∼1.6 GHz, and 4 GB of RAM and going through all the values of k 2 from 10 to 48 (AES(3) key length), t m = 0.2340212 was obtained. Now, for n g = 215, we had t r = 48.14715 and t e = t m n g ≈ 50.3145. In another test: n g = 150, t r = 34.9565, then t e ≈ 35.1032. Note that the PC used in this case had different characteristics and less computational capacity than the experiments in Tables 1-3. The interesting thing is that under these conditions, the results were as expected as well.
In a similar way, the GA was executed with the TBB methodology for the search in G K , for values of k d equal to those of k 2 and different generations (n g ). It was observed that the time estimates behaved in a similar way to the results presented previously for the BBM methodology. Note that in the AES(t) family of ciphers, the length of the key increases from 48 for AES(3) to 128 for AES (8); however, regardless of the key length, the same behavior was seen in all of them. Now, we showed with these experiments another application of this study on time estimation. In the GA scheme with the BBM methodology, the total number of generations (iterations) to perform for a given value of k 2 is: Taking n g = g, by using t e , then we can do an a priori estimation for a given value of k 2 , of the total time it will take the GA to perform all the generations or a certain desired percent of them. For example, in AES(3), for k 2 = 16, in Expression (10), we have g = 655; now, since t m = 0.0435571 in Table 1, then the approximate time that the GA will consume to perform 655 generations is t e ≈ 655 · 0.0435571 ≈ 28.5299, as can be seen in the table. Another example can be seen in Table 2, also for k 2 = 16.
On the other hand, supposing we have an available time t e , to carry out the attack with this model, thus we may use (9) and (10), to compute an approximated value of k 2 , which implies doing the corresponding partition of the space and computing the number of generations to perform for this time t e and the value of k 2 . In this sense, doing n g = g in (9), we have: We remark that the above is valid in the TBB methodology, only that k d is used instead of k 2 .
As can be observed, the results on the estimation of time were favorable. In this sense, the following points can be summarized:

1.
Taking into account the estimation of time t e and its observed closeness to the real value t r , a number of generations to be carried out in an available or desired time can be estimated (using Expression (9)), which can be taken as a starting point for the proper choice of k 2 , or k d in G K (see Section 2). In this way, it is possible to adapt the size of the search space (to choose a proper value of k 2 using (11)) to the number of generations that it is estimated can be executed in a given time.

2.
The time t k 2 could be used to perform the time estimation of its own k 2 , but as can be seen in the tables, sometimes, it makes predictions with minor errors and other times greater than with t m . Another drawback is that it cannot be used for other k 2 . On the contrary, the main advantage of using t m is that it can be calculated for some sparse values of k 2 and be used to estimate the time even with values of this parameter whose t k 2 has not been calculated.

Proposal of Other Fitness Functions
In the context of the BBM and TBB methodologies used in this work with the GA, we studied in this section which fitness functions provided a better response, in the sense that consistent keys were obtained as solutions in a greater percentage of occasions. Let E be a block cipher with length n of plaintext and ciphertext, defined as in Expression (7), T a plaintext, K a key, and C the corresponding ciphertext, that is C = E(K, T). Let: be the function of decryption of E, such that T = D(K, C). Then, the fitness function with which we have been working and based on the Hamming distance d H , for a certain individual X of the population, is: which measures the closeness between the encrypted texts C and the text obtained from encrypting T with the probable key X (see [16]). A similar function is the one that measures the closeness between plaintexts: Another function that follows the idea of comparing texts in binary with d H is the weighting of F 1 and F 2 . Let α, β ∈ [0, 1] ⊂ R, such that α + β = 1, then this function would be defined as follows: It is interesting to note that F 3 is more time consuming than each function separately, but the idea is to be more efficient in searching for the key.
The fitness functions proposed at this point are based on measuring the closeness of the plaintext and ciphertext, but in decimals. Let Y d be the corresponding conversion to decimals of the binary block Y. The first function is defined as follows, Note that if the encrypted texts are equal, C d = E(X, T) d , then |C d − E(X, T) d | = 0, which implies that F 4 (X) = 1, i.e., if they are equal, then the fitness function takes the highest value. On the contrary, the greatest difference is the farthest they can be, i.e., C d = 2 n − 1 and E(X, T) d = 0, and therefore, F 4 (X) = 0. The following is a weighting of the functions F 1 and F 4 , Both functions have in common that they measure the closeness between ciphertexts. This is not ambiguous since, for example, if C and E(X, T) differ by two bits, the function F 1 will always have the same value no matter what these two bits are. On the contrary, it is not the same in F 4 if the bits are both more or less significant since the numbers are not the same in their decimal representation. The following function measures the closeness in decimals of plaintexts: Finally, the functions F 7 , F 8 , and F 9 are defined with respect to the previous ones as follows, where α i ∈ [0, 1] ⊂ R, i ∈ {1, 2, 3, 4} and 4 ∑ i=1 α i = 1. This guarantees that in general, each F j (X) ∈ [0, 1] ⊂ R, j ∈ {1, 2, 3, 4, 5, 6, 7, 8, 9}. The idea behind the introduction of these functions lies mainly in the fact that there are changes that the Hamming distance does not detect, as opposed to the decimal distance. For example, suppose the key is a = (1, 1, 1, 1, 1, 1) 2 , and b = (0, 0, 0, 0, 0, 1) 2 is the possible key, both in binary. It is clear that the Hamming distance is five, and the distance in decimals is 62 since a = 63 and b = 1; the fitness functions take the values 1 − 5/6 = 0.17 for the binary version and 1 − 62/63 = 0.016 for the decimal version. Now, if b = (0, 0, 1, 0, 0, 0) 2 , the binary fitness function would still be 0.17 since there are still five different bits; on the other hand, b = 8, so the decimal fitness function takes the value 1 − 55/63 = 0.13. Finally, if we take b = (1, 0, 0, 0, 0, 0) 2 = 32, then the distance in binary remains the same value, but the decimal continues to change, therefore, the fitness function as well, and takes the value 0.49. Therefore, this shows that the change of b, the decimal distance, is always detected, unlike the binary distance, which remains the same for certain changes. AES(3) encryption attack experiments were carried out for the two methodologies for partitioning the key space to compare these functions. The main idea is to find the key and not do a component percent match analysis between them, where the fitness functions with the Hamming distance would be more useful. A PC with an Inter (R) Core (TM) i3-4160 CPU @ 3.60GHz (four CPUs), and 4 GB of RAM was used. For the results, we took into account the average time it took to find the key, the average number of generations in which it was found, the percentage of failures (in many attacks carried out), and a parameter called efficiency, E F i , which resulted in a weighting of the three previous criteria. (Fitness functions' efficiency). Let µ 1 , µ 2 , µ 3 ∈ [0, 1] ⊂ R, µ 1 + µ 2 + µ 3 = 1, t F i , i = 1, · · · , k, the time it takes the GA to find the key with F i , on an average for g F i generations, and p F i the percent of attempts in that the GA did not find the key with F i . Then, the efficiency, E F i , of the fitness function F i with respect to the other k − 1 functions, F j , j = i, is defined as,

Definition 1
Note that the number of generations and the failure percentage are inversely proportional to the efficiency E F i as the higher these parameters, the lower its efficiency fitness function. Table 4 presents the results of the comparison of the different fitness functions for the BBM space partitioning methodology, in this case k = 9. We took α = β = 0.5 and each α i = 0.25. To calculate E F i the values µ 1 = 0.33, µ 2 = 0.33 and µ 3 = 0.34 were taken for t F i , g F i , and p F i , respectively. Sorting F i with respect to efficiency, the first five would be F 6 , F 8 , F 4 , F 5 , and F 2 . It is noteworthy that of the first three that use only the Hamming distance, only F 2 appears. In the comparison of these functions for the TBB methodology of partitioning the key space and searching in G K , the experiment results are presented in Table 5. In this case, ordering the functions by their efficiency, the first five would be F 1 , F 4 , F 5 , F 8 , and F 6 . Again, a single function appears from the first three, in this case F 1 , and the others repeat. Note in particular that F 8 (the weight of the functions in decimals) is better than F 3 (the weight of the functions in binary) in each of the parameters measured in both methodologies. It is interesting to see what happens if the values of the weights are changed in the functions F 5 , F 7 , and F 9 , which combine the functions with distance in decimals and binary, keeping fixed µ 1 , µ 2 , and µ 3 for the calculation of E F i . In this sense, in the following group of experiments, the weights were assigned as follows for each methodology: the values were 0.2 and 0.8; first, in each of these three functions, the subfunctions in binary were favored, from which α = 0.8, β = 0.2 (in F 5 , F 7 ), α 1 = α 2 = 0.4, and α 3 = α 4 = 0.1 (in F 9 ; note that this function has two subfunctions with the distance in binary and two in decimals); in this case, we identified the functions as F 5b , F 7b , and F 9b ; then, we changed the order of these same weights, and the largest were given to the subfunctions whose distance was in decimals; and we identified the functions for this case as F 5d , F 7d , and F 9d .
For the BBM methodology, the results are presented in Table 6. Note that according to E F i , the first is F 7d , followed by F 5d and F 9d . In Figure 2, these results are compared, according to E F i , with those of Table 4, also including the values of F 5 , F 7 , and F 9 . Sorting the functions according to their efficiency, the first five are F 7d , F 6 , F 8 , F 5d , and F 9d . Notice how the best results prevail in the functions with the distance in decimals. In this sense, F 7 and F 9 (now as F 7d and F 9d ) are incorporated into the first ones and three of those that already were in this group in the above experiments, F 5 (as F 5d ), F 6 , and F 8 In the case of the TBB methodology, the results are presented in Table 7. According to efficiency, the first is F 7b , followed by F 5d and F 5b . In Figure 3, these results are compared with those of all the functions of Table 5. The first five are now F 1 , F 5 , F 4 , F 7b , and F 5d ; notice how the functions that contain the distance prevail in decimals and this combined with binary. In the experiments, the best global behavior of the functions with the decimal distance is verified, and specifically in the BBM methodology, where the keys are grouped into intervals according to their decimal position in space, contrary to the other methodology, where the keys of each class are positioned throughout the space.  Note that when comparing Figures 2 and 3, the values of E F i that are in the tables are not directly compared, but rather, it is necessary to recalculate E F i taking into account that there are 15 functions. We mean, where δ i ∈ {1, · · · , 9, 5b, 5d, 7b, 7d, 9b, 9d}, i = 1, · · · , k, and, k = 15.

Conclusions
In this article, various aspects of some parameters of the GA for the attack on block ciphers were studied. In the first place, a way of estimating the time that the GA takes in a given number of generations was proposed, having an average of the time that this algorithm takes in one generation. This study is important to jointly evaluate different parameters and make the best decisions according to the computational capacity, available time, and an adequate selection of the size of the search space when using the BBM and TBB methodologies. On the other hand, several fitness functions were proposed with favorable results in the experiments with respect to the fitness functions using only the Hamming distance. In this sense, it was found that the fitness functions that use the decimal distance, in general, are more efficient than those that use only the Hamming distance, especially in the methodology BBM.
As future work, several directions are possible. Similar studies can be carried out with the GA working with other parameters, such as varying the crossover probability and mutation rate and making comparisons regarding the percentage of success of the method. It is also recommended to explore other heuristic techniques and to evaluate the use of whole space partitioning methods so that the methods work closed on the subsets. In the same way, it is also recommended to investigate the combined use with some other tools such as machine learning, deep learning, ANN, SVM, and GEP.