Study of Parameters in the Genetic Algorithm for the Attack on Block Ciphers

Tito-Corrioso, Osmani; Borges-Trenard, Miguel Angel; Borges-Quintana, Mijail; Rojas, Omar; Sosa-Gómez, Guillermo

doi:10.3390/sym13050806

Open AccessArticle

Study of Parameters in the Genetic Algorithm for the Attack on Block Ciphers

by

Osmani Tito-Corrioso

^1,*

,

Miguel Angel Borges-Trenard

²

,

Mijail Borges-Quintana

³

,

Omar Rojas

⁴

and

Guillermo Sosa-Gómez

^4,*

¹

Departamento de Matemática, Facultad de Ciencias de la Educación, Universidad de Guantánamo, Av. Che Guevara km 1.5 Carr. Jamaica, Guantánamo 95100, Cuba

²

Doctorate in Mathematics Education, Universidad Antonio Nariño, Bogotá 111321, Colombia

³

Departamento de Matemática, Facultad de Ciencias Naturales y Exactas, Universidad de Oriente, Av. Patricio Lumumba s/n, Santiago de Cuba 90500, Cuba

⁴

Facultad de Ciencias Económicas y Empresariales, Universidad Panamericana, Álvaro del Portillo 49, Zapopan, Jalisco 45010, Mexico

^*

Authors to whom correspondence should be addressed.

Symmetry 2021, 13(5), 806; https://doi.org/10.3390/sym13050806

Submission received: 8 April 2021 / Revised: 21 April 2021 / Accepted: 27 April 2021 / Published: 5 May 2021

(This article belongs to the Special Issue Theoretical Computer Science and Discrete Mathematics)

Download

Browse Figures

Versions Notes

Abstract

:

In recent years, the use of Genetic Algorithms (GAs) in symmetric cryptography, in particular in the cryptanalysis of block ciphers, has increased. In this work, the study of certain parameters that intervene in GAs was carried out, such as the time it takes to execute a certain number of iterations, so that a number of generations to be carried out in an available time can be estimated. Accordingly, the size of the set of individuals that constitute admissible solutions for GAs can be chosen. On the other hand, several fitness functions were introduced, and which ones led to better results was analyzed. The experiments were performed with the block ciphers AES(t), for

t \in {3, 4, 7}

.

Keywords:

genetic algorithm; cryptanalysis; AES(t); optimization; heuristics

1. Introduction

There are several methods and tools that are used as optimization methods and predictive tools. Several heuristic algorithms have been used in the context of cryptography; in [1], the Ant Colony Optimization (ACO) heuristic method was used, and a methodology with S-AES block encryption was tested, using two pairs of plain encrypted texts. In [2], a combination of GA and ACO methods was used for cryptanalysis of stream ciphers. In [3,4,5], the possibilities of combining and designing these analyzes using machine learning and deep learning tools were shown. In [6,7,8], the methods of the Artificial Neural Network (ANN), Support Vector Machine (SVM), and Gene-Expression Programming (GEP) were used as predictive tools in other contexts.

The Genetic Algorithm (GA) is an optimization method used in recent years in cryptography for various purposes, mainly to carry out attacks on various encryption types. Some of the research conducted in this direction is mentioned next. In [9], the authors presented a combination of the GA with particle swarm optimization (another heuristic method based on evolutionary techniques); they called their method genetic swarm optimization and applied it to attack the block cipher Data Encryption Standard (DES). Their experimental results showed that better results were obtained by applying their combined method than by using both methods separately. The proposal presented in [10] provided a preliminary exploration of GA’s use over a Permutation Substitution Network (SPN) cipher. The purpose of the scan was to determine how to find weak keys. Both works [9,10] used a known plaintext attack, i.e., given a plaintext T and the corresponding ciphertext C, one is interested in finding the key K. In [10], the fitness function evaluates the bitwise difference (Hamming distance) between C and the ciphertext of T, using a candidate for the key, whereas, on the contrary, in [9] the Hamming distance between T and the decryption of the ciphertext of C is measured. In [11], a ciphertext-only attack on simplified DES was shown, obtaining better results than by brute force. The authors used a fitness function that combined the relative frequency of monograms, digrams, and trigrams (for a particular language). Since the key length was very small, they were able to use this kind of function. The approach in [12] was similar to [11]; it used essentially the same fitness function, but with different parameters. It was also more detailed regarding the experiments and compared them concerning brute force and random search. For more details on the area of cryptanalysis using GAs, see [13,14,15].

As in all evolutionary algorithms, it is always a difficulty in the GA that, as the number of individuals in the space of admissible solutions grows, in this case, the set of keys, it is necessary to perform a greater number of generations in order to obtain the best results. It is clear that the greater the number of generations, the more time the algorithm consumes, so it is important to be able to estimate the time that may be necessary to execute a certain number of desired generations. On the other hand, it is necessary to analyze fitness functions that allow obtaining better results with the fittest individuals obtained.

Symmetry is omnipresent in the universe; in particular, it is present in symmetric cryptography, where the secret key is known for both authorized parts in the communication channel essentially by symmetry. We worked with block ciphers, an important primitive of symmetric cryptographic, where the key space (the population of admissible solutions for the GA in this case) is exponentially big, making it impossible in many cases to fully move in that space.

In the present work, the ideas to divide the key space that were started in [16,17] were followed. Both methodologies for dividing the key space allow the GA search space to be reduced over a subset of individuals. For this case, we studied the behavior of time and the introduction of various fitness functions. The structure of the work is as follows. In Section 2, the general ideas of the GA and two methodologies for partitioning the key space are presented; in Section 3, several parameters of the cryptanalysis for block ciphers using the GA are studied; in Section 3.1, the time it takes to execute a certain number of iterations is analyzed, so that a number of generations to be carried out in an available time can be estimated; and in Section 3.2, other fitness functions are proposed. Finally, Section 4 gives the conclusions.

2. Preliminaries

2.1. The Genetic Algorithm

The GA is a heuristic optimization method. We assume that the reader knows the general ideas of how the GA works; see Algorithm 1. In this section, we briefly describe the GA scheme used in this work.

Algorithm 1. Genetic algorithm.

Input:: m (quantity of individuals in the population), F (fitness function), g (number of generations).
Output:: the individual with the highest fitness function as the best solution.
1:: Randomly generate an initial population $P_{i}$ with m individuals (possible solutions).
2:: Compute the fitness of each individual from $P_{i}$ with F.
3:: while the solution is not found, or the g generations are not reached do
4:: Select parent pairs in $P_{i}$ .
5:: Perform the crossover of the selected parents, and generate a pair of offspring.
6:: Mutate each of the resulting descendants.
7:: Compute the fitness of each of the descendants with F and their mutations.
8:: By the tournament method between two, based on the fitness of the parents and descendants, decide what is the new population $P_{i}$ for the next generation (selecting two individuals at random each time and choosing the one with the highest fitness).
9:: end while

The individuals from the populations are elements of the key space taken as binary blocks. For Crossover, the crossing by two points was used, and the crossover probability was fixed to

0.6

. The Mutate operation consisted of interchanging the values of the bits of at most three random components of the binary block with a mutation rate of 0.2. The values of

0.6

and

0.2

were fixed for all experiments, and the study of the incidence of the variation of these values in the behavior of the GA was not addressed in this paper. An individual x is better adapted than another y, if it has greater fitness, i.e., if

F (x) > F (y)

. Fitness functions are studied in more detail in Section 3.2. For the specification of the GA to block ciphers, see Section 3 of [16].

2.2. Key Space Partition Methodologies

The methodologies introduced in [16,17] allow GAs to work on a certain subset of the set of admissible solutions as if it were the complete set. The importance of this fact is that it reduces the size of the search space and gives the heuristic method a greater chance of success, assuming that the most suitable individuals are found in the selected subset. Let

F_{2}^{k_{1}}

be the key space of length

k_{1} \in Z_{> 0}

. It is known that

F_{2}^{k_{1}}

has cardinality

2^{k_{1}}

, and therefore, there is a one-to-one correspondence between

F_{2}^{k_{1}}

and the range

[0, 2^{k_{1}} - 1]

. If an integer

k_{2}

is set, (

1 < k_{2} \leq k_{1}

), then the key space can be represented by the numbers,

q 2^{k_{2}} + r,

(1)

where

q \in [0, 2^{k_{1} - k_{2}} - 1]

and

r \in [0, 2^{k_{2}} - 1]

. In this way, the key space is divided into

2^{k_{1} - k_{2}}

blocks (determined by the quotient in the division algorithm dividing by

2^{k_{2}}

), and within each block, the corresponding key is determined by its position, which is given by the remainder r. The main idea is to stay in a block (given by q) and move within this block through the elements (given by r) using the GA. Note in this methodology that first q is set to choose a block, and then, r varies to be able to move through the elements of the block; however, the complete key in

F_{2}^{k_{1}}

is obtained from Expression (1). We refer to this methodology as BBM. For more details on the connection with GAs, see [16].

The following methodology is based on the definition of the quotient group of the keys

G_{K}

whose objective is to make a partition of

F_{2}^{k_{1}}

in equivalence classes. It is known that

F_{2}^{k_{1}}

, as an additive group, is isomorphic to

Z_{2^{k_{1}}}

. Let h be the homomorphism defined as follows:

\begin{matrix} h : Z_{2^{k_{1}}} & ⟶ & Z_{2^{k_{2}}} \\ n & ⟶ & n (m o d 2^{k_{2}}), \end{matrix}

(2)

where

k_{2} \in Z_{> 0}

and

0 < k_{2} < k_{1}

. We denote by N the kernel of h, i.e.,

N = {x \in Z_{2^{k_{1}}} | h (x) = 0 \in Z_{2^{k_{2}}}} .

(3)

Then, by the definition of h, we have that N is composed by the elements of

Z_{2^{k_{1}}}

, which are multiples of

2^{k_{2}}

. It is known that N is an invariant subgroup; therefore, the main objective is to calculate the quotient group of

Z_{2^{k_{1}}}

by N, and in this way, the key space will be divided into

2^{k_{2}}

equivalence classes. We denote by

G_{K}

the quotient group of

Z_{2^{k_{1}}}

by N (

G_{K} = Z_{2^{k_{1}}} / N

). By Lagrange’s theorem, we have that

o (G_{K}) = o (Z_{2^{k_{1}}}) / o (N)

, but

o (G_{K}) = o (Z_{2^{k_{2}}}) = 2^{k_{2}}

, then,

o (N) = o (Z_{2^{k_{1}}}) / o (Z_{2^{k_{2}}}) = 2^{k_{1} - k_{2}} .

(4)

Now, N can be described, taking into account that its elements are multiples of

2^{k_{2}}

. For this, we take

Q = {0, 1, 2, \dots, 2^{k_{1} - k_{2}} - 1}

, then:

\begin{matrix} N & = & < 2^{k_{2}} > = {x \in Z_{2^{k_{1}}} | \exists q \in Q, x = q 2^{k_{2}}} \\ = & {0, 2^{k_{2}}, 2 * 2^{k_{2}}, 3 * 2^{k_{2}}, \dots, (2^{k_{1} - k_{2}} - 1) * 2^{k_{2}}} . \end{matrix}

(5)

On the other hand,

G_{K} = {N, 1 + N, 2 + N, \dots, (2^{k_{2}} - 2) + N, (2^{k_{2}} - 1) + N} .

(6)

In this way,

Z_{2^{k_{1}}}

is divided into a partition of

2^{k_{2}}

classes given by N.

G_{K}

is called the quotient group of keys. Let,

E : {0, 1}^{m} \times {0, 1}^{n} \to {0, 1}^{n}, m, n \in N, m \geq n,

(7)

be a block cipher, T a plaintext, K a key, and C the corresponding ciphertext, i.e.,

C = E (K, T)

;

K^{'}

is said to be a consistent key with E, T, and C, if

C = E (K^{'}, T)

(see [16]). The idea here is also to go through, from the total space, the elements that are in a class and then find one (or several) consisting of the keys of that class. To be able to go through the elements of each class, note that

Z_{2^{k_{2}}}

is isomorphic with

G_{K}

, and the isomorphism corresponds to each

r \in Z_{2^{k_{2}}}

its equivalence class

r + N

in

G_{K}

; thus, selecting a class is setting an element

r \in Z_{2^{k_{2}}}

. On the other hand, the elements of N are of the form

q 2^{k_{2}}

(

q \in Q

); therefore, the elements of the class

r + N

are of the form,

q 2^{k_{2}} + r, q \in Q .

(8)

Then, the problem of looping through each element of each equivalence class consists of first setting an element of

Z_{2^{k_{2}}}

and then looping through each element of the set Q, to find a key of

G_{K}

using Equation (8). The elements of the set Q have block length

k_{d} = k_{1} - k_{2}

, and each class has

2^{k_{d}}

elements. We refer to this methodology as TBB. Note that the TBB methodology is a kind of dual idea with respect to the BBM methodology, i.e., one first stays in the same class (given by r) and then moves within this class through the elements (given by q) using the GA. In this case, the length of the blocks is

2^{k_{d}}

instead of

2^{k_{2}}

.

The main difficulty in these methodologies is the choice of

k_{2}

, since it is the parameter that determines the number of equivalence classes and, therefore, the number of elements within them. If in

G_{K}

,

k_{2}

increases, the classes have fewer elements, but there are more classes; on the contrary, if it decreases, so does the number of classes, but the number of elements of each increases. Something similar happens in the first methodology. The operations of the space partitioning and going through the elements of each class are done with the decimal representation and the specific operations of the GA with the binary representation. For more details, see [16,17].

In Figure 1, the relationship of the content by subsections and the attack on block ciphers are shown in a flowchart.

3. Study of Parameters in the GA

3.1. Time Estimation

In GAs, less complex operations such as mutation and crossing are performed within each class, where the elements have block length

k_{2} \leq k_{1}

or

k_{d} \leq k_{1}

depending on the way of partitioning the space. However, despite the variation of these two parameters, the calculation of the fitness function, being the function of greater complexity within the GA, is carried out using (8), i.e., with the complete key of length

k_{1}

, and not with the part of it found in the class. This means that a variation in the number of elements in a class does not affect the fitness function’s cost. Moreover, if all the parameters remain the same, the GA’s time in each generation must be quite similar, even if

k_{2}

varies. To check this, experiments were done with a PC with an Intel(R) Core (TM) i3-4160 CPU @ 3.60GHz (four CPUs), and 4GB of RAM. AES(t) encryption was used, a parametric version of AES, where

t \in {3, 4, 5, 6, 7, 8}

and also AES(8) = AES (see [18,19]). The experiment consisted of executing the GA with the BBM methodology and measuring the time (in minutes) that it took in a generation for different values of

k_{2}

(keeping the other parameters fixed), then verifying if these data were used to forecast the time it would take in n generations. The size of the population was

m = 100

in all cases.

Table 1, Table 2 and Table 3 summarize the results corresponding to AES(3), AES(4), and AES(7), respectively. The first column has the different values that were given to

k_{2}

. The second column is the average time

t_{k_{2}}

that was obtained for a generation in 10 executions of each

k_{2}

. The general mean for all the

k_{2}

values is

t_{m} = 0.0435571

minutes approximately in Table 1,

t_{m} = 0.0519393

in Table 2, and

t_{m} = 0.1900297

in Table 3. The third column represents the number of generations (

n_{g}

). The real-time that the algorithm takes,

t_{r}

, appears in the fourth column. The fifth column is the estimated time,

t_{e}

, that should be delayed, the calculation of which is based on:

t_{e} = t_{m} n_{g} .

(9)

Finally, the last column is the error of the prediction,

E_{p} = | t_{r} - t_{e} | .

With these experiments, we wanted to check for the procedure whether if for a specific value of

k_{2}

and having

n_{g}

generations, then the approximate time (t) that the GA would take to complete those generations was

t \approx t_{e} .

With a generation, or very few, the average time it took for the GA was slightly slower, decreasing and tending to stabilize at a limit as it performed more iterations. This was due to probabilistic functions that intervened in the GA and a set of operations to randomly create an initial population. Therefore, the criterion for calculating the average time

t_{k_{2}}

was to let the GA finish executing in a certain number of generations, either because it found the key or because it reached the last iteration without finding it, and then calculate the average. Therefore, calculating

t_{k_{2}}

in a few generations or setting the amount to one, would get longer times; however, doing so would be valid if the intention were to go over the top in estimating the time that the algorithm consumed.

In the case of AES(7) (Table 3), we only experimented with the values 17 and 18 of

k_{2}

, since considering all the previous (or higher) values would take a considerably longer time (given the greater strength of AES(7)).

Similar results were obtained if more values of

k_{2}

were chosen to calculate

t_{m}

. For example, using a PC Laptop with a processor: Intel (R) Celeron (R) CPU N3050 @ 1.60GHz (two CPUs), ∼1.6 GHz, and 4 GB of RAM and going through all the values of

k_{2}

from 10 to 48 (AES(3) key length),

t_{m} = 0.2340212

was obtained. Now, for

n_{g} = 215

, we had

t_{r} = 48.14715

and

t_{e} = t_{m} n_{g} \approx 50.3145 .

In another test:

n_{g} = 150

,

t_{r} = 34.9565

, then

t_{e} \approx 35.1032

. Note that the PC used in this case had different characteristics and less computational capacity than the experiments in Table 1, Table 2 and Table 3. The interesting thing is that under these conditions, the results were as expected as well.

In a similar way, the GA was executed with the TBB methodology for the search in

G_{K}

, for values of

k_{d}

equal to those of

k_{2}

and different generations (

n_{g}

). It was observed that the time estimates behaved in a similar way to the results presented previously for the BBM methodology. Note that in the AES(t) family of ciphers, the length of the key increases from 48 for AES(3) to 128 for AES(8); however, regardless of the key length, the same behavior was seen in all of them.

Now, we showed with these experiments another application of this study on time estimation. In the GA scheme with the BBM methodology, the total number of generations (iterations) to perform for a given value of

k_{2}

is:

g = ⌊\frac{2^{k_{2}}}{m}⌋ .

(10)

Taking

n_{g} = g

, by using

t_{e}

, then we can do an a priori estimation for a given value of

k_{2}

, of the total time it will take the GA to perform all the generations or a certain desired percent of them. For example, in AES(3), for

k_{2} = 16

, in Expression (10), we have

g = 655

; now, since

t_{m} = 0.0435571

in Table 1, then the approximate time that the GA will consume to perform 655 generations is

t_{e} \approx 655 \cdot 0.0435571 \approx 28.5299

, as can be seen in the table. Another example can be seen in Table 2, also for

k_{2} = 16

.

On the other hand, supposing we have an available time

t_{e}

, to carry out the attack with this model, thus we may use (9) and (10), to compute an approximated value of

k_{2}

, which implies doing the corresponding partition of the space and computing the number of generations to perform for this time

t_{e}

and the value of

k_{2}

. In this sense, doing

n_{g} = g

in (9), we have:

k_{2} \approx ⌊{log}_{2} \frac{m t_{e}}{t_{m}}⌋ .

(11)

We remark that the above is valid in the TBB methodology, only that

k_{d}

is used instead of

k_{2}

.

As can be observed, the results on the estimation of time were favorable. In this sense, the following points can be summarized:

Taking into account the estimation of time $t_{e}$ and its observed closeness to the real value $t_{r}$ , a number of generations to be carried out in an available or desired time can be estimated (using Expression (9)), which can be taken as a starting point for the proper choice of $k_{2}$ , or $k_{d}$ in $G_{K}$ (see Section 2). In this way, it is possible to adapt the size of the search space (to choose a proper value of $k_{2}$ using (11)) to the number of generations that it is estimated can be executed in a given time.
The time $t_{k_{2}}$ could be used to perform the time estimation of its own $k_{2}$ , but as can be seen in the tables, sometimes, it makes predictions with minor errors and other times greater than with $t_{m}$ . Another drawback is that it cannot be used for other $k_{2}$ . On the contrary, the main advantage of using $t_{m}$ is that it can be calculated for some sparse values of $k_{2}$ and be used to estimate the time even with values of this parameter whose $t_{k_{2}}$ has not been calculated.

3.2. Proposal of Other Fitness Functions

In the context of the BBM and TBB methodologies used in this work with the GA, we studied in this section which fitness functions provided a better response, in the sense that consistent keys were obtained as solutions in a greater percentage of occasions. Let E be a block cipher with length n of plaintext and ciphertext, defined as in Expression (7), T a plaintext, K a key, and C the corresponding ciphertext, that is

C = E (K, T)

. Let:

D : {0.1}^{m} \times {0.1}^{n} \to {0.1}^{n},

(12)

be the function of decryption of E, such that

T = D (K, C)

. Then, the fitness function with which we have been working and based on the Hamming distance

d_{H}

, for a certain individual X of the population, is:

F_{1} (X) = \frac{n - d_{H} (C, E (X, T))}{n},

(13)

which measures the closeness between the encrypted texts C and the text obtained from encrypting T with the probable key X (see [16]). A similar function is the one that measures the closeness between plaintexts:

F_{2} (X) = \frac{n - d_{H} (T, D (X, C))}{n} .

(14)

Another function that follows the idea of comparing texts in binary with

d_{H}

is the weighting of

F_{1}

and

F_{2}

. Let

α, β \in [0, 1] \subset R

, such that

α + β = 1

, then this function would be defined as follows:

F_{3} (X) = α F_{1} (X) + β F_{2} (X) .

(15)

It is interesting to note that

F_{3}

is more time consuming than each function separately, but the idea is to be more efficient in searching for the key.

The fitness functions proposed at this point are based on measuring the closeness of the plaintext and ciphertext, but in decimals. Let

Y_{d}

be the corresponding conversion to decimals of the binary block Y. The first function is defined as follows,

F_{4} (X) = \frac{2^{n} - 1 - | C_{d} - E {(X, T)}_{d} |}{2^{n} - 1} .

(16)

Note that if the encrypted texts are equal,

C_{d} = E {(X, T)}_{d}

, then

| C_{d} - E {(X, T)}_{d} | = 0

, which implies that

F_{4} (X) = 1

, i.e., if they are equal, then the fitness function takes the highest value. On the contrary, the greatest difference is the farthest they can be, i.e.,

C_{d} = 2^{n} - 1

and

E {(X, T)}_{d} = 0

, and therefore,

F_{4} (X) = 0

. The following is a weighting of the functions

F_{1}

and

F_{4}

,

F_{5} (X) = α F_{1} (X) + β F_{4} (X) .

(17)

Both functions have in common that they measure the closeness between ciphertexts. This is not ambiguous since, for example, if C and

E (X, T)

differ by two bits, the function

F_{1}

will always have the same value no matter what these two bits are. On the contrary, it is not the same in

F_{4}

if the bits are both more or less significant since the numbers are not the same in their decimal representation. The following function measures the closeness in decimals of plaintexts:

F_{6} (X) = \frac{2^{n} - 1 - | T_{d} - D {(X, C)}_{d} |}{2^{n} - 1} .

(18)

Finally, the functions

F_{7}

,

F_{8}

, and

F_{9}

are defined with respect to the previous ones as follows,

\begin{matrix} F_{7} (X) & = & α F_{2} (X) + β F_{6} (X), \end{matrix}

(19)

\begin{matrix} F_{8} (X) & = & α F_{4} (X) + β F_{6} (X), \end{matrix}

(20)

\begin{matrix} F_{9} (X) & = & α_{1} F_{1} (X) + α_{2} F_{2} (X) + α_{3} F_{4} (X) + α_{4} F_{6} (X), \end{matrix}

(21)

where

α_{i} \in [0, 1] \subset R, i \in {1, 2, 3, 4}

and

\sum_{i = 1}^{4} α_{i} = 1

. This guarantees that in general, each

F_{j} (X) \in [0, 1] \subset R, j \in {1, 2, 3, 4, 5, 6, 7, 8, 9}

.

The idea behind the introduction of these functions lies mainly in the fact that there are changes that the Hamming distance does not detect, as opposed to the decimal distance. For example, suppose the key is

a = {(1, 1, 1, 1, 1, 1)}_{2}

, and

b = {(0, 0, 0, 0, 0, 1)}_{2}

is the possible key, both in binary. It is clear that the Hamming distance is five, and the distance in decimals is 62 since

a = 63

and

b = 1

; the fitness functions take the values

1 - 5 / 6 = 0.17

for the binary version and

1 - 62 / 63 = 0.016

for the decimal version. Now, if

b = {(0, 0, 1, 0, 0, 0)}_{2}

, the binary fitness function would still be 0.17 since there are still five different bits; on the other hand,

b = 8

, so the decimal fitness function takes the value

1 - 55 / 63 = 0.13

. Finally, if we take

b = {(1, 0, 0, 0, 0, 0)}_{2} = 32

, then the distance in binary remains the same value, but the decimal continues to change, therefore, the fitness function as well, and takes the value 0.49. Therefore, this shows that the change of b, the decimal distance, is always detected, unlike the binary distance, which remains the same for certain changes.

AES(3) encryption attack experiments were carried out for the two methodologies for partitioning the key space to compare these functions. The main idea is to find the key and not do a component percent match analysis between them, where the fitness functions with the Hamming distance would be more useful. A PC with an Inter (R) Core (TM) i3-4160 CPU @ 3.60GHz (four CPUs), and 4 GB of RAM was used. For the results, we took into account the average time it took to find the key, the average number of generations in which it was found, the percentage of failures (in many attacks carried out), and a parameter called efficiency,

E_{F_{i}}

, which resulted in a weighting of the three previous criteria.

Definition 1

(Fitness functions’ efficiency). Let

μ_{1}

,

μ_{2}, μ_{3} \in [0, 1] \subset R,

μ_{1} + μ_{2} + μ_{3} = 1

,

t_{F_{i}}

,

i = \bar{1, \dots, k}

, the time it takes the GA to find the key with

F_{i}

, on an average for

g_{F_{i}}

generations, and

p_{F_{i}}

the percent of attempts in that the GA did not find the key with

F_{i}

. Then, the efficiency,

E_{F_{i}}

, of the fitness function

F_{i}

with respect to the other

k - 1

functions,

F_{j}, j \neq i

, is defined as,

E_{F_{i}} = 1 - (μ_{1} \frac{t_{F_{i}}}{\sum_{γ = 1}^{k} t_{F_{γ}}} + μ_{2} \frac{g_{F_{i}}}{\sum_{γ = 1}^{k} g_{F_{γ}}} + μ_{3} \frac{p_{F_{i}}}{\sum_{γ = 1}^{k} p_{F_{γ}}}) .

(22)

Note that the number of generations and the failure percentage are inversely proportional to the efficiency

E_{F_{i}}

as the higher these parameters, the lower its efficiency fitness function. Table 4 presents the results of the comparison of the different fitness functions for the BBM space partitioning methodology, in this case

k = 9

. We took

α = β = 0.5

and each

α_{i} = 0.25

. To calculate

E_{F_{i}}

the values

μ_{1} = 0.33

,

μ_{2} = 0.33

and

μ_{3} = 0.34

were taken for

t_{F_{i}}

,

g_{F_{i}}

, and

p_{F_{i}}

, respectively. Sorting

F_{i}

with respect to efficiency, the first five would be

F_{6}

,

F_{8}

,

F_{4}

,

F_{5}

, and

F_{2}

. It is noteworthy that of the first three that use only the Hamming distance, only

F_{2}

appears.

In the comparison of these functions for the TBB methodology of partitioning the key space and searching in

G_{K}

, the experiment results are presented in Table 5. In this case, ordering the functions by their efficiency, the first five would be

F_{1}

,

F_{4}

,

F_{5}

,

F_{8}

, and

F_{6}

. Again, a single function appears from the first three, in this case

F_{1}

, and the others repeat. Note in particular that

F_{8}

(the weight of the functions in decimals) is better than

F_{3}

(the weight of the functions in binary) in each of the parameters measured in both methodologies.

It is interesting to see what happens if the values of the weights are changed in the functions

F_{5}

,

F_{7}

, and

F_{9}

, which combine the functions with distance in decimals and binary, keeping fixed

μ_{1}

,

μ_{2}

, and

μ_{3}

for the calculation of

E_{F_{i}}

. In this sense, in the following group of experiments, the weights were assigned as follows for each methodology: the values were 0.2 and 0.8; first, in each of these three functions, the subfunctions in binary were favored, from which

α = 0.8

,

β = 0.2

(in

F_{5}

,

F_{7}

),

α_{1} = α_{2} = 0.4

, and

α_{3} = α_{4} = 0.1

(in

F_{9}

; note that this function has two subfunctions with the distance in binary and two in decimals); in this case, we identified the functions as

F_{5 b}

,

F_{7 b}

, and

F_{9 b}

; then, we changed the order of these same weights, and the largest were given to the subfunctions whose distance was in decimals; and we identified the functions for this case as

F_{5 d}

,

F_{7 d}

, and

F_{9 d}

.

For the BBM methodology, the results are presented in Table 6. Note that according to

E_{F_{i}}

, the first is

F_{7 d}

, followed by

F_{5 d}

and

F_{9 d}

.

In Figure 2, these results are compared, according to

E_{F_{i}}

, with those of Table 4, also including the values of

F_{5}

,

F_{7}

, and

F_{9}

. Sorting the functions according to their efficiency, the first five are

F_{7 d}

,

F_{6}

,

F_{8}

,

F_{5 d}

, and

F_{9 d}

.

Notice how the best results prevail in the functions with the distance in decimals. In this sense,

F_{7}

and

F_{9}

(now as

F_{7 d}

and

F_{9 d}

) are incorporated into the first ones and three of those that already were in this group in the above experiments,

F_{5}

(as

F_{5 d})

,

F_{6}

, and

F_{8}

.

In the case of the TBB methodology, the results are presented in Table 7. According to efficiency, the first is

F_{7 b}

, followed by

F_{5 d}

and

F_{5 b}

.

In Figure 3, these results are compared with those of all the functions of Table 5. The first five are now

F_{1}

,

F_{5}

,

F_{4}

,

F_{7 b}

, and

F_{5 d}

; notice how the functions that contain the distance prevail in decimals and this combined with binary. In the experiments, the best global behavior of the functions with the decimal distance is verified, and specifically in the BBM methodology, where the keys are grouped into intervals according to their decimal position in space, contrary to the other methodology, where the keys of each class are positioned throughout the space.

Note that when comparing Figure 2 and Figure 3, the values of

E_{F_{i}}

that are in the tables are not directly compared, but rather, it is necessary to recalculate

E_{F_{i}}

taking into account that there are 15 functions. We mean,

E_{F_{δ_{i}}} = 1 - (μ_{1} \frac{t_{F_{δ_{i}}}}{\sum_{γ = 1}^{k} t_{F_{δ_{γ}}}} + μ_{2} \frac{g_{F_{δ_{i}}}}{\sum_{γ = 1}^{k} g_{F_{δ_{γ}}}} + μ_{3} \frac{p_{F_{δ_{i}}}}{\sum_{γ = 1}^{k} p_{F_{δ_{γ}}}}),

(23)

where

δ_{i} \in {1, \dots, 9, 5 b, 5 d, 7 b, 7 d, 9 b, 9 d}

,

i = \bar{1, \dots, k}

, and,

k = 15

.

4. Conclusions

In this article, various aspects of some parameters of the GA for the attack on block ciphers were studied. In the first place, a way of estimating the time that the GA takes in a given number of generations was proposed, having an average of the time that this algorithm takes in one generation. This study is important to jointly evaluate different parameters and make the best decisions according to the computational capacity, available time, and an adequate selection of the size of the search space when using the BBM and TBB methodologies. On the other hand, several fitness functions were proposed with favorable results in the experiments with respect to the fitness functions using only the Hamming distance. In this sense, it was found that the fitness functions that use the decimal distance, in general, are more efficient than those that use only the Hamming distance, especially in the methodology BBM.

As future work, several directions are possible. Similar studies can be carried out with the GA working with other parameters, such as varying the crossover probability and mutation rate and making comparisons regarding the percentage of success of the method. It is also recommended to explore other heuristic techniques and to evaluate the use of whole space partitioning methods so that the methods work closed on the subsets. In the same way, it is also recommended to investigate the combined use with some other tools such as machine learning, deep learning, ANN, SVM, and GEP.

Author Contributions

Conceptualization, O.T.-C. and M.B.-Q.; methodology, O.T.-C., M.B.-Q., M.A.B.-T., and G.S.-G.; software, O.T.-C.; validation, O.T.-C., M.B.-Q., M.A.B.-T., O.R., and G.S.-G.; formal analysis, M.A.B.-T., M.B.-Q., O.R., and G.S.-G.; investigation, O.T.-C., M.B.-Q., M.A.B.-T., and G.S.-G.; writing—original draft preparation, O.T.-C. and M.B.-Q.; writing—review and editing, O.T.-C., M.B.-Q., M.A.B.-T., O.R., and G.S.-G.; visualization, O.T.-C.; supervision, M.A.B.-T., M.B.-Q., O.R., and G.S.-G. All authors read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Conflicts of Interest

The authors declare no conflict of interest.

References

Grari, H.; Azouaoui, A.; Zine-Dine, K. A cryptanalytic attack of simplified-AES using ant colony optimization. Int. J. Electr. Comput. Eng. 2019, 9, 4287. [Google Scholar] [CrossRef]
Jawad, R.N.; Ali, F.H. Using Evolving Algorithms to Cryptanalysis Nonlinear Cryptosystems. Baghdad Sci. J. 2020, 17, 0682. [Google Scholar] [CrossRef]
Blackledge, J.; Mosola, N. Applications of Artificial Intelligence to Cryptography. Transactions on Machine Learning & Artifical Intellengence 6th 20. Trans. Mach. Learn. Artif. Intellengance 2020. [Google Scholar] [CrossRef]
Lee, T.R.; Teh, J.S.; Yan, J.L.S.; Jamil, N.; Yeoh, W.Z. A Machine Learning Approach to Predicting Block Cipher Security. In Proceedings of the Cryptology and Information Security Conference, Seoul, Korea, 2–4 December 2020; p. 122. [Google Scholar]
So, J. Deep Learning-Based Cryptanalysis of Lightweight Block Ciphers. Secur. Commun. Netw. 2020, 2020, 3701067. [Google Scholar] [CrossRef]
You, L.; Yan, K.; Liu, N. Assessing artificial neural network performance for predicting interlayer conditions and layer modulus of multi-layered flexible pavement. Front. Struct. Civ. Eng. 2020, 14, 487–500. [Google Scholar] [CrossRef]
Qiu, X.; Xu, J.X.; Tao, J.Q.; Yang, Q. Asphalt Pavement Icing Condition Criterion and SVM-Based Prediction Analysis. J. Highw. Transp. Res. Dev. 2018, 12, 1–9. [Google Scholar] [CrossRef]
Leon, L.P.; Gay, D. Gene expression programming for evaluation of aggregate angularity effects on permanent deformation of asphalt mixtures. Constr. Build. Mater. 2019, 211, 470–478. [Google Scholar] [CrossRef]
Vimalathithan, R.; Valarmathi, M.L. Cryptanalysis of DES using computational intelligence. Eur. J. Sci. Res. 2011, 55, 237–244. [Google Scholar]
Brown, J.A.; Houghten, S.; Ombuki-Berman, B. Genetic algorithm cryptanalysis of a substitution permutation network. In Proceedings of the 2009 IEEE Symposium on Computational Intelligence in Cyber Security, Nashville, TN, USA, 30 March–2 April 2009; pp. 115–121. [Google Scholar] [CrossRef]
Garg, P.; Varshney, S.; Bhardwaj, M. Cryptanalysis of Simplified Data Encryption Standard Using Genetic Algorithm. Am. J. Netw. Commun. 2015, 4, 32. [Google Scholar] [CrossRef] [Green Version]
Al Adwan, F.; Al Shraideh, M.; Saleem Al Saidat, M.R. A genetic algorithm approach for breaking of bimplified data encryption standard. Int. J. Secur. Appl. 2015, 9, 295–304. [Google Scholar] [CrossRef]
Delman, B. Genetic Algorithms in Cryptography. Master’s Thesis, Rochester Institute of Technology, New York, NY, USA, 2004. [Google Scholar]
Baragada, S.R.; Reddy, P.S. A Survey of Cryptanalytic Works Based on Genetic Algorithms-IJETTCS-2013-08-20-024. Int. J. Emerg. Trends Technol. Comput. Sci. (IJETTCS) 2013, 2, 18–22. [Google Scholar]
Khan, A.H.; Lone, A.H.; Badroo, F.A. The Applicability of Genetic Algorithm in Cryptanalysis: A Survey. Int. J. Comput. Appl. 2015, 130, 42–46. [Google Scholar] [CrossRef]
Borges-Trenard, M.; Borges-Quintana, M.; Monier-Columbié, L. An application of genetic algorithm to cryptanalysis of block ciphers by partitioning the key space. J. Discret. Math. Sci. Cryptogr. 2019. [Google Scholar] [CrossRef]
Tito, O.; Borges-Trenard, M.A.; Borges-Quintana, M. Ataques a cifrados en bloques mediante búsquedas en grupos cocientes de las claves. Rev. Cienc. Mat. 2019, 33, 71–74. [Google Scholar]
Monier-Columbié, L. Sobre los Ataques Lineal y Genético a Cifrados en Bloques. Master’s Thesis, Universidad de la Habana, Habana, Cuba, 2018. [Google Scholar]
Nakahara, J.; de Freitas, D.S. Mini-ciphers: A reliable testbeb for cryptanalysis? In Dagstuhl Seminar Proceedings. 09031. Symmetric Cryptography; Schloss Dagstuhl-Leibniz-Zentrum für Informatik: Wadern, Germany, 2009. [Google Scholar]

Figure 1. Flowchart of the relationship between content by subsections and the attack on block ciphers.

Figure 2. Efficiency of all fitness functions in the BBM methodology.

Figure 3. Efficiency of all fitness functions in the TBB methodology.

Table 1. Time estimation in AES(3).

$k_{2}$	1 Gen	n Gen	$t_{r}$	$t_{e}$	$E_{p}$
10	0.0355	2	0.0962	0.0871	0.0091
11	0.0518	20	0.7134	0.8711	0.1577
12	0.0429	27	1.0491	1.1760	0.1269
13	0.042	81	3.3475	3.5281	0.1806
14	0.0429	49	2.0863	2.1343	0.0481
15	0.0454	71	3.1606	3.0926	0.0680
16	0.0444	655	28.9312	28.5299	0.4012

Table 2. Time estimation in AES(4).

$k_{2}$	1 Gen	n Gen	$t_{r}$	$t_{e}$	$E_{p}$
10	0.0519	9	0.3739	0.4675	0.0936
11	0.0553	8	0.3838	0.4155	0.0318
12	0.0465	5	0.2756	0.2597	0.0159
13	0.0564	2	0.1303	0.1039	0.0264
14	0.0506	81	4.1554	4.2071	0.0517
15	0.0510	98	4.9621	5.0900	0.1279
16	0.0519	655	34.1330	34.0202	0.1128

Table 3. Time estimation in AES(7).

$k_{2}$	1 Gen	n Gen	$t_{r}$	$t_{e}$	$E_{p}$
17	0.1895	373	69.1909	70.8811	1.6902
18	0.1905	932	178.069	177.108	0.9610

Table 4. Comparison of fitness functions, with BBM.

$F_{i}$	Times	Generations	Failures (%)	$E_{F_{i}}$
$F_{1}$	5.233	121.2	60	0.8731
$F_{2}$	5.402	108.4	50	0.8870
$F_{3}$	11.101	117.4	50	0.8584
$F_{4}$	4.764	109.2	40	0.8995
$F_{5}$	9.451	109.8	30	0.8885
$F_{6}$	3.126	63.4	20	0.9433
$F_{7}$	12.424	121.3	50	0.8511
$F_{8}$	7.054	77.1	10	0.9309
$F_{9}$	15.811	87.7	30	0.8682

Table 5. Comparison of fitness functions, with TBB.

$F_{i}$	Times	Generations	Failures (%)	$E_{F_{i}}$
$F_{1}$	3.688	83.1	20	0.9278
$F_{2}$	5.353	109.1	60	0.8633
$F_{3}$	11.403	122.9	40	0.8536
$F_{4}$	3.226	67.8	30	0.9240
$F_{5}$	7.147	83.4	10	0.9235
$F_{6}$	4.871	96.2	40	0.8939
$F_{7}$	10.694	113.1	20	0.8840
$F_{8}$	8.354	92	20	0.9029
$F_{9}$	16.876	95.7	50	0.8270

Table 6. Comparison of functions

F_{5}

,

F_{7}

, and

F_{9}

, with BBM.

Table 6. Comparison of functions

F_{5}

,

F_{7}

, and

F_{9}

, with BBM.

$F_{i}$	Times	Generations	Failures (%)	$E_{F_{i}}$
$F_{5 b}$	10.247	115	50	0.772
$F_{7 b}$	9.131	90.6	40	0.814
$F_{9 b}$	20.053	107.4	50	0.728
$F_{5 d}$	7.276	83.3	10	0.891
$F_{7 d}$	5.921	61.3	0	0.933
$F_{9 d}$	13.799	77.5	10	0.862

Table 7. Comparison of functions

F_{5}

,

F_{7}

, and

F_{9}

, with TBB.

Table 7. Comparison of functions

F_{5}

,

F_{7}

, and

F_{9}

, with TBB.

$F_{i}$	Times	Generations	Failures (%)	$E_{F_{i}}$
$F_{5 b}$	9.987	111.5	40	0.845
$F_{7 b}$	8.578	86.7	10	0.909
$F_{9 b}$	22.500	119.1	50	0.777
$F_{5 d}$	8.341	96.9	10	0.905
$F_{7 d}$	13.623	141.8	80	0.754
$F_{9 d}$	22.183	114.8	30	0.811

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tito-Corrioso, O.; Borges-Trenard, M.A.; Borges-Quintana, M.; Rojas, O.; Sosa-Gómez, G. Study of Parameters in the Genetic Algorithm for the Attack on Block Ciphers. Symmetry 2021, 13, 806. https://doi.org/10.3390/sym13050806

AMA Style

Tito-Corrioso O, Borges-Trenard MA, Borges-Quintana M, Rojas O, Sosa-Gómez G. Study of Parameters in the Genetic Algorithm for the Attack on Block Ciphers. Symmetry. 2021; 13(5):806. https://doi.org/10.3390/sym13050806

Chicago/Turabian Style

Tito-Corrioso, Osmani, Miguel Angel Borges-Trenard, Mijail Borges-Quintana, Omar Rojas, and Guillermo Sosa-Gómez. 2021. "Study of Parameters in the Genetic Algorithm for the Attack on Block Ciphers" Symmetry 13, no. 5: 806. https://doi.org/10.3390/sym13050806

APA Style

Tito-Corrioso, O., Borges-Trenard, M. A., Borges-Quintana, M., Rojas, O., & Sosa-Gómez, G. (2021). Study of Parameters in the Genetic Algorithm for the Attack on Block Ciphers. Symmetry, 13(5), 806. https://doi.org/10.3390/sym13050806

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Study of Parameters in the Genetic Algorithm for the Attack on Block Ciphers

Abstract

1. Introduction

2. Preliminaries

2.1. The Genetic Algorithm

2.2. Key Space Partition Methodologies

3. Study of Parameters in the GA

3.1. Time Estimation

3.2. Proposal of Other Fitness Functions

4. Conclusions

Author Contributions

Funding

Informed Consent Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI