Breaking Data Encryption Standard with a Reduced Number of Rounds Using Metaheuristics Differential Cryptanalysis

This article presents the author’s own metaheuristic cryptanalytic attack based on the use of differential cryptanalysis (DC) methods and memetic algorithms (MA) that improve the local search process through simulated annealing (SA). The suggested attack will be verified on a set of ciphertexts generated with the well-known DES (data encryption standard) reduced to six rounds. The aim of the attack is to guess the last encryption subkey, for each of the two characteristics Ω. Knowing the last subkey, it is possible to recreate the complete encryption key and thus decrypt the cryptogram. The suggested approach makes it possible to automatically reject solutions (keys) that represent the worst fitness function, owing to which we are able to significantly reduce the attack search space. The memetic algorithm (MASA) created in such a way will be compared with other metaheuristic techniques suggested in literature, in particular, with the genetic algorithm (NGA) and the classical differential cryptanalysis attack, in terms of consumption of memory and time needed to guess the key. The article also investigated the entropy of MASA and NGA attacks.


Introduction
The growing popularity of computerisation, and at the same time the Internet itself, results in a growing demand for more and more advanced security methods. Restrictions such as individual user access control or basic authentication have become insufficient today. For several decades, engineers concentrating on the topic of information security have designed special cryptographic algorithms that meet the most important security aspects.
The main assumption of cryptography is not to hide the fact of the existence of information, but to keep its real image secret. The message is transformed in such a way that it is readable only to its author and the recipient it is dedicated to [1,2].
Contemporary symmetric block ciphers implement the process of transformation of the plain text using the Feistel cipher and the generalized substitution-permutation network [2]. In 1990, a completely new cryptanalytical method was made public, namely differential cryptanalysis [3]. In the case of the most modern and advanced encryption algorithms, the differential cryptanalysis itself turns out to be ineffective. In order to improve the attack performance, it was proposed to combine metaheuristic algorithms with the differential cryptanalysis algorithm.
In general, metaheuristic algorithms are used to obtain approximate solutions. In the case of cryptanalysis, it is necessary to guess the ideal decryption key-an approximate solution is unacceptable. Due to the avalanche effect present in every encryption algorithm today, changing any bit at the input causes a complete mixing of all bits at the output, which in fact results in the generation of a completely new ciphertext [1]. The developed algorithm enables automatic sifting of the keys with the worst value of the fitness function, owing to which the set of potential solutions will be significantly reduced.
Additional analytical properties of memetic algorithms improve the local search process in such a way as to achieve the best solution in the shortest possible time.
Metaheuristic algorithms are more and more often used in computer science, and thus in the domain of computer security. In the literature, we can find publications describing all kinds of metaheuristic attacks targeting both classical ciphers, contemporary symmetric block ciphers and stream ciphers. A literature review of publications is presented in Table 1. In [4], the authors focused on evolutionary cryptanalysis using GA on DES4 ciphers by comparing the same bits between original and encrypted ciphertexts. Tadros in [5] presented another GA used to break FEAL8 and DES4 ciphers. Garg in [6] included a comparison between MA and GA during cryptanalysis of SDES encryption algorithm relying on n-gram statistics and frequency analysis method. Another approach was present by Hu in [7], quantum-inspired GA has been applied to break TEA. Abd-Elmonim described another attack, based on the PSO algorithm, responsible to break the full 16-rounded DES cipher in [8]. Vimalathithan and Valarmathi presented their researches about combining the effectiveness of GA and PSO as a new Generic Swarm Optimization algorithm to attack SDES cipher. In 2012, Jadon [10] and Pandey, with Mishra published interesting approaches related to Binary PSO and original PSO algorithms used in cryptanalysis attacks dedicated to DES cipher.
The next chapter is dedicated to a brief introduction to symmetric block ciphers and the DES cipher. The third chapter presents the basic assumptions of differential cryptanalysis, which were used and which constituted a basis for the design work on the MASA algorithm. Chapter four contains a detailed description of the developed metaheuristic attack carried out with the use of MA. The next chapter focuses on describing the runtime environment, including presenting all the parameters selected for each attack. This chapter also presents the results of the experiments, including the entropy studies for the MASA and NGA algorithms. The second to last chapter presents a detailed analysis of the effectiveness of the attacks presented, both in terms of the number of proven solutions and the time of decryption of the cryptogram. The article is concluded with a brief summary of the various stages of the research. This chapter also suggests further research directions. Appendix A is attached to this article, detailing the results for the Ω 2 characteristic.

Symmetric Block Ciphers
Symmetric ciphers are still one of the most popular encryption algorithms. In this type of ciphers, only one main key is used, which simultaneously takes on a role of an encryption and decryption key, which can be written as K E = K D . In the case of block ciphers, each message is divided into a finite number of blocks of the same length-for example, 64-bit blocks. Then they are transferred to the appropriate encryption function. Exactly one block of the ciphertext is generated from one block of plain text. If the message cannot be divided into even blocks, an additional block is created to store the last, incomplete, fragment of data. Then, for consistency, it is supplemented with default values or zeros.
These algorithms are perfect for encrypting larger volumes of data stored, that is, in all kinds of warehouses, wholesalers or databases. The most popular block cipher schemes include ciphers such as: DES and AES.

Data Encryption Standard
The DES cipher has been designed in such a way that the avalanche effect occurs from the very beginning of the algorithm [1]. Changing any input bit forces us to change at least half, and sometimes even all, of the output bits. The state of each bit at the output depends on each bit specified at the input [29].
The basic version of the cipher converts 64-bit plain text blocks into 64-bit ciphertext blocks, using a 64-bit encryption key K [2,30]. After running the algorithm, the primary key is reduced to 56 bits by removing every eighth parity bit. K is then subjected to breaking into six 48-bit subkeys, used in each of the cipher rounds, K 1 , ..., K 6 -A description of the primary key distribution process is presented in detail in [1,2,[29][30][31][32]. Figure 1 shows a 6-round DES algorithm. The plain text block is passed to the initial IP permutation. Then, the generated block is divided into two regular 32-bit parts, R and L. In the next steps, six identical encryption cycles will be run, in which the right part of the R i is passed to the f -round function along with the corresponding subkey K i . Then, the generated data block is subjected to the exclusive disjunction operation with the left part of the L i , resulting in a new right part of the R i+1 . The new left part of the L i+1 is copied from the right part of the previous R i cycle.
After all the cipher rounds have been completed, parts of the L 6 and R 6 are combined into a 64-bit block, which will undergo the last transformation by the IP −1 inverse permutation function. The result of transposition of individual bits will be a 64-bit cryptogram block.
The f round function has been visualized in Figure 2. As an input parameter, a 32-bit data block is given, which at the very beginning will be extended via permutation E. The aim of this transformation is to align the length of the transferred block with the size of the subkey by duplicating the selected bits. By allowing one bit to influence two substitutions, the avalanche effect is increased [1]. The generated sequence is modulo two sum with subkey bits and then divided into eight 6-bit B 1 -B 8 blocks. Each of the B j blocks will be transferred to the so-called substitution matrix called S-blocks S j . The main aim of this transform is to compress the input data. 6-bit data blocks will be converted into 4-bit blocks. S j consist of integers between 0 and 15, stored in matrices of sixteen columns and four rows. The first and last bits of a 6-bit sequence B j determine the line number. The remaining four bits represent the number of the column from which the return value will be selected [1,2,30].
S j are the only nonlinear element of the DES standard. Changing one bit in an input sequence can lead to a complete mixing of all generated bits at the output. Modifications carried out in them have a significant impact on the level of complexity of cryptanalysis of the entire cipher. At the end of the f function, the generated sequences are combined into one 32-bit block, which will be passed to the permutation P-aimed at mapping each of the input bits to exactly one output bit without duplicating or omitting any of them [1].

Differential Cryptanalysis
The suggested algorithm is based on an attack with selected plain text. At the beginning, it should be assumed that the cryptanalyst has continuous access to the encryption algorithm, which allows him to select a pair of plain texts and analyse the generated ciphertexts. It is important that the tested pairs must differ from each other in a certain way. Most symmetric block ciphers determine this difference on the basis of a simple symmetric difference operation, which is written as P = P ⊕ P * , where P and P * are two crafted plain texts. Pairs may be generated in a pseudorandom way, although the most important condition is the difference P , which must follow the established process. Next, the cryptanalyst checks how the determined difference changes in the subsequent phases of the cipher. Using the difference between the texts in individual iterations of the cipher, for a sufficiently large number of pairs, it is possible to assign different probabilities, suggesting the correctness of some subkeys [3]. When analyzing subsequent pairs of plain texts and ciphertexts, it turns out that one key may be more probable than the others.
Every modern cipher is non-linear-it means that it is not possible to find any pattern or rule by which to determine the value of a function for the next argument [3]. This nonlinearity is obtained via the round f function. Each of all possible differences is characterized by a certain probability, which determines how often the f function returns the expected value [3]. These differences are called characteristics Ω. All possible characteristics can be determined by means of an additional matrix, where the rows correspond to all possible symmetric differences of the input blocks, and the columns to all possible symmetric differences of the output blocks [1]. Each of the elements will determine how many times the sum of the output bits occurs for the selected sum of the input bits.
By analysing the diagram shown in Figure 2, the input symmetric difference B can be determined assuming that E = E(R i−1 ): where symbol stands for the concatenation of the successive data blocks. From the expression above, it can be seen that B has nothing to do with the subkey. When the value of each B j is known, the set of all ordered pairs (B j , B * j ) can be determined for the input symmetric difference as suggested in [31]: Knowing the output difference C j = S j (B j ) ⊕ S j (B * j ), it becomes possible to generate the distribution of all possible input differences to all output differences according to the theorem described in [31]: Most often, this distribution will be steady. The cryptanalyst's task is to find distributions that are as unsteady as possible. Based on the expression (3), an additional test set can be determined using the following formula [31]: If the number of elements in test j is equal to the power of I N j set, then the set must contain bits of the K ij subkey [31].
This method makes it possible to restore the correct decryption key using 2 47 selected plain texts and the corresponding ciphertexts.

Metaheuristics Differential Cryptanalysis
From the point of view of the developed attack, the IP and IP −1 permutations may be omitted. The algorithm begins by selecting the two most probable 3-round characteristics Ω 1 P and Ω 2 P mentioned in [31,32], which are presented in Figure 3, where P denotes characteristics for plaintext and C for ciphertexts.
The probability of each characteristic is exactly P Ω = 1 16 in the fourth round of the encryption algorithm S-Blocks S 2 , S 5 , S 6 , S 7 , S 8 for Ω 1 P and S 1 , S 2 , S 4 , S 5 , S 6 for Ω 2 P for some input symmetric difference B j return an output symmetric difference C j equal to zero.
Owing to this, it becomes possible to determine the sets I 1 = {2, 5, 6, 7, 8} for Ω 1 P and I 2 = {1, 2, 4, 5, 6} for Ω 2 P . The further description of the attack is identical for each of the characteristics Ω so it was decided to generalize it by introducing one generic I set consisting of elements of sets I 1 and I 2 .
The next step will be to generate a set of plain text pairs, along with a set of corresponding cryptograms, where the symmetrical difference will correspond to the characteristics Ω 1 and Ω 2 . The number of pairs needed is calculated using the signal-to-noise ratio [3]: where: • m-the number of pairs generated, having no effect on S/N; • p-the probability of the selected characteristic Ω; • k-number of bits of the subkey; • α-the average number of subkeys, suggested by one pair; • β-the ratio of the analysed pairs to all possible ones. As suggested in [3], for S/N = 2 16 , 7-8 correct pairs are needed for each of the characteristics. Due to the probability of P Ω , a minimum of 150-200 pairs of plain text should be generated [3].
Additionally, the test j test set is determined, owing to which it will be possible to partially filter pairs from the set. If the power of the test set for at least one element from set I is equal to 0, the pair may be rejected: The aim of the suggested attack is to guess the last K 6 encryption subkey. If the difference of C and part of R 5 is known, it becomes possible to analyze the various subkeys closely by comparing all bits of the output of the S-blocks with C . A brute-force attack would need to check all 2 30 solutions. MA can be used as an optimization tool that finds the correct solution in much shorter time.
Each individual is represented by a 30-bit K j subkey. The fitness function is defined with the following formula: where: • H-is the Hamming distance; • L-the length of the subkey.
Owing to the knowledge of the probability of P Ω , it is possible to estimate the value of L 3 , while R 6 can be obtained by analyzing a pair of generated ciphertexts. F f counts the number of overlapping bits between the difference obtained from the S-blocks and the C difference.
The algorithm uses standard one-point crossover. The locus is selected pseudorandomly from 1 to 30. The newly created subkeys can be modified with the use of a mutation operator-which consists in replacing two pseudorandomly selected bits. The algorithm selects individuals using tournament selection. A leader is elected from the set of all subkeys and it is passed to the crossover operator.
There is an additional local search process in the algorithm-it is performed using the simulated annealing algorithm. The MASA attack pseudocode for the ΩP characteristic is shown below. Due to the complexity of this algorithm, it was decided to divide it into two parts: • the first one, Algorithm 1-responsible for generating a set of filtered pairs of plain text, ciphertexts and determining the test j test set for each of the indexes; • the second one, presented in Algorithm 2-describing the memetic algorithm, along with the processes of selection, crossing, mutation and exploitation, taking into account the pseudocode of the basic simulated annealing algorithm. Running the MASA algorithm for Ω 1 P will make it possible to guess 30 out of 48 bits of the K 6 subkey. Re-running the algorithm, this time for Ω 2 P , allows us to find an extra 12 bits. In order to obtain the remaining 6 bits of the last K 6 subkey-coming from the S-block S 3 , we can use the brute-force method. Having the K 6 subkey, it is possible to recover 48 out of 56 bits of the decryption key by reversing the key decomposition process. The remaining 8 bits can be guessed using the brute-force method once again-for example, a brute force attack.

Experimental Results
This chapter describes the analysis of the proposed memetic attack MASA and NGA in terms of the quality and number of solutions obtained [22]. It was important to check whether the suggested algorithms make it possible to improve the time of finding the correct subkey. Another important aspect was to check whether the MASA memetic algorithm enables a more effective, and therefore more successful, differential cryptanalysis.

Selecting Parameters
As part of the experiments, the impact of the parameters listed below for each of the attacks on the convergence of the algorithm and the quality of the obtained solutions was examined: • number of iterations for the MASA and NGA algorithms; • population size for the MASA i NGA algorithms; • number of plaintext and ciphertext pairs γ for the MASA and NGA algorithms; • probability of the heuristic negation P n for the NGA algorithm.
In the conducted experiments, the parameter values were used in various combinations and for the subsequent experiments, potentially the best values in terms of the running time of the algorithm were established. For the MASA memetic algorithm, the parameters were set according to Table 2 below: The description of the NGA algorithm parameters has been described in detail in the publication [19]. Table 3 presents the most important parameters of the NGA algorithm: Tourney size T SIZE 10 5 Crossover probability P c 0. 9 6 Mutation probability P m 0.02 7 Heuristic operator probability P n 0.25 As was mentioned before, for the purposes of the tests, a simplified version of the DES cipher was used, in which the number of rounds was limited from 16 to 6. All other processes in the encryption algorithm, such as subkey generation and S-block compression, remained unchanged.

Comparative Study
Each of the algorithms was tested 30 times for each of the characteristics Ω. Table 4 below shows the value of the F f fitness function for the MASA and NGA algorithms for the first characteristic Ω 1 . The remaining results-for the characteristic Ω 2 are given in Appendix A in the Table A1. Experiments in which the correct decryption key could not be guessed were marked in bold in the table above.
The probability of each of the characteristics for this cipher is not 100%. It means that despite striving for the maximum value of the fitness function, it will never be achieved. The inability to obtain the maximum value means that we are not able to terminate the running of the algorithm earlier than after the completion of all predetermined iterations.  In a large number of cases, the MASA attack finds the correct subkey in the first 25 iterations. In approximately 6-7 cases, the algorithm found a solution using half of the available iterations, while in the other two cases (tests #3 and #21, marked as red on the figure) the attack failed to cope with the given ciphertext. The algorithm found the correct decryption subkey in 93% of the cases -markes as green on the figure. In the case of the NGA algorithm, the cipher was not cracked 11 times-which is over 37% of all possible approaches-red bars on the figure. During the remaining 63% of the tests, it was possible to crack the cipher with the decryption algorithm-green color. In most cases, it was possible to guess the correct subkey using only 30-40 iterations. The tests with identifiers #1 and #26 also deserve special attention. They show a very large number of iterations (over 80), which means that the NGA algorithm found the correct solution at the very end of its running.
On the presented bar plots we can notice the MASA algorithm is much effective because it successfully found the correct subkey in almost every test when NGA attack has worked in only 63% of experiments. Simulated annealing, used as an additional exploitation step of the MA, is more effective than the heuristic negation operator used in the NGA attack.
The next stage of the experiments was to analyze the course of the fitness function value using the convergence diagrams, which were presented successively, for the MASA attack and Ω 1 in Figures 6 and 7, for the NGA algorithm. Convergence diagrams for the Ω 2 were present in Appendix A in the Figure A3, for the MASA algorithm, and Figure A4 for the NGA attack.  The above graph shows tests #3 and #4 with minimum, maximum, medians and averages-and average values increased and decreased by the standard deviation of the fitness function. The tests were selected in such a way as to visualize both a positive case-when it was possible to guess the correct subkey, and a negative one.
In the case of both tests of the MASA algorithm, a rapid increase in the maximum value of F f can be noticed at the very beginning of the algorithm's running. In further iterations, there are single drops of this value, after which the maximum value is stabilized and then increased again. The median for 60% of the algorithm's running time remains similar, only at the very end of its running we can notice its decrease. When analyzing the case #4 diagram, already in the first iterations of the algorithm, a rapid increase in the median value can be observed-the majority of individuals in the population have a similar value of the fitness function. This may be related to the algorithm falling into the local extreme, which it has not managed to leave.  The next stage of the tests was to review the distribution of the fitness function values in the last iteration of each attack-the distribution is presented in Figure 8 for the MASA algorithm, and Figure 9 in the case of an NGA attack. Boxplots for the Ω 2 characteristic were present in Appendix A in the Figure A5, for the MASA algorithm, and Figure A6 for the NGA attack.   1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29   In the case of the MASA algorithm, some of the tests-for example, #18, #20 or #22are characterized by a high degree of homogeneity, which means that the population is characterized by a low diversity of individuals. When analyzing each of the attacks, a large degree of variability between individuals can be observed, which is undoubtedly indicated by the median value, changing its position between the first and the third quartiles. In the case of the NGA algorithm, in some experimentes, an unexpected increase of the value of the fitness function can be observed at the very end of the algorithm's running-it is evidenced by the presence of the outlier of the maximum value.
The MASA and NGA attacks are characterized by a certain degree of pseudo-randomness. In order to perform statistical verification of the algorithms, a non-parametric Wilcoxon's test was used to compare the results. The hypothesis H 0 , specifying no difference when comparing the samples, and the hypothesis H 1 , assuming a difference between the two samples, were set. The following criteria were used to perform the test: • value of the fitness function-performed for the best quality subkeys found for each run; • number of subkeys checked.
The weight of each criterion was expressed at the same value, set to 0.5. For the analyses performed, hypothesis H 0 was rejected at p < 0.05-thus indicating the statistically important differences between the best results retrieved. The results obtained through the MASA algorithm are significantly better than the NGA attack.

Entropy Study
The possibility to maintain a highly diverse population may improve the algorithm's ability not to fall into local extremes. In order to estimate the size of the disorder in the system, the entropy was used: The entropy was computed by comparing the respective bits of each subkey with the corresponding bits of the best-adapted individual. An example for the population P = {11101, 10101, 11011, 11110}, where the last individual 11110 is the leader, is presented below (Table 5): where: • p(x 1 )-the probability of an identical bit occurring in a given position between individuals and the leader; • p(x 2 )-the probability of a different bit occurring in a given position between individuals and the leader; • H(x 1 )-entropy values for the probability p(x 1 ), at a given position; • H(x 2 )-entropy values for the probability p(x 2 ), at a given position.
Based on the example listed in Table 5, the entropy value of the entire system can be computed as follows: Entropy for the MASA and NGA algorithms was visualized respectively in Figures 10 and 11. The charts show the maximum, minimum and average values. Moreover, it was decided to visualize the average value of entropy for both attacks on one graph, which is presented in Figure 12 The entropy value was computed during each iteration and 30 launches of MASA and NGA attacks. During all the conducted tests, identical pairs of plain text and the corresponding ciphertexts were used, as well as the same encryption key-owing to which it was possible to make the most reliable comparison.
When analyzing the graphs presented above (Figures 10 and 11), a decrease in the entropy value can be noticed from the very beginning of the running of each of the algorithms. In the last iterations, a gradual stabilization of the system becomes visible, which would most probably be more noticeable after increasing the number of iterations. Comparing the average courses, it can be noticed in Figure 12 that the entropy value for the MASA attack is lower from the very beginning. Only from about the thirtieth iteration, the NGA algorithm obtains a similar value, and sometimes even lower, in relation to the MASA attack. Eventually, the entropy values for the NGA algorithm begin to stabilize at around the sixtieth iteration, while in the case of the MASA attack it continues to decrease. At the end of the algorithms' running, the difference in entropy value between attacks becomes visible.
The experiments carried out and described above clearly confirm the effectiveness of the suggested MASA attack, based on the use of memetic algorithms and simulated annealing. This information may be important during the running of the algorithm, since the probability of leaving the local extremum will be higher, and thus the quality of the final results will be better.

Conclusions
The article presents the results for the NGA genetic algorithm enriched with an additional heuristic negation operator and the MASA memetic algorithm that performs the local search process through simulated annealing. Both algorithms undoubtedly improve the process of an attack of differential cryptanalysis against the ciphertexts generated with the DES standard. An important aspect is the attempt to minimize the number of verified subkeys, which is presented in the table below: The developed algorithms improve the effectiveness and efficiency of the attack, which is extremely important from the point of view of a cryptanalyst. Presented metaheuristics cryptanalysis, based on the differential cryptanalysis approach, can be helpful to raise the security level in already implemented IT systems. It can also be used to improve the complexity of ciphers at the design level. Proposed attacks, verified on the DES cipher, can be tested on more complicated modern encryption algorithms like AES or GOST ciphers.
Based on the tests presented in the previous section and Table 6, it is possible to clearly state the superiority of the MASA attack and the NGA algorithm over the classic differential cryptanalysis attack, due to the frequency of correctly guessed subkey and the number of proven solutions. There are many parameters that influence the quality of offered solutions. Analyzing the importance of individual parameters, we intend in the future to conduct an analysis based on removing some of them or replacing them with a simplified version, without losing the quality of the offered solutions. Such approach (an ablation study) is very common when estimating costs of deep learning solutions and we hope that it will also be very effective here.
Work is currently underway on modifications of the developed attack, which would enable an even faster exploration of the solution space. In the future, an adaptive version of the memetic algorithm is expected to be developed to automatically adjust the attack parameters. A parallel implementation is also planned, which should be much more effective.
Simplified and the original DES encryption algorithms are commonly used by many cryptanalysts as a starting point to perform research and experimental studies in this discipline of science. It can be found in the literature review, presented in Table 1, in the introduction section. The authors of this article decided to use a reduced DES cipher for the purposes of developing new metaheuristic attacks described in the paper. Starting experiments from modern ciphers could be too complicated and significantly extend the research process. At the current state, we can test the proposed algorithms against more advanced symmetric block ciphers such as Twofish, AES, or GOST, which will definitely be the next step in future works.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this paper:   Where the red color indicates experiments when the algorithm wasn't able to find the correct subkey and the green bars indicate are tests when the subkey was successfully guessed.