1. Introduction
From Biham and Shamir in 1990 [
1], differential cryptanalysis has become one of the fundamental cryptanalysis techniques of block ciphers. The main idea of differential cryptanalysis is to distinguish between a block cipher and a random permutation by checking whether the ciphertext difference is related to the plaintext difference.
The prior goal of differential cryptanalysis is to find a pair of plaintext and ciphertext differences with high probability. Many studies have been proposed to find differential pairs [
2,
3,
4], and automatic search using off-the-shelf solvers to the modeled problem [
5,
6,
7,
8] has recently received attention.
Most of the block ciphers are provided with differential cryptanalysis of the ciphers. However, its resistance to differential cryptanalysis is sometimes proved by computing an upper bound of a single differential characteristic probability rather than a differential. Since differential cryptanalysis uses differential while ignoring intermediate values, the resistance may be overestimated if there is a significant gap between the differential probability and the differential characteristic probability. This phenomenon is noticeable in the lightweight block ciphers Midori64 [
9], SKINNY64 [
10], and CRAFT [
11].
Finding a differential and computing a more accurate differential probability is a fundamental goal of differential cryptanalysis. A basic method to compute the differential probability is to find all the differential characteristics and to sum up the characteristic probabilities. However, finding all differential characteristics for a given differential is infeasible. In our research, instead of trying to find all the differential characteristics, we find some differential characteristics with a fixed activity pattern using a multistage graph. For a given differential, we construct a multistage graph in which the path from the source to the sink is equivalent to a differential characteristic. Using the graph, we can count the number of paths from source to sink, that is, the number of differential characteristics, and we can compute a more accurate differential probability.
1.1. Previoust Work
To find a differential characteristic with high probability, several algorithms have been proposed. The branch-and-bound algorithm of Matsui [
4] that finds the differential characteristic of DES-like ciphers recursively searches for the best characteristic of
rounds based on the best characteristics of
rounds. Based on the branch-and-bound algorithm, an improved algorithm [
2], which reduces the search space using a pre-search technique, and a related-key differential characteristic search algorithm [
3] are proposed.
Recently, several automatic methods to find a differential characteristic using off-the-shelf solvers such as Mixed Integer Linear Programming (MILP), Boolean satisfiability problem (SAT) and Satisfiability Modulo Theories (SMT) have been proposed. From the method by Mouha et al. [
8] that counts the number of active S-boxes using the MILP solver, many methods using off-the-shelf solvers are proposed, such as [
5,
7,
12], and are used to show the resistance of block cipher Midori, SKINNY and CRAFT to differential cryptanalysis [
9,
10,
11]. These methos model differential propagation in block ciphers as a specific problem in order to find a solution using MILP, SAT or SMT solvers. A detailed comparison of differential cryptanalysis with MILP, SAT and SMT is also presented [
13].
There are also studies on the differential beyond the differential characteristic of a block cipher. Methods using off-the-shelf solvers can be easily extended to find a collection of differential characteristics with a fixed plaintext/ciphertext difference. From the results of the above methods, the differential probabilities of Salsa-20 [
14], SIMON [
15], SPECK, LEA [
6], and PRINCE, QARMA [
16] are computed. In addition, there is a study using the SMT solver to find differential probabilities of SKINNY and Midori [
17].
Other proposed methods are to find a truncated differential characteristic and then compute the differential probability by multiplying a transition probability matrix. These methods are used to find differential probabilities for TWINE [
18], PRINCE [
19], LAC [
20] and CRAFT [
21]. In addition, the method using a subgraph of a multistage graph representing all differential characteristics [
22] and the meet-in-the-middle method [
23,
24] using a cluster made by forward and backward search are also proposed.
1.2. Contributions
In this paper, we present practical algorithms to compute differential probabilities using a multistage graph based on the meet-in-the-middle method using a cluster in [
23]. We consider the differential characteristics with a fixed activity pattern of Substitution Permutation Network (SPN) structure block ciphers. We construct a multistage graph from the vertex of plaintext difference to the vertex of ciphertext difference. By the graph, we can count the exact number of maximum probability differential characteristics of a given activity pattern. Moreover, we show generalized algorithms to consider characteristics of lower probability than maximum probability. The algorithms presented in this paper run practically on a personal computer, even for full rounds of a block cipher.
Using our method, we present a differential distinguisher for 9-round Midori64 with probability , 9-round SKINNY64 with probability and 14-round CRAFT with probability . Compared to previous works, we present differential distinguishers for Midori64 and SKINNY64 that are one round longer. Furthermore, we present the related-tweakey differential distinguisher with probability and the related-tweak differential distinguisher with probability . We find that the gap between the differential probability and the differential characteristic probability is for Midori64, for SKINNY64 and for CRAFT. These large gaps indicate that it is not appropriate for block cipher designers to consider only a single differential characteristic to show resistance to differential cryptanalysis.
Our algorithms are applicable to block ciphers with S-boxes and word-based diffusion layers. But the gaps between differential and characteristic probability are noticeably large in only Midori64, SKINNY64, and CRAFT. We also explain the large gaps that occur in these block ciphers by relating it to the weak diffusion layer and the S-boxes of block ciphers.
1.3. Outline
In
Section 2, we introduce some background on differential cryptanalysis. In
Section 3, we show that there are a lot of characteristics with fixed difference using the SAT solver. In
Section 4, we show algorithms for computing the differential probability by counting the characteristics using graphs. The results of our algorithms and the discussion are given in
Section 5.
2. Preliminaries
A block cipher
is a function of
to
by
where
is a permutation of
.
is called the
-bit key, and
,
are called the
-bit plaintext, ciphertext. Most block ciphers are classified as iterated block ciphers that are decomposed into
simple round functions.
In many ciphers, the -bit input of the round function is separated into words of -bit each, where . In this paper, we focus on the cipher in which the non-linear part of the round function is composed of 4-bit S-boxes, i.e., . We denote as the output of the round function , which is also the input of . The -th word of is denoted by . And we denote as the -th round of the S layer, as the output of and as the -th round of the linear layer.
2.1. Description of Midori64
Midori [
9] is a family of lightweight block ciphers proposed in 2015, designed for efficient hardware implementations. There are two variants: Midori64 and Midori128. Midori64 employs a 64-bit block, and Midori128 employs a 128-bit block. Both employ a 128-bit key. Since we focus on Midori64 rather than Midori128, we discard the description of Midori128.
In Modori64, the 64-bit plaintext
is presented as
The encryption process of Midori64 is described in Algorithm 1.
Algorithm 1. Encryption process of Midori64 |
Input: 64-bit , 128-bit Output: 64-bit
3-1.
3-2.
3-3.
3-4.
to
|
The S-box of Midori64 is given in
Table 1. The constant
used in key addition has no impact on differential cryptanalysis; no further explanation is given here.
2.2. Description of SKINNY64
SKINNY [
10] is a tweakable lightweight block cipher proposed in 2016. Like the Midori block cipher, there are two variants: SKINNY64 and SKINNY128. SKINNY64 employs a 64-bit block, and SKINNY128 employs a 128-bit block. Since SKINNY follows the tweakey framework [
25], SKINNY takes a unified tweakey input, which integrates the key and the tweak without distinguishing between them. The way of constructing the tweakey is flexible. For a block size
(64 or 128), there are three options for tweakey size:
. Since we focus on SKINNY64, we discard the description of SKINNY128.
In SKINNY, the 64-bit plaintext
is presented as
The encryption process of SKINNY64 is described in Algorithm 2.
Algorithm 2. Encryption process of SKINNY64 |
Input: 64-bit , , Output: 64-bit
4-1.
4-2.
4-3.
4-4.
4-5.
4-6.
4-7.
4-8.
|
The S-box of SKINNY64 is given in
Table 2. In this paper, we omit the detailed explanation of the tweakey schedule and the constants
.
2.3. Description of CRAFT
CRAFT [
11] is a tweakable lightweight block cipher proposed in 2019. CRAFT is designed to provide security against Differential Fault Analysis as a method of side-channel attack. CRAFT takes a 64-bit block plaintext, a 128-bit key and a 64-bit tweak. The plaintext
of CRAFT is presented in the same way as in SKINNY.
The encryption process of CRAFT is described in Algorithm 3. The linear layer of CRAFT is very simple: it involves only XORing the third row to the first and the third and fourth rows to the second.
The S-box of CRAFT is the same as the Midori64 S-box given in
Table 1. In this paper, we omit the detailed explanation of the constants
and
, because they have no impact on differential cryptanalysis.
Algorithm 3. Encryption process of CRAFT |
Input: 64-bit , 128-bit , 64-bit Output: 64-bit
5-1. 5-2. 5-3. 5-4. 5-5.
5-6. 5-7.
|
2.4. Differential Cryptoanalysis
Differential cryptanalysis is one of the fundamental cryptanalysis for block ciphers, and there are many variations such as higher order differential [
26], impossible differential [
27], Boomerang [
28], etc. Below, we introduce some notation and briefly describe differential cryptanalysis.
The differential of a function
is a pair of input and output differences
, and the differential probability is the probability that the difference of
and
is equal to
., i.e.,
We say propagates to through with probability .
The attacker is trying to distinguish whether the oracle
is the fixed-key block cipher
or random permutation
using differential
. He randomly chooses a plaintext
and checks that the following equation holds
If , the attacker can distinguish whether the oracle is the block cipher or random permutation because is always . In general, it is computationally infeasible to compute the differential probability of a block cipher. Therefore, in many cases, the differential characteristic probability is considered to approximate the differential probability.
The differential characteristic of
-round block cipher is a sequence of differences
, and the differential characteristic probability is the probability that the difference in every
-round function of
and
is equal to
. i.e.,
Under the assumption that the round keys of a block cipher are independent [
29], the probability of differential characteristic is easy to compute by multiplying the propagation probabilities of the S-boxes. We say that the S-box is active if the input difference of the S-box is non-zero and define the activity pattern
as a vector in which each position indicates an active S-box. And we define the maximum differential characteristic that is a differential characteristic that every difference propagates through the 4-bit S-box with probability
.
3. Computing Differential Probability Using SAT Solver
Before computing the differential probability, we need to find some differential characteristics. To find differential characteristics using the SAT solver, we model the differential propagation of the block cipher to the SAT problem and obtain the solution using the off-the-shelf SAT solver. The method for modeling the differential propagation of the block cipher is given in the studies of [
5,
7]. We use the totalizer encoding [
30] to model the cardinality constraint on the SAT problem rather than the Sinz encoding [
31] because it is empirically solved faster. Throughout our research, all of the results were obtained using a personal computer equipped with an AMD Ryzen 5 1600 CPU, 16 GB of RAM, and Ubuntu 18.04. And to solve SAT problems, we used the off-the-shelf SAT solver cryptominisat5 [
32].
To compute the differential probability, we first obtain the best differential characteristic with probability . Next, we add constraints to the SAT problem that fix the plaintext, ciphertext differences. Since finding all differential characteristics is infeasible, we also add constraints that restrict the probability between and for some .
For block cipher PRESENT [
33], we obtain the best 15-round differential characteristic with probability
whose plaintext and ciphertext difference
are
Then, we find all the differential characteristics that satisfy the above difference with probability from
to
. The result with a running time of 5.06 h on a personal computer is given in
Table 3. The lower bound of the 15-round PRESENT differential probability is
by summing all the differential characteristic probabilities.
For the block cipher SKINNY64, we obtain the best 10-round differential characteristic with probability
whose plaintext and ciphertext difference
are
To compute the differential probability, we find all the characteristics of
with probability
. The result is that there are 670,000 characteristics. Moreover, even after fixing the activity pattern, we obtained the same result. And for a 12-round differential characteristic whose plaintext and ciphertext difference
are
with probability
, there are more than 1,000,000 characteristics. The result of SKINNY64 is significantly different from the result of PRESENT. Moreover, it is also significantly different from [
17], which said that there are only 62,382 characteristics for the 12-round differential. This result became the motivation to count the number of differential characteristics using the graphs described in the next section.
4. Computing Differential Probability Using Graph
Since finding all characteristics using the SAT solver is time consuming, we used another approach using a graph based on the meet-in-the-middle approach in [
23]. In this section, we focus on the block cipher in which the linear layer of round function consists of a word-based diffusion layer. Since a linear layer without a preceding S-layer is ineffective in differential cryptanalysis, we consider that the round function performs the S-layer first, that is,
.
First, we find a maximum differential characteristic by automatic search and obtain the activity pattern . We focus on the characteristics of differential following the same activity pattern δ.
We introduce some graph notations. A directed graph is a pair of sets, where is a set of vertices and is a set of directed edges. A path in is a non-empty subgraph of with the form , where the are all distinct. We denote a path from a vertex to a vertex by . We focus on a special type of directed graph, called a multistage graph. The multistage graph is a directed graph where the vertex set is partitioned in which, if is in , then and for some and . The vertex in is called the source and the vertex in is called the sink .
4.1. Counting the Maximum Differential Characteristics
We present our methods to count the number of maximum differential characteristics following an activity pattern with a given differential . If the number of specific differential characteristics can be counted, we can compute a lower bound on the differential probability. The main idea is to construct a multistage graph such that the vertex set contains the vertex representing the difference of -round function and count the number of paths .
For a given plaintext and ciphertext difference
,
and activity pattern
, we initialize a multistage graph
with
,
where
. We set
for
and edge set
. The construction graph shown in
Figure 1 consists of three parts: forward generation, backward generation, and match.
4.1.1. Forward Generation
In the forward generation part, the vertices of
are the output difference of the round function
. We generate the vertices of
that are propagated from the vertex in
through
with probability
for every active S-box and satisfy the activity pattern
. We also generate the edge
from
to
if
propagates to
with maximum probability. We check whether it is impossible for the vertex
to propagate to a vertex in the next round with probability
per active S-box. If then, since there is no path from the source to the sink through
, we disregard
. The forward generation algorithm is described in Algorithm 4. The set
is a set of output differences that are propagated from
through S-box
with probability
, i.e.,
Algorithm 4. Forward generation from to |
Input: round , graph , activity pattern Output: graph Procedure 1. for to 1-1. for 1-1-1. 2. return Procedure 1. zero vertex 2. The indices of nonzero word in 3. for 3-1. 3-2. if does not satisfy or nonzero s.t. 3-2-1. Go back to for loop 3-3. if 3-3-1. 3-4. 4. return |
4.1.2. Backward Generation
In backward generation, the vertices of
are the output differences of the
-th S-layer
in the round function. We generate the vertices of
that can propagate to the vertex in
though
with probability
for every active S-box and satisfy
. We also generate the edge
from
to
in the same way as forward generation. Similar to forward generation, we check whether it is impossible to propagate from
to
with probability
per active S-box. The backward generation algorithm is described in Algorithm 5. The set
is a set of input differences such that propagates to
through S-box
with probability
, i.e.,
Algorithm 5. Backward generation from to |
Input: round , graph , activity pattern Output: graph Procedure 1. for to 1-1. for 1-1-1. → 2. return Procedure 1. zero vertex 2. The indices of nonzero word in 3. for 3-1. 3-2. if does not satisfy or nonzero s.t. 3-2-1. Go back to for loop 3-3. if 3-3-1. 3-4. 4. return |
4.1.3. Match
By performing forward generation and backward generation, the multistage graph
is almost constructed except for the edges between
and
. In the match part, we connect
and
if
propagates to
through the S-layer
with probability
for every active S-box. The match algorithm is described in Algorithm 6.
Algorithm 6. Match two vertex sets and |
Input: round , graph Output: graph Procedure 1. for 1-1. 1-2. The indices of nonzero word in 1-3. for 1-3-1. if 1-3-1-1. 2. return |
In the constructed multistage graph , the path from the source to the sink is equivalent to the differential characteristic. Thus, we can count the differential characteristics by counting the paths in .
Since we are interested in only the number of paths, we do not need to store the whole graph
G. In forward generation, the number of paths from the source
to the vertex in
is the sum of paths to connected vertices in
, i.e.,
Therefore, we do not need to store
by marking the number of paths to the vertices of
. It is similar to backward generation. Thus, we store only
and
. In match part, we count the number of paths from
to
by
where
is the subset of
that contains edges between
and
.
For the AES-like ciphers such as Midori, SKINNY and CRAFT, we can simplify the process through precomputation. Since the AES-like cipher uses MixColumns or MixRows operations, there is independence between words for difference propagation through the two S-layers.
Figure 2 shows the independent words in the block cipher SKINNY. Although the differences in same-colored words may change through the S-layer and MixColumns(MixRows) operations, they remain independent from those in words of different colors.
By using this property, we make a table whose entry contains the number of paths from four-word difference to four-word difference though two S-layers and one linear layer. Then, we can match the vertices between and , not and . In the match part, instead of connecting two vertices, we compute the number of paths between , by multiplying the number of paths between the difference of four words marked in table . This precomputation method speeds up dramatically when there are too many active S-boxes in the middle round .
If the key(tweak) schedule of block cipher is linear, i.e., the key(tweak) difference in each round is fixed, our algorithms are applicable to related-key(tweak) differential cryptanalysis by considering key(tweak) difference XOR during graph construction. But in this case, the precomputation method cannot be used.
4.2. Generalized Counting Differential Characteristics
In this subsection, we present a generalized method with a weighted graph. We introduce some additional graph notation for a weighted graph. A weighted graph is a graph whose edge is assigned a weight. We denote the weight of edge by . And we define the weight of path as .
In the construction of a multistage graph, we assign the probability of propagation from
to
scaled by
to the edge weight
, i.e.,
The
is the hamming weight of
indicating the number of active S-boxes in the
-round. And then, we construct a weighted multistage graph similar to the construction in
Section 4.1. The path with weight 1 is interpreted as one maximum differential characteristic, and a path with weight
is interpreted as half of that. We set the lower bound of the edge weight
to prevent the graph from getting too large. We generate the vertices
only if there exists the vertex
with
. Setting
means allowing one word to propagate with probability
. The forward generation algorithm for constructing a weighted graph is given in Algorithm 7. The set
is a set of output differences such that are propagated from
through S-box
, i.e.,
Algorithm 7. Forward generation from to (weighted graph) |
Input: round , graph , activity pattern , weight lower bound Output: graph Procedure 1. for to 1-1. for 1-1-1. → 2. return Procedure 1. zero vertex 2. The indices of nonzero word in 3. for 3-1. 3-2. 3-3. if does not satisfy or 3-3-1. Go back to for loop 3-4. if 3-4-1. 3-5. 3-4. 4. return |
The constructed graph in
Section 4.1 is a special case of a weighted graph with
. By weighted graph with
, we can consider not only the maximum differential characteristic but also the differential characteristic with lower probability.
Like the graph construction in
Section 4.1, we also do not need to store the whole graph
and can simplify by precomputation. When we implement precomputation in weighted graph construction, the lower bound of edge weight from
to
cannot apply because we precompute the path weights from four words to four words, not the whole vertex. So, we implement the precomputation, applying the lower bound
to 4-word propagation through two S-layers. This allows us to consider more edges with weights less than
from
to
.
5. Results on Midori64, SKINNY64 and CRAFT
In this section, we present the results on lightweight block ciphers Midori64, SKINNY64 and CRAFT. And at the end of this section, we discuss why the gap between the differential and characteristic probability is large for Midori64, SKINNY64 and CRAFT in comparison to other block ciphers. In most cases, the execution time was only about 10 s. However, generating the weighted graph for SKINNY64 required considerably more time. For 12-round SKINNY64, the process took 3.28 h when was set to 0.5 and 61.71 h when was set to 0.25.
For each block cipher, we first find about 1000 maximum differential characteristics using a SAT solver and compute the differential probabilities by using a graph and precomputed
table. We present the differential with the highest probability as a result.
Table 4 shows the differential probabilities of block cipher Midori64, SKINNY64 and CRAFT in comparison with previous work.
Figure 3 shows the round-wise differential probabilities of block ciphers Midori64, SKINNY64 and CRAFT. The black line indicates the probability of a single differential characteristic. The green line indicates the differential probability by counting the maximum differential characteristics described in
Section 4.1. And the blue and red line indicate the probability by generalized counting described in
Section 4.2 with
and
, respectively. The red dotted line indicates the block size.
We apply our algorithms to the related-tweak differential without precomputation. The gap between the related-tweakey differential and single characteristic probability is not as large as a single-key differential. In the TK1 related-tweakey setting of 15-round SKINNY64, there are only 216 characteristics with fixed plaintext, ciphertext and tweakey difference. But it is large in CRAFT. We present the related-tweak differential probabilities of SKINNY64 and CRAFT in
Figure 4.
The detailed results are given in
Table 5,
Table 6,
Table 7,
Table 8 and
Table 9. In the tables,
is the single differential characteristic probability and
is the differential probability by counting the maximum differential characteristics.
and
are the probabilities by generalized counting with
and
, respectively.
For
rounds of CRAFT, we found an activity pattern that provides higher differential probability than presented in [
21]. It is given in
Appendix A.
Our algorithm is applicable to all block ciphers using 4-bit S-boxes and word-based diffusion layers such as LED [
34], TWINE [
35] and LBlock [
36]. However, there are a few gaps between the differential and characteristic probabilities compared to the results that we present. Here, we explain the reason that the gaps are large on Midori64, SKINNY64 and CRAFT. The first reason is the weak diffusion layer composed only of XOR with a low branch number. However, it is not enough to explain why the gap is large only in Midori64, SKINNY64 and CRAFT. Although it is a Feistel network structure, TWINE and LBlock have diffusion layers composed only of XOR with low branch numbers. But there is only one maximum differential characteristic for 11-round TWINE and no more than three maximum differential characteristics for 14-round LBlock with fixed input, output difference and an activity pattern. So, we found another reason related to the S-box. By the weak diffusion layer that is composed only of XOR, the difference is rarely changed through the diffusion layer. So, the output difference of the S-layer almost becomes the input of the next S-layer. Thus, the number of paths from plaintext to ciphertext difference through the block cipher is related to the number of word difference paths through S-boxes.
From a difference distribution table of S-box, we generate a directed graph
such that their vertices are a nonzero difference, and edges
exist if
We also obtain an adjacent matrix
of
that is a square matrix where each element indicates whether a pair of vertices is connected by an edge.
Figure 5 shows the graph
and adjacent matrix
of the SKINNY64 S-box difference distribution table.
We can also calculate the
-power of matrix
. The entry
of
means the number of
length path from vertex
to
in
. That is the number of word difference paths from difference
to
through the S-box
times. We compute the average value of
for some S-boxes shown in
Table 10. The S-box
is the optimal 4-bit S-box presented in [
37].
By
Table 10, the average number of paths from a non-zero difference to a non-zero difference is the highest in the Midori64 S-box, followed by the SKINNY64 S-box. To verify that the number of
length path in
is significantly related to the gap between the differential and characteristic probabilities, we compute the number of maximum differential characteristic of SKINNY64, replacing the S-box with another one.
Table 11 shows the results of 10 rounds.
Table 11 shows that the use of Midori64 S-box with a weak diffusion layer is more vulnerable to differential cryptanalysis than the use of other S-boxes. The differential probability using Modir64 S-box is at least
. On the other hand, the probability is
when using PRESENT S-box. This explains why the gap between differential and characteristic probabilities is largest in CRAFT, which uses a Midori64 S-box and a very weak diffusion layer.
6. Conclusions
In this work, we present an approach to computing the lower bound of the differential probability of block ciphers with a word-based diffusion layer and S-layer. Our approach is to obtain the best differential characteristic by using an off-the-shelf SAT solver and then count the number of maximum differential characteristics with a given plaintext, ciphertext difference, and activity pattern. We constructed a multistage graph to count in which the path from source to sink is equivalent to the difference characteristic. Moreover, we present a generalized approach by using a weighted graph to consider more characteristics, not only the maximum characteristic. By using our method, we provide more accurate differential probabilities of Midori64, SKINNY64 and CRAFT.
Further, we propose that the gap between the probability of differential and characteristic in a block cipher with weak diffusion is significantly related to the S-box. We explain the relation by an adjacent matrix of the difference distribution table of the S-box. The average value of the adjacent matrix’s -th power is significantly related to the gap between differential and characteristic probabilities, when the S-box is used with a weak diffusion layer in which the passed difference is rarely changed. We claim that using Midori64 S-box, which has the highest average value of the adjacent matrix’s -th power, with a weak diffusion layer, is vulnerable to differential cryptanalysis, as seen in the results of CRAFT.