symmetry

: Differential fault analysis (DFA) was introduced by Biham and Shamir. It is a powerful analysis technique to retrieve the secret key by injecting fault into an internal state and utilizing the differences between the correct ciphertexts and the faulty ciphertexts. Based on the idea of meet-in-the-middle, some differential characters can help to recover the key of some symmetric ciphers. At CHES 2011, this technique was utilized to give analyses on AES. In this article, we propose several DFA schemes on ITUbee, a software-oriented block symmetric cipher for resource-constrained devices based on the meet-in-the-middle idea. Our attacks are efﬁcient enough and more powerful than previous works. Furthermore, the attacks in this article break the protection countermeasure, meaning we have to review the protection method on devices for ITUbee.


Introduction
With the popularization and development of computer network technology, cryptographic techniques have been widely used to ensure the confidentiality or integrity of messages and the authenticity of communication parties. However, for many resourceconstrained devices such as mobile phones, public transport systems, smart cards, RFID tags, and Internet of Things devices, the majority of which employ lightweight cryptographic algorithms [1,2], these devices are convenient yet vulnerable. The sensitive information within them might be easily exposed by adversaries through side-channel analysis because of the vulnerability of physical information. Among the attacks of sidechannel analysis, fault analysis (FA) is a renowned attack, first introduced by Boneh et al. [3] in 1997, which enables the attacker to obtain additional side-channel information and achieve the key recovery in practical time. At the same time, Biham and Shamir proposed a differential fault analysis (DFA) on DES [4] in 1997. This was the first time DFA was introduced to key recovery for block symmetric ciphers. Utilizing an induced error to disturb the actual implementation of the encryption process and obtain differential information between correct and faulty ciphertext pairs, DFA recovers the correct key efficiently. The key point of DFA is that it allows the adversary to analyze a small number of rounds of a block cipher. DFA has been widely applied to attacks on DES [5,6], AES [7][8][9][10][11][12][13][14], PRESENT [15][16][17], and others [18,19]. The countermeasures against DFA include cipher-or mode-level (e.g., FRIET [20], CRAFT [21], DEFAULT [22], and others [23][24][25]) and implementation-level countermeasures [26]. A widely used implementation-level countermeasure against DFA is to perform the computation twice and check whether the same result is obtained [27][28][29][30].

Description of the Block Cipher ITUBee
ITUbee [37] is a software-oriented lightweight Feistel-like block cipher. Both the key length and block size are 80-bit. It consists of 20 rounds, and the whitening key was added before the first round and after the last round.
The definitions of the components and the notations used in the encryption and decryption procedure are as follows:  [40].
, where a, b, c, d, and e are 8-bit values. - The F-function used in the round is defined as F(X) = S(L(S(X))).
The encryption procedure is shown in Algorithm 1 and Figure 1, and the decryption procedure is the same as the encryption except for the order of the keys and the round constants.

OBSERVATIONS OF ITUBEE
In this section, several observations of ITUbee are given, which are the bases of our DFA.

Differential Property of S-box
As the only non-linear operation in most block ciphers, S-box and its property should be well studied when applying DFA. Let s(·) be S-box used in ITUbee and consider the following equation: α is an input difference and β is an output difference. Given α and β, Equation (1)   For more details, please refer to [37].

Obervations on ITUBee Block Cipher
In this section, we review several observations of ITUbee, which are the bases of our schemes.

Differential Property of S-Box
Let S(·) be the S-box used in ITUbee, and consider the following equation: With α and β standing for input and output differentials, respectively, Equation (1) can be tackled by examining all conceivable x values. Based on the specific α and β, the solution counts to Equation (1) can only be 0, 2, or 4. To elaborate, the likelihoods stand at 129 256 for 0, 126 256 for 2, and 1 256 for 4. Typically, for a randomly chosen input-output differential pair, one would expect to locate 1 x solution. A lookup table structured around α and β indices could simplify this process.

Property of F-Function
Consider a 40-bit input differential exhibiting a singular nonzero byte; the resulting output differential from F(·) will consistently incorporate three nonzero bytes. Given the input differential α and output differential β, on average, there exists a single solution for the equation F(x) ⊕ F(x ⊕ α) = β. To streamline the equation-solving process, it is practical to construct a lookup table, with α and β as guiding indices, bringing computational complexity down to 1. However, the prerequisite calculation for this lookup table necessitates nearly 2 40 iterations of F(·) operations along with a matching memory requirement.

Previous DFA Scheme
In [39], Fu et al. proposed two fault injection schemes on ITUbee and suggested countermeasures to protect the encryption devices. In this section, we review their analysis schemes and countermeasures, and we will prove their countermeasures are not strong enough by proposing our improved DFA schemes in Section 4. The adversary can inject a byte fault to a selected state of the block cipher; for example, the adversary could inject a random byte fault to the output of the last S(·) in the last but two rounds. -

Fault Model
The location of the fault in the word is known to the adversary. Moreover, the case of unknown location of injected fault is also discussed. - The adversary could obtain ciphertexts of both correct and faulty execution.

DFA Schemes
Randomly choose a plaintext P and ask for the corresponding ciphertext C, inject a random byte fault to a certain position of the second S(·) layer of the last round, and obtain the faulty ciphertext C * . For every possible difference generated from the injection, compute the input difference and output difference of the last S(·) operation in the last round and filter the values of input and output states. For the remaining candidates, compute the pairs backwards and filter with the injection position. If there is more than one candidate remaining, repeat the steps and recover the secret key.

Scheme 2
A plaintext P is picked at random and its corresponding ciphertext C is computed using the unknown key. Subsequently, a random byte fault is inserted at a designated spot within the output of the final S(·) layer of the round preceding the last, leading to a faulty ciphertext referred to as C * . For the input difference and output difference of the second F-function of the last round, deduce the possible values of the internal state before and after the F-function according to the property in Section 2.2.2. Further filter the candidates according to the differential in the last round. If there is more than one candidate remaining, repeat the steps and recover the secret key.

Countermeasures
Considering the efficiency, Fu et al. propose to protect the encryption devices by running the double-check mechanism in the last two rounds. However, if we can achieve key recovery attacks by injecting fault before the last two rounds, the countermeasure is invalid.

Improved DFA Schemes on ITUbee
In this section, DFA on ITUbee is described in detail. Under the same assumptions as defined in Section 3.1, we give three schemes whose fault is injected before the last two rounds. For a better understanding of our method, we first introduce some notations used in the section. X i denotes the input state of round i, and C L and C R are the ciphertexts. For each internal state noted in Figure 2, for example, e 0 , use e 0 [i] to denote the i-th byte of the state. e * 0 [i] denotes the corresponding byte of the state with fault, and ∆e * 0 [i] is the difference between the correct and fault value of the i-th byte of the state.

Scheme 1: Differential Fault Attack with Exhaustive Search
We assume that the fault ∆ is injected at a selected byte of state n 2 . The major steps of the attack are as follows: Step 1. Obtain the correct and faulty ciphertexts. Initiate by randomly picking a plaintext P, and calculate the equivalent ciphertext C with the undisclosed key. As presented in Figure 2, infuse a random byte fault ∆ into the state n 2 to procure the erroneous ciphertext C * .
Step 2. Deduce the difference of the internal state. For each 8-bit value in ∆ , the corresponding difference in ∆n 0 can be determined in reverse order. Note that Further, using 40-bit K R , we can forward compute the corresponding difference of ∆j 0 , namely, ∆n 0 , ∆j 0 can be deduced from the ciphertexts, which means both the differences before and after the last F(·) operation are known. According to the observations in Section 2.2.2, for every fixed input difference and output difference pair of F(·), we can obtain one solution that matches the input and output difference pair on average. Thus, for each of the 2 48 possible values of the pair of ∆n 0 and ∆j 0 , there is one corresponding value of n 0 and j 0 that exists on average.
Step 3. Exhaustively search to recover the whole key. Looking up the table storing the values indexed by input and output difference of F(·), we obtain 2 48 possible values of (n 0 , j 0 ). Computing K L = n 0 ⊕ C R and K R = F −1 (L −1 (j 0 )) ⊕ C L , we obtain 2 48 possible keys of (K L , K R ). Exhaustively search for all the possible values and recover the whole key.
Complexity. As F(·) consists of L(·) and S(·) operations only, it is a 40-bit permutation. This kind of permutation can be viewed as a Super S-box [41]. To build a look-up table indexed by input difference α and output difference β, we need 2 40 precomputation time complexity and 2 40 bits of memory. To recover the whole key, we need 2 48 forward and backward computing operations and 2 48 decryption time complexity.

Scheme 2: Meet-in-the-Middle Fault Attack with 2 40 Complexity
Assume that a random fault ∆ is injected at a selected byte of state n 2 . Without loss of generality, we assume that n 2 [0] is the position where the fault is injected. Note that The corrupted value of the internal state j 0 can be obtained in the same way. According to the property of F(·), we can always compute two bytes in the input of F(·), though one byte in the output is unknown. For example, if n 0 [1,2,3,4] is known, only the 0-th byte of n 0 is unknown, so we have Furthermore, ∆j 0 [4] can be computed in the same way. Moreover, we can obtain the state ∆j 0 for all possible values of K R in such a computational path: With the obtained C and C * , as we can see, ∆j 0 [1,4] can be obtained in two computational paths, meaning each computing direction involving several uncorrelated key bytes. Thus, we can carry out an MITM attack. The major steps of an attack making use of ∆j 0 [1] are shown below: Step 1. Obtain the correct and faulty ciphertexts with the same plaintext. Randomly choose a plaintext P and obtain the corresponding ciphertext C under the unknown key. As depicted in Figure 3, inject a random difference ∆ to the 0-th byte of the state n 2 and ask for the corresponding corrupt ciphertext four times. Then, we obtain four pairs of different correct and faulty ciphertext pairs (C, (C * ) i ) and six pairs of faulty ciphertext pairs ((C * ) i , (C * ) j ), where 0 ≤ i, j ≤ 3, i = j. Under the correct key guess, these 10 pairs that satisfy the values in X 19 [1,2,3,4] are the same, only the values in X 19 [0] have differences.  Step 2. Compute ∆j 0 [1] in two computational directions and filter them. To obtain a valid candidate, execute the following:

1.
For each pair, guess all 2 24  Step 3. Recover the correct key. We consider the computation procedure without injected fault, namely, (P, C). With K R filtered by the above process, we can compute the state g 0 . Moreover, for every 2 16 of possible values of X 19 ⊕ K L [0, 2], we can compute all possible states h 0 with the X 19 ⊕ K L [1, 3,4] obtained before. Finally, we can deduce all possible K L to recover and validate the whole key using exhaustive searching.
The above is the attack procedure when the fault is injected in the state n 2 [0]. Likewise, for other certain locations of the injected fault, we can carry out an MITM attack in a similar way. However, if the location of the fault is unknown, we can also carry out three MITM attacks on any given three bytes on ∆j 0 , respectively. Because of the Pigeonhole Principle, there exists at least one byte whose computation is unaffected by the fault. As a result, in this case, we need three times the computational complexity of the known location's case.
Complexity. For a known injected location's fault, we have only one MITM episode, and, in each episode, 2 40+24 different elements in {K R , X 19 ⊕ K L [1, 3,4]} are tested. Only the corrected one will pass the filter in theory. Next, we exhaustively search the corresponding key for every possible value of X 19 ⊕ K L [0, 2] which costs 2 16 in complexity. Therefore, the overall time complexity can be estimated as 2 40 + 2 24 + 2 16 times the encryption or decryption, which is approximately 2 40 . In addition, we need 2 40 memory complexity to store the candidates. However, if the injected location is unknown, we suppose that the same byte is injected by different faults. For consequences, the complexity will be multiplied by three.

Scheme 3: Meet-in-the-Middle Attack with 2 32 Complexity
Assume that the fault ∆ is injected at a selected byte of state n 2 . Without loss of generality, we assume that n 2 [0] is the position where the fault is injected. Consequently, the difference is not totally diffused in state i 1 , namely, ∆i 1 [2,3] = 0. Note that Thus, we have ∆j 1 [0] ⊕ ∆j 1 [2] = ∆j 1 [4], and As we can see, K R ⊕ X 18 can be computed in two parts, respectively, in Equation (8) so that we can carry out an MITM attack as described in the following.
Step 1. Obtain the correct and faulty ciphertexts with the same plaintext. Randomly choose a plaintext P and obtain the corresponding ciphertext C under the unknown key. As shown in Figure 4, inject at random byte faults ∆ to a certain byte of the state n 2 in the last two rounds and store the faulty ciphertext. Inject different faults three times, so we can obtain six ciphertext pairs, which contain three pairs of correct and different faulty ciphertext (C, (C * ) i ) and three pairs of faulty and faulty ciphertext (  Step 2. Compute the value for all candidates and filter them. For all 2 16 candidates of K R ⊕ X 18 [2,4], perform the following operations:

1.
Compute the values of ∆j 1 [0] ⊕ ∆j 1 [2] according to the left side of Equation (8) for all candidates of K R ⊕ X 18 [0, 3] for each that we obtained before, and store the values of K R ⊕ X 18 [0, 3] indexed by the vector of ∆j 1 [0] ⊕ ∆j 1 [2] of five pairs in the six in a table.

2.
Compute the vector ∆j 1 [4] following the right side of Equation (8) for all candidates of K R ⊕ X 18 [1] of the five pairs we obtained before, and store the values in another table.

3.
We sort the two tables and find collisions for the index values. If there is a collision between the two tables, the corresponding K R ⊕ X 18 is stored as a valid candidate.
With the five-pair filter, there will exist just one candidate in theory.
Step 3. Recover the remaining key bits. We consider the computational procedure of one correct and faulty pair {(P, C), (P, C * )}. With K R ⊕ X 18 filtered by the above process, we can compute the state h 1 and the difference ∆g 1 . Moreover, for every 2 8 possible difference ∆ , which is the input difference of the first function F(·) in the last but one round, all possible states g 1 can be deduced with the property of function F(·). Consequently, we can recover the possible K R and the state X 20 , g 0 . Similarly, we can also obtain the difference ∆j 0 and ∆n 0 , which is the input and output difference of the last function in the last round. Likewise, we can obtain the state j 0 to recover the state h 0 and the key K L . Finally, we can obtain all possible keys and validate the correct key using an exhaustive search.
The above is the attack procedure when the fault is injected in the state n 2 [0]. Likewise, for every known location of the injected fault, we can carry out an MITM attack in a similar way. However, if the location of the fault is unknown, we can also reanalyze a byte in state i 1 where the fault has not been affected and carry out MITM attacks. Because the fault can be injected into any one byte of the total five bytes in state n 2 , we need five times the computational complexity of the known location's case.
Complexity. For a known injected location's fault, we have 2 16 MITM episodes for K R ⊕ X 18 [2,4], and, in each episode, elements in K R ⊕ X 18 [0, 1,3] are tested. Only one element will pass the filter with a high probability. Next, we calculate the corresponding key for every possible value of ∆ which costs 2 8 in complexity. Therefore, the overall time complexity can be estimated as 2 16 (2 16 + 2 8 ) + 2 8 , which is approximately 2 32 ; meanwhile, 2 40 precomputational complexity and 2 40 memory complexity for storing the values and difference of the F-function is needed. If the injected location is unknown, we suppose that the same byte is injected by different faults. As a consequence, the computational complexity will be multiplied by five.

Simulation Results
In this section, we give some simulation results of our schemes. As Scheme 1 requires too much time and memory, only the simulations of Schemes 2 and 3 are given. We implemented our schemes on a PC with an Intel Core i7 processor whose frequency is 2.5 GHz. In the simulation of Schemes 2 and 3, respectively, 500 and 1000 samples were recorded with randomly selected keys.
Scheme 2. Owing to the huge computational complexity of Scheme 2, we chose 500 samples simulated in Figure 5, where the y-axis represents the number of key candidates filtered after Step 2 in Scheme 2 and the x-axis represents the sample number. A data point with a value above 1 on the y-axis indicates that the key candidates for that sample have not been completely filtered. In that case, we can filter one more time using the remaining pairs, or we have to endure extra computational complexity in Step 3. As depicted in Figure 5, with the injection of 4 faults, the average number of remaining key candidates is 1.546. This result confirms that 4 faults are sufficient for Scheme 2 to be effective.
Scheme 3. As shown in Figure 5, we simulated Scheme 3 with 1000 samples and collected the number of candidates filtered after Step 2 in Scheme 3. Similarly, we could still filter again in the case that the number of candidates was over one. Moreover, in Step 3, the exhaustive search has a really low probability of causing a situation in which we cannot recover the correct key due to the property of function F. A simple solution to this problem is just changing the correct and faulty pair used in Step 3. When injecting 3 faults, the average number of remaining key candidates is 1.397. This result provides evidence that 3 faults are sufficient for the effectiveness of Scheme 3.

Further Countermeasures
In [39], the authors proposed a countermeasure based on the double-check mechanism. For efficiency, they ran the crucial operation twice to check if the two executions matched each other. However, they only ran the last two rounds twice according to their analysis. The countermeasure is invalid for our attack scheme, so we suggest extending the number of rounds involved in the double-check mechanism by at least one. Namely, run the ITUbee as follows in Algorithm 2: Reset() end Thus, the countermeasure could protect the devices from our attacks. The random delay was introduced to avoid the adversary injecting twice in one execution to run a successful attack.

Discussions
In this section, we present a comparison between the work conducted by Fu et al. [39] and the work conducted by ourselves on the DFA of the same ITUbee algorithm. Fu et al. induced single-byte random faults on the state located in the second to the last round, and they suggested implementing ITUbee with a double-check mechanism in the last two rounds to go against fault analysis. In this paper, we proposed three schemes to achieve DFA attacks, with faults induced at the third to the last round. Our attacks prove that their countermeasures do not protect against all DFA attacks. These distinctions are comprehensively summarized in Table 2 .

Conclusions
This article presents several differential fault analyses on ITUbee based on the meetin-the-middle idea. Our attacks make use of the property of faulty values and differences in faulty and correct intermediate values. Our attack schemes combine the differential fault analysis and meet-in-the-middle methods, which can also be extended to other block ciphers. In addition, we overrode the security of the countermeasures given in previous works and revisited the protection schemes for ITUbee block cipher on devices.
Author Contributions: Conceptualization, Y.K. and Q.Y.; validation, Y.K.; formal analysis, Q.Y. and L.Q.; writing-original draft preparation, Y.K. and Q.Y.; writing-review and editing, Q.Y. and L.Q.; supervision, G.Z. All authors have read and agreed to the published version of the manuscript.