A White-Box Implementation of IDEA

: IDEA is a classic symmetric encryption algorithm proposed in 1991 and widely used in many applications. However, there is little research into white-box IDEA. In traditional white-box implementations of existing block ciphers, S-boxes are always converted into encoded lookup tables. However, the algebraic operations of IDEA without S-boxes, make the implementation not straight forward and challenging. We propose a white-box implementation of IDEA by applying a splitting symmetric encryption method, and verify its security against algebraic analysis and BGE-like attacks. Our white-box implementation requires an average of about 2800 ms to encrypt a 64-bit plaintext, about 60 times more than the original algorithm would take, which is acceptable for practical applications. Its storage requirements are only about 10 MB. To our knowledge, this is the ﬁrst public white-box IDEA solution, and its design by splitting can be applied to similar algebraic encryption structures.


Introduction
Among many classic symmetric encryption algorithms, the International Data Encryption Algorithm (IDEA) was jointly developed and proposed [1][2][3] in 1991. The main encryption process of IDEA includes group operations, a multiplication and addition (MA) structure, and output transformations. Its core operations are modular multiplication, modular addition, and XOR. In addition, the Feistel or SPN structure always uses the S-box as the core structure, while IDEA incorporates three different algebraic operations to replace the traditional S-box non-linear function. Due to its special structure, IDEA differs from these symmetric ciphers and is often selected as an encryption primitive in products such as VLSI and SMS messages [4][5][6].
In 2002, the concepts of a black-box model, gray-box model, and white-box model were proposed by Chow et al. in conjunction with the white-box implementation of AES [7] and DES [8]. As the color lightens, the attacker is given more powerful capabilities. In the white-box model, an attacker can fully monitor and track the contents of memory and cache during dynamic and static execution [9] (for fixed keys), modify the internal operations and memory contents of the algorithm at any time point [10], and so on. Traditional implementations of block ciphers are insecure in these scenarios, because the key simply added (XOR-ed) in the round function will be exposed directly.
More seriously, if an attacker can perform the white-box attacks on legitimate terminal devices securing information such as SMS messages, all of the high-value messages will be obtainable by the attacker. This is a critically serious security risk to both symmetric encryption and asymmetric encryption. White-box cryptography is a solution for preserving the same security level as in the black-box model, with practical significance in applications such as wireless sensor networks, mobile agents, and digital rights management [11][12][13][14][15][16]. Many companies have been working to use white-box encryption solutions, such as the secure chip of a downloadable conditional access system (DCAS) terminal [17] and white-Cryption secure key box (SKB). The latter is the first and only enterprise-ready white-box cryptography solution for web applications, launched by Intertrust company at the RSA Conference, on 24 February 2020.
In the field of white-box research, especially based on symmetric encryption, much work has been done in the design of white-box implementations [18][19][20][21][22], mostly concentrating on DES, AES, SM4, and their variants, which provide a certain degree of protection, but are still breakable [23][24][25][26][27][28]. Since the only non-linear component of the above block ciphers is S-box, a common approach of these white-box schemes is to hide the S-box in lookup tables and use random bijection to confuse inputs and outputs of the lookup tables to protect secret information (the key). As we mentioned before, IDEA has a variety of non-linear operational modes that replace the S-box component. Therefore, the key will be included in multiple non-linear operations. This special behavior brings new challenges to the design of white-box implementations.
In 2019, Lu et al. [29] proposed white-box KMAC, converting modular operations into lookup tables. Since the size of its lookup tables are relatively large and the algebraic operations are different from IDEA, it is not directly usable for implementing a white-box IDEA while preserving IDEA's throughput. There are no other solutions for transforming algebraic operations into lookup tables, and, as far as we know, no white-box IDEA implementation has been proposed. The research into this topic is scant.
In this paper, we observe the algebraic structure of IDEA's group operations and reduce the storage cost of the lookup tables by splitting the plaintext blocks into reasonably sized chunks. We split every arithmetic operation, multiplication modulo (2 16 + 1) and addition modulo 2 16 , into two parts and use an affine transformation to obscure them. A Type I lookup table is created for each part, such that effective analysis is impossible. The outputs of the Type I lookup tables will be added with corresponding module. A Type II lookup table is generated by encoding the additional module results with a randomly chosen mask to eliminate leakage of key related information from unbalanced encodings. We reorganize the MA structure as four arithmetic operations, with each operation processed similarly as before. We then create a working white-box IDEA solution with adequate performance. The total storage is 10.06 MB with an average encryption time of 2786 ms (based on 50,000 encryptions of 64-bit plaintext using Java 8.0), which is about 60 times as much time as the original IDEA. Compared with other white-box implementations in Table 1, our solution offers satisfactory efficiency. The implementation methods that split the blocks with encoding and masks are also usable with other cryptographic primitives using algebraic operations. The structure of the rest of the paper is as follows. In Section 2, we review the IDEA cipher along with encoding and masking techniques. We present our white-box implementation in Section 3. In Section 4, we present a performance analysis. We measure and analyze the security of the white-box IDEA in Section 5. Section 6 contains an additional discussion and our conclusions.

Notation
We use the following notations in this paper. multiplication mod (2 16 + 1) of 16-bit integers addition mod 2 16 of 16-bit integers. During arithmetic operations of IDEA, the allzero value is never used and is replaced by (2 16 + 1) or 2 16 .

Description of IDEA
We now describe the encryption process of IDEA. The fixed size 64-bit plaintext P is divided into four 16-bit blocks P = (P 1 P 2 P 3 P 4 ). The entire algorithm includes eight rounds of encryption operations plus the output transformations, with each encryption round selecting six sub-keys Z (r) i , i = 1, 2 . . . 6, r = 1, 2 . . . 9. The 128-bit initial master key is divided into eight 16-bit sub-keys from left to right, for 52 total sub-keys (eight rounds × six sub-keys + four sub-keys), generated by left shifting by 25 positions. X The eight encryption rounds are .
The output transformations are In the group operations, the arithmetic operations, integer multiplication modulo 2 16 + 1 (1) and integer addition modulo 2 16 (2), are the core operations in IDEA: For integer multiplication modulo 2 16 + 1, direct calculation is expensive. We can use a low-high algorithm to compute integer multiplication, for example: Thus, the low-high algorithm is available for most CPU, and invertible for the Fermat primes. The two arithmetic operations with XOR operations were added for both confusion and diffusion. The three operations on 16-bit blocks are incompatible, and they are nonassociative, non-distributive, or non-isotopic.

White-Box Cryptography Techniques and Terminology
White-box cryptography is an obfuscation technique that hides the information about the key in a cryptography system. In recent developments, multi-linear mapping technology [30] and masking technology [31] have provided meaningful security guarantees.

Internal Encodings
A popular method for handling a fixed key is to embed it in lookup tables encoded with random bijection [7,8]. Randomness of the bijection makes it difficult for the adversary to recover the key from the encoding lookup tables.

1.
Encoding: X is the transformation, from m-bit to n-bit: where F is a randomly selected m-bit to m-bit bijection called the input encoding, and G is a randomly selected n-bit to n-bit bijection called the output encoding. E(X) is called an encoded transformation. The main purpose of constructing E(X) is to obfuscate the input and output of X.

2.
Networked encoding: a networked encoding of the compound transformation Y • X (Y transformation after X transformation) is where N, M, and H are all bijections. The networked encoding confuses the input and output of X and Y and ensures that all the transformations are combined to be functionally equivalent to the original transformation.

External Encodings
An adversary's extraction of the entire solution to another device is equivalent to possessing the encryption function of the white-box implementation [25]. One possible solution to this problem is the use of external encodings that assumes the encryption function E k is part of a more powerful environmental system E k : where A and B are randomly selected bijective encodings that make it impossible for the attacker to calculate E k directly by extracting E k .

Masking Technology
When generating the lookup table Y, for a plaintext P and secret key K, before encoding the output of E, one idea [31] is to use a mask α that is selected randomly to perform the XOR operation followed by use of a linear encoding L and a different non-linear encoding N to encode the output: Using the same linear encoding can protect the encryption operations and cancel the mask via a simple XOR operation.

White-Box Implementation of IDEA
The strategy of our solution is to split the IDEA into five parts. Each part is obfuscated and represented as a number of lookup tables. Specifically, we use randomly chosen group members and the additive inverses of F 2 16 +1 and F 2 16 to transform the group operations to Type I lookup tables, add different masks to generate the Type II-V tables, and eliminate the masks via XOR operations with the Type II-E tables. Finally, external encodings are used to protect the initial inputs and final outputs.

Generating the Lookup Tables for Group Operations
A 64-bit initial plaintext P is split into four blocks P = (P 1 P 2 P 3 P 4 ). Prior to the encryption operations, encoding blocks L (0,1) , L (0,2) , L (0, 3) , and L (0,4) are used to encode the plaintext blocks. The encoding blocks are randomly chosen group members from F 2 16 +1 or F 2 16 : We denote the four blocks of the encryption ROUND-1 as: The encoding blocks L (0,1) , L (0,2) , L (0, 3) , and L (0,4) are the external encodings to protect the initial blocks. The inputs of each encryption round are denoted as We select four group members Q (r,a) , a = 1, 2, 3, 4 from F 2 16 +1 and four group members P (r,a) from F 2 16 in each encryption round. We now describe the computations more concretely using the examples X  The two parts of X (r) 1 are encoded by L −1 (r−1,1) , which is the multiplicative inverse of L (r−1,1) , and Q (r,1) , resulting in: X (r) 1−1 and X (r) 1−2 . Thus, the multiplication modulo (1) can be split into two calculations: and then we randomly select the additive inverses (a 1 ,a 1 ) from F 2 16 +1 , adding them to Y 1−2 , respectively: 1−2 are implemented as lookup tables Type I-HB1 and Type I-LB1. We can compute the final result We perform the addition modulo (2) by splitting X where Z with further additive inverses (a 2 ,a 2 ) from F 2 16 added into Y The transformations X (r) In the same way, we obtain the results Y Following the strategy, we transform the group operations for IDEA into eight Type I lookup tables.
Second, we randomly select MASK N1 , MASK N2 ∈ {0, 1} 16 to perform the XOR operations and encode the outputs using S −1 (r) , 1) , and A (r,2) are all 16 × 16 randomly selected reversible affine mappings. The overall process is shown in Figure 3. Because the calculation of Ψ (r,b) is related to the keys, it must be invisible to users, and the application of masks is similar to adding round-keys. Thus, we use the Type II lookup table to include these two transformations.
1−N1 as an example. In CA(r, 1), we perform the calculation [Q (r,1) − 1] Γ (r,1) and then find Ψ (r,1) from F 2 16 +1 . Ψ (r,1) is used to offset a part of Q (r,1) Γ (r,1) . This calculation is The mask MASK N1 ∈ {0, 1} 16 is selected randomly, and XORed with Y (r) 1 . The composite affine mapping S −1 (r) encodes the output to obtain the result: The Type II-V1 lookup table includes the operation Y  Using the same method, we use the Type II-V lookup tables to obtain Y Since these XOR operations of IDEA eliminate MASK N1 and MASK N2 added in this step, the inputs of the MA structure have no masks.

Generating the Lookup Tables of MA Structure and Adding Another Masks
Although operations in MA structure are different, the four arithmetic calculations are similar to the group operations except for the number of the sub-keys (the group operations have four sub-keys, and the MA structure has two). The table type is also Type I. To encode the outputs of thebMA structure, we use CA (r,c) , c = 5, 6 to offset the confusion caused by the random group members and add the other two masks to generate the Type II lookup tables. The whole process of the MA structure is shown in Figure 5.
The inputs of the MA structure are Y  is split into two parts, and Q (r,3) is used to encode them. The multiplication modulo can be performed with the sub-key Z (r) 5 , and the addition inverses (a 5 ,a 5 ) from F 2 16 +1 are added, yielding the operations Y . T (r) is also stored in the system. In particular, the addition modulo operation will be done with Y (r) 7 rather than a sub-key. We split Y (r) 6 into two parts and use P (r,3) to encode Y (r) 6−1 and Y (r) 6−2 . Φ (r,1) is the multiplicative inverse of Q (r,3) from F 2 16 +1 and is used to offset Q (r, 3) on Y (r) 7 . P (r, 3) handles it, yielding Y (r) 7 . This leads to the following calculations: We then let CY (r) The additive inverses (a 6 ,a 6 ) from F 2 16 should be incorporated to obtain MY (r) We implement the calculations Y   Using the same method, the second multiplication modulo in the MA structure uses Φ (r,2) to eliminate P (r, 3) , with the output encoded by Q (r,4) and incorporating the additive inverses (a 8 ,a 8 ). From this, we obtain Y

Eliminating the Masks
At this point, we have six blocks: four containing the outputs of group operations and two containing the outputs of the MA structure. We perform four XOR operations to complete the original IDEA design: There are two different masks for each of the four results Y  The same affine layers make it easy to eliminate the masks with table Type II-M. The outputs should be encoded by L (r,1) , L (r, 3) , L (r,2) , and L (r,4) , with the intermediate values encoded by T (r) and S (r) . We now show Y (r) 11 as an example (see Figure 9). After two masks have been eliminated, S (r) is used to encode the output, producing EY (r) 11 , and L (r,1) is used to process EY   (the operations will generate a Type II-E1 lookup table).
Thus far, we have finished one encryption round. The other seven rounds have the same sequence of operations.

Generating the Lookup Tables for Output Transformation
The output transformation phase performs only group operations, so we also convert these computations into eight Type I lookup tables. Two corresponding tables can be used to compute EC 1 , EC 2 , EC 3 , and EC 4 , respectively. Finally, the ciphertext is In particular, we regard the multiplicative inverses of Q (9,1) , P (9,1) , P (9,2) , and Q (9,2) as external encodings embedded in other components of the computer. Using the external encodings, we obtain the same outputs as the original IDEA implementation:

Performance Analysis
The white-box IDEA requires 68 modular additions, 48 XORs, 216 lookups, and eight composite affine mappings.
We used a common laptop with an Intel® Core(TM) i7-3630QM CPU at 2.40 GHz to test our white-box IDEA implementation written in Java 8.0. Encrypting a 64-bit plaintext 50,000 times with the original IDEA requires about 47 ms on average. Our white-box IDEA required 2786 ms on average, about 60 times slower than the plain algorithm, which is acceptable. Since our solution is the first white-box IDEA implementation, we can only compare efficiency with white-box implementations of other algorithms, such as KMAC [29] and AES [7]. As shown in Table 1, our solution offers competitive computing efficiency.

Security Analysis
Since the function of the S-box is replaced by algebraic operations in IDEA, the core idea of our solution is the transformation of the two arithmetic operations into lookup tables. Therefore, we conduct our security analysis on the lookup table structure of our white-box IDEA against algebraic attacks and BGE-like attacks.

Analysis of the Lookup Tables
We first analyze the Type I lookup tables using X On the other hand, by combining the two Type I lookup tables to eliminate the influence of (a 1 , a 1 ), we can find: Due to the randomness of (a 1 , a 1 ), Q (r,1) , and L −1 (0,1) , where (a 1 , a 1 ) are additive inverses and Q (r,1) , L −1 (0,1) are group members from F 2 16 +1 . They provide 2 48 different values in (3) and 2 32 probabilities in (4). Thus, there is no effective calculation or analysis of the secret key.
Another approach is the 2003 proposal from Biryukov et al. [24] that describes an algorithm for any two substitutions (S-boxes) to solve the affine equivalence problem. The S 1 and S 2 affine equivalents should satisfy the equation where Λ 1 and Λ 2 are n × n invertible matrices, and α and β are n-bit columns. The linearity of Λ 1 , Λ 2 means we can check all α, β that satisfy the above equation to the recovered key with time complexity O(n 3 2 2n ). The core idea of this attack is to construct the affine equivalence problem.
Since the algebraic operations replace the S-boxes, it is not possible to find the S-boxes and the linear relationship directly. However, we can regard the multiplication modulo (or addition modulo) operation as a T-box to try to find a structure equivalent to the S-box.
Then, using the idea of the affine equivalence problem to attack the solution, and using the example X (r) Here, we regard Λ 1 as a random group member from F 2 16 +1 , α as a 16-bit column, and the T 1 box as a type of S-box representing the multiplication modulo and the operation of the CA (r,1) box. We regard the added masks as a T 2 box. This leads to We then rewrite the preceding equation as where S −1 (r) is represented by Λ 2 , which is a randomly selected reversible composite affine mapping, and S −1 (r) • MASK N1 is represented by β, which is actually the Type II-M1 lookup tables. Thus, we have constructed the affine equivalence problem.
In our scheme, the above equation is only similar in form to the structure of the affine equivalence problem because the operations of the T-boxes and S-boxes are different. Even if the adversary combines the Type I and Type II tables to obtain the encoded masks' lookup tables, they also need to determine all possible Λ 1 , Λ 2 , and α to obtain the key information hidden in the T 1 box from the above formula.

BGE-Like Attacks
Billet et al. [23] proposed a method called BGE to attack the white-box AES designed by Chow et al. [7]. The core idea of the BGE attack is the transformation of the non-linear structure into a linear structure, construction of a linear relationship, and then use the linear relationship to determine the affine or linear encoding. The time complexity of this algorithm is 2 22 at present.
The group operations of the IDEA, multiplication modulo, and addition modulo can be performed separately as non-linear T 1 -and T 2 -boxes, which are treated as non-linear S-boxes. We consider the group operations of one encryption round as follows: We can also consider the MA structure, obtaining the equations:

MY
(r) From these eight sets of equations, we can see that the calculation methods of the T 1 and T 2 boxes are not exactly the same, meaning that a linear relationship where K denotes random group members from F 2 16 +1 or F 2 16 , and σ is a constant, which does not necessarily exist. If the linear relationship is not established, the BGE-like method cannot recover the keys.
In addition, the number of invertible matrices of order 16 is For white-box diversity and white-box ambiguity [7], our scheme has enough randomness to counter brute-force attacks.
In summary, although our scheme is a completely new implementation of a cryptography primitive, it is still sufficiently secure to resist ordinary algebraic analysis and BGE-like attacks.

Conclusions and Discussion
In this paper, we proposed a scheme to implement the white-box IDEA. We focused on the computation of two arithmetic operations in IDEA and developed a method to transform these arithmetic operations into lookup tables (Type I) and embedded four different masks to increase the resistance against white-box attacks (Type II). Our implementation presents a new approach for transforming different algebraic operations into lookup tables, and can be applied to other encryption systems with similar algebraic structures with resistance against algebraic attacks, including BGE attacks.
Since the security evaluation method of the white-box model is not as complete as in the black-box model, our future work will have two avenues. One effort will be to test other new white-box implementation methods. For example, we want to design a completely new white-box cipher with much smaller storage costs. The other will be to optimize security and robustness of this white-box IDEA implementation with the expectation that it will provide long-term protection. Since the structures of the three operations are completely different from other block ciphers, we used the relatively large lookup tables (Type II) to convert these calculations. Without the traditional S-box, there are still great difficulties and challenges in optimizing the storage costs and security of the lookup tables.
Author Contributions: Conceptualization, S.P.; methodology, S.P. and T.L.; software, S.P.; formal analysis, S.P. and T.L.; writing-original draft preparation, S.P.; Writing review and editing, T.L., X.L. and Z.G. All authors have read and agreed to the published version of the manuscript.