3.1.1. Constant Roundkey Pre-Computation
Since the AES algorithm used in Simpira uses a round constant unlike the original AES extended roundkey, it is possible to calculate the value used as the roundkey in advance. Before entering the AES round function in Algorithm 3, the roundkey is pre-computed and the AES round function operation is performed. Therefore, we fixed the b parameter to 1 because we optimized the implementation of Simpira Permutation where the value of b (the number of blocks) is 1. Roundkeys always use the fixed value b. Therefore, during the operation of the function, the roundkey can be calculated in advance without having to recalculate the roundkey every round. In other words, the operations performed in lines 1 to 4 of Algorithm 3 (operation of the roundkey) can be omitted.
3.1.2. Omitting AddRoundkey Function
Simpira runs 6 rounds. In this case, two AES round functions are performed in one Simpira round. Among the round functions of AES, the roundkey used in the Addroundkey function uses a constant roundkey once and uses Z (all values of roundkey are 0x00) once. In other words, two roundkeys are used per round and a total of 12 roundkeys. Since one roundkey per round is 0x00, 6 roundkeys are using 0x00 in a total of 6 rounds. The operation of the Addroundkey function consists of the operation of State and roundkey. When operation is performed with roundkey of 0x00 and State, the State value does not change.
The implementation of existing Simpira study was implemented using AES-NI. When using AES-NI instructions, the Addroundkey function cannot be omitted. However, we do not use AES-NI and implement each AES function individually. So, We can omit the Addroundkey operation that uses Z where all roundkey values are 0x00 among Addroundkey functions. For this reason, we omit a total of 12 Addroundkeys to 6 Addroundkeys.
3.1.3. Optimizing InvMixColumn
In line 6 of Algorithm 2, InvMixColumns operation is performed. Mixcolumn is not performed in the last round of AES. However, since Simpira’s round function is implemented using AES-NI, Mixcolumn is included in the last round. So, after the round function ends, it is implemented by additionally using InvMixcolumn, which is used when decrypting Mixcolumn. In this process, performing InvMixcolumn operations at the end of the round is the same result of omitting the Mixcolumn operations once in the round function of AES. Therefore, it is more efficient to omit the Mixcolumn operation once than implement the InvMixcolumn, separately. As the result, since we directly implement the AES round function, we omit the operation of Mixcolumn and InvMixcolumn once each.
3.1.4. Optimized Addroundkey Function
The Addroundkey step in the existing AES performs an
operation on the extended roundkey and the current block bit by bit. However, the Addroundkey of AES used in Simpira has the characteristic of using a fixed roundkey value. The result of
operation, the round constant, roundkey, and the number of blocks
b (i.e., 1) are used as the roundkey. As mentioned in
Section 3.1.1, it is possible to pre-compute the roundkey using the round constant, roundkey, and number of block
b (i.e., 1) with this characteristic.
Figure 2 summarizes the values for each roundkey. Among
, it can be seen that only the values corresponding to
,
,
, and
are
operations with the round constant. Using these properties,
,
,
, and
perform the
operation with the bit value corresponding to the current block. Through this process, different roundkey values can be generated for each round. However, other roundkey(except
,
,
, and
) values are fixed at 0x00.
Therefore, except for operations for
,
,
, and
, the results of the remaining operations are the same as those when the operation is not performed. The algorithm applying the optimized Addroundkey can be found at Algorithm 4.
Algorithm 4: Optimized Addroundkey in AVR microcontrollers (.macro round); , , , : input register, : temporary register, Y: indirect address register. |
Input: | |
| - 4:
eor R4, R18
|
Output: | |
| - 5:
ld R18, Y+
|
- 1:
ld R18, Y+
| - 6:
eor R8, R18
|
- 2:
eor R0, R18
| - 7:
ld R18, Y+
|
- 3:
ld R18, Y+
| - 8:
eor R12, R18
|
As the result, we omit the rest of the operations except
,
,
, and
whose values change, reducing the operations of Addroundkey of the existing from 16 operations to 4 operations using Simpira’s characteristics. Comparison results are shown in
Table 4. For the Addroundkey operation, 48 cycles were obtained when the same operation was performed as before, whereas 12 cycles were obtained for this work. As a result, it reflects a performance improvement of 4.0×.
We implemented each module for Subbytes, Shiftrows, MixColumns, and Addroundkey of Simpira to call the module as needed. By implementing it as a Modularization, it is possible to efficiently manage the code.
Three optimization techniques of
Section 3.1.1,
Section 3.1.2,
Section 3.1.3 are equally applicable to 32-bit RISC-V processors. However, the technique in
Section 3.1.4 does not apply to the 32-bit RISC-V processor. Because it takes AVR’s structural advantage of the 8-bit register size of the AVR microcontroller.
3.1.5. Using Optimized AES Implementation of AVR
For the optimal implementation of Simpira on the AVR microcontroller, it is necessary to first implement the optimization of the AES algorithm. We implemented Simpira by modifying Johannes Feichtner’s [
26] optimized code. Feichtner’s is implemented by integrating the key extension step and AddRoundKey into one step. In this case, 4 LDD operations, 1 LDI operation, 4 ADD operations, and 16 EOR operations, a total of 25 operations are required.
As mentioned in
Section 3.1.1, Simpira uses a round constant instead of an extended round key, unlike AES, so the value used as the round key can be calculated in advance. So, we omit it because we do not use the key expansion step. As mentioned in
Section 3.1.2, if we perform
operation with roundkey and State of 0x00, the State value does not change, so we omit the total 12 Addroundkeys. Therefore, like Algorithm 4, 4 LD operations and 4 EOR operations are required for a total of 8 operations.
MixColumns are the most computationally expensive in AES. To implement MixColumns efficiently, Feichtner reduced the multiplication and XOR operations required for MixColumns to a minimum. Multiplication of 2 is implemented so that ADD operation, BRCC operation, and EOR operation are performed in order. As a result, MixColumns was efficiently implemented through 16 ADD operations, 16 BRCC operations, 64 EOR operations, and 36 MOV operations.
As a result, our Addroundkey implementation omitted 17 operations over Feichtner’s, and we implemented Simpira using these optimized MixColumns. The code size to which our optimization technique is applied is as shown in our work * in
Table 5.