## 1. Introduction

There is an increasing demand for secure data communication between embedded devices in many areas, including automotive, industrial, and smart-home applications. To enable cryptography in resource-constrained devices, researchers have studied lightweight cryptography that has a good performance in implementation by design. Lightweight cryptography emerged from block cipher design [

1], which now covers a larger area in cryptography, including authenticated encryption (AE). In particular, NIST is running a standardization process for lightweight AE algorithms (NIST LWC) [

2].

Side-channel attack (SCA) [

3,

4] is a considerable security risk in lightweight cryptography’s main targets: embedded devices under a hostile environment in which a device owner attacks the device with physical possession. Consequently, NIST LWC considers the grey-box security model with side-channel leakage [

5]. In addition to security, the cost of implementing SCA countermeasures in resource-constrained devices is a big issue because SCA countermeasures multiply the cost.

Threshold implementation (TI) [

6] is an SCA countermeasure based on multi-party computation (MPC) [

7]. TI is popular for hardware implementations because it can provide the security in the presence of glitches, i.e., transient signal propagation through a combinatorial circuit, which is inevitable in common hardware design. Consequently, there are an increasing number of papers reporting authenticated encryptions with TI [

8,

9,

10]. Researchers are even optimizing the algorithms for TI: the TI-friendly S-boxes [

11,

12] and the TI-friendly modes of operation [

13,

14].

SAEAES is an instantiation of the SAEB mode of operation [

15] with the standard block cipher AES [

16], and is a NIST LWC candidate. Choosing AES is a practical decision for providing backward compatibility with the numerous AES accelerators and coprocessors that the industry has invested so far. However, not so many NIST LWC candidates chose AES (COMET [

17], mixFeed [

18], and SAEAES [

19] our of the 32 candidates) because newer lightweight primitives outperform AES in lightweight implementations. The impact of using AES is even larger with TI. Many lightweight algorithms, such as GIFT [

20] and SKINNY [

21], use an S-box with which an efficient, i.e., 3-share and uniform TI is available [

21]. In contrast, this is not the case for AES [

22], which was standardized before TI become popular. The early AES TI compensated for this disadvantage by refreshing the output share by adding fresh randomness [

23,

24,

25], but this raised another implementation challenge of generating fresh randomness at a high rate. Daemen’s

changing of the guards [

26] in 2017 opened the door for enabling a uniform TI for a larger class of functions, and its generalization enabled the first 3-share TI for AES without fresh randomness in 2019 [

27].

#### 1.1. Purpose and Approach

The question that naturally arises is the cost of the backward compatibility:

how many more gates do we need by choosing AES instead of other lightweight algorithms with TI? The question has been unanswered because of the gap between the conventional works on lightweight AE and efficient TI implementation: the conventional SAEAES implementations are all without TI [

15,

19,

28]. The purpose of this paper is to implement the first threshold implementation of SAEAES and to evaluate the cost we are trading with the backward compatibility.

Our approach is to extend the recent AES implementation with the 3-share and uniform TI using the generalized changing of the guards [

27], but we redesign the AES circuit architecture to satisfy the additional requirements by the mode of operation. Then, we evaluate our design’s performance and compared it with the previous implementation of SAEB-GIFT [

13]: the same mode of operation instantiated with the state-of-the-art lightweight block cipher GIFT [

20].

#### 1.2. Contributions

Here we summarize our key contributions.

**(I)** **Identification of design challenges in extending AES implementation to SAEAES (Section 4)** Our design is based on the 3-share and uniform TI of AES using the generalized changing of the guards [

27]. We identify that the mode of operation enforces the byte order, making the conventional row-oriented serialization inefficient [

23]. Also, the mode of operation should preserve the secret key that the on-the-fly key schedule overwrites.

**(II)** **Column-oriented AES implementation (Section 5.2)** We propose a new AES circuit architecture that uses the column-oriented data serialization to address the aforementioned incompatibility with the row-oriented serialization.

**(III)** **The first SAEAES implementation with threshold implementation (Section 5)** We show the first TI of SAEAES that uses the column-oriented serialization and the 3-share and uniform AES S-box. The design has an independent key store for preserving the secret key until the next AES call.

**(IV)** **Improved TI of key array (Section 5.5)** We show the concrete realization of the key array for TI that reduces the register size by 216 bits or 32% from the original design [

27].

**(V)** **Performance evaluation and comparison (Section 6)** We synthesize our design using a standard cell library to evaluate its circuit area in GE (gate equivalents). We show that our design uses 18,288 GE with TI composed of AES (14,256 GE, 78%), the key store (3422 GE, 19%), and the mode of operation (610 GE, 3%). Compared with the conventional SAEB-GIFT implementation that uses 6229 GE [

13], the SAEAES implementation is roughly three times larger. We identify that the non-linear key schedule and the extended states for satisfying uniformity as the major factors for this difference.

#### 1.3. Organization

This paper is organized as follows. We begin by reviewing the algorithm of SAEAES in

Section 2, and the previous TI of AES in

Section 3. Then, we state the design challenges we address in the paper in

Section 4. We describe our proposed design

Section 5 followed by the performance evaluation in

Section 6.

Section 7 is the conclusion.