1. Introduction
The Substitution box (S-Box) is a core element in block ciphers that is responsible for introducing nonlinearity to obfuscate the link between plaintext, ciphertext, and secret keys [
1]. This feature makes encryption algorithms safer against cryptanalysis tools like Differential Cryptanalysis and Linear Cryptanalysis [
2]. When an S-Box is deployed in hardware, it can be implemented using either lookup tables LUTs or direct computation via logic circuits. Furthermore, S-Box operations are repeatedly used across multiple encryption rounds. For example, in AES-128 [
3], each 128-bit data block undergoes 10 encryption rounds, with 16 S-Box substitutions per round. Consequently, the resource consumption of the S-Box significantly affects the hardware cost and overall performance of the encryption algorithm.
Within the field of embedded systems and IoT devices, where lightweight cryptography with minimal resource consumption is required [
4], designing an efficient S-Box from the outset, rather than merely optimizing existing implementations, is crucial. The choice of an 8-bit S-Box is justified as it is the standard size used in many block cipher algorithms. S-Boxes of 4-bit, 5-bit, or smaller could further reduce hardware costs but would significantly degrade nonlinearity, making the cipher more vulnerable to differential and linear attacks. Thus, an
S-Box provides an optimal balance between security and hardware efficiency, ensuring compliance with cryptographic criteria while remaining practical for resource-constrained devices. Resource optimization methods using SAT Solvers for implementing small-sized S-Boxes are often effective, as shown in a previous study [
5]. However, for larger sizes such as 8-bit, they typically face challenges due to the complexity of optimizing an 8-variable Boolean function. Therefore, if resource optimization is the goal, optimizing from the design stage is a better approach.
Many studies have primarily focused on improving the hardware implementation of existing S-Boxes rather than developing entirely new S-Box architectures that inherently minimize resource usage [
6,
7,
8]. Some studies focus on optimizing S-Box resources, but the cryptographic criteria parameters are very low [
9,
10].
The choice of using for constructing an 8×8 S-Box is driven by both mathematical efficiency and practical implementation considerations. Since naturally supports arithmetic operations on 4-bit elements, an 8-bit S-Box can be efficiently structured by combining two 4-bit elements, making finite field operations more suitable for this size.
For S-Boxes, using is unnecessary and inefficient, as the field size exceeds the required number of input bits. If a finite field approach is used, would be a more appropriate choice for 4-bit designs. However, S-Boxes generally consume very few hardware resources and their implementation can be efficiently optimized using logic minimization techniques. In contrast, an 8×8 S-Box is a Boolean function of eight variables, making it significantly more complex and challenging to optimize directly in hardware. Therefore, optimizing the S-Box at the design stage is crucial to achieving an efficient implementation.
Meanwhile, applying to S-Boxes of sizes such as or is challenging, as these sizes do not naturally align with the structure of the finite fields commonly used in cryptographic implementations. Unlike 8-bit designs, where each byte can be decomposed into two 4-bit elements for structured computation, a 5-bit or 6-bit S-Box would require a different field representation, resulting in increased complexity and reduced efficiency.
Several studies have demonstrated that multiplication in
can be efficiently implemented using logic circuits with a minimal number of gates [
11,
12]. Moreover, even for implementing existing S-Boxes such as the AES S-Box, researchers often seek to transform operations from
to
to reduce computational costs in hardware. This highlights the potential of
in designing S-Boxes that maintain security while optimizing resource consumption. This is also demonstrated in the study in [
13]. The authors of [
14] constructed the S-Box on a subset of the multiplicative group in a Galois field smaller than
instead of the entire field to reduce size, but the results are not optimal.
In this research, we propose a novel method for designing an 8-bit S-Box using multiplication in with modular reduction using an irreducible polynomial. The goal of the study is to propose an S-Box generation architecture that is both hardware-optimized and meets cryptographic criteria effectively.
The following points summarize this paper’s main contributions:
Propose an architecture for generating 8-bit S-Boxes based on multiplication in .
Experiment with modulo multiplication using various irreducible polynomials and evaluate the generated S-Boxes. The S-Box exhibits significantly lower hardware resource consumption compared to previous studies.
Conduct a comprehensive and extensive analysis of the S-Box, demonstrating that its cryptographic properties remain robust when compared to related works.
Assess the hardware implementation efficiency of the selected S-Box and evaluate its resilience to side-channel attacks (SCAs).
Estimate the number of CNOT, Toffoli, and NOT gates required for implementing the proposed S-Box in a quantum computing environment.
The structure of this paper is outlined as follows.
Section 2 introduces the proposed method.
Section 3 analyzes the cryptographic properties of the designed S-Box.
Section 4 evaluates the hardware resource consumption of the proposed S-Box implementation.
Section 5 examines the parameters and experimental results to assess its resistance against SCAs. Finally, the results are outlined in
Section 6.
2. Related Works
There are various methods for designing S-Boxes, each with distinct characteristics in terms of security, computational effectiveness, and hardware implementation feasibility.
A common approach to producing S-Boxes is the use of mathematical transformations over finite fields. This method is employed in AES [
3], where the S-Box is designed according to the multiplicative inverse operation combined with an affine transformation to ensure nonlinearity and uniform distribution. The advantage of this approach lies in its strong cryptographic properties; however, it requires complex arithmetic operations, making hardware implementation more challenging. A recently popular approach is the use of chaotic maps for S-Box generation [
15,
16,
17,
18,
19,
20]. Chaotic dynamical systems exhibit high sensitivity to initial conditions, enabling the creation of highly nonlinear and unpredictable S-Boxes. This method is suitable for applications requiring flexibility and dynamic S-Box generation instead of a fixed substitution structure. However, a notable drawback is that, for 8-bit S-Boxes, the achievable nonlinearity remains limited. As reported in [
18], only a few cases achieve a nonlinearity value of 112.
Additionally, many other methods have been explored with the goal of finding S-Boxes that satisfy criteria such as high nonlinearity and resistance to linear and differential cryptanalysis [
21]. As cryptographic attacks continue to evolve, research and improvements in S-Box generation methods remain a crucial area in cryptographic studies. This section will analyze research findings specifically related to hardware optimization for 8-bit S-Boxes.
The study by Canright et al. [
6] proposes a compact AES S-Box design utilizing subfield arithmetic in
and
, leading to a reduction in logic gate count. The results demonstrate a minimized S-Box implementation with 195 logic gates for the standalone design and 253 logic gates for the merged design, achieving greater optimization compared to previous approaches.
The studies in [
7,
8,
22,
23] have focused on optimizing resource consumption for the AES S-Box. One way to do this is to implement multiplication operations at the logic circuit level to cut down on the number of logic gates needed. The most efficient AES S-Box implementation to date is by Maximov [
23], which achieves logic gates of 64 XOR/NOR gates, 23 NAND/OR gates, 4 AND gates, and 6 multiplexers (MUXs) [
13,
23].
The studies by Rashidi [
13,
24] use a lightweight 8-bit S-Box design using inversion in
combined with an optimized affine transformation. To reduce hardware resource consumption, the inversion operation is performed in the composite field
instead of directly in
. A key aspect of this approach is the utilization of resource sharing in the field multiplication over
, rewriting inversion equations to minimize computational complexity and integrating computational blocks within the S-Box into a unified structure. This optimization reduces circuit area and latency compared to previous designs while maintaining a level of security equivalent to the AES S-Box. The most optimized S-Box implementation achieves a hardware footprint of 60 XOR/XNOR gates and 80 NAND/NOR gates.
Reference [
25] presents a low-cost 8-bit S-Box design using inversion in
combined with an affine transformation. To optimize hardware resources, the authors apply composite field arithmetic over
, reducing the computational cost of the inversion process. The S-Box construction consists of two main steps: first, performing inversion in
and then applying a low-area affine transformation. However, although the paper does not provide detailed information on the number of logic gates used, the S-Box architecture still requires multiple multiplications, squaring operations, and inversions. As a result, it is not entirely resource-efficient.
Kuznetsov et al. [
26] have proposed a way to make 8-bit S-Boxes that are more nonlinear by using the Hill Climbing Algorithm along with a new cost function. The goal is to cut down on the number of times the S-Box search process is run while keeping important cryptographic properties like higher nonlinearity and resistance to cryptanalytic attacks. Although the proposed method significantly improves efficiency, it achieves a maximum nonlinearity of 104, which is still lower than that of some optimally designed S-Boxes. This limitation may impact its robustness against differential and linear cryptanalysis in practical applications.
Reference [
27] describes a new way to find S-Box circuits that have the best multiplicative complexity (MC-optimal). It does this by combining the A* pathfinding algorithm with a detailed study of MC computation. This method extends the search space beyond existing tools such as SAT-solvers and LIGHTER, enabling the construction of optimized circuits for
and
S-Boxes, including bijective S-Boxes, almost perfect nonlinear (APN) S-Boxes, and certain quadratic permutations. Furthermore, the study establishes new lower bounds on the complexity of multiplicative AES and MISTY S-Boxes. However, while the research provides improved theoretical bounds for the MC of
S-Boxes, it does not directly apply the MC-optimal search method to construct optimized circuits for them.
There have been many studies on generating 8-bit S-Boxes from smaller S-Boxes using Lai-Massey, Feistel, and MISTY structures [
10,
28,
29,
30,
31]. However, these methods have limitations in cryptographic security and efficiency, as they rely on predefined structures and the properties of small-sizes S-Boxes. In reference [
9], the authors describe a new way to make S-Boxes by combining bitwise processes that come from the identity function. This is very different from how things have been done in the past. This technique is articulated as a Markov Decision Process, with reinforcement learning functioning as an appropriate solver. The goal of the study is to train an RL agent to generate S-Boxes that can efficiently apply the masking scheme. Studies [
9,
10,
28,
29,
30,
31] focus only on resource optimization, resulting in S-Boxes with rather poor cryptographic properties, i.e.,with nonlinearity less than 100.
In summary, there are various approaches to designing S-Boxes, each optimizing different criteria. However, the most crucial aspect is to design an S-Box that ensures both strong cryptographic properties and hardware efficiency. This research presents an effective solution to achieve this design requirement.
3. Proposed Method
3.1. Proposed Architecture
The proposed method constructs the S-Box by dividing the 8-bit data split into two 4-bit parts, a and b, and applying transformations in . Specifically, the upper part a, consists of bits , while the lower part b, consists of bits . The output is formed by computing A and B, concatenating them, and applying an XOR operation with . This approach reduces computational complexity and optimizes hardware implementation while preserving the essential nonlinear properties of the S-Box.
The data processing within the S-Box occurs in two consecutive steps. First, the value
B is computed based on
a and
b. If
b is zero,
B retains the value of
a. If
b is nonzero, a multiplication in
between
a and
b is performed to generate a new value for
B. Once
B is determined, the value
A is computed in a similar manner. If
B is zero,
A retains the value of
b. If
B is nonzero, a multiplication in
between
b and
B is performed to generate a new value for
A. These steps are expressed mathematically as Equation (
1). And Equation (
2) is applied to compute the inverse S-Box.
where ⊗ denotes multiplication in
, performed modulo an irreducible polynomial of degree 4.
Figure 1 illustrates the hardware architecture of the proposed S-Box and invert S-Box. In this design,
multiplications play a key role in generating the transformed values of
A and
B. However, to avoid unnecessary computations, the system employs multiplexer (MUX) units combined with OR gates to check whether each 4-bit input is zero. Specifically, if an input is zero, the MUX selects the unchanged value; otherwise, the MUX selects the result of the
multiplication.
Upon completing the processing steps, the two 4-bit parts, A and B, are concatenated to form the final 8-bit output of the S-Box. By leveraging this data-splitting approach, the proposed architecture not only optimizes the number of operations required in but also significantly reduces hardware resource consumption.
3.2. S-Box Construction
This section does not focus on the mathematical theory of multiplication in the Galois Field in detail. The study does not explore hardware optimization for implementing these multiplications. Many previous studies have already addressed this issue [
11,
12]. The primary objective of this section is to select and utilize multiplication effectively for S-Box construction. To multiply two-term polynomials, the elements in
must be multiplied, which necessitates the use of a fourth-degree irreducible polynomial. Multiplication in
is defined as Equation (
3).
In this context,
,
, and
are considered elements of the Galois Field
expressed in polynomial form, as seen in Equation (
4).
By solving Equation (
5), one can find the inverse
of an element
, which yields Equation (
6).
The outcomes of the resource evaluation for multiplication operations are comprehensively detailed in
Table 1. The results demonstrate that employing either the irreducible polynomial
or
for modular reduction in the multiplication operation yields an identical count of logic operations.
In the computation of the inverse S-Box, an additional inversion operation is required, as illustrated in Part (b) of
Figure 1. Therefore, the implementation of this operation is also evaluated in detail in
Table 2. The results indicate that the quantity of logical operations remains similar.
Based on these findings, the study will experiment with generating the S-Box using both and . The corresponding S-Boxes are generated and evaluated based on key cryptographic properties to determine the most suitable polynomial for the design.
Since each S-Box involves two multiplication operations in
, the choice of irreducible polynomials affects both computations. Two different irreducible polynomials are considered for these multiplications, resulting in four distinct S-Box implementations. The selection methods for these four cases are presented in
Table 3. Through the evaluation of these four cases, it was observed that the S-Boxes constructed using the same irreducible polynomial for both multiplications exhibited better cryptographic properties than those using different polynomials. The highest nonlinearity achieved was 112. Based on this result, the S-Box generated in Case S-Box 1, presented in
Table 4, was selected for detailed evaluation. Note that the calculation results for the inverse S-Box are applied in exactly the same manner, so the results are not presented in this section.
3.3. Comparison
Table 5 compares the number of logic gates used in different studies, where the design of this work utilizes 31 XOR gates, 32 AND gates, 6 OR gates, and 2 MUX21s, without using NAND/NOR gates. This design is relatively simple, with fewer total gates compared to other studies. Notably, this study has 31 XOR gates, far fewer than all others. For example, Zhang [
22] uses 154 XOR gates, Canright [
6] uses 91 XOR gates, and even the study with the lowest XOR count before this, Rashidi [
13], still uses 57 XOR gates. Since XOR gates have a higher resource cost than other logic gates, the significant reduction in XOR usage allows this design to save substantial hardware resources.
Here, we only make comparisons with studies that provide detailed information on the quantity of logic gates used in the S-Box design. The area of logic gates varies across different technologies, leading to corresponding differences in the Gate Equivalent (GE) count. Based on STM 65 nm technology [
8], the GE count of this design is calculated to be 115.00, the lowest among all studies compared. Compared to the previous best result in [
23] (168.00 GE), this design reduces 53 GEs, equivalent to a
reduction, demonstrating significant improvements in hardware resource efficiency. This outcome further demonstrates that the resource utilization of the S-Box produced in this research is merely around
relative to that in [
13].
In this section, hardware resources have been evaluated and compared from the design stage. Next, a detailed assessment of the cryptographic characteristics of the selected S-Box will be conducted.
4. Security Analysis
Numerous critical attributes necessary for a cryptographically robust S-Box have been presented in [
1,
2,
34,
35]. Rather than presenting the complete formulas for determining these criteria, we provide only the analytical results. These formulas are widely available in most S-Box literature. The program for analyzing all parameters of the S-Box has been developed and can be accessed here
https://github.com/dpp291187/S-Box-Cryptanalysis (accessed on 1 April 2025).
4.1. Nonlinearity
The nonlinearity (NL) of an S-Box is crucial in assessing its resistance to linear cryptanalysis [
1]. An S-Box with high nonlinearity provides better security. It enhances resistance against both linear and differential cryptanalysis.
There is no strict lower bound for the nonlinearity of an S-Box. However, research commonly considers a nonlinearity value of 100 as sufficient. The theoretical upper bound for the nonlinearity of an 8-variable Boolean function is 120. However, this value can only be achieved for unbalanced functions. In practice, the Boolean functions that constitute an S-Box are balanced, meaning they have an equal number of output bits set to 0 and 1. For balanced functions, the highest nonlinearity found so far is 116 [
36]. Therefore, the maximum achievable nonlinearity for an 8-bit S-Box is 116. Studies indicate that a nonlinearity of 112 is already highly effective for practical cryptographic applications.
As shown in
Table 6, the selected S-Box maintains a uniform nonlinearity value of 112 across all eight Boolean functions, indicating a strong resistance to linear approximations. This level of nonlinearity ensures that any attempt to approximate the S-Box output using affine functions remains computationally infeasible.
Additionally, the algebraic degree (AD) is determined to be 7, further reinforcing its resistance against algebraic cryptanalysis by complicating the process of deriving algebraic relations. These features collectively illustrate the resilience of the S-Box in cryptographic applications requiring both high security and efficient implementation.
4.2. Strict Avalanche Criterion
The Strict Avalanche Criterion (SAC) is a crucial characteristic of cryptographic S-Boxes, guaranteeing that a 1-bit alteration in the input yields an approximately 50% likelihood of altering each output bit. An S-Box is deemed sufficiently random when its SAC value is close to 0.5. By utilizing the equation in [
28] to compute the SAC for each output function of the S-Box, we obtain the results presented in
Table 7. The computed average SAC value is 0.501, which is regarded as nearly optimal.
4.3. Bit Independence Criterion
The Bit Independence Criterion (BIC) is a crucial measure for evaluating the independence between the output bits of an S-Box when the individual input bits are altered. This criterion is assessed based on two key properties: the Strict Avalanche Criterion (BIC-SAC) and the nonlinearity (BIC-NL).
The evaluation of these properties can be performed quickly and easily, giving us the BIC-NL and BIC-SAC values shown in
Table 8 and
Table 9, respectively. On average, the BIC-NL is calculated to be 107.14, while the BIC-SAC achieves a value of 0.478, which is considered a good result according to the BIC-SAC criterion.
4.4. Differential Approximation Probability
The security of an S-Box against differential cryptanalysis is measured by its Differential Uniformity, which indicates how uniformly output differences (
) are distributed for given input differences (
). The detailed method for computing DU is provided in [
2,
35]. The XOR distribution table is defined in Equation (
7).
where
is the S-Box output for input
x.
The XOR table values of the proposed S-Box are calculated in
Table 10. The XOR distribution table analyzes an S-box’s resistance to differential cryptanalysis by showing how output differences (
) distribute for given input differences (
). Rows represent
values, columns represent
values, and each cell shows how often a specific (
) pair occurs. The maximum value in the table (
) indicates the differential uniformity, where lower
values mean better resistance. In this case,
.
The differential uniformity is given by Equation (
8).
A lower
value indicates better resistance to differential attacks. From
Table 10, the Differential Uniformity of the proposed S-Box is
= 18.
The Differential Approximation Probability (DP) measures the probability of predicting the output difference corresponding to a given input difference in an S-Box. This property is very important for figuring out how resistant an S-Box is to differential cryptanalysis, since lower DP values mean that the S-Box is safer from these kinds of attacks. The Differential Approximation Probability (DP) is calculated as .
4.5. Linear Approximation Probability
The Linear Approximation Probability (LP) measures how likely it is that certain input and output bits of an S-Box will have a linear relationship. This metric is very important for figuring out how resistant an S-Box is to linear cryptanalysis. Lower LP values mean that linear approximations are less likely to work. The LP value of the proposed S-Box is determined to be 0.125. The steps for computing LP are explained in [
2,
35], which allows a thorough check of the S-Box’s defenses against linear attacks.
A related concept is the linear structure of an S-Box, which refers to the existence of input–output pairs that satisfy a linear equation (Equation (
9)).
If an S-Box has a strong linear structure, it implies that there exist certain values , such that the output transformation remains predictable under XOR operations. This can significantly weaken the S-Box against linear cryptanalysis. Generally, a lower LP value suggests that the S-Box has minimal linear structure, making it more resistant to attacks. Since the proposed S-Box achieves an LP of 0.125, it indicates that no strong linear structure exists, further enhancing its security.
The data in
Table 11 indicate that the proposed S-Box attains a nonlinearity (NL) of 112.00, which is equivalent to the best-performing S-Boxes in the comparison, including those from studies [
3,
13,
37]. This indicates a good resistance against linear cryptanalysis. Compared to other studies such as [
15,
16,
19,
20,
38,
39,
40,
41], the proposed S-Box demonstrates superior nonlinearity, highlighting its cryptographic strength.
Additionally, the BIC-NL value of this work does not reach the maximum value of 112.00. It still outperforms several other S-Boxes, including those from studies [
15,
16,
19,
20,
38,
39,
40,
41]. This signifies that the S-Box satisfies this condition effectively.
Concerning the Strict Avalanche Criterion, the proposed S-Box attains a value of 0.5009. This value is nearly equivalent to the optimal value of 0.5000. While it does not exactly match the optimal value, the deviation is minimal compared to the remaining S-Boxes. This indicates that the S-Box maintains good balance, ensuring that each output bit has an almost optimal probability of changing when a small input modification occurs. Furthermore, the LP value of this study is relatively low (0.125), enhancing its resistance against linear attacks, although it is not the lowest in the comparison.
A key advantage of the proposed S-Box is the absence of fixed points and opposite fixed points. The fixed point (FP) count is zero, written as . The opposite fixed point (OFP) count is also zero, denoted as . A fixed point is a value x where . An opposite fixed point is a value x where . The term refers to the bitwise complement of x. The presence of fixed points or opposite fixed points can weaken cryptographic systems. Such weaknesses increase vulnerability to certain attacks. The fact that the proposed S-Box has neither FP nor OFP indicates that it does not create predictable input–output patterns, thereby strengthening its security.
However, one limitation of the proposed S-Box is its DP value of 0.070, which is higher than other DP values. This may impact its resistance against differential attacks, although its high nonlinearity partially compensates for this drawback.
The research results indicate that the suggested S-Box is designed for efficient hardware implementation while preserving robust cryptographic features. Specifically, it exhibits high nonlinearity, near-ideal SAC, and no fixed points, contributing to its robustness in cryptographic applications. Although there is room for improvement in the DP criterion, the overall performance indicates that this S-Box remains a strong candidate compared to existing designs.
5. Implementation
5.1. FPGA-Based Implementation
The suggested S-Box is executed on an FPGA platform. It is specifically implemented on the Kintex 7-XC7K160T. The S-Box functions as a substitute for the Rijndael S-Box in the AES encryption technique. The implementation specifics are specified in [
3]. The original S-Box is implemented using a lookup table (LUT)-based approach. To ensure efficient execution, the AES-128 encryption process is designed using a loop-unrolled technique for the full 10-round encryption. The hardware implementation results are presented in
Table 12, providing a comparative analysis of the resource utilization. Notably, the results show that replacing the AES S-Box with the proposed S-Box does not increase hardware resource consumption. This holds true when implemented on the FPGA. The findings confirm its feasibility for integration into AES-based cryptographic systems.
Based on the implementation results presented in
Table 12, the following detailed observations can be made.
First, the standalone implementation of the proposed S-Box requires only 15 LUTs. This is significantly lower than the 32 LUTs needed for the Rijndael S-Box. The reduction corresponds to approximately 53.1% fewer LUTs. This demonstrates a significant reduction in hardware resource utilization compared to the traditional lookup table (LUT)-based approach. Second, when integrating the proposed S-Box into the AES-128 encryption algorithm, the total number of LUTs is reduced by 31.4%, from 1386 to 950. Similarly, the number of Flip-Flops (FFs) decreases by 35.0%, from 406 to 264. Additionally, the proposed S-Box eliminates the need for BRAM, reducing memory usage from 1 to 0, which is particularly beneficial for FPGA platforms with limited memory resources.
However, a notable drawback is the decrease in the maximum operating frequency (). The AES implementation using the proposed S-Box achieves an of 311.487 MHz, which is 9.6% lower than the 344.697 MHz achieved with the Rijndael S-Box. This reduction may impact the overall system performance. Regarding throughput, the AES implementation with the proposed S-Box achieves 3.98 Gbps, compared to 4.40 Gbps for the AES implementation using the Rijndael S-Box, representing a 9.6% reduction. While the proposed S-Box significantly reduces hardware resource consumption, it may lead to a slight decrease in computational performance.
In summary, the proposed S-Box offers substantial advantages in reducing LUT, FF, and BRAM usage compared to the traditional lookup table-based Rijndael S-Box implementation. However, the decrease in maximum operating frequency and throughput should be carefully considered when selecting an implementation approach for high-performance applications.
5.2. Quantum Implementation
In recent years, the implementation of S-Boxes on quantum circuits has emerged as a significant research direction in post-quantum cryptography. Numerous studies have focused on optimizing S-Box execution by reducing the number of quantum gates, minimizing circuit depth, or decreasing the number of required qubits to mitigate errors and optimize resource usage on quantum hardware. However, each approach entails trade-offs between gate count, circuit depth, and qubit requirements. Some recent studies on gate optimization for AES S-Box quantum circuits can be found in [
44,
45,
46]. In this study, we do not apply any optimization techniques but instead focus on evaluating the resource cost of implementing an S-Box on a quantum circuit. Specifically, we analyze the required qubit count, the number of quantum gates utilized (CNOT, Toffoli, and NOT gates), and the Toffoli depth. These basic gates are illustrated in
Figure 2.
In the proposed S-Box design as in
Figure 1, the circuit is composed of the following components: two multiplications in
, six two-input OR gates, two 2:1 multiplexers (MUX21s), and one XOR gate. These components are implemented following the approach of Li [
45], using 20 qubits. The quantum circuit components of the proposed S-Box circuit are illustrated in
Figure 3. Each
multiplication requires 23 CNOT gates and 9 Toffoli gates, so two multiplications consume a total of 46 CNOT gates and 18 Toffoli gates. Each OR gate is realized with 5 NOT gates and 1 Toffoli gate, which amounts to 30 NOT gates and 6 Toffoli gates for all 6 OR gates. Each MUX21 gate requires three CNOT gates and one Toffoli gate, resulting in six CNOT gates and two Toffoli gates for the two MUX21 units, while the XOR gate is equivalent to one CNOT gate. Summing up these contributions, the proposed S-Box circuit requires approximately 26 Toffoli, 53 CNOT, and 30 NOT gates. These resource estimates may increase slightly if additional uncomputation steps are needed, yet remain within the same order of magnitude, demonstrating a balanced trade-off between qubit count and gate cost in the implementation proposed by Li [
45].
The results in
Table 13 highlight significant differences between the proposed S-Box circuit and previous AES S-Box implementations. The proposed design requires 20 qubits, which is comparable to some previous works.
In terms of Toffoli gate count, the proposed S-Box circuit requires only 26 gates, significantly reducing computational overhead compared to AES S-Box implementations. Similarly, the number of CNOT gates in this design (53 gates) is drastically lower than in previous studies, where the required count ranges from 196 to 314 gates. This suggests that the proposed circuit achieves a more resource-efficient quantum realization, reducing both gate complexity and execution costs on quantum hardware.
However, the circuit exhibits a significantly higher X-gate (NOT-gate) count, reaching 30 gates, which is notably greater than in AES S-Box designs, where only 4 gates are required. Regarding Toffoli depth, the proposed circuit achieves a depth of 29, which is higher than in previous works.
Overall, the results demonstrate that the proposed S-Box circuit effectively reduces the number of Toffoli and CNOT gates, making it a promising candidate for resource-efficient quantum implementations. However, in this study, our primary focus is not on optimizing the quantum S-Box design itself but rather on providing a fundamental evaluation of its resource requirements in terms of gate counts. Specifically, we concentrate on quantifying the number of CNOT, Toffoli, and NOT gates employed in our proposed S-Box implementation. Other critical aspects of quantum circuit performance, such as overall circuit depth, qubit connectivity, error propagation, and fault tolerance overhead, are not addressed in the present work. These factors, while important for practical quantum computing, are beyond the scope of our current evaluation and will be the subject of future investigations.
6. Side Channel Attack Analysis
In modern cryptographic research, resistance to side-channel attacks (SCAs) is a key factor in evaluating security levels [
47]. Correlation Power Analysis (CPA) [
48] is a well-known type of side-channel attack that exploits power consumption leakages from cryptographic hardware. This attack technique exploits the relationship between a device’s power consumption and the data being processed to retrieve secret keys. CPA is considered one of the most effective techniques for retrieving cryptographic keys from hardware-based encryption implementations, including FPGAs, microcontrollers, and smart cards.
In this experiment, we conducted a CPA attack targeting the final round of AES-128 implemented on the FPGA Sakura X board. The main objective was to evaluate the S-Box’s resilience to an SCA. Unlike conventional discussions that focus on the theoretical principles of CPA, this section emphasizes the practical aspects, including the experimental setup and attack methodology.
The attack specifically targeted the last round of AES-128, which was implemented in its standard form without any countermeasures against the SCA. The system model used for acquiring power traces is depicted in
Figure 4.
The AES-128 encryption algorithm was implemented on the FPGA using a single-cycle-per-round architecture. This design ensures that each encryption round is completed in a single clock cycle, which facilitates efficient data processing and power measurement.
The attack was based on the Hamming Distance power consumption model, as referenced in [
49,
50]. This model posits that power consumption correlates with the frequency of bit transitions (0 to 1 or 1 to 0) occurring during cryptographic procedures.
To conduct the attack, plaintexts were randomly generated on a computer, while the encryption key remained fixed as a 16-byte sequence: [01, 02, 03, 04, 05, 06, 07, 08, 09, 0A, 0B, 0C, 0D, 0E, 0F] (expressed in hexadecimal notation).
Each 16-byte plaintext-key pair was transmitted from the computer to the FPGA, where the encryption process was executed. The corresponding ciphertexts were then computed and stored. Simultaneously, an oscilloscope was used to measure the power traces associated with each encryption operation. The computer was connected to both the FPGA and the oscilloscope to collect and synchronize the ciphertexts with their respective power traces.
A total of 30,000 power traces were collected during the attack. The CPA analysis focused on the final round of AES, where the last SubBytes operation and the final key addition occur. This round is particularly vulnerable because the relationship between the key and the processed data is more straightforward compared to earlier rounds.
The attack process involved the following steps:
Trace Acquisition: The oscilloscope captured power consumption data corresponding to each encryption operation.
Hypothesis Generation: Possible key byte values were hypothesized, and their expected power consumption patterns were computed using the Hamming Distance model.
Correlation Computation: The correlation coefficient between the observed power traces and the anticipated power consumption values was computed for each potential key hypothesis.
Key Recovery: The key hypothesis that exhibited the highest correlation was identified as the most likely key value.
To evaluate the effectiveness of the S-Box in this research in resisting CPA attacks, we examined the number of traces necessary for successful key recovery in two scenarios:
For the AES standard S-Box, 9000 power traces were sufficient to recover 14/16 (87.5%) of the key bytes. In this case, key byte 11 required the highest number of traces (11,000 power traces), while key byte 8 required the least (4000 power traces).
Figure 5 depicts the correlation coefficient in relation to the quantity of power traces for all 16 key bytes of the AES S-Box.
For the proposed S-Box, 16,000 power traces were sufficient to recover 81.25% (13/16) of the key bytes. Among these, key byte 6 required the highest number of traces (27,000 traces), whereas key byte 9 required the lowest (7000 traces). The correlation coefficient for key byte 6 is visualized in
Figure 6. The correlation coefficient with the highest absolute value corresponds to the correct key.
Figure 7 and
Figure 8 show the relationship between the correlation coefficient and the number of power traces, corresponding to key byte 6 and all 16 key bytes, respectively.
Figure 9 illustrates the number of traces needed to successfully recover all 16 key bytes using CPA analysis. The comparison is made between AES implementations with the original S-Box and the proposed S-Box. It is important to note that no countermeasures against CPA attacks were applied in this evaluation. All parameters, programs, and attack setups were kept identical. The attack was conducted with a total of 30,000 traces, and evaluations were performed at intervals of 1000 traces. The success rates in the chart represent rounded-up values; for instance, a reported success at 16000 traces means the actual success rate could be anywhere between 15,001 and 16,000 traces.
The number of traces needed to recover all 16 key bytes using the proposed S-Box is more than 2.5 times higher than with the AES S-Box. For an attack that recovers approximately 80% of the key bytes, this ratio exceeds 1.7 times. These practical evaluations on FPGA demonstrate that integrating the proposed S-Box into AES enhances its resistance against SCA. Future work will extend the evaluation to ASIC implementations.
The purpose of this evaluation is to analyze and compare the susceptibility of the proposed S-Box to side-channel attacks under the same unprotected conditions as AES. This study does not claim that the proposed S-Box inherently resists side-channel attacks. However, experimental results show that the proposed S-Box exhibits better resistance to information leakage compared to the AES S-Box under the same conditions without any countermeasures, such as masking or hiding. This suggests that the design of the proposed S-Box can more effectively mitigate information leakage, improving security in systems that do not yet implement side-channel attack countermeasures.
7. Conclusions
In this study, an 8-bit S-Box was constructed using a multiplication-based approach in the Galois Field , achieving both cryptographic strength and efficient hardware implementation. The proposed S-Box maintains a nonlinearity of 112, equivalent to the AES S-Box, while satisfying key security criteria such as the SAC, BIC, DP, and LP within secure thresholds.
When implemented as a logic circuit, the proposed S-Box demonstrates significantly improved efficiency. FPGA synthesis results indicate that the circuit complexity is reduced by more than 50%, leading to an overall decrease of over 30% in resource utilization across the AES algorithm. Furthermore, both the theoretical analysis and experimental results validate the improved resistance of the proposed S-Box against an SCA. Notably, the number of traces required for a successful CPA attack on AES-128 is over 2.5 times higher than that of the standard S-Box. This increase highlights its enhanced security against power analysis attacks.
Moreover, the quantum implementation of the proposed S-Box was evaluated in terms of quantum gate complexity. The results indicate that it requires 20 qubits, 26 Toffoli gates, 53 CNOT gates, and 30 NOT gates, with a Toffoli depth of 29.
The results validate the practical applicability of the proposed S-Box in real-world cryptographic systems. Its implementation as a standalone module demonstrates minimal hardware resource consumption, while its integration into AES confirms a significant reduction in overall resource usage without compromising security. These characteristics are particularly crucial for IoT devices and resource-constrained environments, where hardware efficiency is a primary concern. Additionally, such devices are often deployed in untrusted settings, making them vulnerable to physical attacks, including side-channel analysis. The proposed S-Box, with its optimized resource usage and proven resilience against such attacks, presents a practical and secure solution for lightweight encryption in embedded systems.
Overall, the proposed approach effectively balances hardware efficiency, strong compliance with key cryptographic criteria, and applicability for the S-Box. It contributes to the development of secure and efficient designs for block cipher algorithms. These designs can be applied to real-world data security and encryption systems.