Multi-Dimensional Fusion Deep Learning for Side Channel Analysis

Deng, Tuo; Wang, Huanyu; He, Dalin; Xiong, Naixue; Liang, Wei; Wang, Junnian

doi:10.3390/electronics12234728

Open AccessArticle

Multi-Dimensional Fusion Deep Learning for Side Channel Analysis

by

Tuo Deng

^1,2,

Huanyu Wang

^3,*,

Dalin He

^1,2,

Naixue Xiong

³,

Wei Liang

³ and

Junnian Wang

^1,2

¹

School of Physics and Electronic Science, Hunan University of Science and Technology, Xiangtan 411201, China

²

Hunan Province Key Laboratory of Intelligent Sensors and Advanced Sensing Materials, Xiangtan 411201, China

³

School of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan 411201, China

^*

Author to whom correspondence should be addressed.

Electronics 2023, 12(23), 4728; https://doi.org/10.3390/electronics12234728

Submission received: 22 September 2023 / Revised: 28 October 2023 / Accepted: 4 November 2023 / Published: 22 November 2023

(This article belongs to the Section Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

:

The rapid advancement of deep learning has significantly heightened the threats posed by Side-Channel Attacks (SCAs) to information security, transforming their effectiveness to a degree several orders of magnitude superior to conventional signal processing techniques. However, the majority of existing Deep-Learning Side-Channel Attacks (DLSCAs) primarily focus on the classification accuracy of the trained model at the attack stage, often assuming that adversaries have unlimited computational and time resources during the profiling stage. This might result in an inflated assessment of the trained model’s fitting capability in a real attack scenario. In this paper, we present a novel DLSCA model, called a Multi-Dimensional Fusion Convolutional Residual Dendrite (MD_CResDD) network, to enhance and speed up the feature extraction process by incorporating a multi-scale feature fusion mechanism. By testing the proposed model on two software implementations of AES-128, we show that it is feasible to improve the profiling speed by at least 34% compared to other existing deep-learning models for DLSCAs and meanwhile achieved a certain level of improvement (8.4% and 0.8% for two implementations) in the attack accuracy. Furthermore, we also investigate how different fusion approaches, fusion times, and residual blocks can affect the attack efficiency on the same two datasets.

Keywords:

side-channel attacks; deep learning; multi-scale feature fusion; network fusion; AES-128

1. Introduction

With the advent of 5G and AI technology, intelligent Internet of Things (IoT) edge devices have gained widespread usage in cities, workplaces, households, and personal settings [1]. As a result, many cryptographic algorithms are deployed in smart terminal products to ensure secure communication and data storage among these devices. However, the security of cryptographic algorithms at the design stage does not guarantee the same level of security when they are implemented on physical devices. Side-Channel Attacks (SCAs) have emerged as an realistic threat for extracting and retrieving secret keys of different cryptographic implementations. In general, the purpose of SCA is to analyze inadvertent physical leakage (e.g., power consumption) during the enforcement of cryptographic implementations to bypass the theoretical strength of the cryptographic design. This passive and non-intrusive attack was first introduced by Kocher in 1996 [2]. With the progress of side-channel attack techniques over the past decade, it has become a crucial criterion for assessing the security of devices or chips by prominent international cryptographic product evaluation institutions.

In recent years, deep learning techniques have seen increasingly widespread adoption across various domains. The rise of deep learning, as discussed in Goodfellow et al. [3], has made side-channel attacks more feasible and perilous to our information security, given the ability of deep learning models to effectively extract features from raw data. Utilizing neural networks in SCAs can not only significantly reduce the complexity of data processing [4], but also improve the attack efficiency with several orders of magnitude [5].

In 2016, Maghrebi et al. initially explored the potential of Multilayer Perceptrons (MLP) and Convolutional Neural Networks (CNN) in assisting Side-Channel Attacks (SCAs) for secret key recovery, leveraging power consumption as a side-channel. To further explore power analysis studies, Cagli et al. went one step further to apply the CNN model with an approach for data augmentation [6] to bypass the Random Delay Interrupt (RDI) and to overcome the instability in the clock. Afterwards, in 2020, investigated the effect of changing hyper-parameters of MLP and CNN models for SCA [7]. While the majority of previously reported attacks before 2019 did not consider board diversity [8,9], proposed a multi-source training approach for DLSCAs to mitigate the impact of board variety. This approach has significantly enhanced attack efficiency in realistic attack scenarios. Additionally, neural networks were also utilized to assist Template Attacks (TAs) in reference [10], which significantly improves the classification accuracy of the conventional template attacks [11]. In summary, existing works have indicated that DLSCAs offer advantages over traditional attack methods (such as TAs) across various attack scenarios, although training assets and readiness time were used in the profiling phase.

However, most of the presented DLSCAs focus on the extent to which the attack can be more efficient during the attack stage and may overestimate the computational and time resources available to some attackers during the profiling stage. A challenge commonly encountered with deep learning algorithms is their susceptibility to local optima and overfitting at the profiling stage, given their black-box nature. Consequently, the resulting model structure and performance may not always represent the global optimal solution and adversaries may not have unlimited computational and time resources for keeping searching as assumed in the majority of reported DLSCAs. Based on this, Liu et al. proposed a special neural network model containing a dendrite structure, called Dendrite Network (DD network), which is a polynomial neural network with adjustable accuracy and human-readable topology [12,13]. This ‘transparent’ network has strong interpretation, high system recognition, and low computational complexity, which makes SCAs based on the DD network outperform other deep learning models, such as CNN, in terms of model parameter size, attack accuracy, and training time.

Therefore, in this paper, we propose a novel DLSCA model, called the Multi-Dimensional Fusion Convolutional Residual Dendrite (MD_CResDD) network, by collaboratively applying both the fusion mechanism and the dendrite structure to CNN. We show that, at the profiling stage, the collaboration of the fusion mechanism and the dendrite structure makes the model fit faster. Furthermore, it can also effectively mitigate the occurrence of overfitting, while guaranteeing a certain level of improvements in classification accuracy at the attack stage. The word ’multi-dimensional’ refers to both the feature dimension and the network dimension. In essence, in the feature dimension, we employ parallel multi-scale feature fusion to eliminate redundant features and capture diverse information from multiple original features involved in the fusion process. This approach aims to enhance the overall feature extraction capability. Likewise, in the network dimension, the fusion technique combines the distinct learning preferences of each sub-network, leading to a fusion network with improved performance. Specifically, the Residual Dendrite (ResDD) network can enhance the data processing capability of the DD network through the use of residual modules, effectively mitigating the risk of overfitting [14,15].

In the following experiments, we test the proposed framework on two software implementations of Advanced Encryption Standard (AES) by using power consumption as the side channel. Our experimental results exhibit that the proposed MD_CResDD network enhances the profiling rate by at least 34% compared to other existing deep-learning models for DLSCAs, while achieving notable improvements in attack accuracy (8.4% and 0.8% for two implementations). Furthermore, we explore the impact of various fusion approaches, fusion times, and residual blocks on the attack efficiency using the same two datasets.

The structure of the paper is as follows. Section 2 provides an background to AES and presents an overview of profiling SCAs, along with the review of the Points of Interests (PoIs) and commonly used evaluation metrics for SCAs. In Section 3, we introduce how the multi-demensional fusion mechanism can help DLSCAs, while Section 4 and Section 5 present our experimental setups, datasets, and the results of the attacks. Finally, in Section 6, we conclude the paper by discussing open problems and offering concluding remarks.

2. Background

This section begins by reviewing the AES-128. Subsequently, it introduces the concepts of profiling SCAs and elaborates on how the side-channel security of cryptographic implementations can be evaluated.

2.1. Advanced Encryption Standard

In the mid-to-late twentieth century, when computer science was developing rapidly, the security of the Data Encryption Standard (DES) was seriously threatened by the superior information processing power of computers, which led to the emergence of a new encryption standard, the Advanced Encryption Standard [16]. AES has a packet length of 128-bit and three key lengths of 128-bit, 192-bit, and 256-bit, respectively. In this paper, we focus on the AES-128 encryption algorithm. The AES encryption process requires a total of 10 rounds, and each iteration includes four operations (see Figure 1): AddRoundKey, SubByte, ShiftRows, and MixColumn, while the tenth iteration does not include the MixColumn. The SubByts operation in software implementations uses a pre-stored lookup table called SBox from the internal registers in the encryption chip for byte substitution, and loads the generated intermediate state value of the data bus. This makes the output of the SBox substitution in the first round of the AES a common attack point for adversaries.

S o u t_{0} = S b o x (k_{0} \oplus p_{0})

(1)

Formula (1) shows how the

0_{t h}

byte of the SBox output in the first round of AES-128 is generated, wherein

p_{0}

represents the first byte of the plaintext p and

k_{0}

represents the first byte of the initial key k.

2.2. Profiling SCA

SCA is a method that exploits the leakage of physical characteristics from implementations of cryptographic algorithms. By analyzing the non-sensitive information generated during the device’s operation, such as power consumption, electromagnetic radiation, acoustic information, and other observable signals, adversaries may have a chance to extract the sensitive data from victim devices. Among the exiting SCA methods, Power Analysis (PA) has demonstrated notable effectiveness [17]. PA utilizes the correlation between the instantaneous power consumption and the data being processed, along with the operations performed by the device. Its objective is to exploit this correlation to execute successful attacks. Within the domain of PA, three widely employed leakage models, including the Identity (ID), the Hamming weight (HW), and the Hamming distance (HD) model [18,19], are utilized to illustrate the captured power consumption traces. Specifically, we adopt the ID model in the experiments as it does not suffer from the data imbalance problem [20], which encompasses a total of 256 labels. The attack strategy for DLSCAs, by using power consumption as the side channel, consists of the following steps:

Capture profiling traces. Collect a set of N profiling traces, denoted as $T = {T_{i} ∣ i = 0,$ $1, \dots, N}$ , by conducting AES-128 encryption operations on N random plaintexts and N corresponding random keys using the profiling device.
Build leakage profile. The captured traces are first labeled according to the selected attack point. Afterwards, we train deep-learning models on these labeled traces to learn a leakage profile between traces and the key-dependent sensitive value.
Measure traces from victim devices. Repeating the AES-128 encryption operation on the victim device(s) with an unknown key $k^{*}$ to obtain a set of M power traces denoted as $T^{'} = {T_{i}^{'} | i = 0, 1, . . ., M}$ .
Extract the secret key. The pre-trained deep learning model is used to classify traces $T_{i}^{'}, i = {0, 1, . . ., M}$ captured from the victim device(s) to acquire the score vector $S_{i} = s_{i, 0}, s_{i, 1}, . . ., s_{i, 255}$ . Next, we can obtain the subkey candidate $k_{i}$ with the highest classification probability for trace $T_{i}^{'}$ according to the obtained score vector and the selected attack point. Afterwards, the attack is considered successful when the guessed subkey value with the highest classification probability matches the value of the real subkey $k_{i} = k^{*}$ .

$k_{i} = S B o x^{- 1} (a r g max (S_{i})) \oplus p_{i},$

(2)

wherein $p_{i}$ is the first byte of the corresponding plaintext for trace $T_{i}^{'}$ and $S B o x^{- 1} ()$ denotes the inverse of the SubByte procedure.

2.3. Points of Interest

In the context of SCA, PoIs are specific locations or instances in the measurements where interesting information related to the cryptographic operation or key leakage can be observed. Identifying and analyzing these points is crucial in extracting relevant information and conducting successful attacks. In this study, the Signal-to-Noise Ratio (SNR) is employed to identify PoIs in the power traces. Specifically, we show the process of recovering the first subkey and the procedure for other subkeys will be the same. The SNR serves as a measure of the information leakage at different points in the traces. Higher SNR values indicate greater leakage. Formula (3) shows how the SNR is calculated:

S N R = V a r (P_{e x p}) / (V a r (P_{s w . n o i s e} + P_{e l . n o i s e})),

(3)

where

P_{e x p}

denotes the specific portion of power consumption that is accessible to the analyzer, containing the relevant information. In addidtion,

P_{s w . n o i s e} + P_{e l . n o i s e}

represents the noise component. We summarize the procedures of how the SNR of traces are calculated as follows.

First, derive the intermediate value by using the known information of the profiling traces according to the selected attack point (SBox output in the first round of AES).
Calculate the SNR between the power consumption trace data and the obtained intermediate value. Afterwards, select the trace segments with the highest SNR values as PoIs. Figure 2 represents the plot of a captured power trace and the SNR results of the corresponding traces.

2.4. Metrics

PGE (Partial Guessing Entropy) is a common evaluation metric in the side-channel domain used for multi-trace attacks. Single-trace attack accuracy is employed when the attack is more efficient and does not require multiple attempts to achieve a certain attack capability. This metric is used in various attack scenarios where the attacker uses only one trace for each attack. In the case where a single trajectory is used as input, the attack is successful if the predicted value of the actual key is the largest among all participants’ outputs. The single-trace attack accuracy measures how often the model correctly classifies using a single trace. The single-trace classification accuracy is defined as Formula (4).

a c c (T^{'}) = \frac{|\{T_{i}^{'} \in T^{'}\} w i t h k_{i} = k^{*}|}{|T^{'}|}

(4)

In general, we consider an attack to be more efficient when it achieves higher accuracy in classifying traces.

3. Multi-Dimensional Fusion for DLSCAs

In this section, we introduce how the proposed MD_CResDD Network works in DLSCAs, along with the multi-scale feature fusion and ResDD network.

3.1. MD_CResDD Network

As we have mentioned above, the proposed MD_CResDD Network collaboratively applies both the fusion mechanism and the dendrite structure to CNN. In the following subsections, we provide further details about the extent to which the multi-scale feature fusion can help DLSCAs and explain the concept of a ResDD network.

3.1.1. Multi-Scale Feature Fusion

CNN is widely recognized for its superior feature extraction capabilities [21]. By utilizing convolutional kernels to automatically extract informative features from input data and incorporating local connectivity and shared weights, CNN enables the continuous optimization of hyper-parameters. In this paper, the concept of multi-scale feature fusion refers to the use of convolutional kernels of different sizes in the same layer to obtain features with different perceptual fields before passing them onto the next layer, allowing for a more flexible balance of computational effort and generalization capability. Among these approaches, the early fusion mechanism can generate a more integrated and comprehensive feature representation that emphasizes detailed features and possesses a stronger capability to represent information. In contrast, late fusion takes advantage of higher-level features extracted from previous levels to enhance the expressive power of the model. In this paper, we combine the early fusion with the late fusion approach to experimentally compare the impact of different fusion methods on the SCAs brought by different fusion approaches, including Add, Subtract, Multiply, and Average.

3.1.2. ResDD Network

Existing studies have demonstrated that dendritic networks (DD) are capable of effectively classifying linearly indistinguishable inputs [22,23,24]. These networks achieve the classification task by generating logical expressions that incorporate logical relations (with or without) for the respective classes. In contrast, ResDD is a derivative of the dendritic network that utilizes the residual model. It successfully addresses the degeneracy issue that arises when the model becomes deeper. Moreover, ResDD exhibits superior generalization ability and controlled accuracy compared to the dendritic network. Additionally, its mapping capability can be adjusted by varying the number of modules. This paper conducts a comparative analysis of the impacts of different residual blocks on the experimental results, denoted as ResDD-X, ResDD-WX, ResDD-A, and ResDD-WA, respectively (as described in Formula (5)).

\{\begin{matrix} A^{L} = W^{l, l - 1} A^{l - 1} \circ X + W^{l, l - 1} A^{l - 1} \\ A^{L} = W^{l, l - 1} A^{l - 1} \circ X + W_{R}^{l, l - 1} A^{l - 1} \\ A^{L} = W^{l, l - 1} A^{l - 1} \circ A^{l - 1} + W^{l, l - 1} A^{l - 1} \\ A^{L} = W^{l, l - 1} A^{l - 1} \circ A^{l - 1} + W_{R}^{l, l - 1} A^{l - 1} \end{matrix},

(5)

where ∘ is denoted as the Hadamard product and the Hadamard product is the product of the corresponding elements of the corresponding matrix.

A^{L}

and

A^{l - 1}

are the inputs and outputs of the module. X is the original input of ResDD.

W^{l, l - 1}

and

W_{R}^{l, l - 1}

are two different weight matrices.

4. DLSCAs Based on MD_CResDD Network

In this section, we first provide detailed information regarding the equipment used in the experiments. Additionally, we describe the two distinct datasets tested in our experiments. Next, this section describes the process involved in constructing the MD_CResDD network, along with an evaluation of the influence of various fusion times and different residual approaches on DLSCAs.

4.1. Experimental Setups

In this paper, all the experimental models were constructed and trained using the Keras 2.3.1 deep learning framework and TensorFlow-GPU 2.2.0. The experiments were conducted on a computer system equipped with an Intel(R) Core(TM) i7-8550U CPU @ 1.80 GHz processor and an NVIDIA GeForce 940MX GPU.

Next, we introduce the two datasets used in our experiments.

4.1.1. CW Dataset

The first database used in the experiment is called Chipwhisperer (CW) dataset, which is not publicly available. The traces in the CW dataset are collected from the CW308T_STM32F3 microcontroller implementation of AES-128 equipped with a 32-bit Arm Cortex-M4 CPU. In the experiment, we collect 10,000 traces in total, with each trace containing 3000 sampling points. The traces are split into a training set, a validation set, and a test set in a ratio of 5:1:4.

4.1.2. Xmega Dataset

The Xmega public datasets, which consist of traces captured from an 8-bit AVR Atmel ATXmega128D4 microcontroller by using a LeCroy Wave Runner 104MXi DSO equipped with a ZS1000 active probe and connected to the host PC via Ethernet. In the Xmega public dataset [25], each power trace contains 1700 sampling points. The dataset includes 100,000 training traces and 50,000 testing traces in total.

4.1.3. Software vs. Hardware Implementations of AES

Software implementations of AES are executed on general-purpose processors such as CPUs (Central Processing Units). The encryption and decryption processes are carried out using dedicated software instructions. Hardware implementations of AES involve the use of dedicated hardware components, such as ASICs (Application-Specific Integrate Circuits) or FPGA (Field-Programmable Gate Array) devices, designed specifically for AES encryption and decryption.

In general, the side-channel leakage of software-based AES implementations is time-sensitive and exhibits lower noise levels. This is due to the sequential execution of instructions, which enables deep learning models to more easily investigate the characteristics associated with each subkey using a divide-and-conquer approach. In contrast, in hardware-based AES implementations, captured traces often intertwine characteristics related to a greater number of operations because instructions are executed concurrently. Consequently, this inherently increases the complexity of side-channel attacks, particularly when advanced process technologies are involved. However, software implementations continue to hold great significance across numerous applications and scenarios due to the following advantages when compared to hardware implementations of AES:

Flexibility: Software implementations are highly flexible and can be run on a wide range of devices, from computers to embedded systems. They are not tied to specific hardware, making them versatile for various applications.
Ease of Updates and Maintenance: Software implementations can be easily updated and maintained. If there are improvements or security patches for the AES algorithm, they can be applied to the software without changing the hardware.
Portability: AES software implementations are highly portable, allowing the same code to be used on different platforms, including different operating systems and architectures.

4.2. SCA Flow Based on MD_CResDD Network

Data acquisition and pre-process. Data pre-processing including dataset split and PoIs selection.
Build leakage profile. The established leakage model is continuously trained and optimised to learn a leakage profile between traces and the key-dependent sensitive value.
Key recovery. In the AES-128 encryption algorithm, the key consists of 16 bytes. To recover each byte of the key, we follow a sequential approach. We derive the intermediate value state associated with the byte of the key that needs to be recovered using the trained model. Finally, we recover the target subkey from the derived intermediate value based on the known plaintext information. The complete process of the MD_CResDD based deep-learning side channel attack is illustrated in Figure 3.

4.3. Impact of Different Fusion Methods on SCA

When designing the MD_CResDD network, both the fusion time and fusion method play crucial roles. For the fusion time, we use a combination of early fusion and late fusion. In terms of fusion methods, four approaches are compared in the experiments: Add, Subtract, Multiply, and Average. The control variables method is utilized to ensure accurate and effective experiments. Figure 4 illustrates the accuracy of the model across 100 epochs on the CW dataset, employing various fusion methods. Figure 4a–c illustrate the training process of the models with different fusion methods, while Figure 4d demonstrates a comparison between models with and without feature fusion.

From Figure 4, we can find that the MD_CResDD network exhibits a faster fitting rate when employing the Mul_Add fusion method. Table 1 shows the cases of the highest testing accuracy achieved using different fusion methods for early and late fusion, using 300 training epochs. Notably, the model demonstrates the highest testing accuracy when the fusion method employed is Mul_Add.

The fusion mechanism involves combining different fusion times and methods. Early fusion integrates features from diverse sources or levels in the initial network stages, enriching the data and enabling the simultaneous consideration of multiple features during learning and decision-making. Late fusion reduces noise propagation from early-level features, aiding in better feature filtering and extraction for the task. Different fusion methods have a significant impact on the essential features. MD_CResDD employs a blend of early and late fusion, utilizing the Mul_Add fusion method to introduce sensitive information features initially and eliminate disruptive ones later.

4.4. Impact of Different Residual Blocks on SCA

An experimental comparison of model testing accuracy was conducted using different residual blocks, called ResDD-X, ResDD-WX, ResDD-A, and ResDD-WA. The results of this comparison are illustrated in Figure 5.

The experimental results indicate that the model achieves the best performance when utilizing the Res-XW residual module. Formula (6) represents a single Res-XW block, while the summation of two residual modules further enhances the model’s performance.

A^{l} = W^{l, l - 1} A^{l - 1} \circ X + W^{l, l - 1} A^{l - 1}

(6)

f (X) = W^{2, 1} (W^{1, 0} X) \circ X + W_{0}^{1, 0} X) \circ X + W_{1}^{2, 1} ((W^{1, 0} X) \circ X + W_{0}^{1, 0} X)

(7)

Assume that there are two features

x_{1}

and

x_{2}

in the original input data X, and the function is expanded to include

x_{1}^{2}

,

x_{1} x_{2}

, and constant c, if three ResDD modules are expanded to include

x_{1}^{3}

,

x_{2}^{3}

,

x_{1}^{2} x_{2}

,

x_{2}^{2} x_{1}

, and constant c. From the function expansion, it can be found that

f (x)

achieves a linear fit through a non-linear and the

x_{2}^{2} x_{1}

and

x_{1}^{2} x_{2}

indicate the relationship between the features (feature logic relationship). Figure 6 shows the learning strategy of the ResDD network module.

4.5. MD_CResDD Network Structure

The initial phase of the MD_CResDD network, as described before, entails extracting features at multiple scales from the input data using convolutional kernels of varying sizes. These features, obtained at different scales, are then fused through an early fusion layer. Subsequently, the fused features are passed to the ResDD network layer, where further fusion operations occur in the final layer of the ResDD module. The structure of the MD_CResDD network built in this paper is as shown in Figure 7. The MD_CResDD model introduces feature fusion and network fusion mechanisms in contrast to typical DLSCA models. The feature fusion mechanism enhances information representation and fortifies the model’s robustness, while the network fusion mechanism mitigates the risk of model overfitting. Moreover, building on the ResDD model’s strengths, the MD_CResDD model is adept at quicker learning.

5. Experimental Results

This section provides a extensive appraisal of the performance of the MD_CResDD network on two distinct datasets. Furthermore, a comparative analysis is conducted with various DLSCAs, highlighting the strengths and effectiveness of the MD_CResDD network.

5.1. Experimental Results and Attacks for CW Datasets

After building the model, it is further optimized by adjusting hyperparameters, including the learning rate, batch_size, the number of convolutional layers, and the number of ResDD layers, to achieve the optimal attack performance. The optimizer used for the optimized model is Adam, with a learning rate of 0.0001. The softmax function is employed as the activation function for the output layer. The specific hyperparameters are listed in Table 2.

Figure 8 shows the performance of model training on the CW datasets. Figure 8 illustrates the model training accuracy, while Figure 8b depicts the loss of model training.

The model exhibits rapid convergence within the initial 100 epochs of the training process, followed by a gradual decrease in the fitting rate to reach its optimal state. In the training set, the loss decreased to 1.009 with an accuracy of 66.19%, and in the test sets, the loss decreased to 0.986 with an accuracy of 65.1%, with no overfitting.

5.2. Experimental Results on Xmega Dataset

Figure 9 shows the model training results on the public Xmega dataset. Figure 9a is for the model training accuracy and Figure 9b shows the loss of the model during the training process.

The model exhibits exceptional performance, reaching near-optimal levels within 50 epochs. The final model achieved an accuracy of 0.9987 on the test set, with a minimal loss of 0.0045. These results highlight the model’s high accuracy, fast convergence, and resilience against overfitting.

5.3. Comparison of Several Deep Learning SCA

In order to assess the performance of the MD_CResDD network in side channel attacks, this paper conducts a comparative experiment involving three other deep learning SCA models trained on CW datasets and Xmega public datasets. Specifically, the CNN model, DD network model, MLP network model, and MD_CResDD network model were selected for training and attack purposes. It is worth noting that the structures and model parameters of the CNN, DD network, and MLP models are well-established and known [14,26]. To ensure the validity and accuracy of the comparative experiments, the division between training and testing data in the datasets was maintained consistently. This ensures a fair and reliable comparison between the MD_CResDD network and the other SCA models, allowing for a comprehensive evaluation of their respective performances.

5.3.1. Comparison Experiments on the CW Datasets

After training 600 epochs, the performance of the four models on the CW datasets is shown in Table 3. The ’Compare’ column in the table indicates the number of epochs required for the MD_CResDD network to reach the optimal value of the other models.

Table 3 shows that the MD_CResDD network performed better in terms of test accuracy, reaching 0.651 with low model loss, while the model reached its highest accuracy after only 270 epochs of training. Also, the model improved the fitting speed by 45.4% over the best-performing models in CNN, MLP, and DD.

Figure 10 represents the test accuracy and the loss of the model after 500 rounds of training on the CW dataset, respectively. It can be seen that the multidimensional deep learning network MD_CResDD proposed in this paper has a faster fitting speed and higher accuracy compared with the other three models.

5.3.2. Comparison Experiments on Xmega Datasets

After training for 400 epochs, the performance of the four models on the Xmega datasets is presented in Table 4. The ’Compare’ column in the table indicates the number of epochs required for the MD_CResDD network to reach the optimal value of the other models.

The results in Table 4 demonstrate that the MD_CResDD network achieved a remarkable test accuracy of 0.998 on the Xmega dataset, coupled with a minimal model loss. Impressively, it attained the highest accuracy after only 243 epochs of training. Furthermore, the MD_CResDD network exhibited a 34.0% improvement in fitting speed compared to the best-performing models in CNN, MLP, and DD.

Figure 11 illustrates the test accuracy and loss of each model after 100 training rounds on the Xmega dataset. The results indicate that all models exhibit commendable performance on the public dataset, highlighting their excellent generalization ability.

We propose a novel MD_CResDD network that incorporates both the fusion mechanism and the dendrite structure into CNN. Meanwhile, we also investigate the impact of various fusion methods, such as Add, Subtract, Multiply, and Average, as well as different ResDD-X, ResDD-WX, ResDD-A, and ResDD-WA residual blocks on DLSCA. Furthermore, we compare the performance of three different DLSCA models on two software implementations of AES-128. We evaluate the attack efficiency of our proposed MD_CResDD network and compare the results with other existing neural networks for DLSCAs. The MD_CResDD network model exhibits strong performance compared to the other three networks across various datasets. Specifically, the model demonstrates a significant improvement in fitting speed, achieving a speed enhancement of 45.4% and 34.0% over the best-performing models in CNN, MLP, and DD on the two datasets, respectively. Additionally, the MD_CResDD model shows no signs of overfitting. Furthermore, the MD_CResDD network achieves notable enhancements in test accuracies, with improvements of 8.4% and 0.8% on the respective datasets.

5.3.3. Discussion

In this study, we encountered a significant trade-off between improving the efficiency of SCA, maintaining accuracy, and managing costs. We conducted a series of experiments to compare the profiling efficiency and the test accuracy of different DLSCAs. The results indicate that the MD_CResD network improves profiling efficiency without compromising test accuracy. However, due to the higher complexity of the model compared to traditional deep learning attack methods, it results in a slight increase in costs. Certainly, we take into account the challenges associated with deploying DLSCA models in practical scenarios, encompassing aspects like memory utilization, latency, and real-time processing. Resource limitations are also a crucial aspect of these challenges. When resources are constrained, attackers face limitations in constructing and utilizing complex neural networks, leading to a decreased attack efficiency.

6. Conclusions and Future Works

This section concludes with a summary of the paper. In addition, it includes possible future issues.

6.1. Conclusions

We aim to enhance the speed of DLSCA analysis without compromising the effectiveness of DLSCA attacks. Thus, we propose a novel MD_CResDD network, which collaboratively applies both the fusion mechanism and the dendrite structure to CNN. Additionally, we investigate the impact of various fusion methods, including Add, Subtract, Multiply, and Average, as well as different ResDD-X, ResDD-WX, ResDD-A, and ResDD-WA residual blocks on DLSCA. Furthermore, we compare the performance of three different DLSCA models on two software implementations of AES-128, evaluating the attack efficiency of the proposed model and comparing the results with other existing neural networks for DLSCAs. The experimental results demonstrate that utilizing the fusion mechanism that combines early and late fusion, with the fusion mode set as Mul_Add, and employing the Res-XW residual block, the proposed MD_CResDD network enhances the profiling rate by at least 34% compared to other existing deep-learning models for DLSCAs while achieving notable improvements in attack accuracy (8.4% and 0.8% for two implementations). Meanwhile, MD_CResDD has potential future research directions regarding two aspects. Firstly, exploring the attack capabilities of the network model on other datasets, such as wireless side-channel datasets, to investigate the noise and interference resilience of MD_CResDD under long-distance attack scenarios. Secondly, further exploration will be conducted to study the impact of different architectures and fusion strategies on channel analysis.

6.2. Future Works

Investigate and assess the performance and effectiveness of the MD_CResDD network in various cryptographic implementations. Our ongoing exploration of MD_CResDD on new datasets, such as the wireless side channel, includes assessing its performance under different software or hardware configurations implementing encryption algorithms. By conducting these studies, we can gain valuable insights into the potential and effectiveness of the MD_CResDD network in different cryptographic scenarios. Ultimately, this research will provide crucial insights to deepen our understanding of the applicability and effectiveness of the MD_CResDD network.
The development of effective countermeasures against DLSCAs is of utmost importance in safeguarding our devices. To address this concern, future efforts should focus on devising strategies to protect devices from DLSCAs. Promising design directions include reducing physical leakage during the implementation of cryptographic algorithms and enhancing the interference between leakage and physical leakage. By minimizing the extent of physical leakage and increasing the complexity of the side-channel information, these countermeasures aim to fortify the security of devices against DLSCAs.

Author Contributions

T.D. contributed to data acquisition, data analysis, and manuscript writing. H.W. played an active role in the conception and design of the work, accepting accountability for the project. D.H. participated in data acquisition and analysis. N.X. and W.L. granted final approval for the version intended for publication. J.W. contributed to the design of the work, engaged in data interpretation, and also provided final approval for the published version. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the research form of the National Natural Science Foundation of China (No. 61973109).

Data Availability Statement

The datasets generated and/or analyzed during the current study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The supporting source has no conflict of interest.

Abbreviations

SCAs	Side-Channel Attacks
DLSCAs	Deep-Learning Side-Channel Attacks
MD_CResDD	Multi-Dimensional Fusion Convolutional Residual Dendrite
MLP	Multilayer Perceptrons
CNN	Convolutional Neural Networks
DD	Dendrite Network
ResDD	Residual Dendrite Network
AES	Advanced Encryption Standard
DES	Data Encryption Standard
POIS	Points of Interests
PA	Power Analysis
SNR	Signal-to-Noise Ratio
CPU	Central Processing Unit
GPU	Graphics Processing Unit
FD	Failure Detector

References

Hu, W.J.; Fan, J.; Du, Y.X.; Li, B.S.; Xiong, N.; Bekkering, E. MDFC–ResNet: An agricultural IoT system to accurately recognize crop diseases. IEEE Access 2020, 8, 115287–115298. [Google Scholar] [CrossRef]
Kocher, P.C. Timing attacks on implementations of Diffie-Hellman, RSA, DSS, and other systems. In Proceedings of the Advances in Cryptology—CRYPTO’96: 16th Annual International Cryptology Conference, Santa Barbara, CA, USA, 18–22 August 1996; Springer: Berlin/Heidelberg, Germany, 1996; pp. 104–113. [Google Scholar]
Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016; Available online: http://www.deeplearningbook.org (accessed on 8 November 2022).
Cagli, E.; Dumas, C.; Prouff, E. Convolutional neural networks with data augmentation against jitter-based countermeasures: Profiling attacks without pre-processing. In Proceedings of the Cryptographic Hardware and Embedded Systems–CHES 2017: 19th International Conference, Taipei, Taiwan, 25–28 September 2017; Springer: Berlin/Heidelberg, Germany, 2017; pp. 45–68. [Google Scholar]
Wang, R.; Wang, H.; Dubrova, E.; Brisfors, M. Advanced far field EM side-channel attack on AES. In Proceedings of the 7th ACM on Cyber-Physical System Security Workshop, Virtual, 7 June 2021; pp. 29–39. [Google Scholar]
Wong, S.C.; Gatt, A.; Stamatescu, V.; McDonnell, M.D. Understanding data augmentation for classification: When to warp? In Proceedings of the 2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA), Gold Coast, Australia, 30 November–2 December 2016; pp. 1–6. [Google Scholar]
Benadjila, R.; Prouff, E.; Strullu, R.; Cagli, E.; Dumas, C. Deep learning for side-channel analysis and introduction to ASCAD database. J. Cryptogr. Eng. 2020, 10, 163–188. [Google Scholar] [CrossRef]
Das, D.; Golder, A.; Danial, J.; Ghosh, S.; Raychowdhury, A.; Sen, S. X-DeepSCA: Cross-device deep learning side channel attack. In Proceedings of the 56th Annual Design Automation Conference 2019, Las Vegas, NV, USA, 2–6 June 2019; pp. 1–6. [Google Scholar]
Wang, H.; Forsmark, S.; Brisfors, M.; Dubrova, E. Multi-source training deep-learning side-channel attacks. In Proceedings of the 2020 IEEE 50th International Symposium on Multiple-Valued Logic (ISMVL), Miyazaki, Japan, 9–11 November 2020; pp. 58–63. [Google Scholar]
Wu, L.; Perin, G.; Picek, S. The best of two worlds: Deep learning-assisted template attack. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2021, 2021, 413–437. [Google Scholar] [CrossRef]
Chari, S.; Rao, J.R.; Rohatgi, P. Template attacks. In Proceedings of the International Workshop on Cryptographic Hardware and Embedded Systems, Redwood Shores, CA, USA, 13–15 August 2002; Springer: Berlin/Heidelberg, Germany, 2002; pp. 13–28. [Google Scholar]
Wu, X.E.; Mel, B.W. Capacity-enhancing synaptic learning rules in a medial temporal lobe online learning model. Neuron 2009, 62, 31–41. [Google Scholar] [CrossRef] [PubMed]
Liu, G.; Wang, J. Dendrite net: A white-box module for classification, regression, and system identification. IEEE Trans. Cybern. 2021, 52, 13774–13787. [Google Scholar] [CrossRef] [PubMed]
Wang, J.; Wang, W.; Wenxin, Y. Side channel attacks based on dendritic networks. J. Xiangtan Univ. (Natural Sci. Ed.) 2021, 2, 16–30. [Google Scholar]
Liu, G. It may be time to improve the neuron of artificial neural network. TechRxiv 2023. [Google Scholar] [CrossRef]
Daemen, J.; Rijmen, V. The Design of Rijndael; Springer: Berlin/Heidelberg, Germany, 2002; Volume 2. [Google Scholar]
Crocetti, L.; Baldanzi, L.; Bertolucci, M.; Sarti, L.; Carnevale, B.; Fanucci, L. A simulated approach to evaluate side-channel attack countermeasures for the Advanced Encryption Standard. Integration 2019, 68, 80–86. [Google Scholar] [CrossRef]
Bookstein, A.; Kulyukin, V.A.; Raita, T. Generalized hamming distance. Inf. Retr. 2002, 5, 353–375. [Google Scholar] [CrossRef]
Ngai, C.K.; Yeung, R.W.; Zhang, Z. Network generalized hamming weight. IEEE Trans. Inf. Theory 2011, 57, 1136–1143. [Google Scholar] [CrossRef]
Picek, S.; Heuser, A.; Jovic, A.; Bhasin, S.; Regazzoni, F. The Curse of Class Imbalance and Conflicting Metrics with Machine Learning for Side-channel Evaluations. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2019, 1, 209–237. [Google Scholar] [CrossRef]
LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
Gidon, A.; Zolnik, T.A.; Fidzinski, P.; Bolduan, F.; Papoutsi, A.; Poirazi, P.; Holtkamp, M.; Vida, I.; Larkum, M.E. Dendritic action potentials and computation in human layer 2/3 cortical neurons. Science 2020, 367, 83–87. [Google Scholar] [CrossRef] [PubMed]
Mel, B.W. Information processing in dendritic trees. Neural Comput. 1994, 6, 1031–1085. [Google Scholar] [CrossRef]
London, M.; Häusser, M. Dendritic computation. Annu. Rev. Neurosci. 2005, 28, 503–532. [Google Scholar] [CrossRef] [PubMed]
Kizhvatov, I. Side channel analysis of AVR XMEGA crypto engine. In Proceedings of the 4th Workshop on Embedded Systems Security, Grenoble, France, 15 October 2009; pp. 1–7. [Google Scholar]
Wang, H.; Brisfors, M.; Forsmark, S.; Dubrova, E. How diversity affects deep-learning side-channel attacks. In Proceedings of the 2019 IEEE Nordic Circuits and Systems Conference (NORCAS): NORCHIP and International Symposium of System-on-Chip (SoC), Helsinki, Finland, 29–30 October 2019; pp. 1–7. [Google Scholar]

Figure 1. Block diagram of AES-128 [16].

Figure 2. A plot of an example trace captured from the CW308T_STM32F3 target board and the corresponding SNR analysis results on traces captured from the same implementation of AES-128. The red line represents the leakage range of the target byte.

Figure 3. The flowchart of SCAs based on the proposed MD_CResDD.

Figure 4. Comparative results of experiments with different fusion methods.

Figure 5. Comparative results of different residual blocks.

Figure 6. ResDD module learning strategy.

Figure 7. The structure of theproposed MD_CResDD model.

Figure 8. MD_CResDD model’s training performance on the CW datasets.

Figure 9. MD_CResDD model’s training performance on the Xmega datasets.

Figure 10. The training performance of four models on the CW datasets.

Figure 11. The training performance of four models on the Xmega datasets.

Table 1. The highest testing accuracy of the model training.

	Add	Subtract	Multiply	Average
Add	0.621	0.643	0.641	0.613
Subtract	0.004	0.004	0.004	0.004
Multiply	0.651	0.629	0.005	0.606
Average	0.636	0.637	0.635	0.623

Table 2. Hyperparameter setting.

Optimizer	Adam
Learning rate	0.0001
Loss function	Cross-Entropy
Mini_batch	128
Epoch	400

Table 3. The performance of each model on the CW datasets at epochs = 600.

Models	Testing Accuracy	Training Loss	Epochs	Compare
CNN [26]	0.599	1.261	487	65 epochs
DD [14]	0.607	1.092	512	67 epochs
MLP [26]	0.573	1.058	492	53 epochs
MD_CResDD	0.651	0.986	270

Table 4. The performance of each model on the Xmega datasets at epochs = 600.

Models	Testing Accuracy	Training Loss	Epochs	Compare
CNN [26]	0.988 4	0.014 0	371	47 epochs
DD [14]	0.991 9	0.023 5	492	80 epochs
MLP [26]	0.976 2	0.030 5	368	26 epochs
MD_CResDD	0.998 7	0.004 5	243

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Deng, T.; Wang, H.; He, D.; Xiong, N.; Liang, W.; Wang, J. Multi-Dimensional Fusion Deep Learning for Side Channel Analysis. Electronics 2023, 12, 4728. https://doi.org/10.3390/electronics12234728

AMA Style

Deng T, Wang H, He D, Xiong N, Liang W, Wang J. Multi-Dimensional Fusion Deep Learning for Side Channel Analysis. Electronics. 2023; 12(23):4728. https://doi.org/10.3390/electronics12234728

Chicago/Turabian Style

Deng, Tuo, Huanyu Wang, Dalin He, Naixue Xiong, Wei Liang, and Junnian Wang. 2023. "Multi-Dimensional Fusion Deep Learning for Side Channel Analysis" Electronics 12, no. 23: 4728. https://doi.org/10.3390/electronics12234728

APA Style

Deng, T., Wang, H., He, D., Xiong, N., Liang, W., & Wang, J. (2023). Multi-Dimensional Fusion Deep Learning for Side Channel Analysis. Electronics, 12(23), 4728. https://doi.org/10.3390/electronics12234728

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Multi-Dimensional Fusion Deep Learning for Side Channel Analysis

Abstract

1. Introduction

2. Background

2.1. Advanced Encryption Standard

2.2. Profiling SCA

2.3. Points of Interest

2.4. Metrics

3. Multi-Dimensional Fusion for DLSCAs

3.1. MD_CResDD Network

3.1.1. Multi-Scale Feature Fusion

3.1.2. ResDD Network

4. DLSCAs Based on MD_CResDD Network

4.1. Experimental Setups

4.1.1. CW Dataset

4.1.2. Xmega Dataset

4.1.3. Software vs. Hardware Implementations of AES

4.2. SCA Flow Based on MD_CResDD Network

4.3. Impact of Different Fusion Methods on SCA

4.4. Impact of Different Residual Blocks on SCA

4.5. MD_CResDD Network Structure

5. Experimental Results

5.1. Experimental Results and Attacks for CW Datasets

5.2. Experimental Results on Xmega Dataset

5.3. Comparison of Several Deep Learning SCA

5.3.1. Comparison Experiments on the CW Datasets

5.3.2. Comparison Experiments on Xmega Datasets

5.3.3. Discussion

6. Conclusions and Future Works

6.1. Conclusions

6.2. Future Works

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI