Enhancing Time-Frequency Analysis with Zero-Mean Preprocessing

Side-channel analysis is a critical threat to cryptosystems on the Internet of Things and in relation to embedded devices, and appropriate side-channel countermeasure must be required for physical security. A combined countermeasure approach employing first-order masking and desynchronization simultaneously is a general and cost-efficient approach to counteracting side-channel analysis. With the development of side-channel countermeasures, there are plenty of advanced attacks introduced to defeat such countermeasures. At CARDIS 2013, Belgarric et al. first proposed time-frequency analysis, a promising attack regarding the complexity of computation and memory compared to other attacks, such as conventional second-order side-channel analysis after synchronization. Nevertheless, their time-frequency analysis seems to have lower performance than expected against some datasets protected by combined countermeasures. It is therefore required to study the factors that affect the performance of time-frequency analysis. In this paper, we investigate Belgarric et al.’s time-frequency analysis and conduct a mathematical analysis in regard to the preprocessing of frequency information for second-order side-channel analysis. Based on this analysis, we claim that zero-mean preprocessing enhances the performance of time-frequency analysis. We verify that our analysis is valid through experimental results from two datasets, which are different types of first-order masked Advanced Encryption Standard (AES) software implementations. The experimental results show that time-frequency analysis with zero-mean preprocessing seems to have an enhanced or complementary performance compared to the analysis without preprocessing.


Introduction
Cryptosystems play an important role in modern electronic systems, providing security mechanisms such as data encryption and authentication. Cryptographic algorithms are designed to be mathematically secure, and then they are implemented and operated on semiconductors. However, when the cryptographic algorithms operate, semiconductors leak unintentional information such as execution time [1], power consumption [2], electromagnetic radiation [3,4], sound [5], and temperature [6] related to internal instruction and data. Side-channel analysis (SCA) is cryptanalysis using the side-channel leakages. Because the leakages include direct/indirect information on secret key, and SCA has the nature of a divide-and-conquer strategy, vastly decreasing its computation cost, SCA is a practical threat, more efficient than other methods of algorithmic cryptanalysis such as differential/linear-like statistical cryptanalysis. Furthermore, the rapid growth of edge devices, such as embedded devices and the Internet of Things (IoT), identifies SCA as an actively ongoing research field for physical security, even though it was first introduced by Kocher in the late 1990s [7]. hiding combined countermeasure, neutralizing the desynchronization, such as static trace alignment and dynamic time wrapping-based alignment, must be conducted before the higher-order CPA. If any step of the attack against the combined countermeasure fails, the attack ultimately fails.
As an alternative approach for higher-order CPA against the combined countermeasure, at CARDIS 2013, Belgarric et al. proposed time-frequency analysis (TFA), which is an improved variant of FFT-2DPA [46] for CPA, [49]. TFA is a CPA on transformed traces, which are postprocessed through auto-correlation (auto-corr), cross-correlation (x-corr), or absolute value after being preprocessed through DFT or a discrete Hartley transform (DHT). Compared to the aforementioned higher-order CPA, the frequency-based transform is a role of desynchronization [50], and postprocessing is a role of mixing shares [49]. In the perspective of attacker ability, TFA attacker is less required to have the ability to synchronize traces compared to the higher-order CPA attacker because the Fourier transform is time-shift invariant. Furthermore, although there is a tradeoff regarding SNR, TFA attacker requires less capability to conduct PoI selection because the Fourier transform is a linear operation, and rough PoI range selection creates correct first-order leakage after the postprocessing. Fourier transform maintains the trace length, and it is a major strength in the complexity of computation and memory. Thus, TFA is a promising tool for side-channel attackers and evaluators. Nevertheless, when TFA is used, it occasionally fails to analyze some datasets protected by the combined countermeasure approach.
We thus investigate what causes the difference between success and failure when TFA is utilized. Similar to the latest work [51] that studies how to overcome poor performance due to the ghost peak in DPA using mathematical analysis and by employing normalized variance-based pre/post-processing, we study the reason for TFA's poor performance through mathematical analysis.
In this paper, we first mathematically investigate Belgarric et al.'s TFA against firstorder masking and hiding countermeasures and how TFA can remove the masking and hiding countermeasures. We explore the mathematical analysis on equations for the postprocessing of TFA. Based on this analysis, we claim that applying zero-mean preprocessing is necessary before frequency transform to enhance second-order SCA against cryptosystems protected by the combined countermeasure approach. We propose using standardization and min-max normalization methods as zero-mean preprocessing. We then validate our claim through experimental results on two datasets of different first-order masked AES software implementations. The experimental results show that TFA with zero-mean preprocessing seems to demonstrate either an enhanced or a complementary performance compared to TFA without the zero-mean preprocessing.

Preliminaries
We use the following notations: Let the capital letter X denote random variables, and the lowercase x denote their realizations. The j-th entry of a vector x is defined as x[j]. Let T ∈ R D×1 denote side-channel traces of length D. The targeted sensitive variable is Z = f (P, K) where f denotes a part of a cryptographic primitive, P denotes a public variable such as (partial) plaintext or ciphertext, and K denotes a (partial) secret key.

Correlation Power Analysis
The CPA exploits the correlation between a device's power consumption and the hypothesis of the data generated during computation. The correlation coefficient ρ between actual power consumption and the values of the predicted leakage model (ex. Hamming weight) of hypothesis is calculated as below: where variable T represents a set of real power consumption values, H is a set of predicted Hamming weight values, Cov(X, Y) is the covariance of X and Y, and Var(X) is the variance of X. If the key guess is correct, then the hypothesis is calculated correctly, and the correlation coefficient is higher than other cases on incorrect key guesses.

Masking Countermeasure
CPA is an attack to find a secret key by analyzing the correlation between power consumption traces and the intermediate values of a cryptographic algorithm. Such an attack can be counteracted by removing the relationship between the two data groups. The masking countermeasure is a mainstream and unique means to provide provable security against CPA.
Masking is implemented by using uniform random values, i.e., the mask value, to conceal the intermediate. For every execution of the crypto algorithm, new mask values are generated, and these random values are employed to hide all cryptographically calculated intermediate values. Algorithm 1 is a first-order masking AES-128 example using Boolean masking [18]. It is noted that the crypto algorithm is operated correctly, although the intermediate values are concealed by the masking values.

Hiding Countermeasure
The hiding scheme is another approach for counteracting SCA by decreasing the signal-to-noise ratio (SNR). The hiding scheme reduces the relation between traces and data, which are the side-channel leakages and intermediate value during cryptosystem operations. Hiding is a traditional countermeasure to SCA, and it sufficiently increases attack complexity while being cheaper than a masking scheme. However, if the attacker can use more traces or employ advanced preprocessing methodologies, such as alignment, noise reduction, or even the latest deep learning-based side-channel preprocessing, hiding countermeasures can be neutralized. Therefore, hiding is not currently used alone, but is widely used as a secondary means together with other countermeasures.

Second-Order Correlation Power Analysis
Second-order CPA (SO-CPA) is an attack method capable of analyzing first-order masking countermeasures. SO-CPA is equal to the original CPA, except for a preprocess which combines the two shares related on a single sensitive intermediate value. In this paper, we adhere to the assumptions and notations of Prouff et al. for SO-CPA [45]. We also concentrate on the product combining method to explore TFA from among the variable preprocessing methods for SO-CPA.
Let a sensitive value x ∈ Z m be separated into two shares x 0 , x 1 using a uniform random value r ∈ Z m as follows: The leakage of each share is defined as follows: where δ i is a constant term of the leakage, b i is an instance of a Gaussian random variable b i ∼ N (0, σ), and H is a hypothesis, e.g., Hamming weight or Hamming distance. Next, product combining, a representative preprocess for SO-CPA, can be applied to traces as follows: We denote the two versions of product combining as prod1 and prod2. After such product combining, CPA might succeed against preprocessed traces L or L targeting the sensitive intermediate value x.

Time-Frequency Analysis for Second-Order Side-Channel Analysis
TFA is one method for solving the hiding countermeasure by applying frequency transform through DFT (Definition 1) and DHT (discrete Hartley transform) and the masking countermeasure by applying postprocessing such as auto-correlation, cross-correlation, or absolute value for combining the shares' leakage points. TFA can thus analyze the two countermeasures, masking and hiding, at once. Definition 1. Given a sequence X ∈ R n , the discrete Fourier transform of X is calculated as follows: where 0 ≤ f < n and i ∈ C is the imaginary unit, i.e., i 2 = −1.
Belgarric et al. propose five analysis methods that combine the DFT or DHT with auto-corr or x-corr, and exploit other correlations using max-corr as a heuristic method. Belgarric et al. finally asserted that future studies should investigate TFA's potential for higher-order SCA because postprocessing can be naturally adjusted for it. In the next section, we investigate how Belgarric et al.'s TFA work for defeating masking and hiding combined countermeasures.

Mathematical Analysis on Time-Frequency Analysis with Zero-Mean Preprocessing
In this section, we explicitly investigate Belgarric et al.'s TFA and explore detailed mathematical formulas for TFA postprocessing. Then, based on this mathematical analysis, we claim that applying zero-mean preprocessing followed by TFA must be required to improve the performance of the attack.
For the simplicity of representing our mathematical analysis, we have made some assumptions. First, we assume that two sensitivity leakages are included in a single time interval; thus, we only explore the absolute value postprocessing method, since other cases can be easily analyzed in the same manner. Second, we focus on DFT rather than DHT because DHT can also be easily analyzed in the same manner as our first assumption.

Mathematical Analysis
By Euler's formula, DFT(X)[ f ] in Equation (6) can be represented as follows: Since DFT(X) is a linear combination of X, the two leakage parts on L i are not yet mixed multiplicatively in DFT(X) for SO-SCA. Thus, a step that mixes leakage parts on two shares multiplicatively is required. For such a mixing step, the absolute value |DFT(X)[ f ]| is computed as follows: For simplicity, we only consider index t = 0, 1 representing the time samples of L 0 , L 1 , and index t = 2, a time sample irrelevant to sensitivity value, respectively, i.e., two samples of each share and an independent sample. We skip the constant term 1/ (n). Finally, the absolute value can be simplified as follows: From Equation (9), we can observe the term of prod1 shown in Equation (4). However, the constant part δ of the leakage model in Equation (3) is not zero, in many cases of the SCA domain. In these cases, prod2 is, in general, superior to prod1 for SO-CPA. Since applying zero-mean preprocessing makes prod1 terms into prod2 terms, applying zeromean preprocessing will improve the performance of TFA for SO-CPA. We next introduce two candidates for the zero-mean preprocessing.

Zero-Mean Preprocessing
To improve the performance of TFA, we applied zero-mean preprocessing to the traces before converting to the frequency domain. We suggest the following two well-known methods for zero-mean preprocessing.

Definition 2.
(Min-max Normalization) Given X j ∈ R n j = 0, 1, 2, . . . , N − 1 , the minmax normalization is calculated as follows: Definition 3. (Standardization) Given X j ∈ R n j = 0, 1, 2, . . . , N − 1 , standardization is calculated as follows: First, standardization performs zero-mean preprocessing, followed by producing a unit variance. Because the unit variance processing does not affect the CPA performance, we can use standardization as zero-mean preprocessing for enhancing TFA. We chose standardization because it is included in many signal processing libraries. Second, although min-max normalization cannot create an exact zero-mean, we expect it to approximate the mean to zero, especially in desynchronized traces, and thus to enhance TFA. It is also included in many signal processing libraries. The next section shows that the experimental results validate our claim.

Experimental Results
In this section, we describe the experiments we conducted and verify our claim by analyzing the performance changes of TFA whether zero-mean preprocessing is applied or not. We consider three different methodologies, which are the same except for their preprocessing parts. They are denoted as follows in the rest of this paper: • TFA represents Belgarric et al.'s time-frequency analysis without any zero-mean preprocessing.
• TFAwPS represents a time-frequency analysis with standardization. • TFAwPN represents a time-frequency analysis with min-max normalization.
We applied each methodology to two datasets that were protected by different firstorder masking and a random delay. We repeated each methodology 50 times against each dataset to estimate the guessing entropy per the varying number of traces.
In this section, we show only three methodologies (first-order CPA in the time domain, TFA, and TFAwPS) because we prioritized observing the degree to which zero-mean preprocessing changes performance. Please refer to Appendix A to compare the experimental results on TFAwPS and TFAwPN.

Introduction to the AES-M Datasets
AES-M is a dataset of collected power consumption traces for the simplest form of first-order masked AES with 8-bit software implementation [18]. We acquired 60,000 power traces of AES operation, as shown in Algorithm 1, operating on a CW308T-STM32F target board [52] at 7.37 MHz using a CW1173 ChipWhisperer-Lite [53] with a sampling rate of 29.54 MHz. Note that the power traces acquired from the ChipWhisperer have a high SNR. We then attempted a first-order CPA and confirmed that there was no first-order leakage.
To compare the performance of each methodology in more challenging situations, we simulate desynchronization by only a random shift. In short, we experimented on three cases of AES-M datasets, such as ASCAD dataset with [43]: synchronized traces and two kinds of desynchronization-simulated traces, randomly shifted within 50 and 100 points, respectively. We denote each case as follows: AES-M-sync, AES-M-desync50, and AES-M-desync100. In all the cases, we chose the same two time intervals corresponding to lines 4 and 12 in Algorithm 1 for the points of interest.

Experimental Results on AES-M-Sync
We first investigated the power traces of AES-M-sync, as shown in Figure 1, and then selected the target time intervals, as shown in Figure 1c. We then transformed the concatenation of the two time intervals for each trace by TFA and TFAwPS, as shown in Figure 2a,b.
Unlike a first-order CPA with raw traces, we succeeded at guessing the correct key using CPA against waveforms that were transformed by both TFA and TFAwPS. This result implies that a transformation based on FFT and absolute value is valid for analyzing first-order masking countermeasures.
We investigated the change of guessing entropy per varying the number of traces in three cases using basic CPA with raw traces, TFA, and TFAwPS. Figure 3a,b presents the best cases of TFA and TFAwPS, respectively, while Figure 4b presents the worst cases for TFA and TFAwPS, respectively. These results show that TFAwPS performed better than TFA and CPA. Even in the best TFA case, TFAwPS and TFA demonstrate a similar performance;

TFA TFAwPS
Average max peak of absolute correlation coefficients 0.036785 0.042720 Average confidence (1st max peak/2nd max peak) 1.591291 1.840527 The number of found key bytes 16 16

Experimental Results on AES-M-Desync50
We conducted experiments on AES-M-desync50. Figure 5a presents the traces for AES-M-desync50. As in the previous section, we transformed the concatenation of the two time intervals for each trace using TFA and TFAwPS. As shown in Figure 5b, each transformation by TFA and TFAwPS generates a very similar waveform because the transformation is based on the FFT.
Likewise, we investigated the change of guessing entropy per varying the number of traces in three cases investigating a basic CPA with raw traces, TFA, and TFAwPS. Figure 6a,b presents the best cases for TFA and TFAwPS, respectively, while Figure 7a,b presents the worst cases for TFA and TFAwPS, respectively. In the best case, TFAwPS has a better output than TFA and in the other cases, even the worst case, TFAwPS shows a similar performance against AES-M-desync50. Table 2 presents that TFAwPS found one more key than TFA did.
However, concerning the results on all bytes, which are provided in Table A1 in Appendix A, TFA and TFAwPS have a mutual complementary performance regarding each found key byte. We thus conclude that both analyses must be performed together in this dataset case.

Experimental Results on AES-M-Desync100
We also experimented on AES-M-desync100. Figure 8a presents the traces for AES-M-desync100. As in the previous section, we transformed the concatenation of the two time intervals for each trace using TFA and TFAwPS. As shown in Figure 8b, each transformation by TFA and TFAwPS also generates a very similar waveform because the transformation is based on the FFT.
We investigated the change of guessing entropy per varying the number of traces on three cases using a basic CPA with raw traces, TFA, and TFAwPS. Figure 9 present the best cases for TFA and TFAwPS, while Figure 10a,b presents the worst cases for TFA and TFAwPS, respectively. From the best and worst results, we recognize that TFA and TFAwPS achieve similar performances. Due to the use of fixed time intervals and severe random delay, TFA and TFAwPS succeeded at finding only four keys (See Table 3).

TFA TFAwPS
Average max peak of absolute correlation coefficients 0.028241 0.027701 Average confidence (1st max peak/2nd max peak) 1.234980 1.200778 The number of found key bytes 12 13    Table 3. The comparison of results for TFA and TFAwPS against AES-M-desync100.

TFA TFAwPS
Average max peak of absolute correlation coefficients 0.024567 0.025794 Average confidence (1st max peak/2nd max peak) 1.083455 1.123166 The number of found key bytes 4 4 This poor performance is related to the efficiency of the analysis, and it is a tradeoff relationship between performance and efficiency. If we choose longer time interval window sizes, we can obtain better or worse results, due to adding noise in the time samples. The attacker must choose appropriate time intervals properly for successful analysis. Nevertheless, like the results in the previous section, TFA and TFAwPS have a mutual complementary performance in view of found key bytes (See Table A1 in the Appendix A).

Experimental Results on ASCAD Dataset
ASCAD is a well-known public dataset in the SCA domain for the reproducible study of profiling SCA, and it includes a set of electromagnetic (EM) traces obtained by measuring the emissions using an 8-bit MCU (ATMEGA8515), where a first-order masked AES operates [43]. The target of leakage is the 3rd SubByte in the 1st round of AES. ASCAD has three kinds of trace cases: ASCAD-sync, ASCAD-desync50, and ASCAD-desync100, as shown in Figure 11. The original traces are well aligned and are used as an ASCADsync case. ASCAD-desync50 and ASCAD-desync100 are made by simulating uniform random shifting within 50 and 100, respectively, from the original synchronized traces. ASCAD includes 50,000 and 10,000 traces in each case for the profiling and attack phases, respectively, since ASCAD is a dataset for benchmarking profiling attacks. We used all 60,000 traces simultaneously, including profiling traces and attack traces, because we only investigated TFA and TFA with zero-mean preprocessing.
Just as in the previous section, we conducted the same preprocessing and analysis methodologies on these datasets, employing the basic CPA, TFA, and TFAwPS against the ASCAD datasets. We present the results against each dataset according to varying the number of traces in Figures 12-14. Note that TFA fails to find the key byte in all cases except for ASCAD-desync100. We cannot hypothesize why TFA can find the key against ASCAD-desync100. However, all results show that TFAwPS always succeeds in finding the key byte, and that TFAwPS outperforms TFA. Figure 15 illustrates the average guessing entropy on all ASCAD datasets per the number of traces. From these results, we again easily confirm that TFAwPS outperforms TFA for all ASCAD datasets. Therefore, all of the results on ASCAD datasets clearly support our claim. the ASCAD datasets. We present the results against each dataset according to varying the number of traces in Figures 12-14. Note that TFA fails to find the key byte in all cases except for ASCAD-desync100. We cannot hypothesize why TFA can find the key against ASCAD-desync100. However, all results show that TFAwPS always succeeds in finding the key byte, and that TFAwPS outperforms TFA. Figure 15 illustrates the average guessing entropy on all ASCAD datasets per the number of traces. From these results, we again easily confirm that TFAwPS outperforms TFA for all ASCAD datasets. Therefore, all of the results on ASCAD datasets clearly support our claim.  the ASCAD datasets. We present the results against each dataset according to varying the number of traces in Figures 12-14. Note that TFA fails to find the key byte in all cases except for ASCAD-desync100. We cannot hypothesize why TFA can find the key against ASCAD-desync100. However, all results show that TFAwPS always succeeds in finding the key byte, and that TFAwPS outperforms TFA. Figure 15 illustrates the average guessing entropy on all ASCAD datasets per the number of traces. From these results, we again easily confirm that TFAwPS outperforms TFA for all ASCAD datasets. Therefore, all of the results on ASCAD datasets clearly support our claim.

Conclusions
We explored Belgarric et al.'s TFA. While their TFA can be used for SO-CPA with roughly chosen time intervals of leakages, and it has the potential to scale for higher-order attacks, it still performs poorly in some datasets. Because our mathematical analysis of the TFA shows that zero-mean preprocessing acts as a more suitable leakage term in situations when the constant noise is not zero or an equivalent leakage term otherwise, we claim that zero-mean preprocessing can enhance TFA performance. We conducted experiments and provided results supporting this claim. In the case of synchronized datasets, the TFA with zero-mean preprocessing achieves a better result than the TFA without preprocessing. In addition, in the case of desynchronized datasets, the TFA with zero-mean preprocessing has a mutual complementary result with the TFA without preprocessing. Such complementary results might be caused by a non-zero constant noise situation. However, it is difficult to determine whether the noise constant is zero or not in desynchronization situations. Therefore, based on both mathematical analysis and our experimental results, we conclude with a recommendation to use both the TFA with zero-mean preprocessing and without preprocessing for datasets in which masking and desynchronization countermeasures, such as random delay, are applied concurrently.