Side-Channel Evaluation Methodology on Software

.


Introduction
Many papers deal with side-channel attacks ( [1][2][3][4]). However, the digital systems have become so complex that one cannot speak about only one side-channel, but many of them, such as: protocol-level (e.g., error rates in post-quantum cryptography (PQC), code-based), cache-attack, simple power analysis (SPA), differential power analysis (DPA), etc. They are generally addressed individually. For example, a scientific paper will show how to best break one given countermeasure using one attack, in a precise context. However, seldom is there a big picture of evaluating a fully-fledged implementation end-to-end. Now, from a designer's or an evaluator's standpoint, the goal is to get rid of all the leakages, and/or have full coverage. In practice, leakage detection can appear in two flavors: 1. Formal verification of the absence of leakage. The analysis outcome is binary, and therefore unambiguous to interpret. 2. Trace-driven leakage detection. In this case, the tests feature true/false positive/negative probabilities. Detection metrics shall therefore be analyzed in great detail, as, for instance, dramatized in [5].
One apparent drawback of this approach is that once the control flow has been balanced, the traces are already well aligned for subsequent statistical analysis of the data leakage. However, this issue shall rather be considered as an advantage: from the developer's standpoint the burden of trace alignment is relieved, and therefore the developer can focus on real activity, namely, leakage analysis, in ideal conditions. From the lab evaluator's standpoint, the alignment is indeed an issue, but it is not the core protection, and resynchronization techniques do exist (cross-correlation, dynamic time warping (DTW), etc.) and/or some analyses are invariant in the time offsets (frequency domain analysis, convolutional neural networks or CNN [7], recurrent neural networks with connectionist temporal classification loss [8], etc.). Therefore, in the rest of the paper, we assume a progression of the analysis in two steps: horizontal and then vertical.

Illustrations
We illustrate the article briefly with symmetric cryptography (AES), and in more detail with asymmetric cryptography (RSA). The RSA implementation is protected with masking, namely, exponent blinding and base blinding (also illustrated in Figure 1). Figure 1. Top: Insecure RSA. Bottom: Plaintext masking of RSA (note that this countermeasure has been initially proposed against timing attacks [9], at the time where vertical attacks where not known-however it fits the purpose of protecting against such attacks) built on top of an insecure RSA.

Contributions
As our contributions in this paper, we propose: • A methodology to analyze leakages based on a partitioning of the studied algorithm.
• Symbolic (in whitebox context) and dynamic (in blackbox context) horizontal leakage detection, and their repair (a topic that is seldom addressed). • A new strategy for vertical leakage detection (in the case of aligned traces), which does not need any determination of sensitive variables to some constant. Additionally, this strategy leverages a two-step algorithm which first selects the points of interests (that depend on the key), and second, checks whether they are properly masked.
This paper is basically revisiting comprehensive side-channel analysis on software implementations, showing how to detect, diagnose and then repair them in an interactive manner. Typically, we show that the number of fixes to apply to an implementation of mbedTLS RSA is such that the final overhead in terms of performances is about +40% clock cycles. Only after this fix is applied, vertical analyses can follow (indeed, vertical analyses assume that traces are aligned). When traces are not aligned, blinding is ineffective. In this respect, the novel method we put forward allows one to detect sensible samples which are unmasked, with an algorithm that is universal, in that it does not require the tester to set input parameters to some arbitrary constant values (which is the state-of-the-art in ISO/IEC 17825).

Related Works
There have been some works in this direction. In [2], the authors explain how it is possible to automatically fix detected timing and cache-timing vulnerabilities in order to reach a constant time implementation of the code-under-test through a series of transformations that operate on the basic blocks. This approach seems interesting; however, the sensitivity propagation method would inevitably catch false positives, for which fixes will be automatically deployed and will add unnecessary overhead to the code.
In [3], the authors present a tool, which they call SLEAK, whose goal is to automate the analysis against side-channel attack (SCA) vulnerabilities of software implementations. They present a case study on a symmetric algorithm (AES) against vertical attacks. The paper, however, does not address the constant-time feature of the algorithm under test or how to deal with non-constant time implementations (which is challenging, e.g., for the implementations that use shuffling countermeasures). Besides, the presented approach is based on iterations that consider leakages related to each bit of the secret. This may decrease the performance of the evaluation.
In [4], the authors present a "DATA" framework, whose goal is to detect attacks that exploit cache, DRAM and branch predictions. Their approach consists of recording address access patterns in software with different inputs, and performing a differential analysis in order to find dependency on the secret.

Scope of This Paper
The goal of this paper is to present an extensive methodology to evaluate cryptographic software in front of horizontal and vertical attacks. Those are threats for software that is designed to conceal secrets. Notice that we aim to detect vulnerabilities in such a way that the implementation can be fixed. Therefore, we will be considering an iterative approach whereby the evaluation results allow one to fix the identified vulnerabilities. We are not interested in attacks, but rather in a methodology to either pinpoint issues or to prove that the software is free from flaws.

Assumptions
Firstly, we assume that the studied code is correct, i.e., that it contains no bugs. For instance, OpenSSL has several CVEs (common vulnerabilities and exposures), including buffer overflows, etc. Even post-quantum cryptography is prone to bugs, such as the underflow in the BIKE decapsulation algorithm (327 CVEs found in https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=openssl).
Secondly, we assume that the secrets to be protected are the inputs to the algorithm, and not the outputs. In some (rare) cases, the secrets are the outputs, which makes the analysis more complex (the tainting especially cannot be achieved). Examples are the key generation algorithms, or even the encapsulation algorithms, which yield a so-called shared secret.

Methodology
The search for vulnerability unfolds in three steps: 1. Partitioning of inputs into four classes: inputs from the user, algorithm constants, private keys and randomization. 2. Identification of horizontal leakages, which enables their correction. 3. Identification of vertical leakages, which enables their correction.

Input Partitioning
The inputs of any cryptographic algorithm can be classified into four categories, depending on whether they are public or private, and whether they are fixed or variable. The taxonomy is provided in Table 1. Let us notice that the randomness (or masks) can be part of the algorithm's specifications, such as in digital signatures (which shall not yield twice the same signature, even when signing twice the same message). However, the randomness can also be a means to implement the algorithm in an unpredictable way, so that the attacker fails to correlate a (secret dependent) model to internal values. Such randomness is also referred to as random masking or blinding, and is usually considered only a little effective against effective attacks (which are better off protected by balancing the operations). On the contrary, masking is the preferred technique to protect against vertical attacks, where intermediate values shall be protected.
Examples of parameters are provided in Table 2.

Description of the Leak
Horizontal leakage consists of temporal variations which can be monitored while the algorithm is running. The observation can be external, e.g., by monitoring the time or even the power profile. Alternatively, it can be internal, by checking whether a line of cache is required by the (victim) cryptographic program, which can be asserted by concurrently trying to access the same line of cache.
The time taken by the attacker process depends upon whether the victim is actually using it or not. Notice that cache-related attacks are preferably executed on a platform with an operating system, since the attacker can deploy, in parallel, one or several attacks to probe the shared cache (at least its timing behavior). However, this situation is not required. Indeed, the attacker can measure externally that the cryptographic code evicts itself (or not) while trying to load far (or close) data/code, relative to the current position.
There are two reasons for horizontal leakage: conditional code and conditional data access (read or write). Control-flow (resp. data-flow) leakage can be prevented by disabling the instruction (resp. data) cache. Indeed, without cache, there is no longer any observable hit/miss pattern in terms of time, nor is there the possibility for an attacker to flush lines of cache to test the time it takes to access any address.

Identification of the Leak
In a whitebox scenario, leaks are identified by a traversal of the source code abstract syntax tree (AST). The AST vertices are tainted: instructions are termed sensitive if they manipulate a sensitive variable s-that is, a variable which depends on any secret k.
The tainting algorithm analyzes only a dependence relationship, but can be refined to analyze values. For instance, when a sensitive variable is affecting a constant or a non-sensitive variable, then it is no longer sensitive. Some user-level annotations can also help, for instance, removing the sensitivity from a variable which is the output of a hash function, since there is no way to recover a preimage (computationally speaking). Still, such a sensitive variable equal to a hash value shall remain sensitive if the attack can be perpetrated only knowing the hash value (it is useless for the attacker to know the preimage). Such a situation occurs while analyzing HMAC (hashed message authentication code) functions [13].
Vulnerabilities are merely identified as the encounter of a sensitive variable s with: • A conditional instruction, such as if(s), or • A conditional indirection, such as tab[s].
In the context of blackbox analysis, the binary code is exercised under a debugger (GNU Debugger or GDB in our case) under constants p, n and m, but varying k. Then, the detection occurs as follows: • A conditional instruction is revealed by varying the instruction pointer; • A conditional indirection is revealed by varying the address in an indirect load or store operation.
We refer to this as the GDB methodology. Notice that this methodology is the same as that already employed by practitioners, using valgrind. In this methodology, instead of varying k, it is left uninitialized. This is possible in C language. By default, compilers might assign a zero value to k, but in general the actual initialization is undefined. Such code is not executed in the nominal environment, but is handed over to valgrind. The tool will tag specifically uninitialized variables, and will precisely report warning upon: whose condition is uninitialized. Thus, assuming that the only uninitialized variables in the code are k, namely, the secrets intentionally not set, the warnings reported by valgrind will exactly coincide with those emitted by the proposed GDB methodology.

AES
It is well-known that the vanilla AES exhibits both control-flow and data-flow leakages. Namely: • The xtime function contains an if(s) statement on the MSB (most significant bit) of the output of SubBytes. This leak is traditionally fixed by replacing the test with a Boolean selection; • The SubBytes look-up can be resolved by an exhaustive access.
Vulnerable code and repaired code (constant-time code) are demonstrated in Algorithms A1 and A2 (Appendix A).

RSA
Two variants of RSA are shown in Table 2: either with or without CRT (Chinese remainder theorem). We note that with more secret variables, more vulnerabilities are found.
The initial list of vulnerabilities is fairly large, as shown in Figure 2 (using a representation as that already introduced in [14]). The leakage graph in the figure reads as follows. The entry point is the function represented at the top of the tree. Internal sub-function calls are indicated as ovals below it. The annotations on the edges represent the propagation path of the master secret k to the sensitive variable s which triggers the non-constant timing issue. The rectangle boxes contain all the lines of code within one function which are vulnerable.  The static analysis tool allows one to detect lines of code that might leak horizontally which might otherwise be exploited by SPA or cache-timing attacks. Fixing those vulnerabilities results in a constant-time implementation.
In order to illustrate this, we take the example of an RSA signature in which the target security-sensitive function is the modular exponentiation. Tagging the exponent (respectively the base) uncovers potential vulnerabilities that induce non-constant time behavior of modular exponentiation that depends on the exponent (and respectively the base). Non-constant-time vulnerabilities that were revealed by the static analysis tool applied to sliding window modular exponentiation of mbedTLS for the exponent are illustrated in Figure 3 and are summarized as follows: • V1: Conditional branches that depend on the length of the exponent. The exponent length is used in order to compute the width of the window to be used in the computation. • V2: Conditional branch (while loop) that depends upon the length of the exponent. • V3: Conditional branches that depend upon the ith bit of the exponent in order to skip the leading zeros of the exponent (leading zeros do not have to be processed). This approach allows one to optimize the execution time of the modular exponentiation by simply skipping all MSB set to zero. • V4: Conditional branches that depend upon the ith bit of the exponent in order to slide the window and always start the window with a MSB set to one. This approach make available two optimizations: a first optimization in the execution time of the modular exponentiation, and a second optimization in the precomputation of the windows (since we always start with the MSB set to one, we only need to precompute half of the windows). • V5: Table access that depends upon the window value. • V6: Conditional branch in the processing of the remaining bits (out of window bits) in a square and multiply fashion. In order to make the mbedTLS implementation of modular exponentiation we address these vulnerabilities one by one (in the above order).
• Fix for V1 and V2: In order to make the modular exponentiation constant time, the solution would be to fix the length of the exponent. • fix for V3 and V4: The solution for V3 and V4 is to process the leading zeros and the sliding of the window, and process all windows in the same way. This results in dropping the mentioned optimizations (that makes the implementation non-constant time) in favor of a fixed exponent length and fixed window implementation. The costs of the fix are to spend time processing the windows that does not impact the final result and to precompute all windows (as a window value in this case will not necessarily begin with an MSB equal to one). • Fix for V5: In order to make the table access indistinguishable from an attacker, the solution would be to access all the elements and keep only the desired ones. This comes at a huge cost, as one has to access all precomputed windows before performing the multiplication. • Fix for V6: The square and multiply algorithm is used to deal with the remaining bits. From one SPA trace, an attacker can deduce the value of less than wsize bits of the exponent. Knowing only wsize of the exponent is not critical. However, if the attacker repeats the attack many times, he will collect a set of n equations that gives some information about the secret. The relations are of the form: where r i and k i are the unknown random value (of 224 bits) and the recovered exponent at the step where the remaining bits are processed. To our best knowledge, no algebraic attack has been published in this sense.
Not all those vulnerabilities are called the same amount of times when the code is executed dynamically. A count of all occurrences is represented in Figure 4, obtained by a concrete evaluation under a debugger. An analysis of the vulnerabilities has been conducted and the results are summarized in Table 3. Our selected repair methods are also indicated. They can be divided into two categories: 1. Automatable countermeasures, which apply a stereotyped strategy (here: Boolean selection); 2. Non-automatable countermeasures, which require an algorithmic change (contrast Algorithm A5 to Algorithm A6).
The asterisk (*) in Table 3 indicates that this is not our preferred option. Indeed, if the code still searches for the secret exponent MSB position, then, for subsequent vertical leakage analysis, traces are not aligned. Therefore, we opt to have the exponentiation be fixed/constant time. Remove carries, or protect them by Boolean selection, or process them in constant-time using assembly language instructions (*) and (**) indicate non-preferred options.
The asterisks pair (**) in Table 3 suggests that the test is removed. Indeed, it serves the purpose of the determination of the sign of the result, in the case where the input is negative. Now, in RSA, all the computations can be carried out on positive numbers; hence, the elimination of the test is harmless.
Instead, keeping the test would also have been fine, as our testbenches never call RSA on a negative message (=basis).
The vulnerabilities listed in Table 3 are classified as pertaining to "exponentiation" or "arithmetic". Big number computation is indeed structured as a stack, where exponentiation is built on top of some basic arithmetic operations. The leaks occurring in the exponentiation are the most straightforward to flags by the attacker (attacks including SPA, machine-learning, etc.). Attacks at the arithmetic level are more complex, and require a precise analysis of the underlying mathematics. Nonetheless, despite the leakage in the arithmetic code, it is only indirectly linked to the leakage of the secret exponent, of which some exploits are known, such as extra-reduction [18]. Still, it is an open problem to know whether the amount of carries in a multiplication allows one to recover information about the secret exponent.

Performance
In this section, we study the impacts of the non-constant timing vulnerabilities on the performances. Those are illustrated in Table 4. Table 4. Impacts of fixing vulnerabilities on the mbedtls modular exponentiation implementation measured with mean clock cycles (10,000 runs).

Implementation
Ref In Table 4 the impact on the execution time of the modular exponentiation is shown (in mean clock cycles). The impact was measured and compared on six different versions, each one implementing more protections than the others (F i refers to a version implementing fixes against the vulnerability V i ), except for F 1 (for which fixing the vulnerability V 1 allows one to skip a call to the function "mbedtls_mpi_bitlen" which results in a faster implementation). All subsequent versions of the modular exponentiation were more time consuming. This came as no surprise, as fixing some of the vulnerabilities (e.g., V 5 and V 6 ) requires useless access to some elements of a table (for which we observe the most impact).

Description of the Leak
Vertical leakages are side-channel attacks which attempt to collect information about the software code internal values. Such side-channels are typically power or electromagnetic traces. However, for the sake of whitebox analysis, they also consist of any execution trace which can be obtained by simulation.
Implicitly, they assume that traces are well aligned, in time, so that statistics about the values can be collected in a consistent manner. This assumption was fulfilled as, in the previous section, we exposed not only the vulnerabilities but also the means to plug them.

Identification of the Leak
We leverage the following two-step algorithm.
Example on RSA: On RSA, we got the following results: • Number of sensitive samples vs. T: • Leakage-following t-test or even improved t-test, as per [19].
The leakage is denoted as l v [t], where v is the leaking resource, and where t is the time index (1 ≤ t ≤ T). Some examples of leakage functions are depicted in Figure 5: • At the lowest possible level, namely, the hardware level, the quantum of information is the bit.
They are carried either by a memory element (such as a register) or by a logic gate (termed combinational resource). • At the software level, the information is represented by values in registers, which consists of fixed size arrays of bits (e.g., registers are typically named ax, bx, cx, dx, etc.). • At the concrete level, a leakage can only by captured by one probe, and consists of a real-valued signal, typically sampled by an ADC (analog-to-digital converter) such as an oscilloscope. For the sake of the leakage detection, one needs a correlation function (Corr), which correlates bits, words or real values (depending on the three situations represented in Figure 5). Notice that words can be exploded into bits; therefore, the first two analyses (termed pre-silicon) follow the same modus operandi.
Algorithm 1 operates in two steps: • Lines 4-6: Selection of sensitive samples. The remaining values depend on the key. After this "collapse", the points are all so-called "points-of-interest" (PoIs). As an example, in an AES, the computation and the update of the round counter shall be removed from the PoIs. • Lines 7-9: The leakage detection can operate globally on the remaining time samples. However, also, it is possible to perform the Corr tests for each selected sample t, which will allow for an attribution of the leakage (whose instruction is leaking, and how much).
Notice that those two steps could as well be executed in the reversed order: first, all non-masked samples are listed, and second, this list is narrowed down to samples which in addition depend on the key. On secure designs, this final list is empty. Otherwise, it gathers all leaking samples. Interestingly, the two steps consist of non-interference tests, as coined historically by Goguen and Meseguer [20]. Distinctive features are that this vertical leakage detection methodology allows one to get rid of the "constants" in specific tests (T-Test). In our methodology, Algorithm 1 does not need to select constants, so we evade this question.
The Corr function can be a t-test specifically (basic though efficient linear two-class metric). However, the formulation in Algorithm 1 also opens the opportunity to do detection in one go across all key-sensitive samples (belonging to the set SensitiveSamples), for instance, by leveraging machine learning, which is naturally proficient in handling vectorial datasets. Error: unmasked key-dependent leakage v at time t =⇒ flaw

Lab Evaluator View
Then in practice, we must do the analysis on real traces. Some questions regarding the ISO/IEC 17825 [21] were raised by [19]. The ISO/IEC 17825 [21], which is currently in the update phase-one of the reasons being that it encountered some questions raised by [19] aimed the security levels 3 and 4 of the standard-provides thorough guidelines to mitigate non-invasive attacks on cryptographic modules. The major concern raised in the challenger article was that the Test Vector Leakage Assessment (TVLA) guidelines provided, in the standard targets, leakage detection in full first-order form as the only required measure for testing against differential side channel attacks on symmetric key cryptosystems. Additionally, the α or the significance level or the false positives threshold in the ISO/IEC 17825 is defined to be 0.05 which is significantly higher than the original value implied in the TVLA design by Goodwill et al. as 0.00001 with t-value threshold being 4.5.
To gain some evidence for the proposed side channel evaluation methodology we performed a simple machine learning experiment based on binary classification on a trace set of SM4 encryption for both leakage and obfuscated implementations on software. The non-leakage dataset has been recorded with a fixed key and fixed plaintext for half of the recordings (which is 500,000) and the other half with fixed key and random plaintext (fix-versus-random). An auxiliary study based on the guidelines provided in [19], to perform correction to the significance criterion α in order to minimize the false positive rate for multiple tests in the TVLA based approach provided in ISO/IEC 17825 wherein the initial value of α was kept at 0.00001, was conducted, validating the proposed methodology. In the machine learning approach we tried to divide the non-leakage dataset into two sets, one with random plaintext and the other with fixed plaintext, and then perform supervised learning to classify the two sets. However, the ML classification approach does not provide sufficient accuracy, even with the full-size dataset because the calculated effect size in this trace set using the TVLA approach with Bonferroni correction is 0.0068, which falls under the "very small" category, as categorized in [22]. However, the t-test statistics give a maximum value of 6.2449 for one iteration over 100 iterations in the multiple Welch's t-test evaluation when Bonferroni correction is used over the significance criterion. Also, the recommendations enacted in that sense [19] take into consideration the attack setup (cf. ISO/IEC 20085-1 [1]).

Discussion
The existing CVEs regarding the implementation of mbedTLS target mostly the protocol stack and not the cryptographic implementation at the algorithm level. The uncovered vulnerabilities in this paper are not new in the sense that they existed earlier, but they have not been reported until now as CVEs or in any other format, to the best of the knowledge of the authors, and subsequently they have not been formally fixed in the recent releases of mbedTLS. Therefore, it gives more relevance to this work to be considered as presenting a new CVE, with a fix, for the algorithm-level implementation of mbedTLS RSA.

Conclusions and Perspectives
We have demonstrated a comprehensive flow for a cryptographic software evaluation, mostly using whitebox-but also applicable in the context of blackbox scenarios.
Such a method is needed to test for leakages among the numerous PQC algorithms, and doing so on an equal footing. Our approach enables this assessment.