Key Bit-Dependent Side-Channel Attacks on Protected Binary Scalar Multiplication †

: Binary scalar multiplication, which is the main operation of elliptic curve cryptography, is vulnerable to side-channel analysis. It is especially vulnerable to side-channel analysis using power consumption and electromagnetic emission patterns. Thus, various countermeasures have been reported. However, they focused on eliminating patterns of conditional branches, statistical characteristics according to intermediate values, or data inter-relationships. Even though secret scalar bits are directly loaded during the check phase, countermeasures for this phase have not been considered. Therefore, in this paper, we show that there is side-channel leakage associated with secret scalar bit values. We experimented with hardware and software implementations, and experiments were focused on the Montgomery–López–Dahab ladder algorithm protected by scalar randomization in hardware implementations. We show that we could extract secret key bits with a 100% success rate using a single trace. Moreover, our attack did not require sophisticated preprocessing and could defeat existing countermeasures using a single trace. We focused on the key bit identiﬁcation functions of mbedTLS and OpenSSL in software implementations. The success rate was over 94%, so brute-force attacks could still be able to recover the whole secret scalar bits. We propose a countermeasure and demonstrate experimentally that it can be effectively applied.


Introduction
The blockchain and fast identity online (FIDO), which are emerging as key technologies to lead the Fourth Industrial Revolution, authenticate users by using an elliptic-curve digital signature algorithm (ECDSA). However, scalar multiplication, which is the core operation of ECDSA, is vulnerable to side-channel analysis (SCA). SCAs were first proposed by Paul Kocher in 1996 [1]; they use the leakage consumed while cryptographic algorithms are performed on embedded systems. Various side-channel attacks against elliptic-curve cryptography (ECC) have been researched [2][3][4][5][6][7][8][9][10][11][12][13][14][15][16]. Among them, power analysis using power patterns consumed during algorithm operations is known as the most powerful. Electromagnetic analysis using emitted electromagnetic patterns is similar to power analysis, but there is a difference in useable side-channel information. Therefore, in this paper, we focus on power analysis.
As SCAs become more powerful, various countermeasures to resist them have been studied [17][18][19][20][21][22]. However, only countermeasures to eliminate patterns of data-dependent conditional branches, statistical characteristic according to intermediate values, or data inter-relationships have been studied. No countermeasure has been taken into account for the secure design of the key bit identification phase even though secret scalar bits are directly loaded during that phase. Since the secret scalar bit value is extracted and stored in the variable, the secret scalar can be exposed if the vulnerability is discovered.
Our Contributions. In this paper, we analyzed the power consumption (we also considered information leakage via electromagnetic emanation throughout this paper.) properties of the key bit identification phase and experimentally showed that attacks based on these properties can recover secret scalar bits. Our proposed attacks require only a single power consumption or electromagnetic trace. They also do not require any knowledge of in-out values; thus, they can defeat any combination of existing countermeasures. Two implementations (i.e., hardware and software) were targeted, and we could recover secret scalar bits by applying SPA-VI (SPA based on visual inspection) and a k-means clustering algorithm. Among various scalar multiplication algorithms, we focused on binary scalar multiplication algorithms. The first set of experiments is based on hardware implementation of the Montgomery-López-Dahab ladder algorithm protected by scalar randomization. Experimental results show that the secret scalar bits can be recovered with a 100% success rate using only single power consumption or electromagnetic trace. In the second set of experiments, on software implementation, we targeted algorithms composed using the key bit identification functions of mbedTLS and OpenSSL. Here, secret scalar bits could be recovered with over 94% success rate. If we attacked the power consumption trace using the leakage associated with referenced register addresses, the success rate was 100%. We propose two kinds of countermeasures, one each for hardware and software implementations. Their effectiveness is experimentally demonstrated.
Extension. This paper is an extended version of our paper published in ISPEC 2017 [23]. In that paper, we showed key bit-dependent attack results using only a single power consumption trace. However, in this paper, we show new key bit-dependent attack results using a single electromagnetic trace and a low-pass filter. Thus, we show four experimental results using a power-consumption trace, a power-consumption trace passed through a low-pass filter, electromagnetic trace, and electromagnetic trace passed through a low-pass filter. Measuring electromagnetic traces is not an easy task because it very much depends on the angle and position of the probe. Moreover, in the case of hardware implementation, our latest results using electromagnetic traces have a higher success rate than previous results.
Organization. The rest of this paper is organized as follows. In Section 2, we describe SCAs in scalar multiplication algorithms. In Section 3, we regulate the leakage properties of the attack targets; in Section 4, we establish the attack framework. Experimental results are described in Section 5. We discuss countermeasures in Section 6, and conclusions are presented in Section 7.

Simple-Power Analysis
Simple-power analysis (SPA) is a method of directly analyzing a secret scalar using only one trace or a few traces collected during cryptographic operations [9]. Because cryptographic algorithms have different power-consumption patterns according to the instructions of the processor, the secret scalar or instantaneous command could be analyzed from these patterns. For instance, in the case of a binary scalar multiplication algorithm that performs a point-doubling operation at all times, and performs a point addition operation only when the secret key bit value is 1, the secret key can be found if the point-doubling and point-addition operations have different power-consumption patterns. That is, as per Figure 1a, this irregular sequence of instructions according to the secret scalar bit (i.e., the data-dependent conditional branch) leads to a serious security problem.

Differential Power Analysis (DPA)
DPA is a statistical analysis method that analyzes multiple power-consumption traces to find the secret scalar [9]. Typically, DPA is based on the fact that power consumption depends on data values being manipulated. To perform DPA, input or output values of cryptographic algorithms have to be known. Similarly, there is an address-bit DPA based on the fact that power consumption depends on the address value of the register that loads or stores data during the operation. Thus, even if an SPA countermeasure [19,21,22], which has a regular power-consumption sequence, as shown in Figure 1b, is applied, it is vulnerable to DPA. To cope with this, randomization techniques that eliminate association between all possible intermediate values and power consumption are generally used [17,18,20].

Sophisticated Power Analysis
SPA-and DPA-resistant countermeasures can be defeated by sophisticated attacks, such as a template attack (TA) [6,10,12] or collision attack (CA) [7,11,13]. A TA characterizes power-consumption traces by a multivariate normal distribution to build templates, and matches power-consumption leakage to the templates to find a secret scalar value. A CA is a kind of higher-order DPA and is an attack based on the inter-relationships among intermediate data (i.e., collisions of two intermediate values). So far, no theoretically perfect countermeasures against TAs and CAs have been presented. However, there is a disadvantage, in that they require precise preprocessing, such as decapsulation, localization, and a multiprobe to obtain a power-consumption trace having a high signal-to-noise ratio [6,7,11,13]. Decapsulation in particular requires to physically modify the target devices, and numerous traces are required to build templates.
To thwart previous attacks, various countermeasures to eliminate patterns of data-dependent conditional branches, statistical characteristic according to intermediate values, or data inter-relationships have been studied. However, no countermeasure has been taken into account for the secure design of the key bit identification phase, although secret scalar bits are directly loaded during that phase. Since the secret scalar bit value is extracted and stored in the variable, the secret scalar can be exposed if the vulnerability is discovered. Thus, in this paper, we verify that this vulnerability is sufficient to find a secret scalar.

Key Bit Identification Phase
Elliptic-curve scalar multiplication is a method for computing dP, where d is a secret scalar and P is a point on an elliptic curve. It is an elementary operation of ECC, so it has been used in numerous PKCs. It basically consists of iterative operations determined according to the i-th bit d i value of the secret scalar d, where d is a λ-bit scalar, so d = (d λ−1 , d λ−2 , · · · , d 1 , d 0 ) 2 and 0 ≤ i < λ [19,21,22,24]. For instance, in the algorithms shown in Figure 2, while performing Steps 2 to 5, addresses of registers R x (x = 0 or 1) to be referenced are determined by the d i value.
Thus, at the beginning of the i-th iterative operation, the i-th secret scalar bit value d i is extracted from a λ-bit scalar string and stored in a variable. This phase exists in almost all elliptic-curve scalar multiplication algorithms because they are composed of iterative operations based on the value of d i . At this phase, secret scalar bits, d i , are extracted at the beginning of each iterative operations. We define this step as the key bit identification phase.

Key Bit-Dependent Properties
Binary scalar multiplication consists of iterative operations determined according to the i-th bit d i value of secret scalar d ( Figure 2). Therefore, there exists a key bit identification phase in which the i-th scalar bit value is extracted from a λ-bit scalar string d = (d λ−1 , d λ−2 , · · · , d 1 , d 0 ) 2 and stored in a d i variable at the beginning of each i-th iteration. Thus, power consumption associated with the d i value occurs. We can categorize these properties according to hamming distance (HD) and hamming weight (HW), mainly used as power-consumption models as follows.

Left to Right
Input : P is a point on an elliptic curve,

Property 1.
In hardware implementations, power consumption in the key bit identification phase is simultaneously affected by the hamming distance between two consecutive bits d i+1 and d i , i.e., d i+1 ⊕ d i (0 ≤ i < λ − 1). Thus, if two consecutive bits are the same, i.e., d i+1 = d i , power consumption related to d i+1 ⊕ d i = 0 occurs. Otherwise, power consumption related to d i+1 ⊕ d i = 1 occurs.

Property 2.
In software implementations, power consumption in the key bit identification phase is affected by the hamming weight of d i (0 ≤ i ≤ λ − 1). Thus, if the value of i-th secret bit is 0, i.e., d i = 0, then power consumption is related to 0. Otherwise, power consumption related to 1 occurs.

Key Bit-Dependent Properties of SPA-Resistant Regular Algorithms
The binary scalar multiplication algorithm (Reference [25], Algorithm 3.26 and 3.27) can be easily broken by SPA. Therefore, various SPA-resistant regular algorithms, as shown in Figure 2, have been used. In regular algorithms, the referred register addresses RegAddr d i differ depending on the d i value, and these influence power consumption. Since hardware and software operating structures are different from each other, the effect on power consumption especially differs then.
In hardware implementations, operations are executed in parallel. Thus, at the same time as the secret scalar bits d i are extracted at the beginning of each iterative operation, register addresses RegAddr d i to be referenced are also determined. In accordance with this characteristic, power consumption when the secret scalar bit d i is determined is also influenced by the HD between the register addresses used in two successive loops. In software implementations, differing from hardware implementations, operations are executed sequentially. Hence, register addresses RegAddr d i to be referenced do not affect power consumption at the same time as the secret scalar value d i . In the following, we describe additional power-consumption properties of SPA-resistant regular algorithms. Note that RegAddr 0 is different from RegAddr 1 .

Property 3.
In hardware implementations, power consumption in the key bit identification phase is simultaneously affected by: (a) the hamming distance between two consecutive bits d i+1 and d i , i.e., d i+1 We can classify power-consumption traces into two groups, G 0 and G 1 , using the properties. G 0 includes power-consumption traces when leakage is zero, and G 1 includes traces when leakage is nonzero. Once the traces are classified into two groups, we can recover the respective bit d i , since the most significant bit is always 1. We define a study exploiting Property 1 and 2 as Case Study 1 (Figures 3a,c,

Key Bit-Dependent Attack Framework
In this paper, we consider binary scalar multiplication algorithms that are resistant against SPA and DPA. In particular, we targeted algorithms based on regular algorithms protected by intermediate data randomization. Therefore, we suppose that an attacker is obliged to use a single trace rather than numerous traces. In addition, we assumed that the attacker could distinguish the iterative structure in the traces of regular algorithms. We categorized the attack framework in four steps as follows. Note that we did not consider side-channel atomicity algorithms that are SPA-resistant since it is impossible to distinguish the starting point of iterative loop operations. •

Preprocessing
The attacker can divide trace T into λ subtraces, O i , corresponding to each iteration (0 ≤ i ≤ λ − 1). As shown in Figure 6, trace T is described as a series of λ sub-races as since λ iterative operations are performed when the secret scalar is λ-bit, we divide trace T into λ subtraces and align them. •

Select Points of Interest (PoIs)
If the attacker can use the same device as the target and acquire a trace with a known key, it is easy to find PoIs. The attacker can calculate the sum of squared pairwise t-differences (SOST) [26] of the subtraces classified based on the properties described in Sections 3.1.1 and 3.1.2. Then, the PoIs are the points that have high SOST values. SOST is calculated as follows: where m denotes the mean, σ is standard deviation, and n is the number of elements. If it is not possible to use the same device, the attacker must know how the target algorithm is implemented to find PoIs. Moreover, the key bit identification phase section should be recognized in the trace. In general, since the d i value must be decided in advance before each loop operation, the target phase is positioned near the beginning of each subtrace O i . We represent p i as PoIs of each subtrace O i (0 ≤ i ≤ λ − 1).

Classify into Two Groups and Extract Secret Scalar Bits
The attacker can separate p i into two groups, G 0 and G 1 , applying SPA-VI or a clustering algorithm (e.g., k-means, fuzzy k-means, or EM algorithm [27,28]). Because the most significant bit is always 1, the attacker can configure d λ−1 as 1 and find the respective scalar bit d i based on the power model and properties described in Sections 3.1.1 and 3.1.2. For instance, when power consumption complies with the HD model, the attacker can recover secret scalar bits d i as follows. It is possible to assume that the group that contains p λ−1 indicates that leakage is nonzero, if d i is at first initialized as zero. Consequently, if p i is contained in the same group that contains p λ−1 , d i is one; otherwise, d i is zero (0 ≤ i < λ − 1). Similarly, when power consumption complies with the HW model, the group that includes p λ−1 indicates that leakage is non-zero, and the other group indicate that leakage is zero. Consequently, if p i is contained in the same group that contains p λ−1 , then d i is one; otherwise, d i is zero (0 ≤ i < λ − 1).

Experiment Environments
The first experimental platform is VHDL implementation on a SASEBO-GII FPGA board, as shown in Figure 7. We measured traces using a Teledyne Lecroy HDO6104A oscilloscope at a sampling rate of 2.5 GS/s. Electromagnetic traces were recorded using a Langer LF-R 400. Additionally, we used Mini Circuit BLP (low-pass filter) to increase the signal-to-noise ratio. The second experimental platform was software implementation on an Atmel AVR XMEGA 128D4 microcontroller equipped with a CW-Lite XMEGA target board, as shown in Figure 7. We measured power-consumption traces using the CW-Lite main board at a sampling rate of 29.5 MS/s. Electromagnetic traces were recorded using a Teledyne Lecroy HDO6104A oscilloscope at a sampling rate of 2.5 GS/s, using a Langer LF-U 5 and Mini Circuit BLP(low-pass filter) to increase the signal-to-noise ratio.

Experimental Results
In this section, we demonstrate that a key bit-dependent attack could extract secret scalar bit using a single trace.

Key Bit-Dependent Power/Electromagnetic Attack on Hardware Implementation
Our target binary scalar multiplication algorithm was the Montgomery-López-Dahab ladder algorithm [24] protected by scalar randomization [18]. Therefore, the attacker is restricted to using a single trace. To attack algorithms operating on the first experimental platform, we focused on Properties 1 and 3, described in Sections 3.1.1 and 3.1.2, respectively. However, in hardware implementations, operations are executed in parallel. Thus, at the same time as secret scalar bits d i are extracted at the beginning of each iterative operation, the addresses of registers R x (x = 0 or 1) to be referenced are also determined. Thus, there is no SPA-resistant regular algorithm that only satisfies Property 1. Our target was an SPA-resistant regular algorithm. Hence, we modified the code as shown in Figure 8a to identify how much information was present according to Property 1. The code as shown in Figure 8b is a general implementation that satisfies Property 3.
Version October 19, 2018 submitted to Journal Not Specified 8 of 21 • Preprocessing Operations for the most significant bit d λ−1 do not exist in the Montgomery-López-Dahab ladder algorithm, as shown in Algorithm A1 and Appendix A. In accordance, trace T is composed of λ-1 subtraces for a λ-bit scalar, so we divided trace T into λ − 1 subtraces O i , and aligned them (0 ≤ i ≤ λ − 2). Figure 9 (top) shows one of the subtraces, consisting of six finite-field multiplications, captured from the first experimental platform. •

Select Points of Interest
The key bit identification phase is operated on the second clock cycle of each subtrace of the target algorithm. We also confirmed that points of the second clock cycle of each subtrace are PoIs p i , since the SOST value is the greatest on the points of the second clock cycle, as shown in Figure 9 (bottom) (0 ≤ i ≤ λ − 2). When we calculated the SOST value, we classified PoIs of subtraces p i into two groups according to Property 1 (or 3). •

Classify into Two Groups and Extract Secret Scalar Bits
(1) When we targeted Case Study 1 and exploited the power-consumption trace, it was impossible to clearly split them into two groups through SPA-VI. Since two distributions are overlapped as shown in Figure 3a, we could extract secret scalar bits d i with a 96.75% success rate when we classified p i into two groups based on the differences from an average trace. We could also extract secret scalar bits d i with a 96.74% success rate when we applied the k-means clustering algorithm to classify p i into two groups (i.e., 8 errors). Consequently, a brute-force attack to recover the entire secret scalar could be viable, because the error rate is sufficiently small. Therefore, it was confirmed that the key bit-dependent leakage based on Property 1 was sufficiently large to recover the secret scalar bits.
(2) By using the low-pass filter to increase the signal-to-noise ratio, the success rate slightly improved to 97.17% when we applied the k-means clustering algorithm. It could not be classified into two groups through SPA-VI because distribution overlapped, as shown in Figure 3c.  Table 1; thus, we could acquire all the secret scalar bits d i . From this result, we noticed that changing a referring register leaks more significant information than changing the secret scalar bits. Moreover, we demonstrated that we could recover whole secret scalar bits based on Property 3 using only one power-consumption trace. We define attacks such as in Steps (1) to (4) as key bit-dependent power attacks (KBPA).
(5) Figure 4a shows the PoIs chosen from electromagnetic subtraces when we targeted Case Study 1. Although it was not easy to clearly divide them into two groups via SPA-VI, we could classify p i into two groups with a 100% success rate using the differences from an average trace. Accordingly, the classification success rate based on k-means clustering algorithm was also 100%; thus, we could find the entire secret scalar bits d i based on Property 1.
(6) Moreover, if we could use the low-pass filter to increase the signal-to-noise ratio, we could extract whole secret scalar bits through SPA-VI as shown in Figure 4c. (7) Figure 4b shows the PoIs chosen from electromagnetic subtraces when we targeted Case Study 2.
Unlike the result of (3, 4), it was not easy to clearly divide them into two groups via SPA-VI. However, it was possible to divide p i into two groups based on the differences from an average trace. Thus, the classification success rate based on k-means clustering algorithm was also 100%; therefore, we could find all the secret scalar bits d i based on Property 3 (0 ≤ i ≤ λ − 1).

(8)
Moreover, if we could use the low-pass filter to increase the signal-to-noise ratio, we could extract whole secret scalar bits through SPA-VI as shown in Figure 4d.
To sum up, we also showed that we could recover all secret scalar bits using only one electromagnetic trace. Compared to the key bit-dependent power attack, the secret scalar bits could be recovered with a 100% success rate based on Property 1. We define attacks such as in Steps (5) to (8) as key bit-dependent electromagnetic attacks (KBEA).  diff: clustering based on the difference from an average trace; k-means: clustering using the k-means clustering algorithm.

Key Bit-Dependent Power/Electromagnetic Attack on Software Implementation
In this section, we focus on the key bit identification function of mbedTLS (polarSSL) as shown in Figure 10, which is an extensively used embedded transmission security TLS/SSL public encryption library. It should be noted that to capture an entire binary scalar multiplication trace using the CW-Lite main board is impossible; thus, we used the modified algorithm shown in Figure 11 based on the function in Figure 10 to identify how much information exists. In Appendix B, we describe the key bit identification function of OpenSSL as shown in Figure A1. We also show the experimental results of when we used it. For attack algorithms operating on the second experimental platform, we focused on Properties 2 and 4 described in Sections 3.1.1 and 3.1.2, respectively.

Select Points of Interest
In software implementation, operations are sequentially executed. Hence, differing from hardware implementations, we targeted two positions. The first came immediately after the & 0x01 operation was performed, as shown in Figure 10. The second was where the register was referred to. The register addresses to be referenced were determined according to secret scalar bit d i , so there was information associated with d i . Thus, we targeted where the register LOAD operation was performed for a long integer operation. Points with high SOST values are located where the key bit identification function is performed (see Figure 12). The second target points were located behind the key bit identification function. Here, we chose points with high SOST values as PoIs.
When we calculated SOST values, we classified PoIs of subtraces p i into two groups according to Property 2 (or 4).

• Classify into Two Groups and Extract Secret Scalar Bits
(1) When we targeted Case Study 1 and exploited the power-consumption trace, we could not clearly split it into two groups via SPA-VI, because the two distributions overlapped as shown in Figure 5a, so we applied the k-means clustering algorithm to classify p i into two groups . Approximately 97.60% of the secret scalar bits d i could be extracted, as shown in Table 2. There are misclassified bits, but the number of error bits is sufficiently small. Hence, it is possible to recover whole secret scalar bits with a brute-force attack. Consequently, we confirmed that the key bit-dependent leakage based on Property 2 was sufficiently large to recover the secret scalar bits.
(2) By using the low-pass filter to increase the signal-to-noise ratio, success rate was slightly improved to 98.24% when we applied the k-means clustering algorithm. It could not be classified into two groups through SPA-VI because the distribution overlapped, as shown in Figure 5c.  (3, 4) We investigated leakage associated with referred register addresses determined according to d i in Case Study 2. When we exploited the power-consumption trace, subtraces p i could be divided into two groups through SPA-VI with a 100% success rate, see Figure 5b,d.
(5) Figure 5e shows the PoIs chosen from electromagnetic subtraces when we targeted Case Study 1. They could not be clearly divided into two groups via SPA-VI; thus, the k-means clustering algorithm was needed. Secret scalar bits recovery rate was 94.17%, as shown in Table 2. This was slightly higher (0.17%) than the success rate when we divided p i into two groups based on the differences from an average trace. (6) Unlike the result of (2), PoIs could not be perfectly split into two groups by SPA-VI, as shown in Figure 5f. Thus, we applied the k-means clustering algorithm and we could find approximately 95.96% of the secret scalar bits. This was slightly better than the 93.72% success rate when we divided p i into two groups based on differences from an average trace. Here, we demonstrated that single-trace KBPA and KBEA can also defeat binary scalar multiplication algorithms that are resistant against SPA and DPA in software implementations.  diff: clustering based on the difference from average trace; k-means: clustering using the k-means clustering algorithm.

Countermeasures
We have shown that single-trace KBPA and KBEA could recover whole secret scalar bits. Here, we discuss countermeasures against KBPA and KBEA. We propose two kinds of countermeasures, one each for hardware and software implementations.

Countermeasure for Hardware Implementations
For hardware implementations, we suggest random initialization that initializes the d i variable with random bit before each key bit identification phase, as per Algorithm 1. We verified that the leakage based on Properties 1 and 3 could be efficiently eliminated. The result of the classification of p i is shown in Figures 13a,b, and 14 (top). The success rate of the attack was approximately 50%, and it was similar to randomly guessing the secret scalar bits with a probability of 1/2.

Algorithm 1: ECC Scalar Multiplication (initialized by random bit)
Input : P is a point on an elliptic curve, a λ-bit d i ← random bit 9: end for 10: Return R 0

Countermeasure for Software Implementations
As a countermeasure for software implementations, we propose bit masking to remove the leakage of Properties 2 and 4, as Algorithm 2. This method is a type of address-bit randomization [29,30]. However, there is an important difference, in that bit masking must be performed before loop operation begins, which is shown in Step 2 of Algorithm 2. The result of classification of p i is shown in Figures 13c,d and 14 (bottom). The success rate of the attack is also approximately 50%, and it is similar to randomly guessing the secret scalar bits with a probability of 1/2.

Conclusions
In this paper, we suggested attacks using the leakage that occurs on the key bit identification phase and demonstrated that such attacks could extract secret scalar bits using a single trace without profiling. The attacks could be done not only by power consumption, but also by electromagnetic trace. Compared with previous attacks that required sophisticated preprocessing and multitraces, this represents a significant advantage. There is no need to apply preprocessing, and we could recover the entire secret scalar bits through SPA-VI. Since the proposed KBPA and KBEA attacks could defeat existing countermeasures, this leads to a very robust attack model. Although we focused on ECC binary scalar multiplication algorithms, our proposed attacks are also applicable to RSA binary modular exponentiation algorithms. We proposed countermeasures and experimentally verified that the leakage was removed.

Patents
This section is not mandatory, but may be added if there are patents resulting from the work reported in this manuscript.