An Evaluation of Power Side-Channel Resistance for RNS Secure Logic

In this paper, residue number system (RNS) based logic is proposed as a protection against power side-channel attacks. Every input to RNS logic is encrypted as a share of the original input in the residue domain through modulus values. Most existing countermeasures enhance side-channel privacy by making the power trace statistically indistinguishable. The proposed RNS logic provides cryptographic privacy that also offers side-channel resistance. It also offers side-channel privacy by mapping different input bit values into similar bit encodings for the shares. This property is also captured as a symmetry measure in the paper. This side-channel resistance of the RNS secure logic is evaluated analytically and empirically. An analytical metric is developed to capture the conditional probability of the input bit state given the residue state visible to the adversary, but derived from hidden cryptographic secrets. The transition probability, normalized variance, and Kullback–Leibler (KL) divergence serve as side-channel metrics. The results show that our RNS secure logic provides better resistance against high-order side-channel attacks both in terms of power distribution uniformity and success rates of machine learning (ML)-based power side-channel attacks. We performed SPICE simulations on Montgomery modular multiplication and Arithmetic-style modular multiplication using the FreePDK 45 nm Technology library. The simulation results show that the side-channel security metrics using KL divergence are 0.0204 for Montgomery and 0.0020 for the Arithmetic-style implementation. This means that Arithmetic-style implementation has better side-channel resistance than the Montgomery implementation. In addition, we evaluated the security of the AES encryption with RNS secure logic on a Spartan-6 FPGA Board. Experimental results show that the protected AES circuit offers 79% higher resistance compared to the unprotected AES circuit.


Introduction
Side-channel attacks (SCA) are hardware cryptanalytic techniques used to reveal a secret data value, such as a key embedded into an algorithm by exploiting the implementation vulnerabilities. If two different values for a key or a subkey result in different measurements of a physical attribute, such as power, timing, electromagnetic radiation, or even acoustics, the privacy is lost through this physical leakage.
We differentiate power analysis attacks into two broad classes. When a secret is revealed through a strong correlation between power samples and the secret data value, we consider it to be a loss of side-channel privacy. If the secret is encrypted with a cryptographic technique and is revealed through traditional cryptanalytic techniques, it is labeled as a violation of cryptographic privacy. Most of the known techniques target side-channel privacy. This paper targets both side-channel and cryptographic privacy. Residue number systems (RNS) allow one to create multiple shares of a secret. Each of these shares can be computed independently. The resulting shares can be combined into a single result. This is akin to the traditional multiparty computation. RNS enables one form of multiparty computation. Any homomorphic multiparty computation technique can be used within analysis uses higher-order statistical moments to recover the secret value of a cryptographic algorithm [13]. Most of the existing countermeasures are still vulnerable to such higherorder power analysis attacks for two reasons. First, the leakage of intermediate values is distributed over shares, which is the primary SCA mitigation technique rather than masking the share values. Further, these shares utilize a linear function to reconstruct the original data. Hence, it is relatively easy for an adversary to model the leakage of the shared secret implementation. Second, if the shares are processed together with common Vdd and ground pins, the combined power consumption leads to leakage from such a susceptible implementation on actual intermediate values. Further, if the secure implementation is still in Boolean space, then the adversary can model the leakage with a hypothetical secret value, along with some additional mask bits to correlate with the target implementation leakage.
Logic design styles to make power consumption independent of data values with dual rail logic include Sense Amplifier Based Logic (SABL) [14,15], Wave Dynamic Differential Logic (WDDL) [16]. Similarly, there are other techniques such as asynchronous logic design [17], clock randomization [18], and power distribution design through decoupling unit [19] to hide the data-dependent leakage within the hardware. These design styles offer power side-channel privacy, but not cryptographic privacy. The data is in an open, nonencrypted form. The more robust countermeasure techniques, such as t-private scheme, provide both power side-channel privacy and limited cryptographic privacy. A cryptographic adversary needs to observe t + 1 shares in order to decrypt original values. Out of practical considerations, the value of t cannot be very large. This opens up space for a secure design style that is both power side-channel private and cryptographically private within the design space for secure system implementations. Our proposed RNS secure logic fills this need. The residue number system is a well studied number theory system, utilized in the field of computer arithmetic [20], and digital signal processing [21,22] applications to achieve performance upgrades through parallel computation.

Proposed Approach
In this paper, we discuss a new secure design style based on [23,24]. Our approach is to transform a bit in the Boolean domain into multiple encrypted shares derived from residues in a residue number system. These residue shares exhibit homomorphism for the bit-wise operations such as AND and XOR. Our proposed scheme is well suited for a multi-core platform, where an application can exploit parallelism in security-related applications. Each encrypted share can be processed in a separate core independently. In this work, we present three different secure design styles with varying characteristics based on adversarial complexity and resource overhead. There are many variations to the base schema for residue generation depending on the adversary model and the desired resource overhead. We explore this schema space to come up with three possible secure design styles with varying characteristics. We evaluate the resistance of RNS secure circuits against various side-channel adversary models. Further, we implement the RNS secure logic and report its power side-channel resistance through power uniformity based metrics and success rates of power side-channel attacks. The side-channel power analysis attacks typically deploy machine learning (ML) classifiers such as linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), and naive Bayes (NB). RNS secure logic exhibits the lowest success rates for machine learning-based attacks compared to t-private logic.
The switching uniformity can be evaluated either analytically or through a distance metric such as KL divergence [25]. A natural conclusion seems to be that as switching gets more uniform or KL divergence of power distribution over various values for the secret reduces, the success rate for power side-channel attacks should go down. However, we have observed that even with an increase in KL divergence for power, the power sidechannel success rate has gone down. We speculate that cryptographic privacy, even if not directly addressing power uniformity, thwarts power side-channel attacks. An interesting trade-off between power side-channel privacy and cryptographic privacy to minimize the success rate of a power side-channel adversary exists, which we explored and discussed in [26] (the cited paper is the conference version of current work with preliminary results  which was published in IEEE 31st International Conference on VLSI Design and 17th  International Conference on Embedded Systems, VLSID 2018).
Additionally, we develop an analytical metric for the RNS encoder to quantify the conditional probability of the input bit state given the residue state. We analyze the RNS secure circuit with respect to switching uniformity and propose some enhancement techniques to achieve better uniformity. Further, we investigate the implementation of RNS logic with public and private moduli. The side-channel resistance of these implementations is studied. We also evaluate the security of our implementation through real power traces using specialized side-channel board. The result confirms that it provides good security against higher-order power side-channel attacks.

Motivation
RNS logic supports distributed computation over multiple shares while simultaneously retaining cryptographic and side-channel privacy. It enhances side-channel security where the computation pertaining to a secret is performed by multiple devices or sensors. Sensor and IoT arrays to monitor or control an environment can benefit from the RNS logic. A computation C (S, x) involving a secret S and a parameter x can be performed on multiple (k) sensors or devices as C(S i , x) with the share S i for 0 < i ≤ k. This hardens the computation C against side-channel leakage.

Paper Organization
This paper is organized as follows. In Section 2, the basic principles of the RNS secure circuit are described. Section 3 discusses the resilience characteristics of proposed techniques with respect to switching uniformity and symmetry property. The adversary models and hybrid schemes for better side-channel resistance are discussed in Section 4. Section 5 presents the practical implementation of different circuits and their results. Finally, Section 6 summarizes and concludes the paper.

Basic Principles
In this section, some basic principles for our approach are discussed. Our proposed scheme maps from the message space to the residue code space. Message space consists of binary values ("0" or "1") and corresponding bit-level operations/gates. Residue code space consists of residue values represented with l-bits. These residues use modulo operations such as modular addition and modular multiplication.
In message space, we use and & to denote the logical addition (XOR) and multiplication (AND) operations over Z 2 . Similarly, we denote + for addition and · for multiplication in residue space over Z n . A q bit vector m = (x 1 , x 2 , x 3 , . . . , x q ) denoted by x represents data in message space and its equivalent residue code is represented by (X 1,m , X 2,m , X 3,m , . . . , X q,m ) denoted as X.
RNS secure logic is based on a combination of homomorphic encryption and residue number system. We use homomorphic encryption to create encrypted shares. The binary input values are transformed from message space to residue code space. Additionally, the homomorphism preserves the mathematical integrity of binary message space in the residual value space. An input encoding stage, which need not be on the chip implemented with the RNS secure logic shares, performs the binary message space to residue space conversion. Any computing host can perform this conversion and transmit the residue shares over any link including a network. The binary gates have equivalent modulo operations which are applied over the encrypted shares. Once the results in residue space are computed, they are decoded into the binary space. Once again, decoding need not occur in the secure chip. The residue shares can be transmitted back to a client over a link, where the decoding can be performed. We start by describing the construction of the RNS secret sharing scheme. Our approach comprises three stages, an input encoder, an RNS circuit, and an output decoder.
Input encoder: The homomorphic secret sharing scheme encodes the input message using a function called Input encoder (Enc ). The encoder Enc maps each binary input x to an l-bit residue code denoted by X m i , where m i is the chosen modulus. Modulus choice has an important role in recovering the output back in binary value from residue code space which will be described in the output decoder function. The variable l defines the size of residue space. We first choose an l-bit random value r x and modulus m i from the relatively prime moduli set M = {m 1 , m 2 , m 3 , . . . , m n } and m n is equals to 2 l − 1. The encoding function is modulo addition of random value r x with binary input x over m i and the mathematical representation is given in Equation (1).
The security of the RNS secret shares fully depends on the random value of r x and modulus m i . Note that without the random value r x , the input binary bit x is exposed in the residue domain. The modulus m i is typically chosen per chip implementation, whereas the random values r x are assumed to be refreshed for every instantiation. They are generated by a statistically tested random number generator.

Switching Uniformity
Note that the two main goals of a secure logic family are (1) uniform switching or power distribution so that it is not data dependent, and (2) remove any correlation between intermediate values. Note that the t-private logic achieves both these goals. Through an induction-based proof, the inductive hypothesis establishes that input encoder output has these properties. 1-prob(x) denotes the probability that node x state is 1. 1-prob(x) is a fairly good indicator of its switching probability: 2 * 1-prob(x) * (1− 1-prob(x)). Note that 1-prob(r i ) of a random bit is 0.5, a random bit holds state 1 with probability 1/2. Additionally, note that when an input bit x with arbitrary 0 ≤ 1-prob(x) ≤ 1 is exclusive-ORed with a random bit r i , 1-prob(x ⊕ r i ) = 0.5. These two facts establish that all of the (t + 1) shares output by an encoder have 1-prob equal to 0.5. The entropy of any two random bits r i and r j is 2 bits, since they are not correlated (distribution of states 00, 01, 10, 11 is uniform). By this token, the entropy of t random bits is t, establishing the other property. For the inductive hypothesis, consider the t-private gate for AND (&). The two incoming vectors X = (x 0 , x 1 , . . . , x t ) and Y = (y 0 , y 1 , . . . , y t ) have these two properties by inductive hypothesis. If each row of shift and add multiplication of X and Y forms a share, the 1-prob(x i y j ⊕ x i y k ) is 0.5 given that each share has 1-prob equal to 0.5. However, there is a correlation between rows reducing their entropy. In fact, all the shares of X are revealed within a row, along with one share of Y-y i thereby loses cryptographic privacy. By using an additional random bit per row, the entropy is restored to t + 1. We aim to show similar analytical uniform switching for secure RNS logic style. Proof. Let P denote the plaintext in the binary space, X denote the encrypted share in the residue space. The residue space M X = {0, 1, . . . ,m i − 1}. X = Enc r x (x), where the random value r x is uniformly distributed over M X .
∀ r x and X M X where α = |M X |. To prove this statement, Thus, the input encoder function maps the binary input without any bias on the residue code space. For a given message, the output of input encoder is equiprobable for the chosen modulus m i . The same encoder function can be used to generate different shares by choosing different moduli m i with the same random value r x .
RNS circuit: Our goal is to transform the binary operators, such as AND and XOR, into equivalent residue operators using the composition of modulo multiplication and modulo addition in order to perform the operation securely. We constructed an RNS circuit that computes the residue space equivalent of a Boolean AND as shown in Figure 1. The size of this circuit is independent of the number of shares. It depends only on the modulus size (l).
x y Consider an AND gate in the binary space with inputs x, y and output z. i.e., z = x & y. In our model, the Boolean AND operation is performed with modular multiplication of X m i and Y m i over moduli m i .
The perfect privacy of our proposed scheme requires that the intermediate values or the output values be uniformly distributed with respect to the moduli m i . This leads to uniform switching distribution as well with 1-prob of each of the output bits of a residue output equal to 0.5.

Theorem 2.
[Uniformity] Let f be any modulo function over m i with inputs X m i , Y m i and output Z m i . Then, the output Z m i is uniformly distributed over residue code space, given that inputs are generated by an Input encoder (Enc).
Output Decoder: Each output share is computed independently for the given input vector for each modulus m i . The output residue code Z is defined as linear congruence to the output of binary value z with respect to modulus m i . To compute the resultant binary output bit, we apply Chinese remainder theorem (CRT) on the output shares obtained from the RNS circuit.
Theorem 3 (Chinese Remainder Theorem). Suppose that ⊂ M, where all the elements are pairwise co-prime. let Z m 1 , Z m 2 , ... ,Z m k be integers . Then the system of congruences, z ≡ Z m i (mod m i ) for 1 ≤ i ≤ k, has a unique solution modulo M = m 1 × m 2 × · · · × m k , which is given by: To apply Chinese remainder theorem, it is important that the modulus values m i used to create shares have to be relatively prime to each other. Further, in order to remove the mask, the value e has to be subtracted from the output of CRT followed by mod 2 operation. For this example, the value e is calculated as r x y + xr y + r x r y .
The RNS secret sharing scheme follows a variant of (k, t, n)-threshold scheme [27]. Our threshold scheme is defined in Definition 1. The RNS secret sharing scheme requires a minimum of 2 shares to decode the result residue shares to binary output. Additionally, the shares chosen for decoding must be computed with moduli that are co-prime. Definition 1. (2,k,n) threshold secret sharing scheme: Let n be an integer, n ≥ 3, and 3 ≤ k ≤ n. A (2, k, n)-threshold secret sharing scheme is a method for generating shares for x as P = {X m 1 ,X m 2 , . . . X m n } such that

•
For any A ⊂ P such that |A| < 2, learning the element x should be difficult.

•
For any A ⊂ P such that |A| = 2, reconstruction of element x is possible, given that gcd(m i , m j ) = 1.

•
For any A ⊂ P such that |A| ≥ k, reconstruction of the element x becomes easier, given the set {X m i |i A} are relatively prime.

RNS Logic Resilience Characteristics
In this section, we discuss the resilience characteristics of our proposed scheme. We first review the more general definition of the masking technique and then we will show how our proposed approach is resilient to side-channel attacks.

Symmetry Property
In an RNS secure logic, the encoding scheme converts all the binary inputs to residue space using Equation (1). The random value r x masks the binary value by applying the modulo addition operation. Unlike the other side-channel countermeasures, the shares are created by modular addition. This has the potential to reveal the relationship between a residue and the corresponding input bit through the distributions of bits within the residue. Over the space of all the hidden parameters, modulus m i , and the random value r x , an input bit 0 maps to many residues-set R 0 . Similarly, input bit 1 maps to a set of residues R 1 . Ideally, the two sets R 0 and R 1 should not be distinguishable to the adversary. This property called residue indistinguishability or symmetry hides the input to residue relationship. The sample RNS residue encoding is shown in Table 1 for 2-bit residue space over all possible r x and moduli m i . The valid moduli set for 2-bit encoding is {2, 3}. The columns X m 1 and X m 2 are the outputs of the RNS encoding computed with the modulus values 2 and 3, respectively. Based on input binary values, the residue output shares are organized into two sets, one for binary "0" and another for binary "1". In Table 1, The blue circle shows the share values for binary "0" and the red circle shows the share values for binary "1". The X m 1 ∪ X m 2 contains {00, 01, 10} for x=0. Similarly for x=1, the X m 1 ∪ X m 2 contains {00, 01, 10}. The residue sets for x = 0 and x = 1 contain the same residue values. Given the residual share values, it is difficult for an adversary to infer the binary input value without knowing the random secret r x and moduli m i .
We extend this observation into a quantitative measure called symmetry. Symmetry is the probability that adversary fails to distinguish the input bit state given the residue value distribution. In a realistic attack, the adversary does not have access to the residue values. It infers a residue state through a power side-channel. Traditionally, these power models are Hamming-weight-driven. Hence, the primary differentiating characteristic between different residues is their Hamming weight difference. The adversary attempts to gain incremental information about the input bits state 0 or 1 by measuring infinitesimal differences in the average Hamming weight associated with the residues of input bit 0 and 1. If these average Hamming weights are identical, perfect symmetry exists, denying the adversary this information. Equation (2) models this intuition for a fixed modulus value m i . The targeted chip is functional with a fixed m i , and hence the uncertainty/averaging space for the adversary comes from the random mask r x . The average Hamming weight distributions are plotted in Figure 2 for various values of residue size.
where i varies from 1 to M . r is a random value. As reported in [25], we use KL divergence SCA metric to study the symmetry of residue shares with respect to binary values. We computed the KL divergence metric to find the distance between the two distributions for input bits 0 and 1 for all l = 3, 4, 5. Smaller KL divergence values indicate that the 0 and 1 distributions are close to each other, and hence less differentiable and more symmetric. Table 2 reports both KL divergence and the symmetry values over the residue space sizes.  Machine learning offers a powerful model-building technique to an adversary to correlate the Hamming weight of the residue reflected in the measured power trace and the input bit state. We assess the machine learning classifiers to validate the symmetry metric/property to demonstrate that higher symmetry results in lower correlation. These results are reported in Table 2. The success rate column should be interpreted within the context of a random decision. Since the decision in this context is guessing the input bit state for a given residue Hamming weight, a random coin toss has success probability of 0.5. Any higher success probability indicates machine learning's advantage. The key thing to note is that as symmetry increases, the success rate of ML gets closer to a random guess. Note that we gave an advantage to the ML classifier by having it guess the input bit state from the actual residue state rather than the Hamming weight of the residue. It is evident that the RNS encoding scheme provides strong cryptographic privacy to mitigate the power side-channel attack.

Symmetry in a Software Implementation of RNS
Our proposed encoding scheme is based on homomorphic encryption, which can be used to provide security to cloud-based applications as well. Recall that residue sizes were limited to a small number, such as five, by practical circuit implementation constraints. In software, however, the residue sizes l can be scaled to a large number. We extended our symmetry analysis to software implementations with larger values for l. Due to processor implementation characteristics, l would need to be a multiple of byte size. We experimented with l equal to 16 and 32 bits (2 and 4 bytes). This gave us an asymptotic view of symmetry metric effectiveness.
Computing the symmetry values for l = 16, 32 is tedious and requires 2 l iterations based on Equation (3). For l = 5, symmetry value is already at 0.99. With higher l values, it would converge towards 1 with error converging to 0. Hence, we did not compute the actual symmetry values. We, however, applied the three machine learning classifiers, (LDA, QDA, and NB), to predict input binary values from the residue share values. The x-axis in Figure 3 denotes ratio, which is the ratio of training dataset size to the test dataset size.
Note that a higher ratio should make machine inference converge to a truer success rate. The y-axis captures the success rate of ML classifier. Once again, the ML classifier has an advantage only if its success rate is better than the random 50%. We expect symmetry to be better with l = 32 than with l = 16. It is reflected in Figure 3 with a tighter band around 50% success rate line for l = 32 classifier results compared to the l = 16 classifier results. The differences between the classifiers have to do with their native characteristics. It is clear that the adversary will not gain any advantage even with model-based attacks. Over the unknowns r, m, let f 0 X = |Enc(0, r, m) = X| for 0 ≤ r ≤ 2 l − 1 and 2 ≤ m ≤ 2 l − 1. f 0 X gives the frequency with which a 0 is encoded in to the residue X for uniformly distributed unknowns r, m. We can similarly define f 1 X , the frequency with which a 1 is encoded into the residue X. Ideally f 0 X = f 1 X for all X. A weaker symmetry allows f 0 X = f 1 X , but then insists on f 0 X = f 0 X and f 1 X = f 1 X for all residues X and X . However, in reality, often ∃X, X such that f 0 X = f 0 X or f 1 X = f 1 X . These differences create a skew in the transition probability of residue bits, potentially targetable by an adversary. We investigate the transition probability distribution and discuss in Figure 4 a technique to achieve more balanced transition probability for RNS secure circuits.

Multi-Lane Computation
In addition to the residue indistinguishability property, our proposed scheme has an interesting characteristic called multi-lane computation. The RNS encoding function creates encrypted shares with respect to the moduli m i . These share values are congruent with each other, which allows the hardware designer to implement separate hardware for each share. This is a unique characteristic of RNS secure circuit.
In ideal scenarios, the secret sharing scheme splits the input data at the primary inputs end and combines the primary output shares at the end of computation. For side-channel countermeasures, a secret sharing scheme such as t-private circuits divides the input data into t + 1 shares. Each original gate is replaced by special gates capable of handling t + 1 shares for each input. An input x is encoded with t random shares x 0 , x 1 , . . . , In contrast, for RNS secure circuits, the share values support homomorphic computation on each share. We convert a Boolean circuit into its equivalent RNS circuit where the computation with respect to the shares derived from modulus m i can proceed in its own computation lane. This gives rise to t independent computation lanes, which need only be combined at the output stage at the end of computation lane. Each share can be processed with its own hardware lane as shown in Figure 4. The RNS secure circuits for t-shares are denoted as C m 1 , C m 2 , . . . , and C m t . The power consumption data captured from each RNS lane are denoted as P m 1 , P m 2 , . . . , and P m t , respectively. The ith RNS lane takes the input X i and Y i , and generates the residue output S i .
In this setup, the power side-channel adversary attacks each share independently which we call singular attack mode. For this attack mode, we assume that the adversary controls some of the binary inputs to the circuit before the lane shares are created in an encoder and observes the power consumption for all the relevant lanes. In power analysis, higher order attacks (HOA) have been shown to be more powerful. A corresponding attack could use the correlations between multiple lanes to extract the binary input to residue shares mapping. There are no existing methods on combining the leakage data of different share computations. Since we do not have a good mathematical model, lane correlationbased attacks are difficult to perform against RNS secure circuit. In singular attack mode, we partition the Lane i binary primary inputs into < X i , Y i >, where the inputs < X i > are controllable by the adversary, but < Y i > are private. We assume that there are b X binary input bits in the set < X i > and b Y binary input bits in the set < Y i > with N = b X + b Y . The adversary's goal is to retrieve the binary secret using the power leakage P m i that is captured during the execution of a given function on input data < X i , Y i > in the ith lane.
The power leakage of the RNS secure circuit computing a function f in Lane i is given in Equation (3).
For a successful attack, the L Y i ( f ) needs to be significant, which means that there should be a distinct difference in the probability density function for P(L|Y i = 0) and P(L|Y i = 1). However, the symmetry property of the RNS circuit ensures that the distance between the probability density function is minimal. Recall that each of the N input bits in < X i , Y i > results in an l-bit residue, which is further exclusive-ORed with an l-bit random word within the encoder. Hence, the unknown search space consists of N * l random bits that are statistically not correlated. This increases the unknown search space for a key hypotheses to 2 N * l . RNS secure circuits with different hardware lanes allow the designer to operate each lane at different clock frequency which affects the temporal alignment of the leakages from each lane.
For probing side-channel, the adversary has to probe all the shares of the same value in each lane at the same time. This requires large amount of resources with high precision, which is currently impractical. Even if the adversary is able to extract the residue shares, inferring the corresponding binary input x is still difficult without knowing the hidden parameters, random value r x , and modulus m i , as we have discussed before. In a multicore device, each encrypted share could be processed independently on separate cores with a staggered unpredictable schedule. Moreover, power pins associated with different cores are isolated. The adversary will have to observe and capture the leakages of each share/lane/core separately.

Power Side-Channel Adversary
In this section, we will discuss the strength of the RNS secret sharing scheme and introduce hybrid schemes to achieve better resistance against any power side-channel attacks. We first define our basic assumptions for SCA target circuit. We assume that the adversary can control only binary input values. The encrypted shares are not exposed to an adversary which is in line with commonly accepted adversary models for a countermeasure technique. A power analysis attack is a type of a side-channel attack that exploits the leakage obtained in the form of power consumption from the target circuit. Masking techniques are used to randomize power consumption to make sure that the measured leakage is independent of any processed data. RNS secret sharing scheme is also a type of a masking scheme, which uses homomorphic encryption to mask the intermediate values. RNS secure circuits are highly resistant to power analysis because of their resilience characteristics defined in Section 3. More formally, we could describe the strength of RNS secret sharing scheme as follows.
The Definition 3 says that the adversary can successfully model the leakage, to distinguish the intermediate values between 0 and 1. This could be achieved only if the input value strongly determines the intermediate value.

Definition 3.
[1] Let C be a circuit under investigation with secret valuesẏ. The differential power analysis is defined by where D is a function for the key hypotheses.
Our RNS secret sharing scheme applies homomorphic encryption to the input values using the random value r x and the moduli m i . Our encoding scheme completely weakens the control of input binary values over intermediate values by creating encrypted shares. Hence, the power analysis adversary is unable to model the leakage D(X i , y j ) for a successful attack. Additionally, the residue indistinguishability characteristics of the RNS secure circuit more or less equalize the power consumption values Cẏ(X i ) between all the transitions. Hence, the adversary is not able to distinguish the leakages with respect to the output binary level transitions. We believe that the cryptographic privacy of our proposed scheme also makes it difficult to distinguish based on power leakage.
In order to study the power leakage characteristic of our RNS circuit, we computed the switching probability for each output bit of RNS encoding scheme, as defined in Equation (1) with l = 3. The input signal probabilities were propagated in a gate level description of an encoding scheme, and the results are plotted in Figure 5. The input values are single bit binary values which are exclusive-ORed with a least significant bit of a random value. The carry chain of this computation is designed using a logical AND gate and propagated to the following bits of random values to the most significant bit. The modular function truncates the overflow with respect to a chosen modulus value. This perturbs the uniformity of our scheme. The result shows that the output transition probability of our encoding scheme is skewed with the input signal probability. We have found that the modulus reduction reduces the effect of random value r x and makes the transition probability biased. Hence, it is more likely to be vulnerable to power analysis attacks with larger circuits. To make the transition probability of the RNS circuit unbiased, we introduced a random renewal scheme as in an AND gate of t-private logic. In a random renewal scheme, we performed bitwise exclusive-OR function between a random value R i,j and the output of the encoder. The variables i and j refer to the input and the circuit stage, respectively. The random value R x,j and R y,j is l-bits wide with each bit distributed independently and uniformly. This makes the output transition probability of an RNS encoder 0.5 and unbiased. The modified secure RNS circuit is shown in Figure 6. The random renewal exclusive-OR operation maintains homomorphism over the residue values only with true multiplication. Therefore, no modulus reduction is performed. Once the recovery exclusive-OR operation has been done, the residue values are obtained by modulo reduction with appropriate modulus value m i . To maintain the unbiased transition probabilities, random renewal techniques should be applied at each stage with independent random values. A hybrid logic family that merges t-private circuits with RNS circuits could have additional advantages. In this hybrid logic, the RNS shares are still created in the usual manner. However, the residue output bits are further encoded for t-private logic. For 2hybrid logic, each share bit is split into two additional bit shares in the usual 2-private scheme-x, x ⊕ R x . We used t-private logic gates to implement the equivalent RNS secure circuits, as shown in Figure 7. The t-private logic gate uses the secret random values to maintain the uniform switching property throughout the design. This technique provides higher security against side-channel attacks both in terms of secret search space and randomization of side-channel leakage-as we show experimentally in Section 5.

Results
We have implemented an RNS circuit for a boolean AND gate with l = 3 using the 45 nm FreePDK Standard Cell library and Cadence analogue simulator (Spectre). We have conducted exhaustive simulations over residue space using ocean script. We have measured peak current and power consumption of all the possible input transitions (2 12 ). We performed two styles of analysis of the simulated data: one uses a single random value for encoding the input variables, and the other one uses different random values for encoding the different input variables. First, we computed the average values for each class of output transition with respect to the binary values. Then, we have calculated the coefficient of variation and Kullback-Leibler divergence for each logic scheme shown in Table 3. In addition to the base RNS scheme, we have evaluated the SCA metric for random renewal techniques with separate, per-share random variables. The hybrid scheme that includes t-private logic also uses separate, per-share random variable in the encoder. With increased randomness due to the random variables in the t-private logic and additional random variables per encoder, we expected to see lower standard deviation and KL divergence. Intuitively, increased randomness resulted in increased uniformity in the switching distribution.
The coefficient of variation is a well known SCA metric used to quantify the effectiveness of the countermeasures. The lower the value, better the resistance against power analysis attacks. In our scheme, the coefficient of variation is likely to converge towards lower values for larger circuits. The probability density function of peak current was calculated for each output transition with respect to the binary values and the results are plotted in Figure 8.
Additionally, we computed another SCA metric using Kullback-Leibler (KL) divergence for our analysis which defines the failure probability of the attack. We compute the KL divergence between all pairs of transitions and find the maximum values to identify the transition pair with higher deviation. KL divergence is a measure of how far apart, and hence how distinguishable, two probability distributions are. Table 3 indicates that base RNS scheme is the least SCA-resistant, followed by random renewal, followed by random renewal with t-private as most resistant. Even when we use multiple, separate random values per-share, the relative SCA-resistance follows the same order: base RNS < random renewal < random renewal with t-private. Furthermore, observing Tables 3 and 4 together, each of the schemes (1) base RNS, (2) random renewal and (3) random renewal with tprivate, shows higher resistance with a per-share random value instead of a single shared random value. The total SCA-resistance order among these six schemes appears to be base RNS-single random < random renewal-single share < base RNS-multiple random < random renewal with t-private-single random < random renewal-multiple random < random renewal with t-private-multiple random.
As shown in Table 4, the KL divergence SCA metric value is 0.1620 for the random renewal-multiple random scheme, which corresponds to about an 80% failure probability [25], leading to an expected machine learning success rate of 20%. Similarly, the KL divergence SCA metric is 0.0688 for random renewal with the t-private-multiple random scheme. This corresponds to a failure probability of 90%. (c) Random Renewal with t-private circuits In order to validate the KL divergence driven metric and its anticipated failure probability, we also applied machine learning based classification such as linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), and naive Bayes (NB). We have recorded the peak current for RNS secure circuit and its variants for 5000 randomly generated inputs. For each classifier, we classified the measured leakage data into a training set and validation set with a ratio of 4:1. The success rate was then computed for each classifier and the results are given in Table 5. The random renewal schemes are more resistant than either of the base t-private or base RNS schemes. Generally, the SCA resistance order base t-private < base RNS < random renewal < random renewal with t-private is maintained for all the classifiers with a few exceptions. In t-private logic, each AND and OR gate, requires additional t random variables. This however, would complicate SPICE simulations significantly. Hence, we ended up using weaker versions of t-private logic where all AND gates share the same single random variable and so do all OR gates. This explains some of the unexpected results in Table 5.

Modular Multiplication
For RNS logic, a basis for arithmetic functions could be addition and multiplication. These adders and multipliers operate on l-bit values. Overflow can occur both at X + Y and X * Y. Modular reduction results in X + Y by simply ignoring the carry bit. However, a modular reduction is much more expensive at X * Y. Moreover, for an RNS circuit to perform the modular reduction, it has to know the modulus m i . This creates yet another vulnerability-wherein the RNS circuit has to protect m i . For an adversary model, where m i must be kept secret at an encoder/client cloud node, it would be beneficial not to have to perform modular reduction with respect to m i on-chip. The modular reduction can be delayed significantly through the use of Montgomery multiplication [28]. In summary, Montgomery reduction is performed in a field that is a power of two so that a processor can perform it efficiently. This defers actual m i modular reduction to the circuit boundaries. We evaluated Montgomery reduction-based RNS circuits for machine learning-based secret leakage and for correlation power attacks (CPA) effectiveness. We have implemented a Montgomery reduction scheme on the 3-bit residue shares with the auxiliary modulus 2 3 . The architecture was designed based on the idea proposed in [29], and the required area is 253 GE (gate-equivalent). We have measured the peak current, and the power consumption for 25,000 randomly generated inputs and studied the SCA metrics.
In Montgomery multiplication, the reduction modulus is required to be an integral power of two, which forces the modulus m i to an odd value, given that the reduction modulus needs to be co-prime with the original modulus. Montgomery reduction reduces the available modulus set (m i ) for a given l-bit representation by eliminating even moduli. We have constructed a hardware structure for modular reduction called Arithmetic modular reduction in order to maximize this set. In Section 3, we stated that for practical hardware circuits, the residue size is limited to small values, such as 3 or 5. This allows us to compute the canonical form for modular reduction function on residue size of three. The circuit implementation was done using FreePDK 45 nm standard cell library whose area is 582 GE. We performed circuit simulations to capture the peak current values for Montgomery reduction and Arithmetic reduction schemes.
We compare Arithmetic modular reduction schemes against Montgomery reduction schemes for power side-channel attack resistance [25] in Table 6. The KL divergence value for Montgomery reduction is 0.0204, which corresponds to 90% failure probability. The SCA metric with KL divergence for Arithmetic modular reduction is 0.0024, and the corresponding failure probability is close to 99%. Note that the power consumption of the Arithmetic reduction scheme is higher than that of the Montgomery reduction schemes, with a more uniform peak current profile. We also applied ML-classifiers to determine the secret bit. The success rates of various classifiers are given in Table 7. The success rates of Montgomery reduction and Arithmetic modular reduction are around 35%, showing protection compared to the random guess success rate of 50%. The result clearly shows that the adversary does not have any significant advantage over a random guess of the secret values. We have also evaluated the side-channel security for both Montgomery reduction and Arithmetic modular reduction to determine the minimum number of samples to reveal the secret using the CPA tool. The objective of a CPA adversary is to infer the secret input value by correlating the measured power consumption with the power model derived for the target implementation. We generated 25,000 random values for control inputs. Corresponding residue shares using RNS encoding with random value r x = 3 and modulus m i = 7 were then created. The secret input y=1 was also encoded with the random value r y = 5 using RNS encoder. The residue share values were input to the Montgomery reduction block, whose power was captured. The hypothetical power model was derived by targeting the output of the Montgomery reduction using a hamming distance power model. The hypothetical matrix was generated for unknown space of size 2 7 , i.e., secret input (1-bit) + random value r x (3-bit) + random value r y (3-bit) = total (7-bit). With the modulus value, the unknown search space size increases to 2 10 , which we are unable to process with our computational resources. We correlated the measured power consumption with the hypothetical power model, and the results are reported in Figure 9. In Figure 9, the black represents the correct key hypotheses, and the wrong key guess hypotheses are highlighted in gray. The wrong keys envelop the correct key hypothesis, and also, there is no distinct peak in the correlation. Hence, the adversary does not have any advantage in distinguishing the correct secret value from the raw search space. We also believe that adding the modulus value to the search space will increase the complexity for the adversary. We conducted a similar experiment on an Arithmetic-style implementation, and the results are given in Figure 10. We used the same set of random values for both experiments. The hypotheses results remained the same for the Arthimetic multiplier; the secret value has low correlation values compared to wrong key hypotheses. Thus, the CPA adversary failed to recover secret key values from Montgomery reduction and Arithmetic reduction circuits.

FPGA Evaluation
To evaluate the security of our scheme in a physical platform, we implemented AES encryption on the Sakura-G board using Xilinx ISE 14.6. The board consists of two spartan-6 FPGAs: the XC6SLX9 device contains a communication protocol to send/receive data between analysis PC and victim FPGA (XC6SLX75). The board interface was based on openADC to capture the power trace using chipwhisperer software [30]. We implemented both a base design with no protections and the RNS secure design of AES encryption in the victim FPGA. The unprotected AES was implemented in a round-based architecture, which takes 128-bit plaintext and 128-bit key as input and generates the 128-bit ciphertext. The state array updates intermediate around results with a 128-bit register. A key choice in the RNS circuit design is modulus size, which effects the RNS circuit design complexity and security. In this design, we picked 3-bit moduli. With this design choice, we needed three RNS shares. RNS encoding converts each plaintext bit into three 3-bit RNS shares using the modulo values 3, 4, and 5. The modulo values were chosen such that they were co-prime to each other. The RNS circuit, for each share/lane, was implemented on the victim FPGA separately. Resource utilization is given in Table 8. The RNS-protected AES circuit requires 6063 FPGA slices, which is six times the slice needs of unprotected implementation. The RNS-protected implementation takes RNS shares as input and computes the output represented in RNS shares. The key expansion and AES core round functions were constructed using RNS logic with a 384-bit state register(S i ) to store the intermediate values, where i represents the round number. The configurations and experimental setup details are listed in Table 9. We measured 100,000 power traces of the victim FPGA during the ten rounds of the secure RNS AES circuit as the voltage drop across 1Ω resistor, as shown in Figure 11. Evaluation of the RNS-protected AES implementation using machine-learning classifiers was based on features extracted from the power traces. The feature vector was created with peak power consumption values and the Hamming distance value of the state registers. The state register update caused significant power consumption synchronized with clock cycles. The Hamming distance between the RNS output S 10 and S 9 round values was calculated and introduced into feature arrays to perform byte-level classification. The feature vector array was labeled with corresponding RNS key byte values for key expansion of the tenth round. The machine learning classifiers LDA, QDA, and naive Bayes, were used to predict the key values from the feature vector array. The success rates of key byte prediction for various classifiers and implementations are shown in Figure 12. The continuous line represents the success rate values for the unprotected AES, and the dashed lines represent the success rate values of the RNS-protected implementation. From the results, it is observed that the protected implementation had a success rate that is 79.66% lower than the success rate of the base implementation. In RNS circuits, the input encoding and output decoding functions are off-chip computations. This makes it difficult for the traditional CPA attack to reveal the binary secrets.

Conclusions
IoT nodes in a cyber-physical system are attractive targets for physical side-channel attacks. The physical side-channel attacks benefit from the physical possession or being in the vicinity of the device. This paper has presented a novel logic design style based on residue number systems that offer increased resistance to power side-channel attacks.
We have developed new secure logic based on secret sharing and residue number system. We illustrated the transformation of a boolean function representation into residue operations, such as modular multiplication and modular addition. Several variants of secure RNS logic family based on the encoder design and number of independent random variables are presented. We develop the resilience characteristics of the RNS secure circuits against a power analysis attack. KL divergence captures the statistical differentiability of the power trace distribution for various secret values. A low KL divergence value signifies that the differentiability is very low making the circuit side-channel leakage resistant. Our results show KL divergence value of 0.1165 for 3-bit residue designs. For residue representation with small modulus values, an adversary has significant cryptanalytic capability to model the relationship between a primary input bit and its residues. Several variants of the secure RNS logic family varying in the encoder design and number of independent random variables were presented. The enhancement techniques, such as random renewal and hybrid scheme, restore the switching uniformity in the RNS residues and increase the entropy of the moduli's space.
The resistance of the RNS secure logic family was studied for a boolean AND gate. It was quantified using normalized variance and KL divergence as SCA metrics. We also studied the success rates of common machine learning classifiers such as QDA, LDA, and naive Bayes. The SPICE simulations for standard RNS circuits resulted in a KL divergence value of 4.539, whereas the random renewal scheme and hybrid scheme with t-private logic exhibit much reduced KL divergence of 1.8409 and 0.7312, respectively. This attests to the increased side-channel resistance of random renewal and hybrid schemes. The machine learning success rate based SCA metric shows that these enhancements improve the targeted design resistance.
The RNS secure logic can be supported with both public and private moduli. We incorporated Montgomery reduction-based multiplication and its variant-Arithmetic reduction-to enable private moduli. The KL divergence for Montgomery and Arithmetic reductions were 0.0204 and 0.0024, respectively. This paper also presents a protected AES implementation using RNS secure logic on an FPGA platform. The side-channel security was evaluated using ML classifier success rates on the real signals collected from this FPGA. The protected implementation resulted in a 79.66% lower success rate (higher resistance) compared to an unprotected AES circuit. These results collectively show that the RNS logic exhibits high resistance to power analysis attacks.

Abbreviations
The following abbreviations are used in this manuscript: