Single Trace Side Channel Analysis on NTRU Implementation

: As researches on the quantum computer have progressed immensely, interests in post-quantum cryptography have greatly increased. NTRU is one of the well-known algorithms due to its practical key sizes and fast performance along with the resistance against the quantum adversary. Although NTRU has withstood various algebraic attacks, its side-channel resistance must also be considered for secure implementation. In this paper


Introduction
The currently used public key cryptography (PKC) such as RSA and Elliptic Curve Cryptography (ECC) are no longer secure if the quantum computer is developed running the Shor algorithm [1][2][3].Due to the recent advances in quantum computing, post-quantum cryptography (PQC) is an active area of research.Moreover, the national institute of standards and technology (NIST) announced a project to define new standards for the PKC [4].There are five categories studied for PQC: lattice-based cryptography, multivariate-based cryptography, hash-based cryptography, code-based cryptography, and isogeny-based cryptography.Among those categories, lattice-based cryptography is one of the prominent candidates due to the fast performance with a practical key size.In the same context, the largest number of candidates submitted to NIST project belong to the lattice-based cryptography.One of the well known lattice-based cryptography is NTRU, which is an abbreviation of N-th degree truncated polynomial ring.
NTRU proposed in 1996 by Hoffstein et al. [5] is an encryption algorithm based on the shortest vector problem.NTRU has attracted much attention to the researchers due to the faster speed than classical cryptosystems by more than two orders of magnitude on the same security level.Regarding the implementation, only the encryption code was open to the public until 2017.After the patent release in 2017, all of the source code was available.Currently, two kinds of implementation are proposed in the literature, one released on GitHub in 2017 and the other submitted on NIST standardization project.We distinguish the prior one as NTRU Open Source and the latter as NTRUEncrypt, and also the algorithm itself as NTRU.Although NTRU has withstood various mathematical attacks, the security of its implementation is an open question.
After the proposal of the side channel analysis (SCA) by Kocher et al. [6] in 1996, most of the cryptosystems consider a SCA as a de facto standard in nowadays.Additionally, since the resistance of a SCA is included as a requirement in FIPS140-2 (Federal Information Processing Standard publication 140-2), NTRU should consider the resistance against SCA to substitute the RSA and ECC.Moreover, the NIST standardization project suggests a resistance to SCA for the submitted candidates.The SCA is an attack using additional information such as time, sound, and power consumption during the operation of a cryptographic device.Among these methods, power analysis attack such as the differential power analysis (DPA) and simple power analysis (SPA) is known to be the most practical method.The SPA performs by analyzing a single power consumption trace of the device.The DPA is a statistical approach by exploiting a number of power traces related to secret data.Even though the cryptographic algorithm is theoretically safe, the private key or secret message can be exposed by the side-channel leakage when executing the algorithm.In this regard, there are previous studies on SCA on NTRU by Lee et al. [7] and Zheng et al. [8].They performed DPA on NTRU and revealed the secret key.However, whether other types of power analysis can be performed has not been analyzed so far.

Our Contribution
In this paper, we propose the first single trace side channel analysis (STA) against on both NTRU Open Source and NTRUEncrypt with experimental results, and propose a countermeasure.Previous SCA on NTRU [7][8][9] targeted the polynomial multiplication between the cipher-text and the secret key.However, since NTRU was patented until 2017, existing SCAs on NTRU are based on the assumption that publicized polynomial multiplications are used in the decryption process.In this paper, we performed STA on the decryption algorithm used in [10] and on the version submitted in NIST standardization project [11].
Based on the proposed analysis, every NTRU based cryptosystem can be vulnerable to our attack.Moreover, as the PKC is mostly used to exchange the session key between two parties, there might be the case where power consumption trace can only be obtained by one execution of the algorithm.Since we recover with a single power consumption trace, our attack is indeed a threat to these implementations whereas existing DPA cannot be applied in this circumstances.We implement the algorithm on the ATmega128 8-bit processor of the KLA-SCARF AVR [12] and applied the proposed attack.The details of our attack are presented in Section 3.
We propose two versions of countermeasure against the proposed analysis.Although the previous DPA target the different implementation, their method can still be applied on NTRU Open Source and NTRUEncrypt.In this paper, we propose a countermeasure that prevents not only our proposed attack but also the DPA.The proposed countermeasure on NTRU Open Source does not increase the computational cost.Furthermore, the proposed countermeasure on NTRUEncrypt reduces the computational cost approximately by half.The description of our countermeasure is presented in Section 4.

Organization
This paper is organized as follows.In Section 2, we describe NTRU and its implementation.Also, previous SCAs on NTRU are described.In Section 3, we propose our single trace attack and show experimental results.Next, proposed countermeasures and computation comparisons are in Section 4. We make our conclusion in Section 5.

Algorithm of NTRU
The NTRU is a PKC based on the shortest vector problem whose computational complexity is exponential even in the presence of a quantum computer.The encryption and decryption scheme use polynomial rings in R = Z[X]/(x N − 1), which consist of all polynomials with degree less than N and coefficients in Z.Thus, an element f ∈ R can be written as f = ∑ N−1 i=0 f i x i .The polynomial multiplication in R is denoted as •, and is performed as in Equation (1).
Let L f be a set of f ∈ R with d f + 1 coefficients equal to 1 and d f coefficients to −1 and let B g be a set of g ∈ R with d g coefficients equal to 1 and −1, where d f and d g are fixed parameter.We express the polynomials in L f and B g as a trinary polynomial because they consist of only three number of coefficient.The modulus values of integers p and q are used and they satisfy the conditions gcd(p, q) = 1 and p q.We define f p as a polynomial in Z p [X]/(x N − 1) obtained by reducing the coefficients of f ∈ R modulo p.The inverse of f p in Z p [X]/(x N − 1) is denoted as f −1 p .The f q and f −1 q are defined in the same manner.

Key Generation
The private key f is a trinary polynomial selected from L f and the public key h satisfies h = p f −1 q • g (mod q), g ∈ B g .The public key is used in the data encryption and private key is used in the data decryption.

Encryption
The purpose of encryption is to transport the data by converting a message using the public key of the receiver.Then only an owner of proper private key can decrypt the message.To encrypt a plain-text m ∈ B m , we first choose a random polynomial r in B r and compute the cipher-text e as Equation (2).
The modulus q in the above equation means that each coefficient in a polynomial is reduced modulo q.

Decryption
Decryption is used to recover the message from sender.The received data is usually called as cipher-text.The cipher-text e is decrypted by computing the following equations.
The correctness of the decryption is confirmed by the Equations ( 5) and (6).
Note that by choosing the private key f as pF + 1 where F ∈ L f , then f −1 p is equal to 1 so that the Equation ( 4) can be omitted [13].Both target of this paper (NTRUEncrypt, NTRU Open Source) use this optimization.

Side Channel Analysis and Related Work
Although an algorithm is mathematically secure, naive implementation can make cryptosystem vulnerable to various attacks.The most important considerations in implementation are random number generator and leakage of the secret information.In most cryptosystems, the quality of the random numbers used directly determines the security of the system.Therefore, a predictable random value (i.e., low entropy source) may weaken the system.The studies for the randomness have done in the respect of entropy [14][15][16][17][18][19].However, the analysis herein discusses the implementation in the side of SCA.
The SCA is first introduced by Kocher et al. in 1996 [6].Subsequently, power analysis attacks such as the simple power analysis (SPA), differential power analysis (DPA), and correlation power analysis (CPA) [20,21] have been proposed.Nowadays, any attack that exploits information gained from the implementation is considered as SCA.This includes cache attack, EM analysis, and attacks that exploit hardware vulnerabilities [22][23][24][25].However, we mainly focus on the power analysis attack.The power analysis attacks rely on the dependency between the power consumption of the device and the operated data during the execution of an algorithm.The SCA is an actual threat since it can recover the private key of the cryptosystem in practical time.To prevent these type of attack, masking and hiding are studied [26].Masking refers to a method of computing secret information with random values, so that the actual value is unused during the encryption and decryption.Hiding removes the relationship between power consumption and the data.Hiding is one of the hardware level countermeasure which is focused on the security during the operation.
Additionally, the Internet of Things (IoT) devices are advanced nowadays, the security against low-power design is essential [27].However, the conventional PKC is difficult to implement on the resource-constrained environment.Therefore, there is a research on the physical unclonable function (PUF) as a light-weight authentication security primitive.For example, side-channel resistant PUF was intensively studied in [28].

Previous Side Channel Analysis on NTRU
The first studied SCA on the NTRU was timing attacks in 2007 [29].In 2010, Lee et al. introduced a SPA and CPA on NTRU and proposed a countermeasure against the attack [7].The idea behind the proposed SPA in [7] is that there exists a difference in the power consumption when adding non-zero with zero values and non-zero with non-zero values.They also performed CPA on the multiplication using 1000 traces.Also, a second order CPA is proposed in [7] using 10,000 traces.For the description of the attack, please refer to [7].To prevent the SPA, they proposed to initialize the temporary buffer with a non-zero value and to randomize the order of computation and data.They also provided countermeasures against CPA such as masking and shuffling.
In 2013, a first-order collision attack was proposed in [8] with the purpose of incapacitating the countermeasure proposed in [7].Their attack against the first-order countermeasure is an improvement in [8] since the attack is performed with 5,000 traces.The target of the attack was when the same registers are loaded during the multiplication.Overall, the decryption code used for the analysis in [7] and [8] was not an official implementation.Although the proposed attacks can be applied to official implementation, the attack environment is restricted to the case where multiple executions of NTRU with the same key is possible.

Proposed Single Trace Side Channel Analysis on NTRU Implementation
In this section, we propose our STA on the two cases of NTRU implementation.For each case, we first describe the implementation and then suggest our STA.Lastly, we present the experimental results on our attack.The purpose of our attack is to recover the private key.Therefore, only the implementation of decryption is introduced in this paper.

NTRU Open Source
The integral parts of NTRU implementation are the way to store polynomials and a polynomial multiplication.

Representing Polynomials
To store a polynomial f of the private key, NTRU Open Source stores the degree of indeterminant x whose coefficient is −1 or 1.Because the addition is computed according to the degree of −1 and 1, it is possible to operate without the degree of 0. Thus, the private key array first stores all the degree whose coefficient is 1 and then it stores all the degree where its coefficient is −1 in an array.For example, if f = x 3 − x + 1, then the array of f would be {0, 3, 1}.The polynomial in general, is stored such that the coefficient of the xth degree is the xth element in an array.For example, the polynomial e = 3x 4 − x 2 + 9x − 5 represent as {−5, 9, −1, 0, 3}.

Polynomial Multiplication
For efficiency, the private key is set as f = pF + 1 and F is divided into three trinary polynomials and F 3 ∈ L F .The advantage of splitting F, is that it lowers the hamming weight of polynomials so that the multiplication could be speed up [13,30].Consequently, the decryption of NTRU Open Source performs as in Equation ( 7) considering the order of multiplication.
Computation of Equation ( 7) is represented in Algorithm 1 and algorithm for polynomial multiplication is in Algorithm 2.

Algorithm 1 Decryption in NTRU Open Source
Require: The trinary polynomials F 1 , F 2 , F 3 and e ∈ R with degree N F 1 , F 2 , F 3 is a private key add t and u 6: end for 7: for 0 ≤ i < N do 8: * is a word multiplication 9: end for 10: return m The input b of Algorithm 2 is formed in a way such that the degree having coefficient 1 is stored in ascending order and then degree having −1 is stored.The polynomial multiplication starts with the smallest degree where its coefficient equals to −1 and add cipher-text to the initialized array.Since the result must be reduced modulo (x N − 1), this implementation performs the addition from the beginning to (N − 1) and restarts for the 0th element in an array when the degree exceeds N.After the modular operation, the sign is reversed and the same steps are repeated on for the degree having coefficient 1. Lastly, the (mod q) operation is performed by AND(∧) (q − 1) since the q is set as power of 2.

Algorithm 2 Polynomial Multiplication during NTRU Open Source Decryption
Require: Polynomial e ∈ R with degree N and Private key array b let b be a information of private key F Ensure: H = F • e (mod q) 1: for i = 0; i < N; i++ do 2: for i = 0; k < N; i++, k++ do 7: end for 9: for k = 0; i < N; i++, k++ do 10: end for 12: end for 13: for i = 0; i < N; i++ do This step is because the above process is for −1 end for 24: end for 25: for i = 0; i < N; i++ do 26: in the case of q is powering of 2, ∧(q − 1) works for mod q 27: end for 28: return H

Proposed Method
The idea behind the attack is that the correlation between power consumption traces obtained when performing the same operations is higher than the power consumption trace obtained when performing different operations.Let the power trace obtained during the addition operation be taken as a reference trace R. Let O be the subtraces of the power consumption trace in Algorithm 2. When calculating the correlation between R and O, the correlation coefficient will be obtained when computing Algorithm 2. When plotting the gained coefficients values, then a graph appear like Figure 1.There are peaks, called as high peak herein, which signify the affinity between R and O.Then, we recover the private key polynomial by calculating the distance between the high peaks.
As in Algorithm 2, the additions in steps 4 to 12 and steps 16 to 24 depend on the private value.For example, suppose N = 11 and let 5 be the smallest degree when its coefficient equals to −1.Then the steps 6 to 8 are repeated 6 times and steps 9 to 11 are repeated 5 times.Note that, there is a moment when the loop passes to the next loop, then the distance between high peaks is different at that moment.Thus, if the real value is x, so that the interval between (N − x)th and (N − x + 1)th high peak is different from the others.Therefore, we can recover the whole value by applying the same steps for the coefficients −1 and 1.
The first step for analysis is discovering a reference trace R by SPA (Figure 3).The length of R is calculated by dividing the full trace length by the total number of operations.After that, the correlation coefficient can be calculated from the trace using the reference.Figure 1 is a part of the result containing the high peaks and the following intervals.There are two indices tagged on each peak, one represents an order of the high peak and the other is a distance between the previous high peak.The 31th peak has different distance than others, so the first degree where coefficient is −1 is 50 − 31 = 19 = 0x13.With this process, we can recover F 1 , F 2 , F 3 , and the private key.

Representing Polynomials
In the NTRUEncrypt, the polynomial is represented as the coefficients in order.For example, F(x) = x 3 + x − 1 stored as F={−1, 1, 0, 1}.Before the polynomial multiplication of cipher-text and private key, there are steps to compute f = pF + 1.

Polynomial Multiplication
The the Equation (3) operates using the grade school multiplication.Unlike NTRU Open Source, the polynomial multiplication operates separately.These steps are described in Algorithm 3.

Proposed Method
The proposed method exploits the power consumption of steps 1 to 3 and steps 5 to 13 in Algorithm 3 to recover the trinary polynomial F. When F get recovered, the private key polynomial f is computed by f = pF + 1, where p is a public value.The relative order of coefficients −1 is discovered by analyzing the steps 1 to 3 operation.Because F is a trinary polynomial, a constant value p is multiplied by three values −1, 0, and 1.Since most of the processor apply 2's complement method to express negative value, a hamming weight of −1 is bigger than others.Thus we can observe the high peaks in the power consumption trace when the −1 is operated.Note that, the proposed analysis depends on the operation of the processor.Thus, if the processor uses another method to represent negative value, the proposed analysis should consider such circumstances.
The next step, the relative orders of the coefficient 0 are known from 5 to 13 steps which are the polynomial multiplication of cipher-text e and private key f .The power consumption when calculating the coefficient of the cipher-text and 0 will be lower than other calculation processes.This portion where the power consumption is low is referred to as low peak.Therefore, after finding the relative position from 0, 1 to −1, and combining this result with the information of 0s then F is completed recovered.Finally, we can get f by computing pF + 1.

Experiment
Figure 4a is a full trace of the NTRUEncrypt porting on the KLA-SCARF AVR and is captured with a Lecroy HDO6104A oscilloscope at a 250 M sampling rate [12,31].The parameters for the experiment are N = 49, p = 3, q = 2048, and a private key is as follows.
The p and q follow the proposed parameter but N is smaller than the standard because of the experimental environment.
Figure 4c depicts the power consumption of steps 1 to 3 in Algorithm 3. As mentioned above, the high peaks represent the moment when p is multiplied by −1.Also, in the Figure 4c, there are the low peaks related to the coefficient 0 and 1.Thus the relative orders of −1 and others can be recovered by analyzing Figure 4c.
The following process is to recover the coefficients 0. For each coefficient of the cipher-text, there are N multiplications with the private key.During the N operations, the operation of the private key 0 appears in the same order, so the low peaks appear regularly on the whole power trace (Figure 4a).To recover the degree, we should classify a set of multiplications by SPA among the trace.The multiplication between cipher-text and private key occurs after computing pF, and the total recovered number of multiplications is N 2 .To reduce the noise, one can average multiple power consumption trace.Figure 5 illustrates the average of 10 traces.Figure 4b is an enlarged plot of four low peaks to deduce that peaks are identified.Lastly, with the three coefficients recovered from the analysis, the private key f is obtained.

Countermeasure
In this section, we propose a countermeasure for each of the two implementations.The proposed analysis on NTRU Open Source and NTRUEncrypt does not depend on the data information.Since we used a single power consumption trace, countermeasures to prevent DPA such as adding dummy operation and shuffling cannot prevent our attack.

Countermeasure against NTRU Open Source Implementation
Since the advantage of the original implementation is that computes both modular reduction (x N − 1) and polynomial multiplication, simultaneously, the countermeasure we propose also process both of the operation at the same time.Furthermore, the modified implementation has the same number of polynomial coefficient addition as the original implementation.The Algorithm 4 is a countermeasure for the polynomial multiplications described in Algorithm 2.
The Algorithm 4 is a method that precomputes the index i, where the cipher-text polynomial e i will be added.For example, let 9 be the degree of the polynomial and let 2 be a coefficient of degree 0, then the original cipher-text polynomial coefficient addition performs as in Figure 6a.During the original iteration, the additions when i = 0 to 6 operate with different loop when i = 7 to 8.However, the addition in the proposed method operates in the same loop so there is no leakage for side channel analysis.The proposed method first finds the index of cipher-text which is added to the middle index of the result array, then the addition operates simultaneously as in Figure 6b.1: for i = 0; i < N; i++ do 2: t i ← r r is a random value 3: end for 4: for j = d f + 1; j < 2d f + 1; j++ do 5: x ← x + 1 9: x ← x + 1 22: end for 26: end for 27: for i = 0; i < N; i++ do

Countermeasure against NTRUEncrypt Implementation
The proposed countermeasure in this paper uses three tables initialized with a random number.The countermeasure prevents not only proposed attack in this paper but also the other previous attacks with a decreased number of computations compared to the submitted implementation.The NTRUEncrypt first calculates the private key f (= pF + 1) then multiplies f • e.However, the proposed countermeasure uses a trinary polynomial F, and computes p × F • e and adds e to decrypt the cipher-text.This is expressed in Equation (8).
To compute f × e, the proposed countermeasure first computes p × e and temporarily save the value.After that, it updates the output of computation in the three tables according to −1, 0, 1 from the trinary polynomial F. Then the table of the coefficient −1 is subtracted from the table of the coefficient 1.During the accumulation, a start index is sought by (i + j) (mod N) to process (mod x N − 1) at the same time for increased efficiency.
For the side channel resistance, by encoding the trinary polynomial of the private key F = enc(F) at the storage step, it relieves the difference in power consumption coming from loading −1 and 0, 1.The encoding function enc is chosen by considering the physical property.The proposed countermeasure uses an encoding function as enc(−1) = 1, enc(0) = 2, and enc(1) = 4, to have the same hamming weight.Then the trinary polynomial would be represented with 1, 2, and 4 at the proposed algorithm.Algorithm 5 describes the above procedure.PE i ← p × e i 3: T i [1] ← r r is a random 4: T i [2] ← r 5: T i [4] ← r 6: end for 7: for 0 ≤ i < N do 8: end for 11: end for 12: for 0 ≤ i < N do 13: mod q) 14: end for 15: return m At the final step 13, a subtraction of 1 and −1, an addition of e and (mod q) could be processed at once.Since q is a power of 2 in the proposed parameters, (mod q) can be operated as AND (∧).In Algorithm 5, different tables are accessed according to the coefficient of F. However, since the same operation is performed regardless of the coefficients, it can be considered that there is no difference in the power consumption which depends on −1, 0, and 1.Furthermore, as the three tables are initialized with the same random number (steps 3 to 5), the algorithm also prevents SPA and CPA proposed in [7].SPA can be prevented by initializing the array to hold the result with non-zero values and removing the operations that add zero and non-zero values.Also, choosing the non-zero value for initial as random, the algorithm is protected from CPA because the intermediate value cannot be guessed.When three tables initialized with the same random non-zero value, the random values can be removed without additional computation, as in step 13.

Comparison of the NTRUEncrypt
The computational cost of the proposed countermeasure on NTRU Open Source is similar to the unprotected version, only with additional precomputations.Moreover, it is hard to compare the NTRU Open Source and NTRUEncrypt because of the different process of the private key multiplication.Thus, we only compare the computational cost and memory of the protected and unprotected version of NTRUEncrypt.Table 1 includes the number of initialization, addition, and multiplication costs when the degree is N. Since the computational cost of subtraction is similar to that of the addition, we include both the numbers of subtractions and the numbers of additions.The computation to find the start degree (i + j) (mod N) is not included.As shown in Table 1, the total number of computational steps necessary to calculate f • e is reduced when applying our countermeasure.Moreover, the number of multiplications is reduced to square root of the original.Also, the comparison of the memory usage is presented in Table 2. Since NTRUEncrypt use 16 bit as a word size, Table 2 refers to the multiplication of a word size and the number of used arrays.Although the number of used arrays is doubled compared to that used in the unprotected NTRUEncrypt, the total computational cost(number of computational steps) is reduced from 2N 2 + 2N to N 2 + 6N, where N is at least 443.Consequently, considering the computational costs and the memory size, Algorithm 5 is more efficient compared to Algorithm 3 and has side channel resistance.

Result of the Countermeasure Implementation
Figure 7a is the full trace of the Algorithm 5 implemented in the same environment as the analyzed traces.Since the countermeasure uses a table for all coefficients in the private key, the power consumption difference depending on the private key is not exposed and this is observed in Figure 7b.

Conclusions
Although the cryptosystem is proven to be secure, the security of its implementation must be considered as the devices may expose side channel leakages.Since PQC cryptosystems are implemented in classical computers, side channel must be considered.In this paper, we analyzed the two versions of NTRU implementation -NTRUEncrypt and NTRU Open Source.By using a single power consumption obtained in the decryption, we were able to recover the private key on both implementations.Our attack is practical and powerful since it can be applied without constraints of the environment.We also proposed countermeasures for our attack.Our countermeasures not only prevent our proposed attack but also prevents the previous attack.Moreover, our countermeasures do not degrade its performance.In addition, as the NIST standardization project is still in process, every algorithm including NTRU may provide an updated optimized implementation.The proposed analysis is based on the implementation up to now but more optimized version might appear in the future.

3 Figure 1 .
Figure 1.Result: High peaks of the addition and a distance value between the peaks.

Figure 2 .
Figure 2. Full trace of NTRU Open Source: The top figure is a raw trace, the middle figure is a filtered trace, and the bottom two are enlarged figure of the filtered trace.

Figure 3 .
Figure 3. Figuring the reference trace: The three set of an addition operation.

Figure 5 .
Figure 5.The Average of 10 Power Consumption Traces of Polynomial Multiplication using Grace School Multiplication Method.

Algorithm 4
Countermeasure of NTRU Open Source Proeject Require: cipher-text polynomial e ∈ R and coefficient location indices of private key b Ensure: H = F • e (mod q)

28 :
H i ← t i − r (mod q) 29: end for 30: return H (a)Original Implementation Iteration (b)Proposed Implementation Iteration

Algorithm 5
Countermeasure Applied Decryption of NTRUEncryptRequire: Trinary polynomial F an encoding of F ∈ L f , cipher-text e ∈ R Ensure: message m = f • e (mod q) 1: for 0 ≤ i < N do 2:

Table 1 .
Comparison of Operation between Unprotected and Protected NTRUEncrypt.

Table 2 .
Comparison of the Memory Usage between Unprotected and Protected NTRUEncrypt.