Next Article in Journal
Bitcoin’s APIs in Open-Source Projects: Security Usability Evaluation
Next Article in Special Issue
Area-Efficient Vision-Based Feature Tracker for Autonomous Hovering of Unmanned Aerial Vehicle
Previous Article in Journal
Multi-Objective Fuzzy Logic-Based Energy Management System for Microgrids with Battery and Hydrogen Energy Storage System
Previous Article in Special Issue
Design and Analysis of an Approximate Adder with Hybrid Error Reduction
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

High Efficiency Ring-LWE Cryptoprocessor Using Shared Arithmetic Components

1
Faculty of Information Technology, Ton Duc Thang University, Ho Chi Minh City 758307, Vietnam
2
Department of Information and Communication Engineering, Inha University, Incheon 22212, Korea
*
Author to whom correspondence should be addressed.
Electronics 2020, 9(7), 1075; https://doi.org/10.3390/electronics9071075
Submission received: 17 May 2020 / Revised: 24 June 2020 / Accepted: 28 June 2020 / Published: 30 June 2020
(This article belongs to the Special Issue System-on-Chip (SoC) Design and Its Applications)

Abstract

:
A high efficiency architecture for ring learning with errors (ring-LWE) cryptoprocessor using shared arithmetic components is presented in this paper. By applying a novel approach for sharing number theoretic transform (NTT) polynomial multiplier and polynomial adder in encryption and decryption operations, the total number of polynomial multipliers and polynomial adders used in the proposed ring-LWE cryptoprocessor are reduced. In addition, the processing time of NTT polynomial multiplier is speeded up by employing multiple-path delay feedback (MDF) architecture and deploying pipelined technique between all stages of NTT processes. As a result, the proposed architecture offers a great reduction in terms of the hardware complexity and computation latency compared with existing works. The implementation result for the proposed ring-LWE cryptoprocessor on Virtex-7 FPGA board using Xilinx VIVADO shows a significant decrease in the number of slices and LUTs compared with previous works. Moreover, the proposed ring-LWE cryptoprocessor offers higher throughput and efficiency than its predecessors.

1. Introduction

Cryptographic algorithms are grouped into two categories named symmetric algorithms and public key (or asymmetric) algorithms. The former uses a single key between two parties to enable a secure communication, where the key is kept private from all other parties. Symmetric algorithms are widely used because of its simplicity. However, a symmetric key algorithm requires an agreement between sender and receiver on the secret key. Asymmetric, or public key cryptosystems use two different keys called private keys and public keys. Whereas the private key is kept secret for the decryption process, the public key is used for encryption and can be revealed to all other parties. The encryption operation is conducted using a public key, and the encrypted message can only be decrypted using the corresponding private key. The security of Rivest, Shamir, and Adleman (RSA) cryptosystems [1] and elliptic curve cryptosystems (ECC) [2,3] are based on the difficulty of solving some number theoretic problems and the difficulty of solving the elliptic curve discrete logarithm problems, respectively. However, these problems can be solved by the algorithm proposed by Shor [4] in polynomial time with quantum computers. Therefore, stronger security systems or post-quantum cryptosystems have been proposed, and the National Institute of Standards and Technology (NIST) is standardizing them. Among lattice-based cryptosystems for the post-quantum era, ring learning with errors (LWE) is a promising candidate because its security proofs are based on the worst-case hardness of lattice problems that there is no known quantum algorithm can efficiently solve [5]. A typical block diagram of a typical ring-LWE cryptoprocessor is shown in Figure 1. Input message m is encrypted into ciphertext ( c 1 , c 2 ) using arithmetic computation on public key (a, p) and error polynomials e 1 , e 2 , and e 3 . Original message m can be recovered from ciphertext ( c 1 , c 2 ) and private key r 2 using the decryption operation.
The ring-LWE problem has been discussed in recent studies, both in software and hardware [5,7,8,9]. In Reference [5], high-throughput ring-LWE cryptoprocessors are designed to perform ring-LWE encryption and decryption operations. In Reference [7], an approach of integrating ring-LWE cryptography into existing fingerprint authentication systems to fully protect the fingerprint data are introduced. Authors in [9] present the implementation of ring-LWE encryption on IoT processors.
In the ring-LWE cryptosystem, lattices with an algebraic structure like polynomial multiplication and addition are performed over a polynomial ring, typically R q = Z q [ x ] / f ( x ) . Among these operations, polynomial multiplication is the most complex one that can be efficiently performed using number theoretic transform (NTT)-based polynomial multiplication [5]. NTT multiplier is a modified version of fast Fourier transform (FFT) to work in a finite field without inaccurate floating point or complex arithmetic to compute polynomial multiplication efficiently [10]. There are several NTT multiplier architectures that deploy single-path delay feedback (SDF) or multiple path delay commutator (MDC) structures in literature. For example, a high throughput multiplier using NTT cores with radix-2 SDF architecture is presented in [11]. In Reference [5], authors introduce radix-2 and radix-8 MDC architecture-based NTT cores for ring-LWE cryptoprocessors to obtain the encryption throughput of gigabits per seconds and decryption throughput of megabits per second. However, these architectures require large hardware resources and high computation time since the NTT polynomial multipliers work separately and NTT operations are not fully optimized.
In this paper, a novel approach to efficiently use arithmetic components in ring-LWE cryptoprocessors to achieve a high efficiency is presented. Specifically, the polynomial multiplier and polynomial adder that participate in the ring-LWE encryption operation are reused in decryption phase to reduce hardware complexity. Additionally, the NTT polynomial multiplier is designed using MDF architecture and deploying pipeline technique among all stages of NTT and INTT transforms to mitigate hardware complexity and speed up multiplication operations. Our contributions of this article are summarized as follows:
  • We propose a ring-LWE cryptoprocessor architecture in which the same arithmetic components, including one polynomial multiplier and one polynomial adder, are used in both encryption and decryption operations to reduce hardware complexity. As a result, the proposed ring-LWE cryptoprocessor requires less hardware resource than existing architectures to perform encryption and decryption operation.
  • We deploy the polynomial multiplier using NTT multiplier with parallel based MDF architecture to enhance the polynomial multiplication. Furthermore, the pipeline technique is applied in the proposed design to reduce the system latency.
  • We implement the proposed ring-LWE cryptoprocessor architecture on Xilinx Virtex-7 FPGA board and compare the obtained results with its predecessors. Performance evaluation results show that the proposed architecture offers a higher throughput and a better efficiency than others.
The remaining of this paper is structured as follows. In Section 2, brief discussions on the ring-LWE cryptosystems are carried out. Section 3 presents the proposed algorithm and architecture for ring-LWE cryptoprocessor. Section 4 provides the results of implementation and comparison, and Section 5 includes conclusions.

2. Background

2.1. Ring-LWE Cryptosystem

The ring-LWE problem introduced by Regev [12] in 2005 is a machine learning problem that is equivalent to the worst-case lattice problems. The ring-LWE cryptosystem operates over a ring R q = Z q [ x ] / f ( x ) , where f ( x ) is the irreducible polynomial of degree n, n is a power-of-two number, and q is a prime number such that q 1 mod 2 n . The common case of irreducible polynomial is f ( x ) = x n + 1 that is presented in [11].
Polynomial multiplication and polynomial addition are employed to carry out the cryptographic primitives of the ring-LWE cryptosystem. The procedures for key generation, encryption, and decryption of ring-LWE cryptosystem are described as follows:
Key generation: This process generates the public key ( a , p ) and the private key r 2 . Polynomial a is chosen uniformly and two polynomials r 1 and r 2 are selected from the discrete Gaussian distribution χ σ to compute the public key polynomial:
p r 1 a × r 2
Encryption: The input message m is encoded to get the polynomial m e . If the i t h coefficient of m is 1, it is mapped to ( q 1 ) / 2 ; otherwise, it is converted to 0. The cipher-text c 1 and c 2 are computed from the given polynomials and three error polynomials e 1 , e 2 , and e 3 that sampled from the Gaussian distribution:
c 1 a × e 1 + e 2 c 2 p × e 1 + e 3 + m e
Decryption: The input message m is recovered from the pre-decoded polynomial m d :
m d c 1 × r 2 + c 2
Depending on the value of each coefficient of the message m d , the decoder maps it to a corresponding binary bit to recover the original message m.

2.2. Arithmetic Operations over Ring

The arithmetic operations over the ring R q = Z q [ x ] / f ( x ) include modulo reduction, polynomial addition, and polynomial multiplication. Given coefficients a i and b i in R q , two polynomials a ( x ) and b ( x ) over the ring can be defined as follows:
a ( x ) = a 0 + a 1 x + a 2 x 2 + + a n 1 x n 1 b ( x ) = b 0 + b 1 x + b 2 x 2 + + b n 1 x n 1
As mentioned previously, the polynomial multiplication requires the most processing time, and the NTT-based algorithm is an efficient algorithm for performing this multiplication. Assume that ω is a primitive n t h root of unity, the NTT of each coefficient of a ( x ) is defined as:
A i = j = 0 n 1 a j ω i j mod p
The inverse NTT (INTT) is calculated as:
a i = n 1 j = 0 n 1 A j ω i j mod p
Let α and β be extended vectors of a ( x ) and b ( x ) by filling n zero elements. The multiplication of two polynomials a ( x ) and b ( x ) can be computed based on NTT and INTT:
c ( x ) = a ( x ) × b ( x ) = I N T T ω 2 n ( N T T ω 2 n ( α ) N T T ω 2 n ( β ) )
where ⊙ is the point-wise multiplication.
To avoid zero padding in NTT polynomial multiplication, the negative wrapped convolution is used. Assume that c is the negative convolution of a ( x ) and b ( x ) , the negative wrapped convolution can be expressed as:
c i = j = 0 i a j b i j j = i + 1 n 1 a j b n + i j
Define a = ( a 0 , ψ a 1 , , ψ a n 1 ) , b = ( b 0 , ψ b 1 , , ψ b n 1 ) , and c = ( c 0 , ψ c 1 , , ψ c n 1 ) , where ψ 2 ω mod p , the NTT polynomial multiplication becomes
c = a × b = I N T T ω n ( N T T ω n ( a ) N T T ω n ( b ) )
By using the negative wrapped convolution, the NTT multiplication can be calculated using only n-coefficients.
The polynomial addition of two polynomials a ( x ) and b ( x ) is simply adding corresponding coefficients of two polynomials and then doing modulo reduction (MR). In MR operation, the coefficients of the resulting polynomial are reduced by modulus q. To execute this operation, a few MR algorithms are presented in [5,13]. Since the parameters used in this paper are n = 512 and q = 12 , 289 , the SAMS2 algorithm for q = 12 , 289 that is presented in [5] is selected.

2.3. Discrete Gaussian Sampler

In ring-LWE cryptography, error polynomials sampled from a discrete Gaussian distribution χ σ with a standard deviation σ are required. This distribution uses the probability function described as follows:
P r ( E = i ) = 1 S e i 2 2 σ 2
where E is the random variable on Z , S is the normalization factor, and i is an integer. By approximating S to the probability density function can also be described using
P r ( E = i ) = 1 S e π i 2 S 2
In References [5,14,15], authors present several methods for performing discrete Gaussian sampling. Among these methods, rejection sampling and inversion sampling are the popular ones [14]. In practice, rejection sampling for a discrete Gaussian distribution is slow due to the high rejection rate for the sampled values, which are far from the center of the distribution. The inversion method first generates a random probability and then selects a sample value such that the cumulative distribution up to that sample point is just larger than the randomly generated probability. Since the random probability should be of high precision, this method also requires a large number of random bits. The Knuth–Yao algorithm [14,15] uses a random walk model for sampling from any non-uniform distribution. However, the output of Knuth–Yao algorithm is generated in an unpredictable time; then, it is not a reliable sampler [5]. In this work, we deploy the linear feedback shift registers (LFSRs) approach that was proposed in [16]. This approach offers a low-complexity with an approximated uniform pseudo-random distribution; hence, it can be exploited to generate an accurate approximation of a Gaussian distribution with low maximum auto-correlation.

3. Proposed Ring-LWE Cryptoprocessor Architecture Using Shared Arithmetic Components

3.1. Proposed Algorithm for the Ring-LWE Cryptoprocessor

The proposed shared arithmetic components based ring-LWE cryptography algorithm is described in Algorithm 1. Two additional parameters e n c and d e c are used to control the encryption and decryption operations. Encryption operation is enabled when e n c = 1 , and the multiplier Mult2 and adder Add3 participate in encryption phase. Input message m is encoded to get polynomial m e . This encoded message is then added with error polynomial e 3 and stored in c 21 . Polynomials’ multiplications c 10 = a × e 1 and c 20 = p × e 1 are calculated by Mult1 and Mult2, respectively. Ciphertext c 1 is computed by adding two polynomials c 10 and e 2 using polynomial addition function Add2, while c 2 = c 20 + c 21 is conducted by addition function Add3. Ciphertext ( c 1 , c 2 ) is then successfully carried out.
Algorithm 1: Proposed ring-LWE cryptography algorithm using shared arithmetic components
Input a , p Z q n , m { 0 , 1 } n , ω Z q
Output:  Ciphertext ( c 1 , c 2 ) Z q n , or original messsage m
   r 1 , r 2 Gaussian sampler ( s , n ) ;
   e 1 , e 2 , e 3 Gaussian sampler ( s , n ) ;
  while e n c = 1 do
   for i = 0 to n 1 do
    if m [ i ] = 1 then
      m e [ i ] q 2 × m [ i ] ;
    else
      m e [ i ] 0 ;
    end if;
   end for
    c 10 Mult 1 ( a , e 1 ) ; c 20 Mult 2 ( p , e 1 ) ;
    c 21 Add 1 ( m e , e 3 ) ; c 1 Add 2 ( c 10 , e 2 ) ;
    c 2 Add 3 ( c 20 , c 21 ) ;
  end while
  Return ( c 1 , c 2 )
  while d e c = 1 do
    m d 1 Mult 2 ( c 1 , r 2 ) ; m d Add 3 ( m d 1 , c 2 ) ;
   for i = 0 to n 1 do
    if ( q 4 m d [ i ] 3 × q 4 ) then
      m [ i ] = 1 ;
    else
      m [ i ] = 0 ;
    end if;
   end for;
  end while;
  Return ( m )
The decryption operation is enabled by the signal d _ e n . In this phase, multiplier Mult2 and adder Add3 that participate in the encryption phase are reconfigured to perform operations over the ring. Specifically, multiplier Mult2 calculates the multiplication between the cipher-text c 1 and the polynomial r 2 . The result of this multiplication is then transferred to adder Add3, where it is added to the cipher-text polynomial c 2 to return the pre-decoded polynomial m d = c 1 × r 2 + c 2 . Finally, the original message m is recovered from the pre-decoded message m d using a decoder. The original message m is recovered from the pre-decoded polynomial m d using a decoder. If the i-th coefficient of m d satisfies the condition q / 4 m d [ i ] 3 q / 4 , the corresponding i-coefficient of message m ( m [ i ] ) is decoded to 1; otherwise, it is converted to 0. The detailed timing diagram of the proposed ring-LWE crytoprocessors is shown in Figure 2.
When d e c = 1 , polynomial multiplication m d 1 = c 1 × r 2 is computed using the same multiplication resource Mult2 in encryption phase. Similarly, polynominal addition Add3 is reused to compute polynomial addition m d between polynomial m d 1 and ciphertex c 2 . Finally, the original message m is retrieved by decoding message m d .

3.2. Ring-LWE Cryptoprocessor Architecture Using Shared Arithmetic Components

The proposed ring-LWE cryptoprocessor architecture using shared NTT polynomial multiplier and polynomial adder is illustrated in Figure 3, which consists of an encoder, a Gaussian sampler, polynomial adders, polynomial multipliers, a decoder, and a controller. As can be seen, Multiplier 2 and Adder 3 are deployed to participate in both encryption and decryption operations. The detailed architecture is described in Figure 4.
The encryption operation computes the cipher-text ( c 1 , c 2 ) . This operation is enabled by the control signal e n c . Initially, the input information m is encoded using an encoder. Each bit of the message m works as the control signal of the corresponding MUX, where its inputs are 0 or ( q 1 ) / 2 . The encoded message m e is constructed by the outputs of these MUXs. The encoded message is then added with the error polynomial e 3 using the adder Add1, controlled by signals a 1 _ e and a 1 _ d , to get the value ( m e + e 3 ) . Multiplier Mult1 computes the product of the polynomial a and the error vector e 1 , while multiplier Mult2 calculates the multiplication of the public key p and e 1 . These multipliers are triggered by the control signals m 1 _ e and m 2 _ e , respectively. In the proposed architecture, we use different architectures of NTT multiplier, which are discussed in the following part. Two control signals m 1 _ d and m 2 _ d are assigned to 1 indicating that the multiplications at multipliers Mult1 and Mult2 are completely executed. The output of multiplier Mult1 becomes the input of adder Add 2, where it is added to the error vector e 2 to form the ciphertext c 1 . Concurrently, the output of multiplier Mult2 is added with the polynomial ( e 3 + m e ) to generate ciphertext c 2 , performed by adder Add3. Upon control signals, a 2 _ d and a 3 _ d are equal to 1, and the encryption operation is fully accomplished. The ciphertext ( c 1 , c 2 ) is carried out.

3.3. Proposed NTT Polynomial Multiplier Using MDF Architecture

To speed up the computation time and reduce the complexity of ring-LWE cryptoprocessors, a novel NTT polynomial multiplier using MDF architecture is proposed. The MDF architecture can provide a higher throughput rate with minimal hardware cost by combining the features of MDC and SDF. In MDF architecture, the SDF architecture is extended by using a multi-path approach. In order to achieve higher throughput rate, the number of data-paths can be increased to eight or even sixteen.
Theoretically, a NTT-based polynomial multiplier consists of three bit-reverse processes, two NTT processes, one point-wise multiplication, and one INTT process. By using the reverse Cooley–Tukey algorithm [17] in the NTT-based polynomial multiplication operation, three bit-reverse operations are eliminated, as described in Figure 5a. Therefore, the computation time and hardware complexity are greatly reduced. In addition, two NTT calculations for input polynomials are executed concomitantly to mitigate the multiplication latency. The pipeline technique is also applied between all stages of NTT multipliers to decrease critical path delay. In this work, the 8-parallel MDF architecture-based NTT multiplier is deployed. This multiplier is employed in the ring-LWE cryptoprocessors to conduct the encryption and decryption operations. The detail architecture of NTT core using 8-parallel MDF architecture is illustrated in Figure 5b. In Figure 5b, 512-coefficients of input polynomial are divided into eight parallel paths, each path consists of 64 coefficients. For example, path a 1 of polynomial a ( x ) in Figure 5b consists of coefficients a 1 , a 9 , a 17 , and so on. Two input polynomials a ( x ) and b ( x ) are processed using NTT transform architecture. After obtaining the NTT transform of two polynomials a ( x ) and b ( x ) , the point-wise multiplication operation is calculated, followed by the INTT transform to return the value of the multiplication operation. The architecture of INTT core is similar with NTT core architecture, which consists of processing elements PE1 and PE2, as presented in Figure 5b. The proposed PE1 and PE2 architectures for the NTT core are detailed in Figure 6.

4. Implementation Results and Comparison

The proposed architecture for ring-LWE cryptoprocessor is modeled in Verilog HDL, synthesized, and implemented using Xilinx VIVADO 2017.4 on a Virtex-7 FPGA platform. The implementation results of ring-LWE cryptoprocessor are summarized in Table 1.
As can be seen from Table 1, the proposed architecture requires less hardware resources, calculated in number of slices and LUTs, to conduct a completed ring-LWE encryption–decryption operation compared with similar parallel multiplier-based ring-LWE architectures presented in [5,6]. Specifically, to perform ring-LWE encryption and decryption operations, the proposed architecture uses only 23,707 slices and 61,258 LUTs, which is about 42% and 67.83% of that in [5], respectively. Additionally, the encryption and decryption throughput of the proposed architecture are higher than that of architecture in [6] and R8M architecture in [5]. The architectures in [18,19] require a small number of slices and LUTs to perform encryption and decryption; however, these architectures require high latency to complete ring-LWE encryption and decryption operations. Therefore, the values of throughput offered by these architectures are smaller than 150 Mbps, as described in Table 1.
Efficiency presented in [20] can be used as a parameter to evaluate the performance of different designs on various FPGA platforms. In detail, the efficiency parameter represents the throughput value that one hardware unit (LUT) of an architecture can offer. As can be seen from Table 1, the proposed ring-LWE cryptoprocessor architecture can offer a better value of efficiency compared with that of other architectures. Specifically, the obtained efficiency value of the proposed architecture is about two and seven times larger than that of architectures in [5,18], respectively. Comparison in encryption time, decryption time, and efficiency is described in Figure 7.

5. Conclusions

A novel approach to improve encryption and decryption operation of ring-LWE cryptoprocessors has been presented in this paper. By sharing the hardware resources, including polynomial multiplier and polynomial adder, in both encryption and decryption phases, the hardware complexity of the proposed architecture can be significantly reduced. Furthermore, the polynomial multiplication is greatly enhanced by deploying an efficient NTT multipliers using MDF architecture and the pipeline technique between all stages of NTT multipliers. As a result, the proposed ring-LWE cryptoprocessor offers higher throughput and efficiency compared with that of existing works. Therefore, it can be applied in hardware resource-limited systems that require high throughput and low latency.

Author Contributions

T.N.T. conceptualized the idea of this research, conducted experiments, collected data, and prepared the original version. T.T.B.N. reviewed, analyzed data, and updated the manuscript. H.L. supervised, validated, reviewed, and supported the research with funding. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Basic Science Research Program (2019R1F1A1061926) through the NRF funded by the MSIT and by Inha University, Incheon, Korea.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Ochoa-Jiménez, E.; Rivera-Zamarripa, L.; Cruz-Cortés, N.; Rodríguez-Henríquez, F. Implementation of RSA Signatures on GPU and CPU Architectures. IEEE Access 2020, 8, 9928–9941. [Google Scholar] [CrossRef]
  2. Nguyen, T.T.; Lee, H. Efficient Algorithm and Architecture for Elliptic Curve Cryptographic Processor. J. Semicond. Technol. Sci. 2016, 16, 118–125. [Google Scholar] [CrossRef] [Green Version]
  3. Basu Roy, D.; Mukhopadhyay, D. High-Speed Implementation of ECC Scalar Multiplication in GF(p) for Generic Montgomery Curves. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2019, 27, 1587–1600. [Google Scholar] [CrossRef]
  4. Shor, P.W. Polynomial-Time Algorithms for Prime Factorization and Discrete Logarithms on a Quantum Computer. SIAM J. Comput. 1997, 26, 1484–1509. [Google Scholar] [CrossRef] [Green Version]
  5. Rentería-Mejía, C.P.; Velasco-Medina, J. High-Throughput Ring-LWE Cryptoprocessors. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2017, 25, 2332–2345. [Google Scholar] [CrossRef]
  6. Nguyen Tan, T.; Lee, H. Efficient-Scheduling Parallel Multiplier-Based Ring-LWE Cryptoprocessors. Electronics 2019, 8, 413. [Google Scholar] [CrossRef] [Green Version]
  7. Nguyen Tan, T.; Lee, H. High-Secure Fingerprint Authentication System Using Ring-LWE Cryptography. IEEE Access 2019, 7, 23379–23387. [Google Scholar] [CrossRef]
  8. Nguyen Tan, T.; Lee, H. High-Performance Ring-LWE Cryptography Scheme for Biometric Data Security. IEIE Trans. Smart Process. Comput. 2018, 7, 97–106. [Google Scholar] [CrossRef]
  9. Liu, Z.; Azarderakhsh, R.; Kim, H.; Seo, H. Efficient Software Implementation of Ring-LWE Encryption on IoT Processors. IEEE Trans. Comput. 2017. [Google Scholar] [CrossRef] [Green Version]
  10. Liu, D.; Zhang, C.; Lin, H.; Chen, Y.; Zhang, M. Resource-Efficient and Side-Channel Secure Hardware Implementation of Ring-LWE Cryptographic Processor. IEEE Trans. Circuits Syst. I Reg. Pap. 2019, 66, 1474–1483. [Google Scholar] [CrossRef]
  11. Rentería-Mejía, C.P.; Velasco-Medina, J. Hardware Design of an NTT-Based Polynomial Multiplier. In Proceedings of the 2014 Southern Conference on Programmable Logic (SPL), Buenos Aires, Argentina, 5–7 November 2014; pp. 1–5. [Google Scholar]
  12. Regev, O. On Lattices, Learning with Errors, Random Linear Codes, and Cryptography. In Proceedings of the ACM symposium on Theory of Computing, Baltimore, MD, USA, 22–24 May 2005; pp. 84–93. [Google Scholar]
  13. Cao, Z.; Wu, X. An Improvement of the Barrett Modular Reduction Algorithm. Int. J. Computer Mathematics 2014, 91, 1874–1879. [Google Scholar] [CrossRef]
  14. Roy, S.S.; Vercauteren, F.; Verbauwhede, I. High Precision Discrete Gaussian Sampling on FPGAs. In Proceedings of the Selected Areas in Cryptography, Burnaby, BC, Canada, 14–16 August 2013; pp. 383–401. [Google Scholar]
  15. Du, C.; Bai, G. Towards Efficient Discrete Gaussian Sampling for Lattice-Based Cryptography. In Proceedings of the 2015 International Conference on Field Programmable Logic and Applications (FPL), London, UK, 2–4 September 2015; pp. 1–6. [Google Scholar]
  16. Condo, C.; Gross, W.J. Pseudo-Random Gaussian Distribution Through Optimised LFSR Permutations. Electron. Lett. 2015, 51, 2098–2100. [Google Scholar] [CrossRef]
  17. Cooley, J.W.; Tukey, J.W. An Algorithm for the Machine Calculation of Complex Fourier Series. Math. Comput. 1965, 19, 297–301. [Google Scholar] [CrossRef]
  18. Roy, S.S.; Vercauteren, F.; Mentens, N.; Chen, D.D.; Verbauwhede, I. Compact Ring-LWE Cryptoprocessor. In Proceedings of the International Workshop on Cryptographic Hardware and Embedded Systems (CHES), Busan, South Korea, 23–26 September 2014; pp. 371–391. [Google Scholar]
  19. Pöppelmann, T.; Güneysu, T. Towards Practical Lattice-Based Public-Key Encryption on Reconfigurable Hardware. In Proceedings of the International Conference on Selected Areas in Cryptography, Burnaby, BC, Canada, 14–16 August 2013; pp. 68–85. [Google Scholar]
  20. Mahdizadeh, H.; Masoumi, M. Novel Architecture for Efficient FPGA Implementation of Elliptic Curve Cryptographic Processor over GF(2163). IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2013, 21, 2330–2333. [Google Scholar] [CrossRef]
Figure 1. Block diagram of a typical ring-LWE cryptoprocessor [6].
Figure 1. Block diagram of a typical ring-LWE cryptoprocessor [6].
Electronics 09 01075 g001
Figure 2. Timing diagram of the proposed ring-LWE cryptoprocessor architecture.
Figure 2. Timing diagram of the proposed ring-LWE cryptoprocessor architecture.
Electronics 09 01075 g002
Figure 3. Proposed top-level ring-LWE cryptoprocessor architecture.
Figure 3. Proposed top-level ring-LWE cryptoprocessor architecture.
Electronics 09 01075 g003
Figure 4. Proposed ring-LWE cryptoprocessor architecture using shared arithmetic components.
Figure 4. Proposed ring-LWE cryptoprocessor architecture using shared arithmetic components.
Electronics 09 01075 g004
Figure 5. (a) NTT multiplier, and (b) proposed number theoretic transform (NTT) core architecture.
Figure 5. (a) NTT multiplier, and (b) proposed number theoretic transform (NTT) core architecture.
Electronics 09 01075 g005
Figure 6. Proposed PE1 and PE2 architectures for the NTT core.
Figure 6. Proposed PE1 and PE2 architectures for the NTT core.
Electronics 09 01075 g006
Figure 7. (a) comparison in encryption and decryption time, and (b) comparison in system efficiency.
Figure 7. (a) comparison in encryption and decryption time, and (b) comparison in system efficiency.
Electronics 09 01075 g007
Table 1. Performance comparison of different ring-LWE cryptoprocessors.
Table 1. Performance comparison of different ring-LWE cryptoprocessors.
ProposedR8M [5][6][18][19]
DevicesVirtex-7Stratix IVVirtex-7Virtex-6Virtex-6
LUTs (enc./dec.)61,258/–62,994/27,31361,514/251605595/55951536/1536
Slices (enc./dec.)23,707/–56,435/32,01942,374/23,4954760/4760953/953
Frequency (enc./dec.) (MHz)284/330226/216269/315250/251277/276
Clock cycles (enc./dec.)242/235391/225240/22413,769/888313,300/5800
Time (enc./dec.) ( μ s)0.85/0.711.73/1.040.89/0.7154.86/35.3947.90/21.00
Throughput a (enc./dec.) (Mbps)8432/7214465/4928054/720130/14150/24
Efficiency b (Kbps/LUT)137661302395
a Throughput = (Frequency × Number of bits)/Number of clock cycles. b Efficiency = Throughput/Number of LUTs.

Share and Cite

MDPI and ACS Style

Nguyen Tan, T.; Thi Bao Nguyen, T.; Lee, H. High Efficiency Ring-LWE Cryptoprocessor Using Shared Arithmetic Components. Electronics 2020, 9, 1075. https://doi.org/10.3390/electronics9071075

AMA Style

Nguyen Tan T, Thi Bao Nguyen T, Lee H. High Efficiency Ring-LWE Cryptoprocessor Using Shared Arithmetic Components. Electronics. 2020; 9(7):1075. https://doi.org/10.3390/electronics9071075

Chicago/Turabian Style

Nguyen Tan, Tuy, Tram Thi Bao Nguyen, and Hanho Lee. 2020. "High Efficiency Ring-LWE Cryptoprocessor Using Shared Arithmetic Components" Electronics 9, no. 7: 1075. https://doi.org/10.3390/electronics9071075

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop