# Efficient-Scheduling Parallel Multiplier-Based Ring-LWE Cryptoprocessors

## Abstract

## 1. Introduction

## 2. Ring-LWE Cryptography

#### 2.1. Operations in Ring-LWE Cryptography

#### 2.1.1. Key Generation

#### 2.1.2. Encryption

#### 2.1.3. Decryption

#### 2.2. Arithmetic Operations over a Ring

Algorithm 1: Number theoretic transform (NTT)-based polynomial multiplication using negative wrapped convolution. |

#### 2.3. Discrete Gaussian Sampler

#### 2.4. Ring-LWE Encryption and Decryption Algorithm

Algorithm 2: Ring-learning with errors (LWE) encryption algorithm. |

Algorithm 3: Ring-LWE decryption algorithm. |

## 3. Proposed Ring-LWE Cryptoprocessor Architectures

#### 3.1. Proposed Ring-LWE Encryption and Decryption Architectures

#### 3.2. Proposed NTT Multiplier Architecture

## 4. Simulation Results and Comparison

## 5. Conclusions

## Author Contributions

## Funding

## Conflicts of Interest

**Figure 3.**Proposed efficient-scheduling parallel multiplier-based ring-LWE cryptoprocessor architecture.

**Figure 5.**(

**a**) Data flow of the proposed number theoretic transform (NTT) multiplier, (

**b**) Proposed radix-8 multiple delay feedback (MDF)-architecture based NTT multiplier, and (

**c**) Data structure of the proposed radix-8 MDF-architecture based multiplier.

**Table 1.**Implementation results and performance comparison of the proposed ring-learning with errors (LWE) cryptoprocessors.

Proposed Radix-2S | R2S [7] | Proposed Radix-2M | R2M [7] | Proposed Radix-8M | R8M [7] | [13] | [16] | |
---|---|---|---|---|---|---|---|---|

Devices | Virtex-7 | Stratix IV | Virtex-7 | Stratix IV | Virtex-7 | Stratix IV | Virtex-6 | Virtex-6 |

LUTs (enc.) | 23,015 | 28,977 | 29,802 | 31,890 | 61,154 | 62,994 | 5,595 | 1,536 |

LUTs (dec.) | 6,623 | 6,761 | 7,252 | 7,272 | 25,160 | 27,313 | – | – |

Slices (enc.) | 13,588 | 29,290 | 18,933 | 31,540 | 42,374 | 56,435 | 4,760 | 953 |

Slices (dec.) | 6,354 | 7,616 | 7,657 | 8,641 | 23,495 | 32,019 | – | – |

Cycles (enc.) | 1,832 | 2,207 | 651 | 1,194 | 240 | 391 | 13,769 | 13,300 |

Cycles (dec.) | 1,754 | 1,145 | 612 | 644 | 224 | 225 | 8,883 | 5,800 |

Time (enc.) ($\mathsf{\mu}$s) | 4.58 | 9.33 | 1.97 | 5.16 | 0.89 | 1.73 | 54.86 | 47.90 |

Time (dec.) ($\mathsf{\mu}$s) | 4.35 | 4.59 | 1.82 | 2.78 | 0.71 | 1.04 | 35.39 | 21.00 |

Thr. (enc.) ${}^{a}$ (Mbps) | 1,565.07 | 824.03 | 3,638.58 | 1,491.26 | 8,053.93 | 4,465.12 | 130.66 | 149.64 |

Thr. (dec.) ${}^{a}$ (Mbps) | 117.70 | 111.55 | 281.32 | 184.17 | 721.13 | 492.31 | 14.47 | 24.38 |

Eff. (enc.) ${}^{b}$ (Kbps/LUT) | 68.00 | 28.41 | 122.09 | 46.69 | 131.70 | 70.88 | 23.35 | 97.42 |

Eff. (dec.) ${}^{b}$ (Kbps/LUT) | 17.77 | 16.50 | 38.79 | 25.33 | 28.66 | 18.02 | 2.29 | 15.87 |

^{a}Throughput (Thr.) = (Working frequency × No. of bits)/No. of clock cycles.

^{b}Efficiency (Eff.) = Throughput/No. of LUTs.

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

