# Improved Sum of Residues Modular Multiplication Algorithm

## Abstract

**:**

## 1. Introduction

## 2. Background

#### 2.1. Efficient RNS Modular Reduction

#### 2.2. Calculation of $\alpha $

Algorithm 1: Calculation of $\alpha $ |

input: $\{{\gamma}_{1},\dots {\gamma}_{N}\}$ where, ${\gamma}_{i}={\langle {z}_{i}\xb7{{M}_{i}}^{-1}\rangle}_{{m}_{i}}$, $i\in \{1,\dots ,N\}$.input: q, $\mathsf{\Delta}$.output: $\alpha $.$A\leftarrow {2}^{q}\xb7\mathsf{\Delta}$; $\alpha \leftarrow \u230a{\displaystyle \frac{A}{{2}^{q}}}\u230b$; |

## 3. Improved Sum of Residues (SOR) Algorithm

#### 3.1. Calculation of $\kappa $

Algorithm 2: Improved Sum of residues reduction |

Require: p, $\mathsf{\Delta}$, q, $\mathcal{B}=\{{m}_{1},\cdots ,{m}_{N}\}$, ${m}_{1}>{m}_{2}>\cdots >{m}_{N}$, $n=\lceil {log}_{2}{m}_{1}\rceil $, $W=\lceil {log}_{2}p\rceil $, T, $N\ge \lceil \frac{2W}{n}\rceil $ Require: $M={\displaystyle \prod _{i=1}^{N}}{m}_{i}$, $\widehat{M}=(1-\mathsf{\Delta})M$, ${M}_{i}=\frac{M}{{m}_{i}}$ for i = 1 to N Require: pre-computed tables $\left[\begin{array}{c}{\langle {{M}_{1}}^{-1}\rangle}_{{m}_{1}}\\ {\langle {{M}_{2}}^{-1}\rangle}_{{m}_{2}}\\ \vdots \\ {\langle {{M}_{N}}^{-1}\rangle}_{{m}_{N}}\end{array}\right]$, $\left[\begin{array}{c}{\langle -p\rangle}_{{m}_{1}}\\ {\langle -p\rangle}_{{m}_{2}}\\ \vdots \\ {\langle -p\rangle}_{{m}_{N}}\end{array}\right]$, and $\left[\begin{array}{c}\u230a\frac{{\langle {M}_{1}\rangle}_{p}}{{2}^{W-T}}\u230b\\ \vdots \\ \u230a\frac{{\langle {M}_{N}\rangle}_{p}}{{2}^{W-T}}\u230b\end{array}\right]$ Require: pre-computed table $\left[\begin{array}{c}{{\langle \langle {M}_{i}\rangle}_{p}\rangle}_{{m}_{1}}\\ {{\langle \langle {M}_{i}\rangle}_{p}\rangle}_{{m}_{2}}\\ \vdots \\ {{\langle \langle {M}_{i}\rangle}_{p}\rangle}_{{m}_{N}}\end{array}\right]$ for $i=1$ to N.Require: pre-computed table $\left[\begin{array}{c}{{\langle \alpha \xb7\langle -M\rangle}_{p}\rangle}_{{m}_{1}}\\ {{\langle \alpha \xb7\langle -M\rangle}_{p}\rangle}_{{m}_{2}}\\ \vdots \\ {{\langle \alpha \xb7\langle -M\rangle}_{p}\rangle}_{{m}_{N}}\end{array}\right]$ for $\alpha =1$ to $N-1$input: Integers X and Y, $0\le X,Y<\widehat{M}$ in form of RNS: $\{{x}_{1},\cdots ,{x}_{N}\}$ and $\{{y}_{1},\cdots ,{y}_{N}\}$. output: Presentation of $Z=X\xb7Y\phantom{\rule{3.33333pt}{0ex}}mod\phantom{\rule{0.277778em}{0ex}}p$ in RNS: $\{{z}_{1},\cdots ,{z}_{N}\}$. |

## 4. New SOR Algorithm Implementation and Performance

#### 4.1. Comparison

## 5. Conclusions

## Funding

## Conflicts of Interest

## Appendix A

Notation | Description |
---|---|

p | Field modulus. In this work considered as a 256-bit prime ${p}_{S}={2}^{256}-{2}^{32}-977$ or 255-bit prime ${p}_{E}={2}^{255}-19$. |

${m}_{i}$ | RNS channel modulus. ${m}_{i}={2}^{n}-{2}^{{t}_{i}}-1$, ${t}_{i}\in \{0,2,3,4,5,6,8,9\}$. |

n | Bit-length of modulus ${m}_{i}$. ($n=max\lceil lo{g}_{2}{m}_{i}\rceil ,i\in \{1,\dots ,N\}$). |

${n}^{\prime}$ | Is the maximum bit number of $\u230a\frac{{\langle {M}_{i}\rangle}_{p}}{{2}^{W-T}}\u230b,i\in \{1\cdots N\}$. |

$\mathcal{B}$ | set of RNS Moduli: $\mathcal{B}=\{{m}_{1},{m}_{2},\dots ,{m}_{N}\}$. |

N | Number of moduli in $\mathcal{B}$ (size of $\mathcal{B}$). |

B | Is a $2n$-bit integer, product of two RNS channels. |

${B}_{H}$ | Is the n most significant bits of B, i.e., ${B}_{H}=\u230a{\displaystyle \frac{B}{{2}^{n}}}\u230b$. |

${B}_{L}$ | Is the n least significant bits of B, i.e., ${B}_{L}=B\phantom{\rule{3.33333pt}{0ex}}mod\phantom{\rule{0.277778em}{0ex}}{2}^{n}$. |

${B}_{H{H}_{i}}$ | Is the ${t}_{i}$ most significant bits of ${2}^{{t}_{i}}{B}_{H}$, i.e., ${B}_{H{H}_{i}}=\u230a{\displaystyle \frac{{B}_{H}}{{2}^{n-{t}_{i}}}}\u230b$. |

${B}_{H{L}_{i}}$ | Is the n least significant bits of ${2}^{{t}_{i}}{B}_{H}$, i.e., ${B}_{H{L}_{i}}={2}^{{t}_{i}}{B}_{H}\phantom{\rule{3.33333pt}{0ex}}mod\phantom{\rule{0.277778em}{0ex}}{2}^{n}$. |

A | denotes accumulator in Algorithm 1 and Figure A1, Figure A2 and Figure A3. |

X,Y | Integers that meet the condition $0\le X\xb7Y<M$. |

Z | An integer considered as product of X and Y. |

${x}_{i}$ | The residue of integer X in channel ${m}_{i}$ i.e., ${x}_{i}=X\phantom{\rule{3.33333pt}{0ex}}mod\phantom{\rule{0.277778em}{0ex}}{m}_{i}$. |

${\langle Z\rangle}_{p}$ | Mod operation $Z\phantom{\rule{3.33333pt}{0ex}}mod\phantom{\rule{0.277778em}{0ex}}p$. |

$RNS\left(X\right)$ | The RNS function. Returns the RNS representation of integer X. |

$\{{x}_{1},{x}_{2},\dots ,{x}_{N}\}$ | RNS representation of integer X. |

$\left[\begin{array}{c}{x}_{1}\\ {x}_{2}\\ \vdots \\ {x}_{n}\end{array}\right]$ | RNS representation of integer X. |

$({b}_{n-1}{b}_{n-1}\dots {b}_{0})$ | Binary representation of an n-bit integer B. (${b}_{i}\in \{0,1\}$). |

$\left|\right|$ | Bit concatenation operation. |

$\lceil u\rceil $ | The function $ceil\left(u\right)$. |

$\lfloor u\rfloor $ | The function $floor\left(u\right)$. |

W | Bit-length of modulus p, i.e., $W=\lceil lo{g}_{2}p\rceil $. |

M | The dynamic range of RNS moduli. $M={\displaystyle \prod _{i=1}^{N}}{m}_{i}$. |

${M}_{i}$ | Is defined as ${M}_{i}=\frac{M}{{m}_{i}}$. |

$\widehat{M}$ | Is the effective dynamic range. $\widehat{M}=M(1-\mathsf{\Delta})$. |

$\mathsf{\Delta}$ | Correction factor used to calculate $\alpha $. In our design $\mathsf{\Delta}=\frac{1}{{2}^{4}}$. |

$Gi$ | Is: ${\gamma}_{i}={\langle {z}_{i}\xb7{{M}_{i}}^{-1}\rangle}_{{m}_{i}}$, $i\in \{1,\dots ,8\}$. |

$Li$ | Is: $\{\langle {Gi\xb7{\langle {M}_{i}\rangle}_{p}\rangle}_{{m}_{1}},\dots ,\langle {Gi\xb7{\langle {M}_{i}\rangle}_{p}\rangle}_{{m}_{N}}\}$. |

$Ki$ | Is: $\u230a\frac{1}{{2}^{T}}{\gamma}_{i}\lfloor \frac{{\langle {M}_{i}\rangle}_{p}}{{2}^{W-T}}\rfloor \u230b$. |

K | Is the $\kappa $ accumulator. |

$AL$ | Is: $\{{{\langle \alpha \xb7\langle -M\rangle}_{p}\rangle}_{{m}_{1}},\dots ,{{\langle \alpha \xb7\langle -M\rangle}_{p}\rangle}_{{m}_{N}}\}$. |

$KP$ | Is: $\{\kappa \xb7(-{p}_{1}),\dots ,\kappa \xb7(-{p}_{N})\}$. |

## References

- Svobod, A.; Valach, M. Circuit operators. Inf. Process. Mach.
**1957**, 3, 247–297. [Google Scholar] - Garner, H.L. The Residue Number System. In Proceedings of the Western Joint Computer Conference, Francisco, CA, USA, 3–5 March 1959. [Google Scholar]
- Mohan, P.V.A. Residue Number Systems: Theory and Applications; Springer: New York, NY, USA, 2016. [Google Scholar]
- Rivest, R.; Shamir, A.; Adleman, L. A method for obtaining digital signatures and public key cryptosystems. Comm. ACM
**1978**, 21, 120–126. [Google Scholar] [CrossRef] - Bajard, J.C.; Imbert, L. A full RNS implementation of RSA. IEEE Trans. Comput.
**2004**, 53, 769–774. [Google Scholar] [CrossRef] - Fadulilahi, I.R.; Bankas, E.K.; Ansuura, J.B.A.K. Efficient Algorithm for RNS Implementation of RSA. Int. J. Comput. Appl.
**2015**, 127, 975–8887. [Google Scholar] [CrossRef] - Posch, K.C.; Posch, R. Modulo reduction in residue number systems. IEEE Trans. Parallel Distrib. Syst.
**1995**, 6, 449–454. [Google Scholar] [CrossRef] - Montgomery, P. Modular Multiplication Without Trial Division. Math. Comput.
**1985**, 44, 519–521. [Google Scholar] [CrossRef] - Bajard, J.C.; Didier, L.S.; Kornerup, P. An RNS Montgomery modular multiplication algorithm. IEEE Trans. Comput.
**1998**, 47, 766–776. [Google Scholar] [CrossRef] - Shenoy, P.P.; Kumaresan, R. Fast base extension using a redundant modulus in RNS. IEEE Trans. Comput.
**1989**, 38, 292–297. [Google Scholar] [CrossRef] - Bajard, J.C.; Didier, L.S.; Kornerup, P. Modular Multiplication and Base Extensions in Residue Number Systems. In Proceedings of the 15th IEEE Symposium on Computer Arithmetic, Vail, CO, USA, 11–13 June 2001. [Google Scholar]
- Kawamura, S.; Koike, M.; Sano, F.; Shimbo, A. Cox-Rower Architecture for Fast Parallel Montgomery Multiplication. In Proceedings of the International Conference on the Theory and Applications of Cryptographic Techniques, Bruges, Belgium, 14–18 May 2000. [Google Scholar]
- Bajard, J.C.; Merkiche, N. Double Level Montgomery Cox-Rower Architecture, New Bounds. In Proceedings of the 13th Smart Card Research and Advanced Application Conference, Paris, France, 5–7 November 2014. [Google Scholar]
- Bajard, J.C.; Eynard, J.; Merkiche, N. Montgomery reduction within the context of residue number system arithmetic. J. Cryptogr. Eng.
**2018**, 8, 189–200. [Google Scholar] [CrossRef] - Esmaeildoust, M.; Schinianakis, D.; Javashi, H.; Stouraitis, T.; Navi, K. Efficient RNS implementation of elliptic curve point multiplication over GF(p). IEEE Trans. Very Larg. Scale Integr. Syst.
**2012**, 21, 1545–1549. [Google Scholar] [CrossRef] - Guillermin, N. A High Speed Coprocessor for Elliptic Curve Scalar Multiplications over Fp. In Proceedings of the International Workshop on Cryptographic Hardware and Embedded Systems, Santa Barbara, CA, USA, 17–20 August 2010. [Google Scholar]
- Schinianakis, D.; Stouraitis, T. A RNS Montgomery Multiplication Architecture. In Proceedings of the IEEE International Symposium of Circuits and Systems (ISCAS), Rio de Janeiro, Brazil, 15–18 May 2011. [Google Scholar]
- Kawamura, S.; Komano, Y.; Shimizu, H.; Yonemura, T. RNS Montgomery reduction algorithms using quadratic residutory. J. Cryptogr. Eng.
**2018**, 1, 1–19. [Google Scholar] - Phillips, B.; Kong, Y.; Lim, Z. Highly parallel modular multiplication in the residue number system using sum of residues reduction. Appl. Algebra Eng. Commun. Comput.
**2010**, 21, 249–255. [Google Scholar] [CrossRef] - Asif, S.; Kong, Y. Highly Parallel Modular Multiplier for Elliptic Curve Cryptography in Residue Number System. Circuits Syst. Signal Process.
**2017**, 36, 1027–1051. [Google Scholar] [CrossRef] - Standards for Efficient Cryptography SEC2: Recommended Elliptic Curve Domain Parameters. Version 2.0 CERTICOM Corp. 27 January 2010. Available online: https://www.secg.org/sec2-v2.pdf (accessed on 1 May 2019).
- Ed25519: High-Speed High-Security Signatures. Available online: https://ed25519.cr.yp.to/ (accessed on 1 May 2019).
- Bajard, J.C.; Kaihara, M.E.; Plantard, T. Selected RNS bases for modular multiplication. In Proceedings of the 19th IEEE Symposium on Computer Arithmetic, Portland, OR, USA, 8–10 June 2009. [Google Scholar]
- Molahosseini, A.S.; de Sousa, L.S.; Chang, C.H. Embedded Systems Design with Special Arithmetic and Number Systems; Springer: New York, NY, USA, 2017. [Google Scholar]
- Barrett, P. Implementing the Rivest Shamir and Adleman Public Key Encryption Algorithm on a Standard Digital Signal Processor. In Proceedings of the Conference on the Theory and Application of Cryptographic Techniques, Linkoping, Sweden, 20–22 May 1986. [Google Scholar]
- Asif, S.; Hossain, M.S.; Kong, Y.; Abdul, W. A Fully RNS based ECC Processor. Integration
**2018**, 61, 138–149. [Google Scholar] [CrossRef] - Asif, S. High-Speed Low-Power Modular Arithmetic for Elliptic Curve Cryptosystems Based on the Residue Number System. Ph.D. Thesis, Macquarie University, Sydney, Australia, 2016. [Google Scholar]

**Figure 3.**Implementation of ${\langle \kappa \langle -p\rangle \rangle}_{{m}_{i}}$ in architectures SOR_1M_N and SOR_1M_P (Up) and in architecture SOR_2M (Down).

${2}^{66}-1$ | ${2}^{66}-{2}^{2}-1$ | ${2}^{66}-{2}^{3}-1$ | ${2}^{66}-{2}^{4}-1$ |

${2}^{66}-{2}^{5}-1$ | ${2}^{66}-{2}^{6}-1$ | ${2}^{66}-{2}^{8}-1$ | ${2}^{66}-{2}^{9}-1$ |

CURVE | Modulus p | N | n | $\frac{{2}^{\mathit{W}-\mathit{n}}}{\mathit{N}}$ | $\mathit{\u03f5}$ |
---|---|---|---|---|---|

ED25519 | ${2}^{255}-19$ | 8 | 66 | ${2}^{186}$ | 19 |

SECP160K1 | ${2}^{160}-{2}^{32}-21389$ | 5 | 66 | $\frac{{2}^{94}}{5}$ | ${2}^{32}+21389$ |

SECP160R1 | ${2}^{160}-{2}^{32}-1$ | 5 | 66 | $\frac{{2}^{94}}{5}$ | ${2}^{32}+1$ |

SECP192K1 | ${2}^{192}-{2}^{32}-4553$ | 6 | 66 | $\frac{{2}^{125}}{3}$ | ${2}^{32}+4553$ |

SECP192R1 | ${2}^{192}-{2}^{64}-1$ | 6 | 66 | $\frac{{2}^{125}}{3}$ | ${2}^{64}+1$ |

SECP224K1 | ${2}^{224}-{2}^{32}-6803$ | 7 | 66 | $\frac{{2}^{158}}{7}$ | ${2}^{32}+6803$ |

SECP224R1 | ${2}^{224}-{2}^{96}+1$ | 7 | 66 | $\frac{{2}^{158}}{7}$ | ${2}^{96}-1$ |

SECP256K1 | ${2}^{256}-{2}^{32}-977$ | 8 | 66 | ${2}^{187}$ | ${2}^{32}+977$ |

SECP2384R1 | ${2}^{384}-{2}^{128}-{2}^{96}+{2}^{31}-1$ | 12 | 66 | $\frac{{2}^{316}}{3}$ | ${2}^{128}+{2}^{96}-{2}^{31}+1$ |

SECP521R1 | ${2}^{521}-1$ | 16 | 66 | ${2}^{451}$ | 1 |

Unit | Device | Max. Logic Delay (ns) | Max. Net Delay (ns) | Max Achieved Freq. on Core MHz |
---|---|---|---|---|

RNS Multiplier | ARTIX 7 | 16.206 | 5.112 | 109.00 |

RNS Adder | ARTIX 7 | 6.017 | 2.303 | 109.00 |

RNS Multiplier | VIRTEX 7 | 11.525 | 3.793 | 125.00 |

RNS Adder | VIRTEX 7 | 3.931 | 1.469 | 125.00 |

RNS Multiplier | VIRTEX UltraScale+ | 5.910 | 4.099 | 185.18 |

RNS Adder | VIRTEX UltraScale+ | 2.139 | 2.454 | 185.18 |

RNS Multiplier | KINTEX 7 | 11.964 | 4.711 | 116.27 |

RNS Adder | KINTEX 7 | 4.613 | 1.599 | 116.27 |

RNS Multiplier | KINTEX UltraScale+ | 5.789 | 4.099 | 187.13 |

RNS Adder | KINTEX UltraScale+ | 2.018 | 2.454 | 187.13 |

Architecture | Platform FPGA | Clk Frequency (MHz) | Latency (ns) | Area (KLUTs),(FFs),(DSPs) | Throughput (Mbps) |
---|---|---|---|---|---|

SOR_1M_N | ARTIX 7 | 92.5 | 335 | (8.17),(3758),(140) | 1671 |

SOR_1M_N | VIRTEX 7 | 128.8 | 241 | (8.17),(3758),(140) | 2323 |

SOR_1M_N | KINTEX 7 | 117.67 | 263 | (8.29),(3758),(140) | 2129 |

SOR_1M_N | VIRTEX US+ ^{1} | 192 | 157 | (8.14),(3758),(140) | 3567 |

SOR_1M_N | KINTEX US+ | 198 | 156.5 | (8.29),(3758),(140) | 3578 |

SOR_1M_P | ARTIX 7 | 92.5 | 259.5 | (8.73),(4279),(140) | 2158 |

SOR_1M_P | VIRTEX 7 | 138.8 | 173 | (8.73),(4279),(140) | 3237 |

SOR_1M_P | KINTEX 7 | 117.6 | 204 | (8.89),(4279),(140) | 2745 |

SOR_1M_P | VIRTEX US+ | 185.18 | 130 | (8.71),(4279),(140) | 4307 |

SOR_1M_P | KINTEX US+ | 187.13 | 128.3 | (8.89),(4279),(140) | 4364 |

SOR_2M | ARTIX 7 | 92.5 | 194.6 | (10.11),(4797),(280) | 2877 |

SOR_2M | VIRTEX 7 | 128.5 | 140 | (10.11),(4797),(280) | 3998 |

SOR_2M | KINTEX 7 | 121.9 | 147.6 | (10.27),(4797),(280) | 3794 |

SOR_2M | VIRTEX US+ | 185.18 | 97.3 | (10.11),(4797),(280) | 5761 |

SOR_2M | KINTEX US+ | 187.13 | 96.3 | (10.26),(4797),(280) | 5821 |

^{1}US+: Ultra Scale+ ™.

Design | Platform | Clk Frequency (MHz) | Latency (ns) | Area (KLUT),(DSP) | Throughput (Mbps) |
---|---|---|---|---|---|

MM_PA_P [20] | VIRTEX 6 | 71.40 | 14.20 | (36.5),(2016) ^{1} | 14798 |

MM_PA_N [20] | VIRTEX 6 | 21.16 | 47.25 | (34.34),(2016) ^{1} | 5120 |

MM_PA_P [27] | VIRTEX 7 | 62.11 | 48.3 | (29.17),(2799) | 15900 |

MM_SPA [27] | VIRTEX 7 | 54.34 | 239.2 | (11.43),(512) | 1391 |

(Ours) SOR_1M_P | VIRTEX 7 | 138.8 | 173 | (8.73),(140) | 3237 |

(Ours) SOR_2M | VIRTEX 7 | 128.5 | 140 | (10.11),(280) | 3998 |

sQ-RNS ^{2} | KINTEX US+ | 139.5 | 107.53(150.53) | (4.247),(84) | 4835 ^{2} |

dQ-RNS [18] | KINTEX US+ | 142.7 | 126.14(168.18) ^{2} | (4.076),(84) | 4122 ^{2} |

(Ours) SOR_1M_P | KINTEX US+ | 187.13 | 128.3 | (8.89),(140) | 4364 |

(Ours) SOR_2M | KINTEX US+ | 187.13 | 96.3 | (10.26),(280) | 5821 |

^{1}Area reported in [27];

^{2}Our estimation.

© 2019 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Mehrabi, M.A.
Improved Sum of Residues Modular Multiplication Algorithm. *Cryptography* **2019**, *3*, 14.
https://doi.org/10.3390/cryptography3020014

**AMA Style**

Mehrabi MA.
Improved Sum of Residues Modular Multiplication Algorithm. *Cryptography*. 2019; 3(2):14.
https://doi.org/10.3390/cryptography3020014

**Chicago/Turabian Style**

Mehrabi, Mohamad Ali.
2019. "Improved Sum of Residues Modular Multiplication Algorithm" *Cryptography* 3, no. 2: 14.
https://doi.org/10.3390/cryptography3020014