Review of Modular Multiplication Algorithms over Prime Fields for Public-Key Cryptosystems
Abstract
1. Introduction
- (1)
- The commonly utilized multiplication algorithms are systematically analyzed and classified into four categories according to their implementation principles. Furthermore, the advantages and disadvantages of each multiplication algorithm are summarized.
- (2)
- A summary of modular reduction algorithms for modular multiplication algorithms is provided, with two categories based on different modulus forms and a focus on modular reduction algorithms for specific prime numbers. Furthermore, the advantages and disadvantages of each modular reduction algorithm are discussed.
- (3)
- Four types of modular multiplication algorithms are analyzed and reviewed, outlining the research progress for each algorithm. Furthermore, the hardware implementations of modular multiplication algorithms are analyzed and compared to provide some guidance for the analysis, design, and future research of modular multiplication algorithms.
2. Multiplication Algorithms
2.1. Basic Algorithms
2.1.1. Schoolbook Multiplication
2.1.2. Comba Multiplication
Algorithm 1. Comba multiplication |
Input: , Output: |
1: for to do 2: if then 3: 4: else 5: 6: end if 7: end for 8: return |
2.2. Algorithms Based on Divide and Conquer
2.2.1. Karatsuba Multiplication
Algorithm 2. Karatsuba multiplication |
Input: ,, with , Output: |
1: 2: 3: 4: 5: 6: 7: return |
2.2.2. Toom–Cook Multiplication
2.2.3. NTT Multiplication
Algorithm 3. NTT-based polynomial multiplication |
Input: ,, with , Output: |
1: , 2: , 3: 4: return |
2.3. Partial Product Optimization Techniques
2.3.1. Booth Encoding
2.3.2. RSD Representation
2.4. Comparative Analysis of Multiplication Algorithms
3. Modular Reduction Algorithms
3.1. Modular Reduction for General Moduli
3.1.1. Barrett Reduction
Algorithm 4. Barrett reduction |
Input: ,,, Output: |
1: 2: 3: if then 4: 5: end if 6: while do 7: return |
3.1.2. Montgomery Reduction
Algorithm 5. Montgomery reduction |
Input: , Output: |
1: 2: 3: if then 4: 5: end if return |
3.2. Modular Reduction for Specific Prime Numbers
3.2.1. Fast Modular Reduction for Mersenne Primes
Algorithm 6. Fast modular reduction for |
Input: Output: |
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: return |
3.2.2. Modular Reduction in NTT Multiplication
Algorithm 7. K-RED reduction |
Input: Output: |
1: 2: 3: return |
3.3. Comparative Analysis of Modular Reduction Algorithms
4. Modular Multiplication Algorithms
4.1. Blakley-Based Interleaved Modular Multiplication
Algorithm 8. Interleaved modular multiplication |
Input: Output: |
1: 2: for down to 0 do 3: 4: 5: 6: if then 7: 8: else if then 9: 10: else 11: end if 12: 13: end for return |
4.2. Barrett Modular Multiplication
Algorithm 9. Barrett modular multiplication |
Input: Output: |
1: 2: 3: 4: 7: if then 8: 9: else if then 10: 11: else 12: end if return |
4.3. Montgomery Modular Multiplication
Algorithm 10. Radix-2 Montgomery modular multiplication |
Input: Output: |
1: 2: for to do 3: 4: 5: end for 6: return |
Algorithm 11. Multi-word radix-2 Montgomery modular multiplication |
Input: Output: , |
1: 2: for to do 3: 4: 5: for to do 6: 7: 8: 9: end for 10: end for return |
Algorithm 12. High-radix Montgomery modular multiplication |
Input: Output: , |
1: 2: for to do 3: 4: 5: for to do 6: 7: 8: end for 9: 10: end for return |
4.4. Fast Modular Multiplication for Specific Prime Numbers
4.5. Comparative Analysis of Modular Multiplication
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- van Deursen, A. Learning from Apple’s# Gotofail Security Bug. Available online: https://avandeursen.com/2014/02/22/gotofail-security/ (accessed on 11 June 2025).
- Nemec, M.; Sys, M.; Svenda, P.; Klinec, D.; Matyas, V. The return of coppersmith’s attack: Practical factorization of widely used RSA moduli. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, New York, NY, USA, 30 October–3 November 2017; pp. 1631–1648. [Google Scholar]
- Saiyed, A.I. Hybrid Quantum-Classical Cryptographic Protocols: Enhancing Security in the Era of Quantum Supremacy. Spectr. Res. 2025, 5. [Google Scholar]
- Comba, P.G. Exponentiation cryptosystems on the IBM PC. IBM Syst. J. 1990, 29, 526–538. [Google Scholar] [CrossRef]
- Karatsuba, A.A.; Ofman, Y.P. Multiplication of many-digital numbers by automatic computers. In Proceedings of the Doklady Akademii Nauk, Moscow, Russia, 18 March 1962; Russian Academy of Sciences: Moscow, Russia, 1962; pp. 293–294. [Google Scholar]
- Toom, A.L. The complexity of a scheme of functional elements simulating the multiplication of integers. In Proceedings of the Doklady Akademii Nauk, Moscow, Russia, 18 March 1963; Russian Academy of Sciences: Moscow, Russia, 1963; pp. 496–498. [Google Scholar]
- Cook, S.A.; Aanderaa, S.O. On the minimum computation time of functions. Trans. Am. Math. Soc. 1969, 142, 291–314. [Google Scholar] [CrossRef]
- Cooley, W.; Tukey, W. An algorithm for the machine calculation of complex Fourier series. Math. Comput. 1965, 19, 297–301. [Google Scholar] [CrossRef]
- Booth, A.D. A signed binary multiplication technique. Q. J. Mech. Appl. Math. 1951, 4, 236–240. [Google Scholar] [CrossRef]
- Avizienis, A. Signed-digit numbe representations for fast parallel arithmetic. IRE Trans. Electron. Comput. 1961, EC-10, 389–400. [Google Scholar] [CrossRef]
- Barrett, P. Implementing the Rivest Shamir and Adleman public key encryption algorithm on a standard digital signal processor. In Proceedings of the Conference on the Theory and Application of Cryptographic Techniques, Berlin/Heidelberg, Germany, August 1986; Springer: Berlin/Heidelberg, Germany, 1986; pp. 311–323. [Google Scholar] [CrossRef]
- Montgomery, P.L. Modular multiplication without trial division. Math. Comput. 1985, 44, 519–521. [Google Scholar] [CrossRef]
- Ananyi, K.; Alrimeih, H.; Rakhmatov, D. Flexible hardware processor for elliptic curve cryptography over NIST prime fields. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2009, 17, 1099–1112. [Google Scholar] [CrossRef]
- Blakely, G.R. A computer algorithm for calculating the product AB modulo M. IEEE Trans. Comput. 1983, 100, 497–500. [Google Scholar] [CrossRef]
- Zoni, D.; Galimberti, A.; Fornaciari, W. Flexible and scalable FPGA-oriented design of multipliers for large binary polynomials. IEEE Access 2020, 8, 75809–75821. [Google Scholar] [CrossRef]
- Kang, B.; Cho, H. Flexka: A flexible karatsuba multiplier hardware architecture for variable-sized large integers. IEEE Access 2023, 11, 55212–55222. [Google Scholar] [CrossRef]
- Rafferty, C.; O’Neill, M.; Hanley, N. Evaluation of large integer multiplication methods on hardware. IEEE Trans. Comput. 2017, 66, 1369–1382. [Google Scholar] [CrossRef]
- Weimerskirch, A.; Paar, C. Generalizations of the Karatsuba algorithm for efficient implementations. Cryptol. Eprint Arch. 2006. Available online: http://eprint.iacr.org/2006/224 (accessed on 11 June 2025).
- Wong, Z.Y.; Wong, D.C.K.; Lee, W.K.; Mok, K.M.; Yap, W.S.; Khalid, A. KaratSaber: New speed records for saber polynomial multiplication using efficient Karatsuba FPGA architecture. IEEE Trans. Comput. 2023, 72, 1830–1842. [Google Scholar] [CrossRef]
- Heidarpur, M.; Mirhassani, M. An efficient and high-speed overlap-free Karatsuba-based finite-field multiplier for FGPA implementation. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2021, 29, 667–676. [Google Scholar] [CrossRef]
- Bodrato, M. Towards optimal Toom-Cook multiplication for univariate and multivariate polynomials in characteristic 2 and 0. In Proceedings of the Arithmetic of Finite Fields: First International Workshop, Madrid, Spain, 21–22 June 2007; Springer: Berlin/Heidelberg, Germany, 2007; pp. 116–133. [Google Scholar]
- Bermudo Mera, M.; Karmakar, A.; Verbauwhede, I. Time-memory trade-off in Toom-Cook multiplication: An application to module-lattice based cryptography. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2020, 2020, 222–244. [Google Scholar] [CrossRef]
- Wang, J.; Yang, C.; Zhang, F.; Meng, Y.; Xiang, S.; Su, Y. A high-throughput Toom-Cook-4 polynomial multiplier for lattice-based cryptography using a novel winograd-schoolbook algorithm. IEEE Trans. Circuits Syst. I Regul. Pap. 2023, 71, 359–372. [Google Scholar] [CrossRef]
- Li, Y.; Zhu, J.; Huang, Y.; Liu, Z.; Tang, M. Single-trace side-channel attacks on the toom-cook: The case study of saber. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2022, 2022, 285–310. [Google Scholar] [CrossRef]
- Zhou, X.; Chen, X.; He, Y.; Mou, X. A flexible-channel MDF architecture for pipelined radix-2 FFT. IEEE Access 2023, 11, 38023–38033. [Google Scholar] [CrossRef]
- Yang, C.; Wu Xiang, S.; Liang, L.; Geng, L. A high-throughput and flexible architecture based on a reconfigurable mixed-radix FFT with twiddle factor compression and conflict-free access. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2023, 31, 1472–1485. [Google Scholar] [CrossRef]
- Liu, M.; Zhao, P.; Wu, T.; Parhi, K.K.; Zeng, X.; Chen, Y. A low-power twiddle factor addressing architecture for split-radix FFT processor. Microelectron. J. 2021, 117, 105276. [Google Scholar] [CrossRef]
- Zhang, N.; Yang, B.; Chen, C.; Yin, S.; Wei, S.; Liu, L. Highly efficient architecture of NewHope-NIST on FPGA using low-complexity NTT/INTT. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2020, 2020, 49–72. [Google Scholar] [CrossRef]
- Xing, Y.; Li, S. A compact hardware implementation of CCA-secure key exchange mechanism CRYSTALS-KYBER on FPGA. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2021, 2021, 328–356. [Google Scholar] [CrossRef]
- Guo, W.; Li, S. Split-radix based compact hardware architecture for CRYSTALS-Kyber. IEEE Trans. Comput. 2023, 73, 97–108. [Google Scholar] [CrossRef]
- Li, D.; Pakala, A.; Yang, K. MeNTT: A compact and efficient processing-in-memory number theoretic transform (NTT) accelerator. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2022, 30, 579–588. [Google Scholar] [CrossRef]
- Mu, J.; Ren, Y.; Wang, W.; Hu, Y.; Chen, S.; Chang, C.-H.; Fan, J.; Ye, J.; Cao, Y.; Li, H.; et al. Scalable and conflict-free NTT hardware accelerator design: Methodology, proof, and implementation. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 2022, 42, 1504–1517. [Google Scholar] [CrossRef]
- Chen, X.; Yang, B.; Yin, S.; Wei, S.; Liu, L. CFNTT: Scalable radix-2/4 NTT multiplication architecture with an efficient conflict-free memory mapping scheme. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2022, 2022, 94–126. [Google Scholar] [CrossRef]
- Su, Y.; Yang, B.L.; Yang, C.; Yang, Z.P.; Liu, Y.W. A highly unified reconfigurable multicore architecture to speed up NTT/INTT for homomorphic polynomial multiplication. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2022, 30, 993–1006. [Google Scholar] [CrossRef]
- Liu, S.H.; Kuo, C.Y.; Mo, Y.N.; Su, T. An area-efficient, conflict-free, and configurable architecture for accelerating NTT/INTT. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2023, 32, 519–529. [Google Scholar] [CrossRef]
- RamaLakshmi, B.V.; Noorbasha, F. FPGA Implementation of Optimized Radix 4 and Radix 8 Booth Algorithm. Int. J. Perform. Eng. 2021, 17, 552. [Google Scholar] [CrossRef]
- Zhu, D.; Zhang, R.; Ou, L.; Tian Wang, Z. Low-latency design and implementation of the squaring in class groups for verifiable delay function using redundant representation. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2023, 2023, 438–462. [Google Scholar] [CrossRef]
- Bernstein, D.J. Curve25519: New Diffie-Hellman speed records. In Proceedings of the Public Key Cryptography-PKC 2006: 9th International Conference on Theory and Practice in Public-Key Cryptography, New York, NY, USA, 24–26 April 2006; Springer: Berlin/Heidelberg, Germany, 2006; pp. 207–228. [Google Scholar]
- Huang, H.; Liu, Z. Design and implementation of high-speed scalar multiplier for multi-elliptic curve. J. Commun. 2020, 41, 100–109. [Google Scholar]
- Choi, P.; Lee, M.K.; Kim, H.; Kim, D.K. Low-complexity elliptic curve cryptography processor based on configurable partial modular reduction over NIST prime fields. IEEE Trans. Circuits Syst. II Express Briefs 2017, 65, 1703–1707. [Google Scholar] [CrossRef]
- Choi, P.; Lee, M.K.; Kim, D.K. ECC coprocessor over a NIST prime field using fast partial Montgomery reduction. IEEE Trans. Circuits Syst. I Regul. Pap. 2020, 68, 1206–1216. [Google Scholar] [CrossRef]
- Yaman, F.; Mert, A.C.; Öztürk, E.; Savaş, E. A hardware accelerator for polynomial multiplication operation of CRYSTALS-KYBER PQC scheme. In Proceedings of the 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE), Grenoble, France, 1–5 February 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1020–1025. [Google Scholar] [CrossRef]
- Aikata, A.; Mert, A.C.; Imran, M.; Pagliarini, S.; Roy, S.S. KaLi: A crystal for post-quantum security using Kyber and Dilithium. IEEE Trans. Circuits Syst. I Regul. Pap. 2022, 70, 747–758. [Google Scholar] [CrossRef]
- Longa, P.; Naehrig, M. Speeding up the number theoretic transform for faster ideal lattice-based cryptography. In Proceedings of the International Conference on Cryptology and Network Security, Milan, Italy, 14–16 November 2016; Springer International Publishing: Cham, Switzerland, 2016; pp. 124–139. [Google Scholar]
- Bisheh-Niasar, M.; Azarderakhsh, R.; Mozaffari-Kermani, M. High-speed NTT-based polynomial multiplication accelerator for post-quantum cryptography. In Proceedings of the 2021 IEEE 28th Symposium on Computer Arithmetic (ARITH), Lyngby, Denmark, 14–16 June 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 94–101. [Google Scholar] [CrossRef]
- Li, L.; Tian, Q.; Qin, G.; Chen, S.; Wang, W. Compact Instruction Set Extensions for Dilithium. ACM Trans. Embed. Comput. Syst. 2024, 23, 1–21. [Google Scholar] [CrossRef]
- Hossain, M.S.; Kong, Y.; Saeedi, E.; Vayalil, N.C. High-performance elliptic curve cryptography processor over NIST prime fields. IET Comput. Digit. Tech. 2017, 11, 33–42. [Google Scholar] [CrossRef]
- Islam, M.M.; Hossain, M.S.; Shahjalal, M.D.; Hasan, M.K.; Jang, Y.M. Area-time efficient hardware implementation of modular multiplication for elliptic curve cryptography. IEEE Access 2020, 8, 73898–73906. [Google Scholar] [CrossRef]
- Javeed, K.; El-Moursy, A.; Gregg, D. EC-crypto: Highly efficient area-delay optimized elliptic curve cryptography processor. IEEE Access 2023, 11, 56649–56662. [Google Scholar] [CrossRef]
- Rahman, M.S.; Halder, K.K. Area-Time Effective Modular Multiplication for Elliptic Curve Cryptography. In Proceedings of the 2023 International Conference on Next-Generation Computing, IoT and Machine Learning (NCIM), Gazipur, Bangladesh, 16–17 June 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–5. [Google Scholar] [CrossRef]
- Kudithi, T.; Potdar, M.; Sakthivel, R. Radix-4 interleaved modular multiplication for cryptographic applications. In Proceedings of the 2019 International Conference on Vision Towards Emerging Trends in Communication and Networking (ViTECoN), Vellore, India, 30–31 March 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–5. [Google Scholar] [CrossRef]
- Lin, L.; Zheng, P.Y.; Chao, P.C.P. A new ECC implemented by FPGA with favorable combined performance of speed and area for lightweight IoT edge devices. Microsyst. Technol. 2024, 30, 1537–1546. [Google Scholar] [CrossRef]
- Madani, B.; Azzaz, M.S.; Sadoudi, S.; Kaibou, R. High-Speed FPGA Implementation of Modular Multiplication Over Prime Field. In Proceedings of the 2024 1st International Conference on Electrical, Computer, Telecommunication and Energy Technologies (ECTE-Tech), Oum El Bouaghi, Algeria, 17–18 December 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–6. [Google Scholar] [CrossRef]
- Dhem, F.; Quisquater, J. Recent results on modular multiplications for smart cards. In Proceedings of the International Conference on Smart Card Research and Advanced Applications, Amsterdam, The Netherlands, 16–18 September 1998; Springer: Berlin/Heidelberg, Germany, 1998; pp. 336–352. [Google Scholar] [CrossRef]
- Hao, Y.; Wang, W.; Dang, H.; Wang, G. Efficient barrett modular multiplication based on toom-cook multiplication. IEEE Trans. Circuits Syst. II Express Briefs 2023, 71, 862–866. [Google Scholar] [CrossRef]
- Yu, B.; Huang, H.; Liu, Z.; Zhao, S.; Na, N. High-performance hardware architecture design and implementation of Ed25519 algorithm. J. Electron. Inf. Technol. 2021, 43, 1821–1827. [Google Scholar] [CrossRef]
- Agrawal, R.; Yang Javaid, H. Efficient FPGA-based ECDSA verification engine for permissioned blockchains. In Proceedings of the 2022 IEEE 33rd International Conference on Application-specific Systems, Architectures and Processors (ASAP), Gothenburg, Sweden, 12–14 July 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 148–155. [Google Scholar] [CrossRef]
- Zhang, B.; Cheng, Z.; Pedram, M. A high-performance low-power Barrett modular multiplier for cryptosystems. In Proceedings of the 2021 IEEE/ACM Int. Symposium on Low Power Electronics and Design (ISLPED), Boston, MA, USA, 26–28 July 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1–6. [Google Scholar] [CrossRef]
- Zhang, B.; Cheng, Z.; Pedram, M. Design of a high-performance iterative Barrett modular multiplier for crypto systems. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2024, 32, 897–910. [Google Scholar] [CrossRef]
- Zhang, Q.; He, W.; Yang, R. Efficient configurable modular multiplier for rns. In Proceedings of the 2023 8th International Conference on Integrated Circuits and Microsystems (ICICM), Nanjing, China, 20–23 October 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 224–228. [Google Scholar] [CrossRef]
- Zhang, B.; Yan, S. Area-efficient Barrett modular multiplication with optimized Karatsuba algorithm. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 2024, 43, 4626–4639. [Google Scholar] [CrossRef]
- Xu, T.; Cui, Y.; Liu, D.; Wang, C.; Liu, W. Lightweight and efficient hardware implementation for saber using NTT multiplication. In Proceedings of the 2022 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS), Shenzhen, China, 11–13 November 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 601–605. [Google Scholar] [CrossRef]
- Guo, W.; Li, S. Highly-efficient hardware architecture for CRYSTALS-Kyber with a novel conflict-free memory access pattern. IEEE Trans. Circuits Syst. I Regul. Pap. 2023, 70, 4505–4515. [Google Scholar] [CrossRef]
- Koc, C.K.; Acar, T.; Kaliski, B.S. Analyzing and comparing Montgomery multiplication algorithms. IEEE Micro 1996, 16, 26–33. [Google Scholar] [CrossRef]
- Mrabet, A.; El-Mrabet, N.; Lashermes, R.; Rigaud, B.; Bouallegue, B.; Mesnager, S.; Machhout, M. A scalable and systolic architectures of montgomery modular multiplication for public key cryptosystems based on dsps. J. Hardw. Syst. Secur. 2017, 1, 219–236. [Google Scholar] [CrossRef]
- Gallin, G.; Tisserand, A. Generation of finely-pipelined GF (P) multipliers for flexible curve based cryptography on FPGAs. IEEE Trans. Comput. 2019, 68, 1612–1622. [Google Scholar] [CrossRef]
- Botrel, G.; El Housni, Y. Faster montgomery multiplication and multi-scalar-multiplication for snarks. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2023, 2023, 504–521. [Google Scholar] [CrossRef]
- Buhrow, B.; Gilbert, B.; Haider, C. Parallel modular multiplication using 512-bit advanced vector instructions: RSA fault-injection countermeasure via interleaved parallel multiplication. J. Cryptogr. Eng. 2022, 12, 95–105. [Google Scholar] [CrossRef]
- Walter, C.D. Montgomery exponentiation needs no final subtractions. Electron. Lett. 1999, 35, 1831–1832. [Google Scholar] [CrossRef]
- Kuang, S.R.; Wu, K.Y.; Lu, R.Y. Low-cost high-performance VLSI architecture for Montgomery modular multiplication. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2015, 24, 434–443. [Google Scholar] [CrossRef]
- Coliban, R.M. Fast Radix-2 Montgomery modular multiplication on FPGA using ternary adder. In Proceedings of the 2022 International Conference on Computing, Electronics & Communications Engineering (iCCECE), Southend, UK, 17–18 August 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–5. [Google Scholar] [CrossRef]
- Abirami, T.; Saravanan, S.; Rajeshkumar, A.; Santhosh, K.M. FPGA–based Optimized Design of Montgomery Modular Multiplier using Karatsuba Algorithm. In Proceedings of the 2023 Second International Conference on Electronics and Renewable Systems (ICEARS), Tuticorin, India, 2–4 March 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 131–135. [Google Scholar] [CrossRef]
- Tenca, A.F.; Koç, Ç.K. A scalable architecture for modular multiplication based on Montgomery’s algorithm. IEEE Trans. Comput. 2003, 52, 1215–1221. [Google Scholar] [CrossRef]
- Li, H.; Ren, S.; Wang, W.; Zhang Wang, X. A low-cost high-performance montgomery modular multiplier based on pipeline interleaving for iot devices. Electronics 2023, 12, 3241. [Google Scholar] [CrossRef]
- Mert, A.C.; Karabulut, E.; Öztürk, E.; Savaş, E.; Becchi, M.; Aysu, A. A flexible and scalable NTT hardware: Applications from homomorphically encrypted deep learning to post-quantum cryptography. In Proceedings of the 2020 Design, Automation & Test in Europe Conference & Exhibition (DATE), Grenoble, France, 9–13 March 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 346–351. [Google Scholar] [CrossRef]
- Satoh, A.; Takano, K. A scalable dual-field elliptic curve cryptographic processor. IEEE Trans. Comput. 2003, 52, 449–460. [Google Scholar] [CrossRef]
- Wu, T. Radix-16 CSA-based low-latency non-Montgomery modular multiplier. J. Eng. 2022, 2022, 244–248. [Google Scholar] [CrossRef]
- Xiao, H.; Yu, S.; Cheng, B.; Liu, G. FPGA-based high-throughput Montgomery modular multipliers for RSA cryptosystems. IEICE Electron. Express 2022, 19, 20220101. [Google Scholar] [CrossRef]
- Kolagatla, V.R.; Desalphine, V.; Selvakumar, D. Area-time scalable high radix Montgomery modular multiplier for large modulus. In Proceedings of the 2021 25th International Symposium on VLSI Design and Test (VDAT), Surat, India, 16–18 September 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1–4. [Google Scholar] [CrossRef]
- Zhang, B.; Cheng, Z.; Pedram, M. High-radix design of a scalable montgomery modular multiplier with low latency. IEEE Trans. Comput. 2021, 71, 436–449. [Google Scholar] [CrossRef]
- Wu, R.; Xu, M.; Yang, Y.; Tian, G.; Yu, P.; Zhao, Y.; Lian, B.; Ma, L. Efficient high-radix GF (p) montgomery modular multiplication via deep use of multipliers. IEEE Trans. Circuits Syst. II Express Briefs 2022, 69, 5099–5103. [Google Scholar] [CrossRef]
- Liu, Z.; Liu, L.; Huang, H.; Zhang, Q.; Yu, B.; Zhao, S.; Cui, J. Multi-curve-oriented general high-performance ECC processor design. Acta Electonica Sin. 2023, 51, 1562–1571. [Google Scholar] [CrossRef]
- Li, B.; Wang, J.; Ding, G.; Fu, H.; Lei, B.; Yang, H.; Bi, J.; Lei, S. A high-performance and low-cost montgomery modular multiplication based on redundant binary representation. IEEE Trans. Circuits Syst. II Express Briefs 2021, 68, 2660–2664. [Google Scholar] [CrossRef]
- Zhang, Z.; Zhang, P. A scalable montgomery modular multiplication architecture with low area-time product based on redundant binary representation. Electronics 2022, 11, 3712. [Google Scholar] [CrossRef]
- Zhang, S.; Li, S. An implementation of montgomery modular multiplier based on KO-3 multiplication. In Proceedings of the 2022 4th International Conference on Communications, Information System and Computer Engineering (CISCE), Shenzhen, China, 27–29 May 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 596–600. [Google Scholar] [CrossRef]
- Gu, Z.; Li, S. A division-free Toom–Cook multiplication-based Montgomery modular multiplication. IEEE Trans. Circuits Syst. II: Express Briefs 2018, 66, 1401–1405. [Google Scholar] [CrossRef]
- Zhao, S.; Huang, H.; Liu, Z.; Yu, B.; Yu, B. An efficient signed digit montgomery modular multiplication algorithm. Microelectron. J. 2021, 114, 105099. [Google Scholar] [CrossRef]
- Zhao, S.; Zheng, J.; Shao, Y.; Huang, H.; Liu, Z.; Yu, B.; Zhang, Z. RSD-based high-performance radix-4 Montgomery Modular Multiplication for Elliptic Curve Cryptography. Microelectron. J. 2024, 153, 106433. [Google Scholar] [CrossRef]
- Wang, J.; Wang, X.; Liu, W.; Xing, Q.; Tang, X.; Deng, T.; Cao, R.; Huang, M. A parallel and pipelined high speed Montgomery modular multiplier for IoT devices. Comput. Netw. 2025, 265, 111282. [Google Scholar] [CrossRef]
- Marzouqi, H.; Al-Qutayri, M.; Salah, K.; Schinianakis, D.; Stouraitis, T. A high-speed FPGA implementation of an RSD-based ECC processor. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2015, 24, 151–164. [Google Scholar] [CrossRef]
- Ding, J.; Li, S. A reconfigurable high-speed ECC processor over NIST primes. In Proceedings of the 2017 IEEE Trustcom/BigDataSE/ICESS, Sydney, NSW, Australia, 1–4 August 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1064–1069. [Google Scholar] [CrossRef]
- Ding, J.; Li, S.; Gu, Z. High-speed ECC processor over NIST prime fields applied with Toom–Cook multiplication. IEEE Trans. Circuits Syst. I Regul. Pap. 2018, 66, 1003–1016. [Google Scholar] [CrossRef]
- Park, D.W.; Hong, S.; Chang, N.S.; Cho, S.M. Efficient implementation of modular multiplication over 192-bit NIST prime for 8-bit AVR-based sensor node. J. Supercomput. 2021, 77, 4852–4870. [Google Scholar] [CrossRef]
- Hu, X.; Li, X.; Zheng, X.; Liu, Y.; Xiong, X. A high speed processor for elliptic curve cryptography over NIST prime field. IET Circuits Devices Syst. 2022, 16, 350–359. [Google Scholar] [CrossRef]
- Hu, X.; Zheng, X.; Zhang, S.; Li, W.; Cai, S.; Xiong, X. A high-performance elliptic curve cryptographic processor of SM2 over GF (p). Electronics 2019, 8, 431. [Google Scholar] [CrossRef]
- Liu, Z.; Zhang, Q.; Huang, H.; Yang, X.; Chen, G.; Zhao, S.; Yu, B. Design of high area efficiency elliptic curve scalar multiplier based on fast modulo reduction of bit reorganization. J. Electron. Inf. Technol. 2024, 46, 344–352. [Google Scholar] [CrossRef]
- Zhang, C.; Liu, D.; Liu, X.; Zou, X.; Niu, G.; Liu, B.; Jiang, Q. Towards efficient hardware implementation of NTT for kyber on FPGAs. In Proceedings of the 2021 IEEE International Symposium on Circuits and Systems (ISCAS), Daegu, Republic of Korea, 22–28 May 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1–5. [Google Scholar] [CrossRef]
- Nguyen, H.; Tran, L. Design of polynomial NTT and INTT accelerator for post-quantum cryptography CRYSTALS-Kyber. Arab. J. Sci. Eng. 2023, 48, 1527–1536. [Google Scholar] [CrossRef]
- Hu, X.; Tian Li, M.; Wang, Z. AC-PM: An area-efficient and configurable polynomial multiplier for lattice based cryptography. IEEE Trans. Circuits Syst. I Regul. Pap. 2022, 70, 719–732. [Google Scholar] [CrossRef]
- Plantard, T. Efficient word size modular arithmetic. IEEE Trans. Emerg. Top. Comput. 2021, 9, 1506–1518. [Google Scholar] [CrossRef]
- Huang, J.; Zhang, J.; Zhao, H.; Liu, Z.; Cheung, R.C.C.; Koç, Ç.K.; Chen, D. Improved plantard arithmetic for lattice-based cryptography. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2022, 2022, 614–636. [Google Scholar] [CrossRef]
- Huang, J.; Zhao, H.; Zhang, J.; Dai, W.; Zhou, L.; Cheung, R.C.C.; Koç, Ç.K.; Chen, D. Yet another improvement of Plantard arithmetic for faster Kyber on low-end 32-bit IoT devices. IEEE Trans. Inf. Forensics Secur. 2024, 19, 3800–3813. [Google Scholar] [CrossRef]
Multiplication | Advantages | Disadvantages | |
---|---|---|---|
Basic algorithms | Schoolbook multiplication | The hardware implementation is straightforward. | High computational complexity and difficulty in parallel computation are possessed. |
Comba multiplication | The number of carry propagations and memory access are reduced. | ||
Algorithms based on divide and conquer | Karatsuba multiplication | Binary recursive is utilized to reduce computational complexity. | Additional recursive and addition operations are introduced. |
Toom–Cook multiplication | The concept of dynamic divide-and-conquer recursion is utilized to further reduce computational complexity. | The hardware implementation of division operations is relatively challenging. | |
NTT multiplication | The lowest computational complexity is achieved due to the point-value representation. | The length of the input operand and the modulus are limited. | |
Partial product optimization techniques | Booth encoding | The number of partial products is decreased. | The operational efficiency is reduced for multipliers containing non-contiguous 1s or 0s. |
RSD representation | The carry propagations of accumulation operations are decreased. | Redundancy phenomena are susceptible to generate. |
Modular Reduction | Advantages | Disadvantages |
---|---|---|
Barrett reduction | Convert division operations into multiplication operations by utilizing quotient estimation techniques. One modular reduction operation requires two multiplications, one shift, and one subtraction. | Multiple result corrections need to be performed. |
Montgomery reduction | Convert modular operations into shift operations by constructing residue systems. One modular reduction operation requires two multiplications, one shift, one addition, and two modular and subtraction operations. | The domain transformation operation is required. |
Fast modular reduction for Mersenne primes | Convert modular operations into addition and subtraction operations after data reorganization. | Only applicable to specific types of moduli. |
Modular reduction in NTT multiplication | Convert modular operations into addition, subtraction, and shift operations. |
Reference | Year | Algorithm | Platform | Bit-Width | Clock Cycles | Frequency (MHz) | Area (SLICEs/LUTs/DSPs) | Time (µs) | AT (SLICEs × ms/LUTs × ms) | Throughput (Mbps) | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
[47] | 2017 | Radix-2 IMM | Kintex-7 | 224 | 225 | 130.49 | 365 | - | - | 1.71 | 0.63 | - | 130.99 |
256 | 257 | 135.89 | 397 | - | - | 1.88 | 0.76 | - | 136.17 | ||||
[48] | 2020 | Radix-2 IMM | Virtex-7 | 192 | 193 | 207.1 | 386 | 1151 | - | 0.93 | 0.36 | 1.07 | 206.0 |
224 | 225 | 190.7 | 490 | 1409 | - | 1.18 | 0.58 | 1.66 | 189.9 | ||||
256 | 257 | 177.3 | 514 | 1491 | - | 1.45 | 0.75 | 2.16 | 176.6 | ||||
384 | 385 | 137.6 | 820 | 2355 | - | 2.80 | 2.30 | 6.59 | 137.2 | ||||
521 | 522 | 111.2 | 975 | 2496 | - | 4.69 | 4.57 | 11.71 | 111.0 | ||||
[53] | 2024 | Radix-4 Booth-IMM | Virtex-7 | 192 | 97 | 238 | 586 | 1367 | - | 0.41 | 0.24 | 0.56 | 471 |
256 | 129 | 210 | 741 | 2292 | - | 0.62 | 0.46 | 1.42 | 417 | ||||
384 | 193 | 168 | 1071 | 2527 | - | 1.15 | 1.23 | 2.91 | 334 | ||||
521 | 261 | 142 | 1268 | 2986 | - | 1.84 | 2.33 | 5.50 | 284 | ||||
[59] | 2024 | Radix- BMM | Virtex-7 | 256 | 41 | - | - | 6459 | - | 0.13 | - | 0.84 | 80,757 |
1024 | 137 | - | - | 25,034 | - | 0.50 | - | 12.52 | 279,019 | ||||
[61] | 2024 | Karatsuba-BMM | Virtex-7 | 32 | 7 | - | - | 3862 | - | 0.023 | - | 0.09 | 9726 |
64 | 8 | - | - | 10,032 | - | 0.027 | - | 0.28 | 18,659 | ||||
[71] | 2022 | Radix-2 MMM | Virtex-7 | 192 | - | 310.0 | - | 717 | - | 0.62 | - | 0.45 | 308.47 |
224 | - | 284.17 | - | 835 | - | 0.79 | - | 0.66 | 282.9 | ||||
256 | - | 271.29 | - | 955 | - | 0.94 | - | 0.90 | 270.24 | ||||
384 | - | 253.35 | - | 1499 | - | 1.51 | - | 2.26 | 252.69 | ||||
521 | - | 238.6 | - | 2414 | - | 2.18 | - | 5.26 | 238.14 | ||||
[78] | 2022 | Radix- MMM | UltraScale-XCKU115 | 256 | 47 | 285.7 | - | 1223 | 39 | 0.17 | - | 0.20 | 24,899.7 |
512 | 168 | 285.7 | - | 2348 | 71 | 0.59 | - | 1.38 | 13,931.9 | ||||
1024 | 345 | 285.7 | - | 4209 | 131 | 1.21 | - | 5.08 | 13,562.9 | ||||
[81] | 2022 | Radix- MMM | Virtex-7 | 256 | - | 345 | - | 2900 | - | 0.32 | - | 0.93 | 802 |
512 | - | 299 | - | 3700 | - | 1.04 | - | 3.84 | 494 | ||||
Radix- MMM | 256 | - | 290 | - | 5500 | - | 0.21 | - | 1.18 | 1196 | |||
512 | - | 290 | - | 9500 | - | 0.45 | - | 4.26 | 1143 | ||||
[89] | 2025 | PPMMM | Virtex-7 | 256 | 28 | 228 | - | 2800 | 32 | 0.12 | - | 0.34 | 2098.36 |
384 | 28 | 187 | - | 5164 | 48 | 0.15 | - | 0.77 | 2565.13 | ||||
512 | 28 | 147 | - | 5302 | 128 | 0.19 | - | 1.01 | 2688.09 |
Reference | Year | Algorithm | Technology | Bit-Width | Clock Cycles | Frequency (MHz) | Area | Time (ns) | AT |
---|---|---|---|---|---|---|---|---|---|
[47] | 2017 | Radix-2 IMM | 65 nm | 224 | 225 | 549.45 | 12.8 KGates | 409 | 5.23 KGates × µs |
256 | 257 | 549.45 | 13.3 KGates | 468 | 6.21 KGates × µs | ||||
[55] | 2023 | Toom–Cook-BMM | 40 nm | 256 | - | 613 | 64,877 µm2 | 57.05 | 3701 µm2 × µs |
65 nm | 256 | - | 581 | 113,565 µm2 | 60.2 | 6836 µm2 × µs | |||
40 nm | 1024 | - | 555 | 397,457 µm2 | 63 | 25,040 µm2 × µs | |||
[83] | 2021 | RBR-MMM | 65 nm | 256 | 130 | 1000 | 21.5 KGates | 130 | 2.80 KGates × µs |
1024 | 258 | 685 | 121.8 KGates | 377 | 45.92 KGates × µs | ||||
2048 | 514 | 654 | 224.8 KGates | 786 | 176.70 KGates × µs | ||||
8192 | 4098 | 870 | 568.7 KGates | 4713 | 2680.29 KGates × µs | ||||
[84] | 2022 | RBR-MMM | 65 nm | 1024 | 114 | 617 | 222.4 KGates | 185 | 41.09 KGates × µs |
2048 | 114 | 485 | 707.1 KGates | 235 | 166.22 KGates × µs | ||||
8192 | 788 | 428 | 1493.8 KGates | 1842 | 2751.53 KGates × µs | ||||
[85] | 2022 | Karatsuba-MMM | 65 nm | 256 | 17 | - | 145,281 µm2 | 37.4 | 5430 µm2 × µs |
Modular Multiplication | Analysis |
---|---|
Blakley-based interleaved modular multiplication | Modular multiplication is transformed into a series of addition operations, but the low parallelism and large-bit-width addition operations result in a high path delay. This method is suitable for low-power scenarios. |
Barrett modular multiplication | Modular multiplication is transformed into multiplication and shift operations, but estimation errors may result in multiple reduction operations to correct the results. This method is suitable for small-bit-width scenarios. |
Montgomery modular multiplication | By utilizing residue systems, division can be converted into shift operations, but more resources are required to control the domain transformation. This method is suitable for large-bit-width scenarios. |
Fast modular multiplication for specific prime numbers | Adopting a specially designed fast modular reduction algorithm results in high computational efficiency, but the hardware structure lacks generality. This method is suitable for specific modulus scenarios. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Huang, H.; Zheng, J.; Chen, Z.; Zhao, S.; Wu, H.; Yu, B.; Liu, Z. Review of Modular Multiplication Algorithms over Prime Fields for Public-Key Cryptosystems. Cryptography 2025, 9, 46. https://doi.org/10.3390/cryptography9020046
Huang H, Zheng J, Chen Z, Zhao S, Wu H, Yu B, Liu Z. Review of Modular Multiplication Algorithms over Prime Fields for Public-Key Cryptosystems. Cryptography. 2025; 9(2):46. https://doi.org/10.3390/cryptography9020046
Chicago/Turabian StyleHuang, Hai, Jiwen Zheng, Zhengyu Chen, Shilei Zhao, Hongwei Wu, Bin Yu, and Zhiwei Liu. 2025. "Review of Modular Multiplication Algorithms over Prime Fields for Public-Key Cryptosystems" Cryptography 9, no. 2: 46. https://doi.org/10.3390/cryptography9020046
APA StyleHuang, H., Zheng, J., Chen, Z., Zhao, S., Wu, H., Yu, B., & Liu, Z. (2025). Review of Modular Multiplication Algorithms over Prime Fields for Public-Key Cryptosystems. Cryptography, 9(2), 46. https://doi.org/10.3390/cryptography9020046