An Efficient and Low-Cost Design of Modular Reduction for CRYSTALS-Kyber
Abstract
1. Introduction
- A highly efficient modular reduction algorithm for CRYSTALS-Kyber is proposed. By rederiving the bit tree of the bitwise modular algorithm and proposing three universal methods for eliminating redundant bits, we achieve a more streamlined approach that can be flexibly transferred to other modular parameters. For the parameter , compared with the original algorithm, the computational effort is reduced by 21.5%.
- A reduction unit based on the Dadda tree compression array is designed. A customized compression path design is adopted to effectively reduce resource consumption and latency. Compared with related implementations, the ATP is reduced by 16.5%.
- We evaluate the reduction unit at the polynomial operation level. The experimental results show that the design reduces the ATP of the polynomial operation unit by 16.04% and increases the operating frequency by 9.56%.
2. Bitwise Modular Reduction Algorithm
2.1. Bitwise Modular Reduction
2.2. Algorithm Discussion
3. Overall Architecture
3.1. Design of Bitwise Modular Reduction
3.2. DTHCA Compression Process
4. Experimental Results and Comparison
4.1. Experimental Results
4.2. Comparison with Related Works
Ref | Technique | LUT | DSP | Freq (MHz) | Power (w) | Latency (ns) | ATP * () |
---|---|---|---|---|---|---|---|
DETE’21 [23] | bitwise | 156 | 0 | 202.27 | 0.162 | 4.94 | 771.25 |
TCAS’22 [22] | bitwise | 279 | 0 | 167.48 | 0.172 | 5.97 | 1665.87 |
CISCE’21 [19] | Barrett | 37 | 2 | 102.88 | 0.150 | 7.70 | 3431.18 |
Integration’21 [18] | Barrett | 126 | 0 | 129.87 | 0.147 | 9.72 | 970.2 |
TCHES’21 [20] | Barrett | 94 | 0 | 138.89 | 0.151 | 7.20 | 676.79 |
TCAD’23 [21] | Barrett | 79 | 0 | 156.25 | 0.153 | 6.40 | 505.6 |
ours | bitwise | 91 | 0 | 215.38 | 0.162 | 4.64 | 422.51 |
4.3. System-Level Evaluation and Comparison
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Shor, P.W. Polynomial-Time Algorithms for Prime Factorization and Discrete Logarithms on a Quantum Computer. SIAM J. Comput. 1997, 26, 1484–1509. [Google Scholar] [CrossRef]
- Chen, L.; Jordan, S.; Liu, Y.-K.; Moody, D.; Peralta, R.; Perlner, R.; Smith-Tone, D. Report on Post-Quantum Cryptography; NIST IR 8105; National Institute of Standards and Technology: Washington, DC, USA, 2016. [Google Scholar]
- National Institute of Standards and Technology. Module-Lattice-Based Key-Encapsulation Mechanism Standard; NIST FIPS 203; National Institute of Standards and Technology: Washington, DC, USA, 2024. [Google Scholar]
- Montgomery, P.L. Modular Multiplication Without Trial Division. Math. Comput. 1985, 44, 519–521. [Google Scholar] [CrossRef]
- Matteo, S.D.; Sarno, I.; Saponara, S. CRYPHTOR: A Memory-Unified NTT-Based Hardware Accelerator for Post-Quantum CRYSTALS Algorithms. IEEE Access 2024, 12, 25501–25511. [Google Scholar] [CrossRef]
- Plantard, T. Efficient Word Size Modular Arithmetic. IEEE Trans. Emerg. Top. Comput. 2021, 9, 1506–1518. [Google Scholar] [CrossRef]
- Longa, P.; Naehrig, M. Speeding up the Number Theoretic Transform for Faster Ideal Lattice-Based Cryptography. In Cryptology and Network Security; Foresti, S., Persiano, G., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2016; Volume 10052, pp. 124–139. ISBN 978-3-319-48964-3. [Google Scholar]
- Barrett, P. Implementing the Rivest Shamir and Adleman Public Key Encryption Algorithm on a Standard Digital Signal Processor. In Advances in Cryptology—CRYPTO’ 86; Odlyzko, A.M., Ed.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2006; Volume 263, pp. 311–323. ISBN 978-3-540-18047-0. [Google Scholar]
- Zhang, N.; Yang, B.; Chen, C.; Yin, S.; Wei, S.; Liu, L. Highly Efficient Architecture of NewHope-NIST on FPGA Using Low-Complexity NTT/INTT. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2020, 2020, 49–72. [Google Scholar] [CrossRef]
- Ding, J.; Li, S. A Low-Latency and Low-Cost Montgomery Modular Multiplier Based on NLP Multiplication. IEEE Trans. Circuits Syst. II Express Briefs 2020, 67, 1319–1323. [Google Scholar] [CrossRef]
- Huang, J.; Zhang, J.; Zhao, H.; Liu, Z.; Cheung, R.C.C.; Koç, Ç.K.; Chen, D. Improved Plantard Arithmetic for Lattice-Based Cryptography. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2022, 2022, 614–636. [Google Scholar] [CrossRef]
- Huang, J.; Zhao, H.; Zhang, J.; Dai, W.; Zhou, L.; Cheung, R.C.C.; Koç, Ç.K.; Chen, D. Yet Another Improvement of Plantard Arithmetic for Faster Kyber on Low-End 32-Bit IoT Devices. IEEE Trans. Inf. Forensics Secur. 2024, 19, 3800–3813. [Google Scholar] [CrossRef]
- Li, L.; Qin, G.; Yu, Y.; Wang, W. Compact Instruction Set Extensions for Kyber. IEEE Trans. Comput. Design Integr. Circuits Syst. 2024, 43, 756–760. [Google Scholar] [CrossRef]
- Bisheh-Niasar, M.; Azarderakhsh, R.; Mozaffari-Kermani, M. High-Speed NTT-Based Polynomial Multiplication Accelerator for Post-Quantum Cryptography. In Proceedings of the 2021 IEEE 28th Symposium on Computer Arithmetic (ARITH), Lyngby, Denmark, 14–16 June 2021; pp. 94–101. [Google Scholar]
- Bertels, J.; Norga, Q.; Verbauwhede, I. A Better Kyber Butterfly for FPGAs. In Proceedings of the 2024 34th International Conference on Field-Programmable Logic and Applications (FPL), Torino, Italy, 2–6 September 2024; pp. 171–177. [Google Scholar]
- Li, M.; Tian, J.; Hu, X.; Cao, Y.; Wang, Z. High-Speed and Low-Complexity Modular Reduction Design for CRYSTALS-Kyber. In Proceedings of the 2022 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS), Shenzhen, China, 11–13 November 2022; pp. 1–5. [Google Scholar]
- Nguyen, D.N.; Tran, V.D.; Pham, H.L.; Duong Le, V.T.; Lam, D.K.; Tran, T.H.; Nakashima, Y. HyperNTT: A Fast and Accurate NTT/INTT Accelerator with Multi-Level Pipelining and an Improved K2-RED Module. In Proceedings of the 2024 International Technical Conference on Circuits/Systems, Computers, and Communications (ITC-CSCC), Okinawa, Japan, 2–5 July 2024; pp. 1–6. [Google Scholar]
- Chen, Z.; Ma, Y.; Chen, T.; Lin, J.; Jing, J. High-Performance Area-Efficient Polynomial Ring Processor for CRYSTALS-Kyber on FPGAs. Integration 2021, 78, 25–35. [Google Scholar] [CrossRef]
- Ma, L.; Wu, X.; Bai, G. Parallel Polynomial Multiplication Optimized Scheme for CRYSTALS-KYBER Post-Quantum Cryptosystem Based on FPGA. In Proceedings of the 2021 International Conference on Communications, Information System and Computer Engineering (CISCE), Beijing, China, 14–16 May 2021; pp. 361–365. [Google Scholar]
- Xing, Y.; Li, S. A Compact Hardware Implementation of CCA-Secure Key Exchange Mechanism CRYSTALS-KYBER on FPGA. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2021, 2021, 328–356. [Google Scholar] [CrossRef]
- Li, M.; Tian, J.; Hu, X.; Wang, Z. Reconfigurable and High-Efficiency Polynomial Multiplication Accelerator for CRYSTALS-Kyber. IEEE Trans. Comput. Design Integr. Circuits Syst. 2023, 42, 2540–2551. [Google Scholar] [CrossRef]
- Guo, W.; Li, S.; Kong, L. An Efficient Implementation of KYBER. IEEE Trans. Circuits Syst. II Express Briefs 2022, 69, 1562–1566. [Google Scholar] [CrossRef]
- Yaman, F.; Mert, A.C.; Ozturk, E.; Savas, E. A Hardware Accelerator for Polynomial Multiplication Operation of CRYSTALS-KYBER PQC Scheme. In Proceedings of the 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE), Grenoble, France, 1–5 February 2021; pp. 1020–1025. [Google Scholar]
- Kavand, N.; Darjani, A.; Rai, S.; Kumar, A. Design of Energy-Efficient RFET-Based Exact and Approximate 4:2 Compressors and Multipliers. IEEE Trans. Circuits Syst. II 2023, 70, 3644–3648. [Google Scholar] [CrossRef]
- Guo, W.; Li, S. Split-Radix Based Compact Hardware Architecture for CRYSTALS-Kyber. IEEE Trans. Comput. 2024, 73, 97–108. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Huang, Z.; Chen, S.; Sun, P.; Deng, D.; Sun, G. An Efficient and Low-Cost Design of Modular Reduction for CRYSTALS-Kyber. Electronics 2025, 14, 2309. https://doi.org/10.3390/electronics14112309
Huang Z, Chen S, Sun P, Deng D, Sun G. An Efficient and Low-Cost Design of Modular Reduction for CRYSTALS-Kyber. Electronics. 2025; 14(11):2309. https://doi.org/10.3390/electronics14112309
Chicago/Turabian StyleHuang, Zhengwu, Sizhe Chen, Pengyue Sun, Ding Deng, and Guangfu Sun. 2025. "An Efficient and Low-Cost Design of Modular Reduction for CRYSTALS-Kyber" Electronics 14, no. 11: 2309. https://doi.org/10.3390/electronics14112309
APA StyleHuang, Z., Chen, S., Sun, P., Deng, D., & Sun, G. (2025). An Efficient and Low-Cost Design of Modular Reduction for CRYSTALS-Kyber. Electronics, 14(11), 2309. https://doi.org/10.3390/electronics14112309