MDPI - Publisher of Open Access Journals

16 pages, 1263 KiB

Open AccessArticle

Accelerating CRYSTALS-Kyber: High-Speed NTT Design with Optimized Pipelining and Modular Reduction

by Omar S. Sonbul, Muhammad Rashid and Amar Y. Jaffar

Electronics 2025, 14(11), 2122; https://doi.org/10.3390/electronics14112122 - 23 May 2025

Viewed by 801

The Number Theoretic Transform (NTT) is a cornerstone for efficient polynomial multiplication, which is fundamental to lattice-based cryptographic algorithms such as CRYSTALS-Kyber—a leading candidate in post-quantum cryptography (PQC). However, existing NTT accelerators often rely on integer multiplier-based modular reduction techniques, such as Barrett [...] Read more.

The Number Theoretic Transform (NTT) is a cornerstone for efficient polynomial multiplication, which is fundamental to lattice-based cryptographic algorithms such as CRYSTALS-Kyber—a leading candidate in post-quantum cryptography (PQC). However, existing NTT accelerators often rely on integer multiplier-based modular reduction techniques, such as Barrett or Montgomery reduction, which introduce significant computational overhead and hardware resource consumption. These accelerators also lack optimization in unified architectures for forward (FNTT) and inverse (INTT) transformations. Addressing these research gaps, this paper introduces a novel, high-speed NTT accelerator tailored specifically for CRYSTALS-Kyber. The proposed design employs an innovative shift-add modular reduction mechanism, eliminating the need for integer multipliers, thereby reducing critical path delay and enhancing circuit frequency. A unified pipelined butterfly unit, capable of performing FNTT and INTT operations through Cooley–Tukey and Gentleman–Sande configurations, is integrated into the architecture. Additionally, a highly efficient data handling mechanism based on Register banks supports seamless memory access, ensuring continuous and parallel processing. The complete architecture, implemented in Verilog HDL, has been evaluated on FPGA platforms (Virtex-5, Virtex-6, and Virtex-7). Post place-and-route results demonstrate a maximum operating frequency of 261 MHz on Virtex-7, achieving a throughput of 290.69 Kbps—1.45× and 1.24× higher than its performance on Virtex-5 and Virtex-6, respectively. Furthermore, the design boasts an impressive throughput-per-slice metric of 111.63, underscoring its resource efficiency. With a 1.27× reduction in computation time compared to state-of-the-art single butterfly unit-based NTT accelerators, this work establishes a new benchmark in advancing secure and scalable cryptographic hardware solutions. Full article

► Show Figures

Figure 1

16 pages, 299 KiB

Open AccessArticle

An Efficient Implementation of Montgomery Modular Multiplication Using a Minimally Redundant Residue Number System

by Mikhail Selianinau and Bożena Woźna-Szcześniak

Appl. Sci. 2025, 15(10), 5332; https://doi.org/10.3390/app15105332 - 10 May 2025

Viewed by 398

Abstract

This paper presents an implementation of modular multiplication based on Montgomery’s scheme within the Residue Number System (RNS). The key innovation of the proposed approach lies in utilizing minimally redundant residue arithmetic, where the rank of a number serves as the primary positional [...] Read more.

This paper presents an implementation of modular multiplication based on Montgomery’s scheme within the Residue Number System (RNS). The key innovation of the proposed approach lies in utilizing minimally redundant residue arithmetic, where the rank of a number serves as the primary positional characteristic of the residue code. Additionally, integer numbers are represented in rank form during base extension operations. Due to the low computational complexity of rank calculation in minimally redundant RNS and the specific constraints imposed on the RNS moduli sets, the proposed modular multiplication method achieves up to a 1.5 times performance improvement over non-redundant RNS counterparts. This approach is particularly suited for applications in public key cryptosystems. Full article

(This article belongs to the Special Issue Novel Insights into Cryptography and Network Security)

19 pages, 4017 KiB

Open AccessArticle

Efficient Large-Width Montgomery Modular Multiplier Design Based on Toom–Cook-5

by Kuanhao Liu, Xiaohua Wang, Yue Hao, Jingqi Zhang and Weijiang Wang

Electronics 2025, 14(7), 1402; https://doi.org/10.3390/electronics14071402 - 31 Mar 2025

Viewed by 408

Abstract

Toom–Cook-n multiplication is an efficient large-width multiplication algorithm based on a divide-and-conquer strategy, widely used in modular multiplication operations for cryptographic algorithms. Theoretically, as the degree n increases, Toom–Cook-n can split the multiplicands into more sub-terms to further enhance the performance [...] Read more.

Toom–Cook-n multiplication is an efficient large-width multiplication algorithm based on a divide-and-conquer strategy, widely used in modular multiplication operations for cryptographic algorithms. Theoretically, as the degree n increases, Toom–Cook-n can split the multiplicands into more sub-terms to further enhance the performance of the multiplier. However, constrained by the computational burden brought by the growing size of the interpolation matrix as the degree increases, current research predominantly focuses on Toom–Cook-4 and Toom–Cook-3. This paper proposes a Montgomery modular multiplication design based on Toom–Cook-5, which alleviates the computational difficulty of the interpolation step by introducing an interpolation matrix pre-simplification strategy. Additionally, the design incorporates and optimizes carry–save adder and Karatsuba multiplication, enabling Toom–Cook-5 multiplication to be applied in practical and efficient hardware implementation. This paper presents the ASIC implementation results of the hardware architecture under a 90nm process, demonstrating superior performance compared to previous works. Full article

► Show Figures

Figure 1

18 pages, 669 KiB

Open AccessArticle

Lazy Modular Reduction for NTT

by Geumtae Kim, Eunyoung Seo, Yongwoo Lee, Young-Sik Kim and Jong-Seon No

Electronics 2024, 13(24), 4887; https://doi.org/10.3390/electronics13244887 - 11 Dec 2024

Cited by 1 | Viewed by 1504

Abstract

The number theoretic transform (NTT) is a fundamental operation in cryptography, especially for lattice-based cryptographic schemes. This paper introduces LazyNTT, a novel method that reduces the number of Montgomery multiplications required in the NTT computation by replacing some of them with standard [...] Read more.

The number theoretic transform (NTT) is a fundamental operation in cryptography, especially for lattice-based cryptographic schemes. This paper introduces LazyNTT, a novel method that reduces the number of Montgomery multiplications required in the NTT computation by replacing some of them with standard multiplication without modular reduction. This approach enhances the performance of the NTT computation and modular polynomial multiplication in lattice-based cryptographic schemes. The proposed LazyNTT can be generalized by increasing the number of standard multiplications. The experimental results show that the proposed LazyNTT improves the cycle counts of the NTT by up to

28 %

and

9 %

, respectively, by allowing two and one standard multiplications. Full article

(This article belongs to the Special Issue Security and Privacy for Modern Wireless Communication Systems, 2nd Edition)

► Show Figures

Figure 1

19 pages, 662 KiB

Open AccessArticle

Optimization of SM2 Algorithm Based on Polynomial Segmentation and Parallel Computing

by Hongyu Zhu, Ding Li, Yizhen Sun, Qian Chen, Zheng Tian and Yubo Song

Electronics 2024, 13(23), 4661; https://doi.org/10.3390/electronics13234661 - 26 Nov 2024

Cited by 1 | Viewed by 1495

Abstract

The SM2 public key cryptographic algorithm is widely utilized for secure communication and data protection due to its strong security and compact key size. However, the intensive large integer operations it requires pose significant computational challenges, which can limit the performance of Internet [...] Read more.

The SM2 public key cryptographic algorithm is widely utilized for secure communication and data protection due to its strong security and compact key size. However, the intensive large integer operations it requires pose significant computational challenges, which can limit the performance of Internet of Things (IoT) terminal devices. This paper introduces an optimized implementation of the SM2 algorithm specifically designed for IoT contexts. By segmenting large integers as polynomials within a modified Montgomery modular multiplication algorithm, the proposed method enables parallel modular multiplication and reduction, thus addressing storage constraints and reducing computational redundancy. For scalar multiplication, a Co-Z Montgomery ladder algorithm is employed alongside Single Instruction Multiple Data (SIMD) instructions to enhance parallelism, significantly improving efficiency. Experimental results demonstrate that the proposed scheme reduces the computation time for the SM2 algorithm’s digital signature by approximately 20% and enhances data encryption and decryption efficiency by about 15% over existing methods, marking a substantial performance gain for IoT applications. Full article

(This article belongs to the Special Issue Knowledge Information Extraction Research)

► Show Figures

Figure 1

14 pages, 553 KiB

Open AccessArticle

HAETAE on ARMv8

by Minjoo Sim, Minwoo Lee and Hwajeong Seo

Electronics 2024, 13(19), 3863; https://doi.org/10.3390/electronics13193863 - 29 Sep 2024

Cited by 1 | Viewed by 1240

Abstract

In this work, we present the highly optimized implementation of the HAETAE algorithm, submitted to the second round of the Korean Post-Quantum Cryptography (KpqC) competition and to the first round of NIST’s additional post-quantum standardization for digital signatures on 64-bit ARMv8 embedded processors. [...] Read more.

In this work, we present the highly optimized implementation of the HAETAE algorithm, submitted to the second round of the Korean Post-Quantum Cryptography (KpqC) competition and to the first round of NIST’s additional post-quantum standardization for digital signatures on 64-bit ARMv8 embedded processors. To the best of our knowledge, this is the first optimized implementation of the HAETAE algorithm on 64-bit ARMv8 embedded processors. We apply various optimization techniques to enhance the multiplication operations in the HAETAE algorithm. We utilize parallel operation techniques involving vector registers and NEON (Advanced SIMD technology used in ARM processors) instructions of ARMv8 embedded processors. In particular, we achieved the best performance of the HAETAE algorithm on ARMv8 embedded processors by applying all the state-of-the-art NTT (Number Theoretic Transform) implementation techniques. Performance improvements of up to

3.07 \times

,

3.63 \times

, and

9.15 \times

were confirmed for NTT, Inverse-NTT, and pointwise Montgomery operations (Montgomery multiplication used in modular arithmetic), respectively, by applying the state-of-the-art implementation techniques, including the proposed techniques. As a result, we achieved a maximum performance improvement of up to

1.16 \times

for the key generation algorithm, up to

1.14 \times

for the signature algorithm, and up to

1.25 \times

for the verification algorithm. Full article

(This article belongs to the Special Issue Recent Advances in Information Security and Data Privacy)

► Show Figures

Figure 1

24 pages, 5436 KiB

Open AccessArticle

An Efficient SM9 Aggregate Signature Scheme for IoV Based on FPGA

by Bolin Zhang, Bin Li, Jiaxin Zhang, Yuanxin Wei, Yunfei Yan, Heru Han and Qinglei Zhou

Sensors 2024, 24(18), 6011; https://doi.org/10.3390/s24186011 - 17 Sep 2024

Viewed by 1286

Abstract

With the rapid development of the Internet of Vehicles (IoV), the demand for secure and efficient signature verification is becoming increasingly urgent. To meet this need, we propose an efficient SM9 aggregate signature scheme implemented on Field-Programmable Gate Array (FPGA). The scheme includes [...] Read more.

With the rapid development of the Internet of Vehicles (IoV), the demand for secure and efficient signature verification is becoming increasingly urgent. To meet this need, we propose an efficient SM9 aggregate signature scheme implemented on Field-Programmable Gate Array (FPGA). The scheme includes both fault-tolerant and non-fault-tolerant aggregate signature modes, which are designed to address challenges in various network environments. We provide security proofs for these two signature verification modes based on a K-ary Computational Additive Diffie–Hellman (K-CAA) difficult problem. To handle the numerous parallelizable elliptic curve point multiplication operations required during verification, we utilize FPGA’s parallel processing capabilities to design an efficient parallel point multiplication architecture. By the Montgomery point multiplication algorithm and the Barrett modular reduction algorithm, we optimize the single-point multiplication computation unit, achieving a point multiplication speed of 70776 times per second. Finally, the overall scheme was simulated and analyzed on an FPGA platform. The experimental results and analysis indicate that under error-free conditions, the proposed non-fault-tolerant aggregate mode reduces the verification time by up to 97.1% compared to other schemes. In fault-tolerant conditions, the proposed fault-tolerant aggregate mode reduces the verification time by up to 77.2% compared to other schemes. When compared to other fault-tolerant aggregate schemes, its verification time is only 28.9% of their consumption, and even in the non-fault-tolerant aggregate mode, the verification time is reduced by at least 39.1%. Therefore, the proposed scheme demonstrates significant advantages in both error-free and fault-tolerant scenarios. Full article

(This article belongs to the Section Vehicular Sensing)

► Show Figures

Figure 1

19 pages, 406 KiB

Open AccessArticle

Symmetry-Enabled Resource-Efficient Systolic Array Design for Montgomery Multiplication in Resource-Constrained MIoT Endpoints

by Atef Ibrahim and Fayez Gebali

Symmetry 2024, 16(6), 715; https://doi.org/10.3390/sym16060715 - 9 Jun 2024

Viewed by 863

Abstract

In today’s TEST interconnected world, the security of 5G Medical IoT networks is of paramount concern. The increasing number of connected devices and the transmission of vast amounts of data necessitate robust measures to protect information integrity and confidentiality. However, securing Medical IoT [...] Read more.

In today’s TEST interconnected world, the security of 5G Medical IoT networks is of paramount concern. The increasing number of connected devices and the transmission of vast amounts of data necessitate robust measures to protect information integrity and confidentiality. However, securing Medical IoT edge nodes poses unique challenges due to their limited resources, making the implementation of cryptographic protocols a complex task. Within these protocols, modular multiplication assumes a crucial role. Therefore, careful consideration must be given to its implementation. This study focuses on developing a resource-efficient hardware implementation of the Montgomery modular multiplication algorithm over GF(

2^{l}

), which is a critical operation in cryptographic algorithms. The proposed solution introduces a bit-serial systolic array layout with a modular structure and local connectivity between processing elements. This design, inspired by the principles of symmetry, allows for efficient utilization of resources and optimization of area and delay management. This makes it well-suited for deployment in compact Medical IoT edge nodes with limited resources. The suggested bit-serial processor structure was evaluated through ASIC implementation, which demonstrated substantial improvements over competing designs. The results showcase an average area reduction of 24.5% and significant savings in the area–time product of 26.2%. Full article

(This article belongs to the Special Issue Security and Privacy Challenges in 5G Networks)

► Show Figures

Figure 1

29 pages, 986 KiB

Open AccessArticle

Hardware Implementations of Elliptic Curve Cryptography Using Shift-Sub Based Modular Multiplication Algorithms

by Yamin Li

Cryptography 2023, 7(4), 57; https://doi.org/10.3390/cryptography7040057 - 10 Nov 2023

Cited by 3 | Viewed by 6040

Abstract

Elliptic curve cryptography (ECC) over prime fields relies on scalar point multiplication realized by point addition and point doubling. Point addition and point doubling operations consist of many modular multiplications of large operands (256 bits for example), especially in projective and Jacobian coordinates [...] Read more.

Elliptic curve cryptography (ECC) over prime fields relies on scalar point multiplication realized by point addition and point doubling. Point addition and point doubling operations consist of many modular multiplications of large operands (256 bits for example), especially in projective and Jacobian coordinates which eliminate the modular inversion required in affine coordinates for every point addition or point doubling operation. Accelerating modular multiplication is therefore important for high-performance ECC. This paper presents the hardware implementations of modular multiplication algorithms, including (1) interleaved modular multiplication (IMM), (2) Montgomery modular multiplication (MMM), (3) shift-sub modular multiplication (SSMM), (4) SSMM with advance preparation (SSMMPRE), and (5) SSMM with CSAs and sign detection (SSMMCSA) algorithms, and evaluates their execution time (the number of clock cycles and clock frequency) and required hardware resources (ALMs and registers). Experimental results show that SSMM is 1.80 times faster than IMM, and SSMMCSA is 3.27 times faster than IMM. We also present the ECC hardware implementations based on the Secp256k1 protocol in affine, projective, and Jacobian coordinates using the IMM, SSMM, SSMMPRE, and SSMMCSA algorithms, and investigate their cost and performance. Our ECC implementations can be applied to the design of hardware security module systems. Full article

(This article belongs to the Special Issue Feature Papers in Hardware Security II)

► Show Figures

Figure 1

16 pages, 1095 KiB

Open AccessArticle

Efficient Hardware Implementation of Elliptic-Curve Diffie–Hellman Ephemeral on Curve25519

by Hung Nguyen, Trang Hoang and Linh Tran

Electronics 2023, 12(21), 4480; https://doi.org/10.3390/electronics12214480 - 31 Oct 2023

Cited by 3 | Viewed by 3037

Abstract

Hardware architecture optimized for implementing the elliptic-curve Diffie–Hellman ephemeral (ECDHE) on 256-bit Montgomery elliptic curves presents unique challenges, particularly for resource-constrained IoT and mobile devices. This work aims to provide an efficient hardware implementation of ECDHE on Curve25519, including a dedicated finite state [...] Read more.

Hardware architecture optimized for implementing the elliptic-curve Diffie–Hellman ephemeral (ECDHE) on 256-bit Montgomery elliptic curves presents unique challenges, particularly for resource-constrained IoT and mobile devices. This work aims to provide an efficient hardware implementation of ECDHE on Curve25519, including a dedicated finite state machine (FSM) designed to handle point multiplication and ECDHE operations, utilizing constant-time algorithms and a unified memory block for resource management. Additionally, we introduce an optimized modular computation unit that covers modular addition, subtraction, multiplication, and inversion. Our proposed hardware architecture enhances the efficiency of ECDHE operations while maintaining low resource utilization, considerably reduced latency, and low power consumption. Synthesized on the Xilinx Artix-7 platform, our design boasts 64,000 Slices and a clock speed of 102 MHz, and it computes an ECDHE scalar multiplication operation in 1.1 ms, consuming 117 mW. The proposed hardware design can be applied to various platforms, including mobile devices and IoT systems. Full article

(This article belongs to the Section Circuit and Signal Processing)

► Show Figures

Figure 1

17 pages, 591 KiB

Open AccessArticle

A Low-Cost High-Performance Montgomery Modular Multiplier Based on Pipeline Interleaving for IoT Devices

by Hongshuo Li, Shiwei Ren, Weijiang Wang, Jingqi Zhang and Xiaohua Wang

Electronics 2023, 12(15), 3241; https://doi.org/10.3390/electronics12153241 - 27 Jul 2023

Cited by 5 | Viewed by 2068

Abstract

Modular multiplication is a crucial operation in public-key cryptography systems such as RSA and ECC. In this study, we analyze and improve the iteration steps of the classic Montgomery modular multiplication (MMM) algorithm and propose an interleaved pipeline (IP) structure, which meets the [...] Read more.

Modular multiplication is a crucial operation in public-key cryptography systems such as RSA and ECC. In this study, we analyze and improve the iteration steps of the classic Montgomery modular multiplication (MMM) algorithm and propose an interleaved pipeline (IP) structure, which meets the high-performance and low-cost requirements for Internet of Things devices. Compared to the classic pipeline structure, the IP does not require a multiplexing processing element (PE), which helps shorten the data path of intermediate results. We further introduce a disruption in the critical path to complete an iterative step of the MMM algorithm in two clock cycles. Our proposed hardware architecture is implemented on Xilinx Virtex-7 Series FPGA, using DSP48E1, to realize the multiplier. The implemented results show that the modular multiplication of 1024 bits by 2048 bits requires 1.03

μ

s and 2.13

μ

s, respectively. Moreover, our area–time–product analysis reveals a favorable outcome compared to the state-of-the-art designs across a 1024-bit and 2048-bit modulus. Full article

(This article belongs to the Special Issue Computer-Aided Design for Hardware Security and Trust)

► Show Figures

Figure 1

15 pages, 6731 KiB

Open AccessArticle

FPGA Implementation for Elliptic Curve Cryptography Algorithm and Circuit with High Efficiency and Low Delay for IoT Applications

by Deming Wang, Yuhang Lin, Jianguo Hu, Chong Zhang and Qinghua Zhong

Micromachines 2023, 14(5), 1037; https://doi.org/10.3390/mi14051037 - 12 May 2023

Cited by 15 | Viewed by 3246

Abstract

The Internet of Things requires greater attention to the security and privacy of the network. Compared to other public-key cryptosystems, elliptic curve cryptography can provide better security and lower latency with shorter keys, rendering it more suitable for IoT security. This paper presents [...] Read more.

The Internet of Things requires greater attention to the security and privacy of the network. Compared to other public-key cryptosystems, elliptic curve cryptography can provide better security and lower latency with shorter keys, rendering it more suitable for IoT security. This paper presents a high-efficiency and low-delay elliptic curve cryptographic architecture based on the NIST-

p_{256}

prime field for IoT security applications. A modular square unit utilizes a fast partial Montgomery reduction algorithm, demanding just a mere four clock cycles to complete a modular square operation. The modular square unit can be computed simultaneously with the modular multiplication unit, consequently improving the speed of point multiplication operations. Synthesized on the Xilinx Virtex-7 FPGA platform, the proposed architecture completes one PM operation in 0.08 ms using 23.1 k LUTs at 105.3 MHz. These results show significantly better performance compared to that in previous works. Full article

(This article belongs to the Special Issue FPGA Applications and Future Trends)

► Show Figures

Figure 1

17 pages, 48424 KiB

Open AccessArticle

A Unified Point Multiplication Architecture of Weierstrass, Edward and Huff Elliptic Curves on FPGA

by Muhammad Arif, Omar S. Sonbul, Muhammad Rashid, Mohsin Murad and Mohammed H. Sinky

Appl. Sci. 2023, 13(7), 4194; https://doi.org/10.3390/app13074194 - 25 Mar 2023

Cited by 3 | Viewed by 1819

Abstract

This article presents an area-aware unified hardware accelerator of Weierstrass, Edward, and Huff curves over

G F (2^{233})

for the point multiplication step in elliptic curve cryptography (ECC). The target implementation platform is a field-programmable gate array (FPGA). In order [...] Read more.

This article presents an area-aware unified hardware accelerator of Weierstrass, Edward, and Huff curves over

G F (2^{233})

for the point multiplication step in elliptic curve cryptography (ECC). The target implementation platform is a field-programmable gate array (FPGA). In order to explore the design space between processing time and various protection levels, this work employs two different point multiplication algorithms. The first is the Montgomery point multiplication algorithm for the Weierstrass and Edward curves. The second is the Double and Add algorithm for the Binary Huff curve. The area complexity is reduced by efficiently replacing storage elements that result in a 1.93 times decrease in the size of the memory needed. An efficient Karatsuba modular multiplier hardware accelerator is implemented to compute polynomial multiplications. We utilized the square arithmetic unit after the Karatsuba multiplier to execute the quad-block variant of a modular inversion, which preserves lower hardware resources and also reduces clock cycles. Finally, to support three different curves, an efficient controller is implemented. Our unified architecture can operate at a maximum of 294 MHz and utilizes 7423 slices on Virtex-7 FPGA. It takes less computation time than most recent state-of-the-art implementations. Thus, combining different security curves (Weierstrass, Edward, and Huff) in a single design is practical for applications that demand different reliability/security levels. Full article

► Show Figures

Figure 1

20 pages, 483 KiB

Open AccessArticle

Word-Based Processor Structure for Montgomery Modular Multiplier Suitable for Compact IoT Edge Devices

by Atef Ibrahim and Fayez Gebali

Mathematics 2023, 11(2), 328; https://doi.org/10.3390/math11020328 - 8 Jan 2023

Cited by 1 | Viewed by 1754

Abstract

The Internet of Things (IoT) is an emerging technology that forms a huge network of different objects and intelligent devices. IoT Security is becoming more important due to the exchange of sensitive sensor data and the potential for incorporating the virtual and real [...] Read more.

The Internet of Things (IoT) is an emerging technology that forms a huge network of different objects and intelligent devices. IoT Security is becoming more important due to the exchange of sensitive sensor data and the potential for incorporating the virtual and real worlds. IoT edge devices create serious security threats to network systems. Due to their limited resources, it is challenging to implement cryptographic protocols on these devices to secure them. To address this problem, we should perform compact implementation of cryptographic algorithms on these devices. At the heart of most cryptographic algorithms is the modular multiplication operation. Therefore, efficient implementation of this operation will have a great impact on the implementation of the whole cryptographic protocol. In this paper, we will focus on the resource and energy efficient hardware implementation of the adopted Montgomery modular multiplication algorithm over GF(

2^{m}

). The main building block of the proposed word-based processor structure is a processor array that has a modular structure with local connectivity between its processing elements. The ability to manage the saving amounts of area, delay, and consumed energy is the main benefit of the suggested hardware structure. We used ASIC technology to implement the suggested word-based processor structure. The final results show an average reduction in the area of 86.3% when compared with the competitive word-based multiplier structures. Additionally, the recommended design achieves significant average savings in area-time product, power, and consumed energy of 53.7%, 83.2%, and 72.6%, receptively, over the competitive ones. The obtained results show that the provided processor structure is best suited for application in compact IoT edge devices with limited resources. Full article

(This article belongs to the Special Issue Codes, Designs, Cryptography and Optimization, 2nd Edition)

► Show Figures

Figure 1

14 pages, 684 KiB

Open AccessArticle

A Scalable Montgomery Modular Multiplication Architecture with Low Area-Time Product Based on Redundant Binary Representation

by Zhaoji Zhang and Peiyong Zhang

Electronics 2022, 11(22), 3712; https://doi.org/10.3390/electronics11223712 - 13 Nov 2022

Cited by 8 | Viewed by 2305

Abstract

The Montgomery modular multiplication is an integral operation unit in the public key cryptographic algorithm system. Previous work achieved good performance at low input widths by combining Redundant Binary Representation (RBR) with Montgomery modular multiplication, but it is difficult to strike a good [...] Read more.

The Montgomery modular multiplication is an integral operation unit in the public key cryptographic algorithm system. Previous work achieved good performance at low input widths by combining Redundant Binary Representation (RBR) with Montgomery modular multiplication, but it is difficult to strike a good balance between area and time as input bit widths increase. To solve this problem, based on the redundant Montgomery modular multiplication, in this paper, we propose a flexible and pipeline hardware implementation of the Montgomery modular multiplication. Our proposed structure guarantees a single-cycle delay between two-stage pipeline units and reduces the length of the critical path by redistributing the data paths between the pipelines and preprocessing the input in the loop. By analyzing the structure and comparing the related work in this paper, our structure ensures a lower area-time product while achieving a controllable and small area consumption. The comprehensive results under different Taiwan Semiconductor Manufacturing Company (TSMC) processes demonstrate the advantages of our structure in terms of flexibility and area-time product. Full article

(This article belongs to the Section Electronic Materials, Devices and Applications)

► Show Figures

Figure 1

Search Results (24)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (24)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI