Next Article in Journal
A Categorization of Digital Twin and Model-Based System Engineering Interactions
Next Article in Special Issue
An Interpretable Method for Anomaly Detection in Multivariate Time Series Predictions
Previous Article in Journal
A Study on Factors Affecting the Continuance Usage Intention of Social Robots with Episodic Memory: A Stimulus–Organism–Response Perspective
Previous Article in Special Issue
Integrating Visual Cryptography for Efficient and Secure Image Sharing on Social Networks
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Efficient Implementation of Montgomery Modular Multiplication Using a Minimally Redundant Residue Number System

by
Mikhail Selianinau
and
Bożena Woźna-Szcześniak
*
Department of Mathematics and Computer Science, Faculty of Science and Technology, Jan Dlugosz University in Czestochowa, al. Armii Krajowej 13/15, 42-200 Czestochowa, Poland
*
Author to whom correspondence should be addressed.
Appl. Sci. 2025, 15(10), 5332; https://doi.org/10.3390/app15105332
Submission received: 14 March 2025 / Revised: 24 April 2025 / Accepted: 8 May 2025 / Published: 10 May 2025
(This article belongs to the Special Issue Novel Insights into Cryptography and Network Security)

Abstract

This paper presents an implementation of modular multiplication based on Montgomery’s scheme within the Residue Number System (RNS). The key innovation of the proposed approach lies in utilizing minimally redundant residue arithmetic, where the rank of a number serves as the primary positional characteristic of the residue code. Additionally, integer numbers are represented in rank form during base extension operations. Due to the low computational complexity of rank calculation in minimally redundant RNS and the specific constraints imposed on the RNS moduli sets, the proposed modular multiplication method achieves up to a 1.5 times performance improvement over non-redundant RNS counterparts. This approach is particularly suited for applications in public key cryptosystems.

1. Introduction

Since the mid-1950s, RNS arithmetic has attracted significant attention from researchers and practitioners across various domains, including number-theoretic methods, computer technology, digital signal processing, cryptography, and communications [1,2,3,4,5,6,7,8,9].
RNS arithmetic is particularly valuable in parallel processing systems, where it enables fast and accurate computations. The inherent parallelism of modular computational structures provides several key advantages over positional numeral systems, especially for operations involving large numbers. These advantages include the following:
  • performance time independence of parallel modular operations from the moduli number and, consequently, the length of the residue code;
  • high adaptability of modular arithmetic algorithms to tabular calculations and pipeline processing;
  • flexibility in using lookup table techniques for reconfiguring modular computational structures, among others.
A well-known application of modular arithmetic is in cryptographic systems, where multiplicative operations over large moduli form the foundation for various information security mechanisms [10,11,12,13,14,15,16,17,18,19,20,21,22,23]. These operations are extensively utilized in electronic signature schemes, public key cryptosystems based on RSA and Rabin’s method, and other cryptographic tasks. Therefore, advancing computational technologies that enhance modular multiplication and exponentiation efficiency over large-number ranges remains a crucial research direction.
One of the most promising approaches for achieving high-performance modular arithmetic is using RNS-based modular computational structures. As the degree of modularity in computational processes increases, the efficiency of these structures improves significantly. This effect is particularly evident when handling large-number arithmetic operations.
A notable example is the Montgomery multiplication (MM) algorithm implemented using RNS [10,11,12,13,22]. Montgomery multiplication replaces general division with an integer division operation, which is well suited for RNS configurations [24]. This property makes MM in RNS highly efficient.
However, a principal computational bottleneck in modular arithmetic is the residue code extension required in MM implementations. Optimizing these operations is a key challenge in developing high-performance cryptographic hardware and software for large-number computations. One promising solution is to use minimally redundant RNS, which allows for more efficient execution of fundamental non-modular operations such as base extension and reverse conversion based on the Chinese Remainder Theorem (CRT) [5,6,25,26,27,28].
This paper aims to take fundamental advantages of minimally redundant RNS arithmetic to optimize multiplication algorithms based on Montgomery’s scheme. By refining the base extension and conversion processes, we enhance the overall efficiency of MM in large-number cryptographic applications.
This paper provides a rigorous theoretical framework for optimizing modular multiplication in minimally redundant Residue Number Systems (RNSs), offering new insights into the computational efficiency of cryptographic operations. Using minimally redundant arithmetic, the study improves the performance of modular computations, which are fundamental in public-key cryptography, secure computing, and high-performance arithmetic.
Rather than focusing on software or hardware implementation, this work contributes by presenting formal algorithmic descriptions, mathematical proofs, and correctness guarantees. These theoretical foundations serve as a basis for future advances in secure computing, number-theoretic algorithms, and parallel processing architectures.
For clarity and reproducibility, the paper provides detailed algorithmic methodologies and mathematical derivations, making it accessible to researchers developing efficient cryptographic protocols, modular arithmetic techniques, and computational number theory applications. The structured formal approach to RNS-based Montgomery multiplication supports interdisciplinary applications in computer science, digital signal processing, and cryptographic engineering.
The remainder of this paper is structured as follows. Section 2 and Section 3 introduce the theoretical foundations of the research. Section 4 presents the mathematical background of the proposed method. Section 5 describes the MM algorithm and analyzes the impact of RNS moduli selection. Section 6 provides a discussion of the results. Section 7 concludes the paper.

2. Basic Principles of RNS Arithmetic

Residue Number System (RNS) arithmetic is based on abstract algebra and number theory [29,30].
In the set of integers Z , an RNS is defined by a set of pairwise coprime natural moduli m 1 , m 2 , , m k where m i > 2 for i = 1 , 2 , , k and k 2 . In this system, an integer X Z is represented by the k-tuple χ 1 , χ 2 , , χ k , where χ i = X m i is the residue of X modulo m i , i.e., χ i belongs to the set Z m i = { 0 , 1 , , m i 1 } , ensuring X χ i ( mod m i ) for all i = 1 , 2 , , k
An RNS allows the unique representation of up to M k = i = 1 k m i integers through their residue codes. The number range is typically defined by Z M k = { 0 , 1 , , M k 1 } .

2.1. Arithmetic Operations in RNS

Due to the carry-free nature of RNS arithmetic, addition, subtraction, and multiplication can be executed in parallel, independently for each modulus, following the rule below:
A B = α 1 , α 2 , , α k β 1 , β 2 , , β k =
= α 1 β 1 m 1 , α 2 β 2 m 2 , , α k β k m k ,
where + , , × denotes modular arithmetic operations, with α i = A m i and β i = B m i for i = 1 , 2 , , k .

2.2. Determining the Integer Value from Residue Representation

In RNS arithmetic, reconstructing the integer X from its residue code χ 1 , χ 2 , , χ k is required for performing non-modular operations. This reconstruction relies on the so-called positional characteristics of the residue code.
There are two primary approaches to constructing RNS arithmetic based on these characteristics: (1) rank-based computation and (2) mixed-radix representation [7,8,9].
According to the Chinese Remainder Theorem [29,31], for a given RNS defined by the set { m 1 , m 2 , , m k } of pairwise relatively prime moduli, the integer X and its residue code χ 1 , χ 2 , , χ k are related by the following equation:
X = i = 1 k M i , k χ i , k ρ k X M k ,
where M i , k = M k / m i , and χ i , k = M i , k 1 χ i m i is a normalized residue modulo m i for i = 1 , 2 , , k , with Y 1 m denoting the multiplicative inverse of Y modulo m.
The integer ρ k ( X ) and Equation (1) define the rank and rank form of X, respectively [2,3]. Equation (1) establishes an isomorphic mapping between X Z M k and its residue code χ 1 , χ 2 , , χ k Z m 1 × Z m 2 × × Z m k . In this context, computing the rank ρ k X enables the realization of this isomorphic mapping.

2.3. Mixed-Radix Representation

In RNS arithmetic, the rank of a number is a positional characteristic of primary importance since it enables one to perform all the non-modular operations. In a Mixed Radix System (MRS) defined by a set m 1 , m 2 , , m k of pairwise relatively prime moduli, an integer X Z M k is represented by the k-tuple x k , x k 1 , , x 1 of mixed-radix digits, resulting in
X = i = 1 k x i M i 1 ,
where x i Z m i ( i = 1 , 2 , , k ) , M 0 = 1 , M i 1 = j = 1 i 1 m j ( i = 2 , 3 , k )  [8,9].
An MRS is advantageous over an RNS for non-modular operations such as magnitude comparison, sign determination, and overflow detection.

2.4. Computational Complexity and Optimization

In the conventional non-redundant RNS, calculating both the rank ρ k ( X ) and the digits x k , x k 1 , , x 1 of the mixed-radix representation of the number X Z M k from its residue code χ 1 , χ 2 , , χ k reduces to the parallel summation of sets of small-bit-length residues corresponding to the RNS moduli m 1 , m 2 , , m k , while accounting for the number of overflows that occur in the rings Z m 1 , Z m 2 , , Z m k during the modular addition operations.
At the same time, according to [25,27], the computational complexity of calculating the rank ρ k ( X ) is O ( k 2 ) in terms of modular addition operations and the number of required lookup tables. Consequently, this becomes computationally expensive as k increases, for instance, when implementing non-modular operations based on the rank form of integer representation (1), where operands and computational results belong to large numerical ranges.
One possible way to improve modular computational structures is to employ more efficient and optimized variants of RNS arithmetic compared to the traditional approach. As is well known, the use of code redundancy can improve the arithmetic properties of an RNS [5,6].
As demonstrated in [25,27], the use of a minimally redundant RNS enables the optimization and acceleration of the rank calculation. To this end, an additional redundant residue, χ 0 = X m 0 , corresponding to an additional modulus m 0 = 2 , is added to the original residue code χ 1 , χ 2 , , χ k of the number X Z M k . Thus, the non-redundant residue code is extended by just one bit, representing the parity of the number X.
This modification reduces the computational complexity of rank calculation from O ( k 2 ) to O ( k ) [25,27].
Consequently, using minimally redundant RNS arithmetic not only optimizes but also improves the efficiency of non-modular operations that rely on the rank form of integers (1), such as the base extension operation.

2.5. Base Extension Operations

In applications involving modular computing structures, base extension operations play a fundamental role, as they serve as the foundation for synthesizing algorithms for nearly all non-modular operations, including general division, scaling, fractional multiplication, code transformations, and more. In information security systems, for instance, computationally intensive operations on large number ranges—such as Montgomery multiplication, modular exponentiation, and others—are implemented using base extension techniques [8,9].
The advantages of minimally redundant modular arithmetic in simplifying non-modular procedures are particularly evident in residue code extension operations.
Let an RNS be defined by the moduli set { m 1 , m 2 , , m k } , and suppose that the residue code ( χ 1 , χ 2 , , χ l ) of a number X is given with respect to the basis M 1 = { m 1 , m 2 , , m l } , where 1 < l < k . The task is to determine the residues corresponding to the basis M 2 = { m l + 1 , m l + 2 , , m k } , i.e., the residue code ( χ l + 1 , χ l + 2 , , χ k ) . This process constitutes the essence of the residue code extension operation from the moduli M 1 to the remaining moduli in M 2 . For this operation, we use the conventional notation χ l + 1 , χ l + 2 , , χ k = BEX ( X ; { m 1 , m 2 , , m l } , { m l + 1 , m l + 2 , , m k } ) , or, in a simplified form, BEX ( X ; M 1 , M 2 ) .
A necessary condition for the correctness of the base extension is the uniqueness of the mapping Z m 1 × Z m 2 × × Z m l Z M l . This means that the code χ 1 , χ 2 , , χ l must uniquely determine the integer X. When this condition is satisfied, the number X can be represented in rank form with respect to the moduli m 1 , m 2 , , m l , by analogy with Equation (1).
Determining the rank ρ l ( X ) is significantly simplified when the RNS defined over the basis M 1 is minimally redundant. In comparison to non-redundant residue coding, this approach yields an approximately l / 2 -fold reduction in the number of required modular addition operations and the size of lookup tables [25,27]. This advantage becomes especially important when working with large numbers.
Therefore, for the number X = χ 1 , χ 2 , , χ l , the base extension operation to the moduli m l + 1 , m l + 2 , , m k reduces to determining the rank ρ l X of the number X, and subsequently applying to its rank form the procedure for computing the residues with respect to the moduli in the basis M 2 :
χ j = i = 1 l M i , l χ i , l m j ρ l X M l m j m j j = l + 1 , l + 2 , , k .

3. Basic Computational Scheme for Montgomery Modular Multiplication

Let A and B be the operands of a multiplication operation modulo a large number p. The fundamental concept proposed by Montgomery [24] consists of an additive transformation of the product C = A B , allowing the division without remainder by a specially chosen auxiliary modulus S > p .
To achieve this, the following equation is established:
C = C + C p 1 S p .
Given that S and p are coprime, i.e., gcd ( S , p ) = 1 , Equation (3) guarantees that the integer C is a multiple of the modulus S for any values of the operands A and B.
This property is at the core of Montgomery’s approach to constructing an efficient modular multiplication scheme [24]. The modularity of Equation (3) follows from the equality:
C S = C + C p 1 S p S = C C p 1 p S = 0 .
Therefore, the number
C = C S = C + C p 1 S p S
is an integer.
Let x denote the greatest integer less than or equal to x. Following Euclid’s Division Lemma [29,30], the number C can be represented as
C = C p + C p p ,
Then, by reducing modulo p, Equation (5) simplifies to
C p = C S p = C S p = A B S p .
Thus, taking into account (6) and (7), the integer number
C ^ = A B S 1 p = C Q p
represents the expected computed result. Here, the number
Q = C p
is a uniquely determined integer factor for given operands A and B.
For A , B Z p = { 0 , 1 , , p 1 } , according to Equation (3), the following inequality holds:
C p 1 2 + S 1 p < p 2 + S p .
Since S > p , Equation (5) implies the following estimation for the number C :
C < 2 p .
Thus, according to (9), the integer factor Q can take on only two values: 0 or 1. Hence, to determine Q, it is necessary to compare the integers C and p.
Therefore, Q = CMP ( C , p ) , where
CMP ( x , y ) = 0 , if x < y 1 , if x y ,
denotes the result of comparing two integers x and y.
Based on this, the main steps of Montgomery Modular Multiplication (MM) can be summarized as follows:
C = A B , X = C Y S , C = C + X p , C = C S , C ^ = C Q p ,
where Y = | p 1 | S and Q { 0 , 1 } .
As seen from Equation (11), in MM implementing in a positional number system, the main complexity arises from the multiplication of large numbers such as A B , C Y , and X p . That also holds for calculating the multiplicative inversion Y. However, its determination complexity is not fundamentally significant as this constant is a long-term parameter obtained during preliminary calculations.
In RNS arithmetic, all operations specified in (11) are modular. So, their performance is much simpler and faster than in conventional arithmetic. The inherent parallelism of RNS primarily determines the preference for using modular arithmetic for MM implementation.

4. Realization of Montgomery Modular Multiplication in Minimally Redundant RNS

Let a primary Residue Number System (RNS) be defined on the set M = { m 1 , m 2 , , m k } of pairwise relatively prime odd moduli m i ( m i > 2 , i = 1 , 2 , , k ; k 2 ) , with an additional modulus m 0 = 2 . Consequently, the moduli set { m 0 , m 1 , , m k } forms a minimally redundant RNS. The effective number range is given by the ring Z M k = { 0 , 1 , , M k 1 } .
Let the operands A, B, and modulus p be represented by their minimally redundant residue codes:
A = α 0 , α 1 , , α k , B = β 0 , β 1 , , β k , p = π 0 , π 1 , , π k ,
where α i = A m i , β i = B m i , π i = p m i ,    and i = 0 , 1 , , k .
The complete moduli set M is a sum of moduli sets M 1 = { m 1 , m 2 , , m l } and M 2 = { m l + 1 , m l + 2 , , m k } . Here, the auxiliary modulus S is defined as the product of the moduli in the set M 1 :
S = M l = i = 1 l m i , where 1 < l < k .
Since the multiplicative inversion Y = | p 1 | M l (see (11)) belongs to the ring Z M l = { 0 , 1 , , M l 1 } , it is uniquely represented in a minimally redundant residue code as follows:
( ξ 0 , ξ 1 , , ξ l ) ,
where ξ i = π i 1 m i for i = 0 , 1 , , l .
Similarly, the number X = | C Y | M l (see (11)) is represented in a minimally redundant residue code with respect to M 1 :
X = χ 0 , χ 1 , , χ l = γ 0 ξ 0 m 0 , γ 1 ξ 1 , m 1 , , γ l ξ l m l ,
where χ i = X m i , γ i = α i β i m i i = 0 , 1 , , l .
Following (11), we compute the residue code of C = C + X p with respect to the complete moduli set M . Before proceeding, it is necessary to extend the residue code (12) to the moduli in M 2 .
Base extension is one of the most difficult non-modular operations in RNS arithmetic. Its implementation usually uses the rank form (1) [12,13,14,15,16].
Following (1), the number X with respect to M 1 is represented by the following:
X = i = 1 l M i , l χ i , l ρ l X M l ,
where, based on the theorem for rank calculation in minimally redundant RNS [25,27],
ρ l X = ρ ^ l X + δ l X ,
with ρ ^ l ( X ) as the inexact rank of the number X, and δ l ( X ) as a two-valued rank correction:
ρ ^ l ( X ) Z l = { 0 , 1 , , l 1 } , δ l ( X ) Z 2 = { 0 , 1 } .
The inexact rank ρ ^ l X is calculated as follows:
ρ ^ l X = 1 m l i = 1 l R i , l χ i ,
where
R i , l ( χ i ) = 1 m i M i , l 1 1 χ i m i m l i l ,
R l , l χ l = M l 1 1 χ l m l ,
with M l 1 = M l / m l and M i , l 1 = M l 1 / m i i = 1 , 2 , , l 1 .
Similarly, the rank correction δ l ( X ) is computed using trivial addition operations modulo m 0 = 2 :
δ l X = χ 0 + i = 1 l χ i , l 0 + ρ ^ l 0 m 0 ,
where χ i , l 0 = χ i , l m 0 , ρ ^ l 0 = ρ ^ l X m 0 , and i = 1 , 2 , , l [25,27].
Hence, according to (13) and (14), the required extension of the residue code (12) to the moduli of the set M 2 is carried out by the following rule:
χ j = i = 1 l M i , l χ i , l ρ ^ l X + δ l X M l m j ( j = l + 1 , l + 2 , , k ) ,
where ρ ^ l X and δ l X are calculated according to (15)–(17) and (18), respectively.
Let us denote this base extension operation as BEX ( X ; M 1 , M 2 ) .
The number C = C + X p (see (11)) is represented by its minimally redundant residue code ( γ 0 , γ 1 , , γ k ) concerning the moduli of the complete set M .
Since S = M l , then, according to (4), the integer C is divisible by M l without a remainder. Hence, its residues concerning the moduli of the set M 1 are equal to 0:
γ i = 0 , i = 1 , 2 , , l .
At the same time, we compute the remaining residues using the following rule:
γ j = γ j + χ j π j m j m j , j = 0 , l + 1 , l + 2 , , k .
To obtain residue code ( γ 0 , γ 1 , , γ k ) of the number,
C = C M l = C + X p M l ,
two computational stages are necessary:
  • At the 1st stage, the residues γ j j = l + 1 , l + 2 , , k concerning moduli of the set M 2 are calculated;
  • At the 2nd stage, the extension of truncated residue code ( γ l + 1 , γ l + 2 , , γ k ) to the moduli of the set M 1 are carried out.
As follows from (21), at the first stage, the residues of the integer C are calculated according to the following rule:
γ j = 1 M l γ j m j = 1 M l γ j + χ j π j m j m j j = l + 1 , l + 2 , , k .
At the same time, the two redundant residues γ 0 and γ 0 are equal. Since m 1 , m 2 , , m l are relatively prime odd moduli, we have m i m 0 = 1 for i = 1 , 2 , , l , which implies M l m 0 = 1 . Therefore,
γ 0 = 1 M l γ 0 + χ 0 π 0 m 0 m 0 = γ 0 + χ 0 π 0 m 0 m 0 = γ 0 .
At the second stage, the residue code ( γ l + 1 , γ l + 2 , , γ k ) is extended to the moduli m 1 , m 2 , , m l using a rank form of the number C concerning the moduli m l + 1 , m l + 2 , , m k :
C = j = l + 1 k M j , n M j , n 1 γ j m j ( ρ ^ n ( C ) + δ n ( C ) ) M n ,
where n = k l , M n = M k / M l = j = l + 1 k m j , M j , n = M n / m j , and j = l + 1 , l + 2 , , k . The integers ρ ^ n ( C ) and δ n ( C ) are the inexact rank and rank correction of the number C , respectively.
In this case, similar to (15)–(17), the inexact rank ρ ^ n ( C ) is calculated as follows:
ρ ^ n ( C ) = 1 m k j = l + 1 k R j , k ( γ j ) ,
where
R j , k ( γ j ) = 1 m j M j , n 1 1 γ j m j m k j k ,
R k , k ( γ j ) = M n 1 1 γ k m k ,
while M n 1 = M n / m k and M j , n 1 = M n 1 / m j for j = l + 1 , l + 2 , , k 1 .
At the same time, the rank correction δ n ( C ) { 0 , 1 } is calculated as follows:
δ n ( C ) = γ 0 + j = l + 1 k γ j , n m 0 + ρ ^ n ( C ) m 0 m 0 ,
where γ j , n = M j , n 1 γ j m j for j = l + 1 , l + 2 , , k .
Therefore, according to (24), the residues γ 1 , γ 2 , , γ l of the number C with respect to the set M 1 are calculated as follows:
γ i = j = l + 1 k M j , n γ j , n ( ρ ^ n ( C ) + δ n ( C ) ) M n m i i = 1 , 2 , , l .
Let us denote this base extension operation as BEX ( C ; M 2 , M 1 ) .
Thus, as a result of two-stage calculations, we have obtained the minimally redundant residue code ( γ 0 , γ 1 , , γ k ) of the number C Z 2 p = { 0 , 1 , , 2 p 1 } .
Further, following the computational scheme (11), we compute the integer C ^ = C Q p representing a result of MM, at that C ^ Z p = { 0 , 1 , , p 1 } . At the same time, its minimally redundant residue code γ ^ 0 , γ ^ 1 , , γ ^ k is computed according to the following rule:
γ ^ i = γ i , if Q = 0 , γ i π i m i , if Q = 1
for i = 0 , 1 , , k .
To determine the integer factor Q { 0 , 1 } according to (10), a magnitude comparison between the integers C and p is performed using a mixed-radix system (MRS).
For this purpose, the number C is converted into its mixed-radix representation, denoted as C m r s = c l , c l 1 , , c 1 . We refer to this operation as RC ( C ) . Its implementation relies on the parallel procedure proposed in [28].
Additionally, during a preliminary stage, we compute the mixed-radix digits p l , p l 1 , , p 1 for the modulus p.

5. Basic Algorithm of Montgomery Multiplication in Minimally Redundant RNS

According to the above, the implementation of the Montgomery multiplicative scheme (11) using minimally redundant RNS results in the following operational sequence:
C = A B = γ 0 , γ 1 , , γ k ; X = C Y M l = χ 0 , χ 1 , , χ l ; ( χ l + 1 , χ l + 2 , , χ k ) = BEX ( X ; M 1 , M 2 ) ; C = C + X p = ( γ 0 , γ l + 1 , , γ k ) ; C = C / M l = ( γ 0 , γ l + 1 , , γ k ) ; ( γ 1 , γ 2 , , γ l ) = BEX ( C ; M 2 , M 1 ) ; C ^ = C CMP ( C , p ) p = ( γ ^ 0 , γ ^ 1 , , γ ^ k ) .
A necessary and sufficient correctness condition for the two-stage process of calculating C = ( γ 0 , γ 1 , , γ k ) , according to (22), (23) and (29), for C = A B Z M k , is that the numbers X and C should belong to the range Z M n = { 0 , 1 , , M n 1 } with respect to the moduli m l + 1 , m l + 2 , , m k .
Since the inequality 0 C = C / M l < M n implies the inequality 0 C < M k , the condition C Z M n guarantees the fulfillment of the condition C Z M k .
Now, let us explore the necessary conditions for m 1 , m 2 , , m k and p to ensure that X , C Z M n .
Let A , B Z p = { 0 , 1 , , p 1 } . Since, according to (12), X < M l , then from (21) we obtain the inequality
C < ( p 2 + p M l ) / M l = p ( 1 + p / M l ) .
From (32), it follows that C Z 2 p = { 0 , 1 , , 2 p 1 } only when p / M l < 1 .
On the other hand, the numbers X and C should be the elements of the range Z M n concerning moduli m l + 1 , m l + 2 , , m k . Therefore, taking into account that X < M l and C < 2 p , this requirement leads to the following inequalities:
M l < M k / M l
and
2 p < M k / M l .
Thus, in the case when A , B Z p and C Z 2 p , the correctness of the proposed MM scheme is ensured by the conditions (33) and (34).
As already noted, the magnitude comparison of the integers C and p is performed using MRS, resulting in CMP ( C , p ) (see (10)). Therefore, the correctness of the reverse conversion C < 2 p is ensured by using the RNS number range that includes the range Z 2 p . As follows from (33) and (34), the number range Z M k significantly exceeds the range Z 2 p . To determine the integer factor Q, according to (10), it is sufficient to use the truncated RNS defined by the moduli set M 1 with the number range Z M l .
Based on the above, the following statement is true.
Theorem 1.
Let the moduli of the sets M 1 = { m 1 , m 2 , , m l } and M 2 = { m l + 1 , m l + 2 , , m k } , together with the modulus p meet the conditions
M l < M k / M l , 2 p < M k / M l ,
and let the operands A, B of the multiplicative operation C ^ = A B M l 1 p belong to the ring Z p = { 0 , 1 , , p 1 } . Then, the integer C (see (21)) is an element of the ring Z 2 p = { 0 , 1 , , 2 p 1 } . At the same time, the integer C ^ Z p is calculated using the equality
C ^ = C CMP ( C , p ) p ,
A similar result can also be obtained in the case when A and B belong to the range Z 2 p . Therefore, by analogy to (32), the following inequality
C < 4 p 2 + p M l M l = p 1 + 4 p / M l
leads to the condition
M l > 4 p .
Hence, the following statement is true.
Theorem 2.
Let the moduli of the sets M 1 = { m 1 , m 2 , , m l } and M 2 = { m l + 1 , m l + 2 , , m k } together with the modulus p meet the conditions
4 p < M l , 2 p < M k / M l ,
and let the operands A, B of the multiplicative operation C ^ = A B M l 1 p belong to the ring Z 2 p = { 0 , 1 , , 2 p 1 } . Then, the integer C (see (21)) also belongs to the ring Z 2 p . At the same time, the integer C ^ is calculated according to (36).
Based on the optimized Montgomery multiplicative scheme (31) in minimally redundant RNS, an algorithm for multiplication over a large modulus is synthesized. This algorithm has a tabular configuration and possesses all the fundamental advantages of minimally redundant residue arithmetic, which most appear when operating in the ranges of large numbers. The developed algorithm is suitable for both software realization and implementation using parallel modular computational structures with a multiprocessor architecture.
Considering the aforementioned discussion, the implementation of MM in a minimally redundant RNS can be formally presented as Algorithm 1.
This algorithm’s key distinguishing feature is its inherent suitability for using lookup table techniques. In this case, the required set of lookup tables is generated during the preliminary stage, ensuring the minimization of the labor intensity of the real-time MM implementation.
Algorithm 1: Montgomery multiplication in minimally redundant RNS
Input: A = α 0 , α 1 , , α k , B = β 0 , β 1 , , β k ,
    p = π 0 , π 1 , , π k = p l , p l 1 , , p 1
Output: C ^ = A B M l 1 p
  1 
C = A B = γ 0 , γ 1 , , γ k , γ i = α i β i m i i = 0 , 1 , , k
  2 
X = C Y M l = χ 0 , χ 1 , , χ l , χ i = γ i | π i 1 | m i m i , i = 0 , 1 , , l
  3 
ρ l ( X ) = ρ ^ l ( X ) + δ l ( X )
  4 
χ l + 1 , χ l + 2 , , χ k = BEX ( X ; M 1 , M 2 )
  5 
C = C + X p = ( γ 0 , γ l + 1 , , γ k ) , γ j = γ j + χ j π j m j m j j = 0 , l + 1 , l + 2 , , k
  6 
C = C / M l = ( γ 0 , γ l + 1 , , γ k ) , γ j = M l 1 γ j m j j = l + 1 , l + 2 , , k
  7 
ρ n ( C ) = ρ ^ n ( C ) + δ n ( C )
  8 
( γ 1 , γ 2 , , γ l ) = BEX ( C ; M 2 , M 1 )
  9 
c l , c l 1 , , c 1 = RC ( C )
10 
Q = CMP ( C , p )
11 
C ^ = C Q p = γ ^ 0 , γ ^ 1 , , γ ^ k , γ ^ j = γ j Q π j m j j = 0 , 1 , , k

6. Discussion

Let us now discuss the theoretical and practical aspects of the proposed approach.
An analysis of well-established approaches to implementing Montgomery multiplication in RNS reveals that two primary methods for computing the rank—an essential step in performing the critical base extension operations—are commonly employed [8,9,12,14,15,16,17].
The first method enables the determination of the integer value of an RNS number by computing an integer correction factor, which essentially corresponds to the rank of the number. In this case, the implementation of the CRT yields an exact integer reconstruction. This approach involves the inclusion of an additional redundant modulus m k + 1 , which depends on the number k of primary RNS moduli, into the original moduli-set { m 1 , m 2 , , m k } .
As a result, this method requires an additional modular channel with a width of log 2 k + 1 bits, along with redundant lookup tables.
The main idea of an alternative approach is to represent the rank as the integer part of a sum of at most k proper fractions of the form χ i , k / m i , where i = 1 , 2 , , k . The rank value is then estimated recursively by approximating these fractions. To eliminate division by the modulus m i , each denominator m i is replaced by a power of 2, and the corresponding numerator χ i , k is approximated using its most significant bits. Since division by powers of 2 is equivalent to simple bit shifts, the rank computation is reduced to a sequence of additions.
The main drawbacks of this method are as follows. First, it relies on recursive computations, as the algorithm evaluates the rank value in a sequential, bit-by-bit manner. Moreover, the number of required iterations depends on the bit length used for the approximation. As a result, this method leads to significantly prolonged computation times when dealing with large numbers. Additionally, due to the approximate nature of the algorithm, it does not always produce the correct rank value.

6.1. Computational Costs of Proposed MM Algorithm

In RNS arithmetic, the implementation of MM (see (31)) reduces to performing modular operations on small bit-length residues concerning the moduli of the sets M 1 and M 2 using the lookup tables technique.
To assess computational efficiency, we compare the computational costs presented in Table 1, where N mo and N mo * represent the number of required modular operations at the corresponding steps of Algorithm 1 both in minimally redundant RNS and conventional non-redundant RNS, respectively. Additionally, C m r s and p m r s denote the mixed-radix representations of the integers C and p, respectively.
Computing the modular code of the number C = A B requires k + 1 modular operations in a minimally redundant RNS, and k operations in a conventional RNS.
Computing the modular code of the number X = | C Y | M l with respect to the moduli m 1 , m 2 , , m l in the set M 1 requires l + 1 modular operations in a minimally redundant RNS, and l operations in a conventional RNS.
The modular operation costs required to compute the rank ρ l ( X ) of the number X with respect to the moduli in the set M 1 are based on the results presented in [27], and amount to l and ( l 2 + 5 l 10 ) / 2 modular operations, respectively (see Equations (14)–(18)).
The base extension operation BEX ( X ; M 1 , M 2 ) , which extends the modular code of the number X from the moduli set M 1 to M 2 , requires l ( k l ) modular operations in both minimally redundant and conventional RNS implementations (see Equation (19)).
Computing the modular code of the number C = ( C + X p ) / M l with respect to the moduli set M 2 , while taking into account the redundant module m 0 , requires k l + 1 modular operations in a minimally redundant RNS and k l operations in a conventional RNS (see Equations (19)–(23)).
Similarly, computing the rank ρ n ( C ) of the number C with respect to the moduli set M 2 requires k l modular operations in a minimally redundant RNS, and ( ( k l ) 2 + 5 ( k l ) 10 ) / 2 operations in a conventional RNS (see Equations (25)–(28) and [27]).
The base extension operation BEX ( C ; M 2 , M 1 ) , which maps the modular code of the number C from M 2 back to M 1 , also requires ( k l ) l modular operations in both minimally redundant and conventional RNS implementations (see Equation (29)).
Calculating the mixed-radix representation C m r s of the number C requires ( l 2 + 3 l 8 ) / 2 modular operations, as shown in [28].
Comparing the numbers C m r s and p m r s in the MRS requires l modular operations.
In the final step, computing the modular code of the number C ^ = C Q p , taking into account the redundant modulus m 0 , requires k + 1 modular operations in a minimally redundant RNS, and k operations in a conventional RNS (see Equation (29)).
Without loss of generality and for simplicity, let us assume that the moduli of the sets M 1 = { m 1 , m 2 , , m l } and M 2 = { m l + 1 , m l + 2 , , m k } are selected such that k = 2 l .
Under these assumptions, when employing a minimally redundant RNS, the total number of required modular operations, as derived from Table 1, is given by
N 1 l = ( 5 l 2 + 21 l ) / 2 .
In contrast, for a conventional non-redundant RNS, the required number of modular operations is
N 1 * l = ( 7 l 2 + 27 l 28 ) / 2 .
Hence, when A, B and C ^ belong to Z p (see Theorem 1), the reduction factor in computational complexity for implementing MM in a minimally redundant RNS, compared to a conventional non-redundant RNS, is given by
C 1 ( l ) = N 1 * l N 1 l = 1 + 2 l 2 + 6 l 28 5 l 2 + 21 l .
For instance, the computed values of C 1 ( l ) for specific values of l are as follows: C 1 10 1.33 , C 1 15 1.36 , and C 1 20 1.37 . Furthermore, as l increases, the factor C 1 ( l ) continues to grow, asymptotically approaching a limiting value of 1.40.

6.2. Basic Exponentiation Algorithm by Large Modulus

It is well known that MM is primarily used for computing powers of natural numbers with respect to a large modulus p:
F ( X , E , p ) = X E p ,
where the numbers X and E are represented by their residue and binary codes, respectively: X = χ 1 , χ 2 , , χ k and E = e b 1 , e b 2 , , e 0 2 , where e b 1 =1 and b denotes the bit-length of E.
The well-known method for computing (43) relies on the multiplicative decomposition of the exponentiation function:
F ( X , E , p ) = X j = 0 b 1 e j 2 j p = X e 0 ( X e 1 ( ( X e b 2 X e b 1 2 ) 2 ) 2 ) 2 p .
All multiplicative operations in (44) are performed within the framework of scheme (11), which allows the reuse of previously computed results as subsequent operands. In this case, the fact that the number C belongs to the ring Z 2 p when A , B Z 2 p is ensured by condition (39). Consequently, the correction operation (36) needs to be applied only at the final step of the computation in accordance with (44).
In a minimally redundant RNS, the implementation of MM when A , B , and C belong to Z 2 p requires
N 2 l = 2 l 2 + 6 l + 3
modular operations (see Table 1).
In contrast, in a conventional non-redundant RNS, the number of required modular operations is given by
N 2 * l = 3 l 2 + 9 l 10 .
Thus, in the considered case, based on (45) and (46), the computational complexity reduction factor is expressed as
C 2 l = N 2 * l N 2 l = 1 + l 2 + 3 l 13 2 l 2 + 6 l + 3 .
For example, the values of C 2 ( l ) for specific values of l are as follows: C 2 ( 10 ) 1.46 , C 2 ( 15 ) 1.48 , and C 2 ( 20 ) 1.49 . Furthermore, as l increases, the reduction factor C 2 ( l ) (see (47)) continues to grow, asymptotically approaching the value 1.50.
It is important to note that the same reduction factor of computational complexity is also achieved when implementing the exponentiation procedure (44).

7. Conclusions

This paper presents an efficient implementation of MM in a minimally redundant RNS. The core feature of the proposed approach consists of using an advanced rank calculation method introduced by the author in [25,27].
The key contributions of this work are as follows:
  • The use of minimally redundant RNS significantly reduces the computational cost of MM implementation compared to conventional non-redundant RNS. Moreover, the complexity of rank computation for the moduli sets M 1 and M 2 is reduced by a factor of l / 2 and ( k l ) / 2 , respectively. This optimization improves the efficiency of base extension operations BEX ( X ; M 1 , M 2 ) and BEX ( C ; M 2 , M 1 ) , as well as the overall MM procedure;
  • The conditions used to choose the moduli sets M 1 and M 2 with respect to the operating modulus p are established (see Theorems 1 and 2). These conditions ensure the correctness of multiple accesses to the MM procedure when both operands and results reside in the ring Z 2 p . This property is particularly beneficial for implementing modular exponentiation procedures based on MM;
  • A novel MM algorithm for minimally redundant RNS is introduced. This algorithm is well suited for efficient implementation using lookup table techniques. Moreover, the most computationally demanding operations required for generating lookup tables are organized as a preprocessing step, executed independently before the main computational procedure;
  • The computational costs of the proposed MM scheme in a minimally redundant RNS are analyzed and compared with those of a non-redundant RNS. The results demonstrate that the minimally redundant RNS implementation requires 1.5 times fewer computational resources when operands and results belong to the ring Z 2 p . This advantage also extends to the modular exponentiation procedure based on MM.
In summary, the intrinsic parallelism and tabular nature of minimally redundant RNS arithmetic, combined with the simplicity of rank computation and base extension operations, significantly enhance the performance of Montgomery multiplication-based algorithms.

Author Contributions

The authors have equally contributed to this work and have read and improved the final version of the manuscript. Conceptualization, M.S.; investigation, B.W.-S.; methodology, M.S.; writing—original draft preparation, M.S.; writing—review and editing, B.W.-S.; funding acquisition, B.W.-S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the authors.

Acknowledgments

We sincerely thank the anonymous reviewers for their valuable suggestions and opinions.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Szabo, N.S.; Tanaka, R.I. Residue Arithmetic and Its Application to Computer Technology; McGraw-Hill: New York, NY, USA, 1967. [Google Scholar]
  2. Akushskii, I.Y.; Juditskii, D.I. Machine Arithmetic in Residue Classes; Soviet Radio: Moscow, Russia, 1968. (In Russian) [Google Scholar]
  3. Amerbayev, V.M. Theoretical Foundations of Machine Arithmetic; Nauka: Alma-Ata, Kazakhstan, 1976. (In Russian) [Google Scholar]
  4. Soderstrand, M.A.; Jenkins, W.K.; Jullien, G.A.; Taylor, F.J. (Eds.) Residue Number System Arithmetic: Modern Applications in Digital Signal Processing; IEEE Press: New York, NY, USA, 1986. [Google Scholar]
  5. Kolyada, A.A.; Pak, I.T. Modular Structures of Pipeline Digital Information Processing; Belarusian State University: Minsk, Belarus, 1992. (In Russian) [Google Scholar]
  6. Chernyavsky, A.F.; Danilevich, V.V.; Kolyada, A.A.; Selyaninov, M.Y. High-Speed Methods and Systems of Digital Information Processing; Belarusian State University: Minsk, Belarus, 1996. (In Russian) [Google Scholar]
  7. Omondi, A.R.; Premkumar, B. Residue Number Systems: Theory and Implementation; Imperial College Press: London, UK, 2007. [Google Scholar]
  8. Ananda Mohan, P.V. Residue Number Systems. Theory and Applications; Springer: Cham, Switzerland, 2016. [Google Scholar]
  9. Molahosseini, A.S.; de Sousa, L.S.; Chang, C.H. (Eds.) Embedded Systems Design with Special Arithmetic and Number Systems; Springer: Cham, Switzerland, 2017. [Google Scholar]
  10. Posch, K.S.; Posch, R. Modulo reduction in residue number systems. IEEE Trans. Parallel Distrib. Syst. 1995, 6, 449–454. [Google Scholar] [CrossRef]
  11. Schwemmlein, J.; Posc, K.S.; Posch, R. RNS-modulo reduction upon a restricted base value set and its applicability to RSA cryptography. Comput. Secur. 1998, 17, 637–650. [Google Scholar] [CrossRef]
  12. Bajard, J.-C.; Didier, L.-S.; Kornerup, P. An RNS Montgomery modular multiplication algorithm. IEEE Trans. Comput. 1998, 47, 766–776. [Google Scholar] [CrossRef]
  13. Hiasat, A.A. New efficient structure for a modular multiplier for RNS. IEEE Trans. Comput. 2000, 49, 170–174. [Google Scholar] [CrossRef]
  14. Kawamura, S.; Koike, M.; Sano, F.; Shimbo, A. Cox-rower architecture for fast parallel Montgomery multiplication. In Proceedings of the EUROCRYPT’00: 19th International Conference on Theory and Application of Cryptographic Techniques, Bruges, Belgium, 14–18 May 2000; Springer: Berlin/Heidelberg, Germany, 2000; pp. 523–538. [Google Scholar]
  15. Nozaki, H.; Motoyama, M.; Shimbo, A.; Kawamura, S. Implementation of RSA Algorithm Based on RNS Montgomery Multiplication. In Proceedings of the 3rd International Workshop on Cryptographic Hardware and Embedded Systems (CHES 2001), Paris, France, 14–16 May 2001; LNCS. Springer: Berlin/Heidelberg, Germany, 2001; Volume 2162, pp. 364–376. [Google Scholar]
  16. Bajard, J.-C.; Didier, L.-S.; Kornerup, P. Modular multiplication and base extensions in residue number systems. In Proceedings of the 15th IEEE Symposium on Computer Arithmetic (ARITH 2001), Vail, CO, USA, 11–13 June 2001; pp. 59–65. [Google Scholar]
  17. Bajard, J.-C.; Imbert, L. A full RNS implementation of RSA. IEEE Trans. Comput. 2004, 53, 769–774. [Google Scholar] [CrossRef]
  18. Lim, Z.; Phillips, B.J. An RNS-Enhanced Microprocessor Implementation of Public Key Cryptography. In Proceedings of the ACSSC 2007: Conference Record of the Forty-First Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, USA, 4–7 November 2007; pp. 143–1434. [Google Scholar]
  19. Shieh, M.-D.; Chen, J.-H.; Wu, H.-H.; Lin, W.C. A New Modular Exponentiation Architecture for Efficient Design of RSA Cryptosystem. IEEE Trans. VLSI Syst. 2008, 16, 1151–1161. [Google Scholar] [CrossRef]
  20. Gandino, F.; Lamberti, F.; Paravati, G.; Bajard, J.-C.; Montuschi, P. An algorithmic and architectural study on Montgomery exponentiation in RNS. IEEE Trans. Comput. 2012, 61, 1071–1083. [Google Scholar] [CrossRef]
  21. Schinianakis, D.; Stouraitis, T. Multifunction residue architectures for cryptography. IEEE Trans. Circuits Syst. 2014, 61, 1156–1169. [Google Scholar] [CrossRef]
  22. Bajard, J.-C.; Eynard, J.; Merkiche, N. Montgomery reduction within the context of residue number system arithmetic. J. Cryptogr. Eng. 2018, 8, 189–200. [Google Scholar] [CrossRef]
  23. Omondi, A.R. Cryptography Arithmetic. Algorithms and Hardware Architectures; Springer: Cham, Switzerland, 2020. [Google Scholar]
  24. Montgomery, P.L. Modular multiplication without trial division. Math. Comput. 1985, 44, 519–521. [Google Scholar] [CrossRef]
  25. Selianinau, M. An efficient implementation of the Chinese Remainder Theorem in minimally redundant Residue Number System. Comput. Sci. 2020, 21, 237–252. [Google Scholar] [CrossRef]
  26. Selianinau, M. An efficient implementation of the CRT algorithm based on an interval-index characteristic and minimum redundancy residue code. Int. J. Comput. Meth. 2020, 17, 2050004. [Google Scholar] [CrossRef]
  27. Selianinau, M. Computationally efficient approach to implementation of the Chinese Remainder Theorem algorithm in minimally redundant Residue Number System. Theory Comput. Syst. 2021, 65, 1117–1140. [Google Scholar] [CrossRef]
  28. Selianinau, M. An efficient parallel reverse conversion of residue code to mixed-radix representation based on the Chinese Remainder Theorem. Entropy 2022, 24, 242. [Google Scholar] [CrossRef] [PubMed]
  29. Shoup, V. A Computational Introduction to Number Theory and Algebra, 2nd ed.; University Press: Cambridge, UK, 2005. [Google Scholar]
  30. Hardy, G.H.; Wright, E.M. An Introduction to the Theory of Numbers, 6th ed.; Oxford University Press: London, UK, 2008. [Google Scholar]
  31. Ding, C.; Pei, D.; Salomaa, A. Chinese Remainder Theorem: Applications in Computing, Coding, Cryptography; World Scientific Publishing: Singapore, 1996. [Google Scholar]
Table 1. The number of required modular operation.
Table 1. The number of required modular operation.
Step N mo N mo *
C = A B k + 1 k
X = C Y M l l + 1 l
ρ l ( X ) = ρ ^ l ( X ) + δ l ( X ) l ( l 2 + 5 l 10 ) / 2
BEX ( X ; M 1 , M 2 ) l ( k l ) l ( k l )
C = ( C + X p ) / M l k l + 1 k l
ρ n ( C ) = ρ ^ n ( C ) + δ n ( C ) k l ( ( k l ) 2 + 5 ( k l ) 10 ) / 2
BEX ( C ; M 2 , M 1 ) ( k l ) l ( k l ) l
C m r s = RC ( C ) ( l 2 + 3 l 8 ) / 2 ( l 2 + 3 l 8 ) / 2
Q = CMP ( C m r s , p m r s ) ll
C ^ = C Q p k + 1 k
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Selianinau, M.; Woźna-Szcześniak, B. An Efficient Implementation of Montgomery Modular Multiplication Using a Minimally Redundant Residue Number System. Appl. Sci. 2025, 15, 5332. https://doi.org/10.3390/app15105332

AMA Style

Selianinau M, Woźna-Szcześniak B. An Efficient Implementation of Montgomery Modular Multiplication Using a Minimally Redundant Residue Number System. Applied Sciences. 2025; 15(10):5332. https://doi.org/10.3390/app15105332

Chicago/Turabian Style

Selianinau, Mikhail, and Bożena Woźna-Szcześniak. 2025. "An Efficient Implementation of Montgomery Modular Multiplication Using a Minimally Redundant Residue Number System" Applied Sciences 15, no. 10: 5332. https://doi.org/10.3390/app15105332

APA Style

Selianinau, M., & Woźna-Szcześniak, B. (2025). An Efficient Implementation of Montgomery Modular Multiplication Using a Minimally Redundant Residue Number System. Applied Sciences, 15(10), 5332. https://doi.org/10.3390/app15105332

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop