1. Introduction
Since the mid-1950s, RNS arithmetic has attracted significant attention from researchers and practitioners across various domains, including number-theoretic methods, computer technology, digital signal processing, cryptography, and communications [
1,
2,
3,
4,
5,
6,
7,
8,
9].
RNS arithmetic is particularly valuable in parallel processing systems, where it enables fast and accurate computations. The inherent parallelism of modular computational structures provides several key advantages over positional numeral systems, especially for operations involving large numbers. These advantages include the following:
- performance time independence of parallel modular operations from the moduli number and, consequently, the length of the residue code; 
- high adaptability of modular arithmetic algorithms to tabular calculations and pipeline processing; 
- flexibility in using lookup table techniques for reconfiguring modular computational structures, among others. 
A well-known application of modular arithmetic is in cryptographic systems, where multiplicative operations over large moduli form the foundation for various information security mechanisms [
10,
11,
12,
13,
14,
15,
16,
17,
18,
19,
20,
21,
22,
23]. These operations are extensively utilized in electronic signature schemes, public key cryptosystems based on RSA and Rabin’s method, and other cryptographic tasks. Therefore, advancing computational technologies that enhance modular multiplication and exponentiation efficiency over large-number ranges remains a crucial research direction.
One of the most promising approaches for achieving high-performance modular arithmetic is using RNS-based modular computational structures. As the degree of modularity in computational processes increases, the efficiency of these structures improves significantly. This effect is particularly evident when handling large-number arithmetic operations.
A notable example is the Montgomery multiplication (MM) algorithm implemented using RNS [
10,
11,
12,
13,
22]. Montgomery multiplication replaces general division with an integer division operation, which is well suited for RNS configurations [
24]. This property makes MM in RNS highly efficient.
However, a principal computational bottleneck in modular arithmetic is the residue code extension required in MM implementations. Optimizing these operations is a key challenge in developing high-performance cryptographic hardware and software for large-number computations. One promising solution is to use minimally redundant RNS, which allows for more efficient execution of fundamental non-modular operations such as base extension and reverse conversion based on the Chinese Remainder Theorem (CRT) [
5,
6,
25,
26,
27,
28].
This paper aims to take fundamental advantages of minimally redundant RNS arithmetic to optimize multiplication algorithms based on Montgomery’s scheme. By refining the base extension and conversion processes, we enhance the overall efficiency of MM in large-number cryptographic applications.
This paper provides a rigorous theoretical framework for optimizing modular multiplication in minimally redundant Residue Number Systems (RNSs), offering new insights into the computational efficiency of cryptographic operations. Using minimally redundant arithmetic, the study improves the performance of modular computations, which are fundamental in public-key cryptography, secure computing, and high-performance arithmetic.
Rather than focusing on software or hardware implementation, this work contributes by presenting formal algorithmic descriptions, mathematical proofs, and correctness guarantees. These theoretical foundations serve as a basis for future advances in secure computing, number-theoretic algorithms, and parallel processing architectures.
For clarity and reproducibility, the paper provides detailed algorithmic methodologies and mathematical derivations, making it accessible to researchers developing efficient cryptographic protocols, modular arithmetic techniques, and computational number theory applications. The structured formal approach to RNS-based Montgomery multiplication supports interdisciplinary applications in computer science, digital signal processing, and cryptographic engineering.
The remainder of this paper is structured as follows. 
Section 2 and 
Section 3 introduce the theoretical foundations of the research. 
Section 4 presents the mathematical background of the proposed method. 
Section 5 describes the MM algorithm and analyzes the impact of RNS moduli selection. 
Section 6 provides a discussion of the results. 
Section 7 concludes the paper.
  2. Basic Principles of RNS Arithmetic
Residue Number System (RNS) arithmetic is based on abstract algebra and number theory [
29,
30].
In the set of integers , an RNS is defined by a set of pairwise coprime natural moduli  where  for  and . In this system, an integer  is represented by the k-tuple , where  is the residue of X modulo , i.e.,  belongs to the set , ensuring  for all 
An RNS allows the unique representation of up to  integers through their residue codes. The number range is typically defined by .
  2.1. Arithmetic Operations in RNS
Due to the carry-free nature of RNS arithmetic, addition, subtraction, and multiplication can be executed in parallel, independently for each modulus, following the rule below:
        where 
 denotes modular arithmetic operations, with 
 and 
 for 
.
  2.2. Determining the Integer Value from Residue Representation
In RNS arithmetic, reconstructing the integer X from its residue code  is required for performing non-modular operations. This reconstruction relies on the so-called positional characteristics of the residue code.
There are two primary approaches to constructing RNS arithmetic based on these characteristics: (1) rank-based computation and (2) mixed-radix representation [
7,
8,
9].
According to the Chinese Remainder Theorem [
29,
31], for a given RNS defined by the set 
 of pairwise relatively prime moduli, the integer 
X and its residue code 
 are related by the following equation:
        where 
, and 
 is a normalized residue modulo 
 for 
, with 
 denoting the multiplicative inverse of 
Y modulo 
m.
The integer 
 and Equation (
1) define the 
rank and 
rank form of 
X, respectively [
2,
3]. Equation (
1) establishes an isomorphic mapping between 
 and its residue code 
. In this context, computing the rank 
 enables the realization of this isomorphic mapping.
  2.3. Mixed-Radix Representation
In RNS arithmetic, the rank of a number is a positional characteristic of primary importance since it enables one to perform all the non-modular operations. In a Mixed Radix System (MRS) defined by a set 
 of pairwise relatively prime moduli, an integer 
 is represented by the 
k-tuple 
 of mixed-radix digits, resulting in
        where 
, 
, 
 [
8,
9].
An MRS is advantageous over an RNS for non-modular operations such as magnitude comparison, sign determination, and overflow detection.
  2.4. Computational Complexity and Optimization
In the conventional non-redundant RNS, calculating both the rank  and the digits  of the mixed-radix representation of the number  from its residue code  reduces to the parallel summation of sets of small-bit-length residues corresponding to the RNS moduli , while accounting for the number of overflows that occur in the rings  during the modular addition operations.
At the same time, according to [
25,
27], the computational complexity of calculating the rank 
 is 
 in terms of modular addition operations and the number of required lookup tables. Consequently, this becomes computationally expensive as 
k increases, for instance, when implementing non-modular operations based on the rank form of integer representation (
1), where operands and computational results belong to large numerical ranges.
One possible way to improve modular computational structures is to employ more efficient and optimized variants of RNS arithmetic compared to the traditional approach. As is well known, the use of code redundancy can improve the arithmetic properties of an RNS [
5,
6].
As demonstrated in [
25,
27], the use of a minimally redundant RNS enables the optimization and acceleration of the rank calculation. To this end, an additional redundant residue, 
, corresponding to an additional modulus 
, is added to the original residue code 
 of the number 
. Thus, the non-redundant residue code is extended by just one bit, representing the parity of the number 
X.
This modification reduces the computational complexity of rank calculation from 
 to 
 [
25,
27].
Consequently, using minimally redundant RNS arithmetic not only optimizes but also improves the efficiency of non-modular operations that rely on the rank form of integers (
1), such as the base extension operation.
  2.5. Base Extension Operations
In applications involving modular computing structures, base extension operations play a fundamental role, as they serve as the foundation for synthesizing algorithms for nearly all non-modular operations, including general division, scaling, fractional multiplication, code transformations, and more. In information security systems, for instance, computationally intensive operations on large number ranges—such as Montgomery multiplication, modular exponentiation, and others—are implemented using base extension techniques [
8,
9].
The advantages of minimally redundant modular arithmetic in simplifying non-modular procedures are particularly evident in residue code extension operations.
Let an RNS be defined by the moduli set , and suppose that the residue code  of a number X is given with respect to the basis , where . The task is to determine the residues corresponding to the basis , i.e., the residue code . This process constitutes the essence of the residue code extension operation from the moduli  to the remaining moduli in . For this operation, we use the conventional notation , or, in a simplified form, .
A necessary condition for the correctness of the base extension is the uniqueness of the mapping 
. This means that the code 
 must uniquely determine the integer 
X. When this condition is satisfied, the number 
X can be represented in rank form with respect to the moduli 
, by analogy with Equation (
1).
Determining the rank 
 is significantly simplified when the RNS defined over the basis 
 is minimally redundant. In comparison to non-redundant residue coding, this approach yields an approximately 
-fold reduction in the number of required modular addition operations and the size of lookup tables [
25,
27]. This advantage becomes especially important when working with large numbers.
Therefore, for the number 
, the base extension operation to the moduli 
 reduces to determining the rank 
 of the number 
X, and subsequently applying to its rank form the procedure for computing the residues with respect to the moduli in the basis 
:
  3. Basic Computational Scheme for Montgomery Modular Multiplication
Let 
A and 
B be the operands of a multiplication operation modulo a large number 
p. The fundamental concept proposed by Montgomery [
24] consists of an additive transformation of the product 
, allowing the division without remainder by a specially chosen auxiliary modulus 
.
To achieve this, the following equation is established:
Given that 
S and 
p are coprime, i.e., 
, Equation (
3) guarantees that the integer 
 is a multiple of the modulus 
S for any values of the operands 
A and 
B.
This property is at the core of Montgomery’s approach to constructing an efficient modular multiplication scheme [
24]. The modularity of Equation (
3) follows from the equality:
	  Therefore, the number
      is an integer.
Let 
 denote the greatest integer less than or equal to 
x. Following Euclid’s Division Lemma [
29,
30], the number 
 can be represented as
	  Then, by reducing modulo 
p, Equation (
5) simplifies to
	  Thus, taking into account (
6) and (
7), the integer number
      represents the expected computed result. Here, the number
      is a uniquely determined integer factor for given operands 
A and 
B.
For 
, according to Equation (
3), the following inequality holds:
Since 
, Equation (
5) implies the following estimation for the number 
:
Thus, according to (
9), the integer factor 
Q can take on only two values: 0 or 1. Hence, to determine 
Q, it is necessary to compare the integers 
 and 
p.
Therefore, 
, where
      denotes the result of comparing two integers 
x and 
y.
Based on this, the main steps of Montgomery Modular Multiplication (MM) can be summarized as follows:
      where 
 and 
.
As seen from Equation (
11), in MM implementing in a positional number system, the main complexity arises from the multiplication of large numbers such as 
, 
, and 
. That also holds for calculating the multiplicative inversion 
Y. However, its determination complexity is not fundamentally significant as this constant is a long-term parameter obtained during preliminary calculations.
In RNS arithmetic, all operations specified in (
11) are modular. So, their performance is much simpler and faster than in conventional arithmetic. The inherent parallelism of RNS primarily determines the preference for using modular arithmetic for MM implementation.
  4. Realization of Montgomery Modular Multiplication in Minimally Redundant RNS
Let a primary Residue Number System (RNS) be defined on the set  of pairwise relatively prime odd moduli  , with an additional modulus . Consequently, the moduli set  forms a minimally redundant RNS. The effective number range is given by the ring .
Let the operands 
A, 
B, and modulus 
p be represented by their minimally redundant residue codes:
      where 
   and 
The complete moduli set 
 is a sum of moduli sets 
 and 
. Here, the auxiliary modulus 
S is defined as the product of the moduli in the set 
:
Since the multiplicative inversion 
 (see (
11)) belongs to the ring 
, it is uniquely represented in a minimally redundant residue code as follows:
      where 
 for 
.
Similarly, the number 
 (see (
11)) is represented in a minimally redundant residue code with respect to 
:
      where 
, 
.
Following (
11), we compute the residue code of 
 with respect to the complete moduli set 
. Before proceeding, it is necessary to extend the residue code (
12) to the moduli in 
.
Base extension is one of the most difficult non-modular operations in RNS arithmetic. Its implementation usually uses the rank form (
1) [
12,
13,
14,
15,
16].
Following (
1), the number 
X with respect to 
 is represented by the following:
      where, based on the theorem for rank calculation in minimally redundant RNS [
25,
27],
      with 
 as the inexact rank of the number 
X, and 
 as a two-valued rank correction:
The inexact rank 
 is calculated as follows:
      where
      with 
 and 
.
Similarly, the rank correction 
 is computed using trivial addition operations modulo 
:
      where 
, 
, and 
 [
25,
27].
Hence, according to (
13) and (
14), the required extension of the residue code (
12) to the moduli of the set 
 is carried out by the following rule:
      where 
 and 
 are calculated according to (
15)–(
17) and (
18), respectively.
Let us denote this base extension operation as .
The number 
 (see (
11)) is represented by its minimally redundant residue code 
 concerning the moduli of the complete set 
.
Since 
, then, according to (
4), the integer 
 is divisible by 
 without a remainder. Hence, its residues concerning the moduli of the set 
 are equal to 0:
At the same time, we compute the remaining residues using the following rule:
To obtain residue code 
 of the number,
      two computational stages are necessary:
- At the 1st stage, the residues  concerning moduli of the set  are calculated; 
- At the 2nd stage, the extension of truncated residue code  to the moduli of the set  are carried out. 
As follows from (
21), at the first stage, the residues of the integer 
 are calculated according to the following rule:
At the same time, the two redundant residues 
 and 
 are equal. Since 
 are relatively prime odd moduli, we have 
 for 
, which implies 
. Therefore,
At the second stage, the residue code 
 is extended to the moduli 
 using a rank form of the number 
 concerning the moduli 
:
      where 
, 
, 
, and 
. The integers 
 and 
 are the inexact rank and rank correction of the number 
, respectively.
In this case, similar to (
15)–(
17), the inexact rank 
 is calculated as follows:
      where
      while 
 and 
 for 
.
At the same time, the rank correction 
 is calculated as follows:
      where 
 for 
.
Therefore, according to (
24), the residues 
 of the number 
 with respect to the set 
 are calculated as follows:
Let us denote this base extension operation as .
Thus, as a result of two-stage calculations, we have obtained the minimally redundant residue code  of the number .
Further, following the computational scheme (
11), we compute the integer 
 representing a result of MM, at that 
. At the same time, its minimally redundant residue code 
 is computed according to the following rule:
      for 
.
To determine the integer factor 
 according to (
10), a magnitude comparison between the integers 
 and 
p is performed using a mixed-radix system (MRS).
For this purpose, the number 
 is converted into its mixed-radix representation, denoted as 
. We refer to this operation as 
. Its implementation relies on the parallel procedure proposed in [
28].
Additionally, during a preliminary stage, we compute the mixed-radix digits  for the modulus p.
  5. Basic Algorithm of Montgomery Multiplication in Minimally Redundant RNS
According to the above, the implementation of the Montgomery multiplicative scheme (
11) using minimally redundant RNS results in the following operational sequence:
A necessary and sufficient correctness condition for the two-stage process of calculating 
, according to (
22), (
23) and (
29), for 
, is that the numbers 
X and 
 should belong to the range 
 with respect to the moduli 
.
Since the inequality  implies the inequality , the condition  guarantees the fulfillment of the condition .
Now, let us explore the necessary conditions for  and p to ensure that .
Let 
. Since, according to (
12), 
, then from (
21) we obtain the inequality
From (
32), it follows that 
 only when 
.
On the other hand, the numbers 
X and 
 should be the elements of the range 
 concerning moduli 
. Therefore, taking into account that 
 and 
, this requirement leads to the following inequalities:
      and
Thus, in the case when 
 and 
, the correctness of the proposed MM scheme is ensured by the conditions (
33) and (
34).
As already noted, the magnitude comparison of the integers 
 and 
p is performed using MRS, resulting in 
 (see (
10)). Therefore, the correctness of the reverse conversion 
 is ensured by using the RNS number range that includes the range 
. As follows from (
33) and (
34), the number range 
 significantly exceeds the range 
. To determine the integer factor 
Q, according to (
10), it is sufficient to use the truncated RNS defined by the moduli set 
 with the number range 
.
Based on the above, the following statement is true.
Theorem 1. Let the moduli of the sets  and , together with the modulus p meet the conditionsand let the operands A, B of the multiplicative operation  belong to the ring . Then, the integer  (see (21)) is an element of the ring . At the same time, the integer  is calculated using the equality  A similar result can also be obtained in the case when 
A and 
B belong to the range 
. Therefore, by analogy to (
32), the following inequality
      leads to the condition
Hence, the following statement is true.
Theorem 2. Let the moduli of the sets  and  together with the modulus p meet the conditionsand let the operands A, B of the multiplicative operation  belong to the ring . Then, the integer  (see (21)) also belongs to the ring . At the same time, the integer  is calculated according to (36).  Based on the optimized Montgomery multiplicative scheme (
31) in minimally redundant RNS, an algorithm for multiplication over a large modulus is synthesized. This algorithm has a tabular configuration and possesses all the fundamental advantages of minimally redundant residue arithmetic, which most appear when operating in the ranges of large numbers. The developed algorithm is suitable for both software realization and implementation using parallel modular computational structures with a multiprocessor architecture.
Considering the aforementioned discussion, the implementation of MM in a minimally redundant RNS can be formally presented as Algorithm 1.
This algorithm’s key distinguishing feature is its inherent suitability for using lookup table techniques. In this case, the required set of lookup tables is generated during the preliminary stage, ensuring the minimization of the labor intensity of the real-time MM implementation.
      
| Algorithm 1: Montgomery multiplication in minimally redundant RNS | 
| Input: , ,     Output:    1 ,   2 , ,   3   4   5 ,   6 ,   7   8   9 10 11 , 
 | 
  6. Discussion
Let us now discuss the theoretical and practical aspects of the proposed approach.
An analysis of well-established approaches to implementing Montgomery multiplication in RNS reveals that two primary methods for computing the rank—an essential step in performing the critical base extension operations—are commonly employed [
8,
9,
12,
14,
15,
16,
17].
The first method enables the determination of the integer value of an RNS number by computing an integer correction factor, which essentially corresponds to the rank of the number. In this case, the implementation of the CRT yields an exact integer reconstruction. This approach involves the inclusion of an additional redundant modulus , which depends on the number k of primary RNS moduli, into the original moduli-set .
As a result, this method requires an additional modular channel with a width of  bits, along with redundant lookup tables.
The main idea of an alternative approach is to represent the rank as the integer part of a sum of at most k proper fractions of the form , where . The rank value is then estimated recursively by approximating these fractions. To eliminate division by the modulus , each denominator  is replaced by a power of 2, and the corresponding numerator  is approximated using its most significant bits. Since division by powers of 2 is equivalent to simple bit shifts, the rank computation is reduced to a sequence of additions.
The main drawbacks of this method are as follows. First, it relies on recursive computations, as the algorithm evaluates the rank value in a sequential, bit-by-bit manner. Moreover, the number of required iterations depends on the bit length used for the approximation. As a result, this method leads to significantly prolonged computation times when dealing with large numbers. Additionally, due to the approximate nature of the algorithm, it does not always produce the correct rank value.
  6.1. Computational Costs of Proposed MM Algorithm
In RNS arithmetic, the implementation of MM (see (
31)) reduces to performing modular operations on small bit-length residues concerning the moduli of the sets 
 and 
 using the lookup tables technique.
To assess computational efficiency, we compare the computational costs presented in 
Table 1, where 
 and 
 represent the number of required modular operations at the corresponding steps of Algorithm 1 both in minimally redundant RNS and conventional non-redundant RNS, respectively. Additionally, 
 and 
 denote the mixed-radix representations of the integers 
 and 
p, respectively.
Computing the modular code of the number  requires  modular operations in a minimally redundant RNS, and k operations in a conventional RNS.
Computing the modular code of the number  with respect to the moduli  in the set  requires  modular operations in a minimally redundant RNS, and l operations in a conventional RNS.
The modular operation costs required to compute the rank 
 of the number 
X with respect to the moduli in the set 
 are based on the results presented in [
27], and amount to 
l and 
 modular operations, respectively (see Equations (
14)–(
18)).
The base extension operation 
, which extends the modular code of the number 
X from the moduli set 
 to 
, requires 
 modular operations in both minimally redundant and conventional RNS implementations (see Equation (
19)).
Computing the modular code of the number 
 with respect to the moduli set 
, while taking into account the redundant module 
, requires 
 modular operations in a minimally redundant RNS and 
 operations in a conventional RNS (see Equations (
19)–(
23)).
Similarly, computing the rank 
 of the number 
 with respect to the moduli set 
 requires 
 modular operations in a minimally redundant RNS, and 
 operations in a conventional RNS (see Equations (
25)–(
28) and [
27]).
The base extension operation 
, which maps the modular code of the number 
 from 
 back to 
, also requires 
 modular operations in both minimally redundant and conventional RNS implementations (see Equation (
29)).
Calculating the mixed-radix representation 
 of the number 
 requires 
 modular operations, as shown in [
28].
Comparing the numbers  and  in the MRS requires l modular operations.
In the final step, computing the modular code of the number 
, taking into account the redundant modulus 
, requires 
 modular operations in a minimally redundant RNS, and 
k operations in a conventional RNS (see Equation (
29)).
Without loss of generality and for simplicity, let us assume that the moduli of the sets  and  are selected such that .
Under these assumptions, when employing a minimally redundant RNS, the total number of required modular operations, as derived from 
Table 1, is given by
In contrast, for a conventional non-redundant RNS, the required number of modular operations is
Hence, when 
A, 
B and 
 belong to 
 (see Theorem 1), the reduction factor in computational complexity for implementing MM in a minimally redundant RNS, compared to a conventional non-redundant RNS, is given by
For instance, the computed values of  for specific values of l are as follows: , , and . Furthermore, as l increases, the factor  continues to grow, asymptotically approaching a limiting value of 1.40.
  6.2. Basic Exponentiation Algorithm by Large Modulus
It is well known that MM is primarily used for computing powers of natural numbers with respect to a large modulus 
p:
        where the numbers 
X and 
E are represented by their residue and binary codes, respectively: 
 and 
, where 
=1 and 
b denotes the bit-length of 
E.
The well-known method for computing (
43) relies on the multiplicative decomposition of the exponentiation function:
All multiplicative operations in (
44) are performed within the framework of scheme (
11), which allows the reuse of previously computed results as subsequent operands. In this case, the fact that the number 
 belongs to the ring 
 when 
 is ensured by condition (
39). Consequently, the correction operation (
36) needs to be applied only at the final step of the computation in accordance with (
44).
In a minimally redundant RNS, the implementation of MM when 
 and 
 belong to 
 requires
        modular operations (see 
Table 1).
In contrast, in a conventional non-redundant RNS, the number of required modular operations is given by
Thus, in the considered case, based on (
45) and (
46), the computational complexity reduction factor is expressed as
For example, the values of 
 for specific values of 
l are as follows: 
, 
, and 
. Furthermore, as 
l increases, the reduction factor 
 (see (
47)) continues to grow, asymptotically approaching the value 1.50.
It is important to note that the same reduction factor of computational complexity is also achieved when implementing the exponentiation procedure (
44).
  7. Conclusions
This paper presents an efficient implementation of MM in a minimally redundant RNS. The core feature of the proposed approach consists of using an advanced rank calculation method introduced by the author in [
25,
27].
The key contributions of this work are as follows:
- The use of minimally redundant RNS significantly reduces the computational cost of MM implementation compared to conventional non-redundant RNS. Moreover, the complexity of rank computation for the moduli sets  and  is reduced by a factor of  and , respectively. This optimization improves the efficiency of base extension operations  and , as well as the overall MM procedure; 
- The conditions used to choose the moduli sets  and  with respect to the operating modulus p are established (see Theorems 1 and 2). These conditions ensure the correctness of multiple accesses to the MM procedure when both operands and results reside in the ring . This property is particularly beneficial for implementing modular exponentiation procedures based on MM; 
- A novel MM algorithm for minimally redundant RNS is introduced. This algorithm is well suited for efficient implementation using lookup table techniques. Moreover, the most computationally demanding operations required for generating lookup tables are organized as a preprocessing step, executed independently before the main computational procedure; 
- The computational costs of the proposed MM scheme in a minimally redundant RNS are analyzed and compared with those of a non-redundant RNS. The results demonstrate that the minimally redundant RNS implementation requires 1.5 times fewer computational resources when operands and results belong to the ring . This advantage also extends to the modular exponentiation procedure based on MM. 
In summary, the intrinsic parallelism and tabular nature of minimally redundant RNS arithmetic, combined with the simplicity of rank computation and base extension operations, significantly enhance the performance of Montgomery multiplication-based algorithms.