1. Introduction
Since the mid-1950s, RNS arithmetic has attracted significant attention from researchers and practitioners across various domains, including number-theoretic methods, computer technology, digital signal processing, cryptography, and communications [
1,
2,
3,
4,
5,
6,
7,
8,
9].
RNS arithmetic is particularly valuable in parallel processing systems, where it enables fast and accurate computations. The inherent parallelism of modular computational structures provides several key advantages over positional numeral systems, especially for operations involving large numbers. These advantages include the following:
performance time independence of parallel modular operations from the moduli number and, consequently, the length of the residue code;
high adaptability of modular arithmetic algorithms to tabular calculations and pipeline processing;
flexibility in using lookup table techniques for reconfiguring modular computational structures, among others.
A well-known application of modular arithmetic is in cryptographic systems, where multiplicative operations over large moduli form the foundation for various information security mechanisms [
10,
11,
12,
13,
14,
15,
16,
17,
18,
19,
20,
21,
22,
23]. These operations are extensively utilized in electronic signature schemes, public key cryptosystems based on RSA and Rabin’s method, and other cryptographic tasks. Therefore, advancing computational technologies that enhance modular multiplication and exponentiation efficiency over large-number ranges remains a crucial research direction.
One of the most promising approaches for achieving high-performance modular arithmetic is using RNS-based modular computational structures. As the degree of modularity in computational processes increases, the efficiency of these structures improves significantly. This effect is particularly evident when handling large-number arithmetic operations.
A notable example is the Montgomery multiplication (MM) algorithm implemented using RNS [
10,
11,
12,
13,
22]. Montgomery multiplication replaces general division with an integer division operation, which is well suited for RNS configurations [
24]. This property makes MM in RNS highly efficient.
However, a principal computational bottleneck in modular arithmetic is the residue code extension required in MM implementations. Optimizing these operations is a key challenge in developing high-performance cryptographic hardware and software for large-number computations. One promising solution is to use minimally redundant RNS, which allows for more efficient execution of fundamental non-modular operations such as base extension and reverse conversion based on the Chinese Remainder Theorem (CRT) [
5,
6,
25,
26,
27,
28].
This paper aims to take fundamental advantages of minimally redundant RNS arithmetic to optimize multiplication algorithms based on Montgomery’s scheme. By refining the base extension and conversion processes, we enhance the overall efficiency of MM in large-number cryptographic applications.
This paper provides a rigorous theoretical framework for optimizing modular multiplication in minimally redundant Residue Number Systems (RNSs), offering new insights into the computational efficiency of cryptographic operations. Using minimally redundant arithmetic, the study improves the performance of modular computations, which are fundamental in public-key cryptography, secure computing, and high-performance arithmetic.
Rather than focusing on software or hardware implementation, this work contributes by presenting formal algorithmic descriptions, mathematical proofs, and correctness guarantees. These theoretical foundations serve as a basis for future advances in secure computing, number-theoretic algorithms, and parallel processing architectures.
For clarity and reproducibility, the paper provides detailed algorithmic methodologies and mathematical derivations, making it accessible to researchers developing efficient cryptographic protocols, modular arithmetic techniques, and computational number theory applications. The structured formal approach to RNS-based Montgomery multiplication supports interdisciplinary applications in computer science, digital signal processing, and cryptographic engineering.
The remainder of this paper is structured as follows.
Section 2 and
Section 3 introduce the theoretical foundations of the research.
Section 4 presents the mathematical background of the proposed method.
Section 5 describes the MM algorithm and analyzes the impact of RNS moduli selection.
Section 6 provides a discussion of the results.
Section 7 concludes the paper.
2. Basic Principles of RNS Arithmetic
Residue Number System (RNS) arithmetic is based on abstract algebra and number theory [
29,
30].
In the set of integers , an RNS is defined by a set of pairwise coprime natural moduli where for and . In this system, an integer is represented by the k-tuple , where is the residue of X modulo , i.e., belongs to the set , ensuring for all
An RNS allows the unique representation of up to integers through their residue codes. The number range is typically defined by .
2.1. Arithmetic Operations in RNS
Due to the carry-free nature of RNS arithmetic, addition, subtraction, and multiplication can be executed in parallel, independently for each modulus, following the rule below:
where
denotes modular arithmetic operations, with
and
for
.
2.2. Determining the Integer Value from Residue Representation
In RNS arithmetic, reconstructing the integer X from its residue code is required for performing non-modular operations. This reconstruction relies on the so-called positional characteristics of the residue code.
There are two primary approaches to constructing RNS arithmetic based on these characteristics: (1) rank-based computation and (2) mixed-radix representation [
7,
8,
9].
According to the Chinese Remainder Theorem [
29,
31], for a given RNS defined by the set
of pairwise relatively prime moduli, the integer
X and its residue code
are related by the following equation:
where
, and
is a normalized residue modulo
for
, with
denoting the multiplicative inverse of
Y modulo
m.
The integer
and Equation (
1) define the
rank and
rank form of
X, respectively [
2,
3]. Equation (
1) establishes an isomorphic mapping between
and its residue code
. In this context, computing the rank
enables the realization of this isomorphic mapping.
2.3. Mixed-Radix Representation
In RNS arithmetic, the rank of a number is a positional characteristic of primary importance since it enables one to perform all the non-modular operations. In a Mixed Radix System (MRS) defined by a set
of pairwise relatively prime moduli, an integer
is represented by the
k-tuple
of mixed-radix digits, resulting in
where
,
,
[
8,
9].
An MRS is advantageous over an RNS for non-modular operations such as magnitude comparison, sign determination, and overflow detection.
2.4. Computational Complexity and Optimization
In the conventional non-redundant RNS, calculating both the rank and the digits of the mixed-radix representation of the number from its residue code reduces to the parallel summation of sets of small-bit-length residues corresponding to the RNS moduli , while accounting for the number of overflows that occur in the rings during the modular addition operations.
At the same time, according to [
25,
27], the computational complexity of calculating the rank
is
in terms of modular addition operations and the number of required lookup tables. Consequently, this becomes computationally expensive as
k increases, for instance, when implementing non-modular operations based on the rank form of integer representation (
1), where operands and computational results belong to large numerical ranges.
One possible way to improve modular computational structures is to employ more efficient and optimized variants of RNS arithmetic compared to the traditional approach. As is well known, the use of code redundancy can improve the arithmetic properties of an RNS [
5,
6].
As demonstrated in [
25,
27], the use of a minimally redundant RNS enables the optimization and acceleration of the rank calculation. To this end, an additional redundant residue,
, corresponding to an additional modulus
, is added to the original residue code
of the number
. Thus, the non-redundant residue code is extended by just one bit, representing the parity of the number
X.
This modification reduces the computational complexity of rank calculation from
to
[
25,
27].
Consequently, using minimally redundant RNS arithmetic not only optimizes but also improves the efficiency of non-modular operations that rely on the rank form of integers (
1), such as the base extension operation.
2.5. Base Extension Operations
In applications involving modular computing structures, base extension operations play a fundamental role, as they serve as the foundation for synthesizing algorithms for nearly all non-modular operations, including general division, scaling, fractional multiplication, code transformations, and more. In information security systems, for instance, computationally intensive operations on large number ranges—such as Montgomery multiplication, modular exponentiation, and others—are implemented using base extension techniques [
8,
9].
The advantages of minimally redundant modular arithmetic in simplifying non-modular procedures are particularly evident in residue code extension operations.
Let an RNS be defined by the moduli set , and suppose that the residue code of a number X is given with respect to the basis , where . The task is to determine the residues corresponding to the basis , i.e., the residue code . This process constitutes the essence of the residue code extension operation from the moduli to the remaining moduli in . For this operation, we use the conventional notation , or, in a simplified form, .
A necessary condition for the correctness of the base extension is the uniqueness of the mapping
. This means that the code
must uniquely determine the integer
X. When this condition is satisfied, the number
X can be represented in rank form with respect to the moduli
, by analogy with Equation (
1).
Determining the rank
is significantly simplified when the RNS defined over the basis
is minimally redundant. In comparison to non-redundant residue coding, this approach yields an approximately
-fold reduction in the number of required modular addition operations and the size of lookup tables [
25,
27]. This advantage becomes especially important when working with large numbers.
Therefore, for the number
, the base extension operation to the moduli
reduces to determining the rank
of the number
X, and subsequently applying to its rank form the procedure for computing the residues with respect to the moduli in the basis
:
3. Basic Computational Scheme for Montgomery Modular Multiplication
Let
A and
B be the operands of a multiplication operation modulo a large number
p. The fundamental concept proposed by Montgomery [
24] consists of an additive transformation of the product
, allowing the division without remainder by a specially chosen auxiliary modulus
.
To achieve this, the following equation is established:
Given that
S and
p are coprime, i.e.,
, Equation (
3) guarantees that the integer
is a multiple of the modulus
S for any values of the operands
A and
B.
This property is at the core of Montgomery’s approach to constructing an efficient modular multiplication scheme [
24]. The modularity of Equation (
3) follows from the equality:
Therefore, the number
is an integer.
Let
denote the greatest integer less than or equal to
x. Following Euclid’s Division Lemma [
29,
30], the number
can be represented as
Then, by reducing modulo
p, Equation (
5) simplifies to
Thus, taking into account (
6) and (
7), the integer number
represents the expected computed result. Here, the number
is a uniquely determined integer factor for given operands
A and
B.
For
, according to Equation (
3), the following inequality holds:
Since
, Equation (
5) implies the following estimation for the number
:
Thus, according to (
9), the integer factor
Q can take on only two values: 0 or 1. Hence, to determine
Q, it is necessary to compare the integers
and
p.
Therefore,
, where
denotes the result of comparing two integers
x and
y.
Based on this, the main steps of Montgomery Modular Multiplication (MM) can be summarized as follows:
where
and
.
As seen from Equation (
11), in MM implementing in a positional number system, the main complexity arises from the multiplication of large numbers such as
,
, and
. That also holds for calculating the multiplicative inversion
Y. However, its determination complexity is not fundamentally significant as this constant is a long-term parameter obtained during preliminary calculations.
In RNS arithmetic, all operations specified in (
11) are modular. So, their performance is much simpler and faster than in conventional arithmetic. The inherent parallelism of RNS primarily determines the preference for using modular arithmetic for MM implementation.
4. Realization of Montgomery Modular Multiplication in Minimally Redundant RNS
Let a primary Residue Number System (RNS) be defined on the set of pairwise relatively prime odd moduli , with an additional modulus . Consequently, the moduli set forms a minimally redundant RNS. The effective number range is given by the ring .
Let the operands
A,
B, and modulus
p be represented by their minimally redundant residue codes:
where
and
The complete moduli set
is a sum of moduli sets
and
. Here, the auxiliary modulus
S is defined as the product of the moduli in the set
:
Since the multiplicative inversion
(see (
11)) belongs to the ring
, it is uniquely represented in a minimally redundant residue code as follows:
where
for
.
Similarly, the number
(see (
11)) is represented in a minimally redundant residue code with respect to
:
where
,
.
Following (
11), we compute the residue code of
with respect to the complete moduli set
. Before proceeding, it is necessary to extend the residue code (
12) to the moduli in
.
Base extension is one of the most difficult non-modular operations in RNS arithmetic. Its implementation usually uses the rank form (
1) [
12,
13,
14,
15,
16].
Following (
1), the number
X with respect to
is represented by the following:
where, based on the theorem for rank calculation in minimally redundant RNS [
25,
27],
with
as the inexact rank of the number
X, and
as a two-valued rank correction:
The inexact rank
is calculated as follows:
where
with
and
.
Similarly, the rank correction
is computed using trivial addition operations modulo
:
where
,
, and
[
25,
27].
Hence, according to (
13) and (
14), the required extension of the residue code (
12) to the moduli of the set
is carried out by the following rule:
where
and
are calculated according to (
15)–(
17) and (
18), respectively.
Let us denote this base extension operation as .
The number
(see (
11)) is represented by its minimally redundant residue code
concerning the moduli of the complete set
.
Since
, then, according to (
4), the integer
is divisible by
without a remainder. Hence, its residues concerning the moduli of the set
are equal to 0:
At the same time, we compute the remaining residues using the following rule:
To obtain residue code
of the number,
two computational stages are necessary:
At the 1st stage, the residues concerning moduli of the set are calculated;
At the 2nd stage, the extension of truncated residue code to the moduli of the set are carried out.
As follows from (
21), at the first stage, the residues of the integer
are calculated according to the following rule:
At the same time, the two redundant residues
and
are equal. Since
are relatively prime odd moduli, we have
for
, which implies
. Therefore,
At the second stage, the residue code
is extended to the moduli
using a rank form of the number
concerning the moduli
:
where
,
,
, and
. The integers
and
are the inexact rank and rank correction of the number
, respectively.
In this case, similar to (
15)–(
17), the inexact rank
is calculated as follows:
where
while
and
for
.
At the same time, the rank correction
is calculated as follows:
where
for
.
Therefore, according to (
24), the residues
of the number
with respect to the set
are calculated as follows:
Let us denote this base extension operation as .
Thus, as a result of two-stage calculations, we have obtained the minimally redundant residue code of the number .
Further, following the computational scheme (
11), we compute the integer
representing a result of MM, at that
. At the same time, its minimally redundant residue code
is computed according to the following rule:
for
.
To determine the integer factor
according to (
10), a magnitude comparison between the integers
and
p is performed using a mixed-radix system (MRS).
For this purpose, the number
is converted into its mixed-radix representation, denoted as
. We refer to this operation as
. Its implementation relies on the parallel procedure proposed in [
28].
Additionally, during a preliminary stage, we compute the mixed-radix digits for the modulus p.
5. Basic Algorithm of Montgomery Multiplication in Minimally Redundant RNS
According to the above, the implementation of the Montgomery multiplicative scheme (
11) using minimally redundant RNS results in the following operational sequence:
A necessary and sufficient correctness condition for the two-stage process of calculating
, according to (
22), (
23) and (
29), for
, is that the numbers
X and
should belong to the range
with respect to the moduli
.
Since the inequality implies the inequality , the condition guarantees the fulfillment of the condition .
Now, let us explore the necessary conditions for and p to ensure that .
Let
. Since, according to (
12),
, then from (
21) we obtain the inequality
From (
32), it follows that
only when
.
On the other hand, the numbers
X and
should be the elements of the range
concerning moduli
. Therefore, taking into account that
and
, this requirement leads to the following inequalities:
and
Thus, in the case when
and
, the correctness of the proposed MM scheme is ensured by the conditions (
33) and (
34).
As already noted, the magnitude comparison of the integers
and
p is performed using MRS, resulting in
(see (
10)). Therefore, the correctness of the reverse conversion
is ensured by using the RNS number range that includes the range
. As follows from (
33) and (
34), the number range
significantly exceeds the range
. To determine the integer factor
Q, according to (
10), it is sufficient to use the truncated RNS defined by the moduli set
with the number range
.
Based on the above, the following statement is true.
Theorem 1. Let the moduli of the sets and , together with the modulus p meet the conditionsand let the operands A, B of the multiplicative operation belong to the ring . Then, the integer (see (21)) is an element of the ring . At the same time, the integer is calculated using the equality A similar result can also be obtained in the case when
A and
B belong to the range
. Therefore, by analogy to (
32), the following inequality
leads to the condition
Hence, the following statement is true.
Theorem 2. Let the moduli of the sets and together with the modulus p meet the conditionsand let the operands A, B of the multiplicative operation belong to the ring . Then, the integer (see (21)) also belongs to the ring . At the same time, the integer is calculated according to (36). Based on the optimized Montgomery multiplicative scheme (
31) in minimally redundant RNS, an algorithm for multiplication over a large modulus is synthesized. This algorithm has a tabular configuration and possesses all the fundamental advantages of minimally redundant residue arithmetic, which most appear when operating in the ranges of large numbers. The developed algorithm is suitable for both software realization and implementation using parallel modular computational structures with a multiprocessor architecture.
Considering the aforementioned discussion, the implementation of MM in a minimally redundant RNS can be formally presented as Algorithm 1.
This algorithm’s key distinguishing feature is its inherent suitability for using lookup table techniques. In this case, the required set of lookup tables is generated during the preliminary stage, ensuring the minimization of the labor intensity of the real-time MM implementation.
Algorithm 1: Montgomery multiplication in minimally redundant RNS |
Input: , , Output: - 1
, - 2
, , - 3
- 4
- 5
, - 6
, - 7
- 8
- 9
- 10
- 11
,
|
6. Discussion
Let us now discuss the theoretical and practical aspects of the proposed approach.
An analysis of well-established approaches to implementing Montgomery multiplication in RNS reveals that two primary methods for computing the rank—an essential step in performing the critical base extension operations—are commonly employed [
8,
9,
12,
14,
15,
16,
17].
The first method enables the determination of the integer value of an RNS number by computing an integer correction factor, which essentially corresponds to the rank of the number. In this case, the implementation of the CRT yields an exact integer reconstruction. This approach involves the inclusion of an additional redundant modulus , which depends on the number k of primary RNS moduli, into the original moduli-set .
As a result, this method requires an additional modular channel with a width of bits, along with redundant lookup tables.
The main idea of an alternative approach is to represent the rank as the integer part of a sum of at most k proper fractions of the form , where . The rank value is then estimated recursively by approximating these fractions. To eliminate division by the modulus , each denominator is replaced by a power of 2, and the corresponding numerator is approximated using its most significant bits. Since division by powers of 2 is equivalent to simple bit shifts, the rank computation is reduced to a sequence of additions.
The main drawbacks of this method are as follows. First, it relies on recursive computations, as the algorithm evaluates the rank value in a sequential, bit-by-bit manner. Moreover, the number of required iterations depends on the bit length used for the approximation. As a result, this method leads to significantly prolonged computation times when dealing with large numbers. Additionally, due to the approximate nature of the algorithm, it does not always produce the correct rank value.
6.1. Computational Costs of Proposed MM Algorithm
In RNS arithmetic, the implementation of MM (see (
31)) reduces to performing modular operations on small bit-length residues concerning the moduli of the sets
and
using the lookup tables technique.
To assess computational efficiency, we compare the computational costs presented in
Table 1, where
and
represent the number of required modular operations at the corresponding steps of Algorithm 1 both in minimally redundant RNS and conventional non-redundant RNS, respectively. Additionally,
and
denote the mixed-radix representations of the integers
and
p, respectively.
Computing the modular code of the number requires modular operations in a minimally redundant RNS, and k operations in a conventional RNS.
Computing the modular code of the number with respect to the moduli in the set requires modular operations in a minimally redundant RNS, and l operations in a conventional RNS.
The modular operation costs required to compute the rank
of the number
X with respect to the moduli in the set
are based on the results presented in [
27], and amount to
l and
modular operations, respectively (see Equations (
14)–(
18)).
The base extension operation
, which extends the modular code of the number
X from the moduli set
to
, requires
modular operations in both minimally redundant and conventional RNS implementations (see Equation (
19)).
Computing the modular code of the number
with respect to the moduli set
, while taking into account the redundant module
, requires
modular operations in a minimally redundant RNS and
operations in a conventional RNS (see Equations (
19)–(
23)).
Similarly, computing the rank
of the number
with respect to the moduli set
requires
modular operations in a minimally redundant RNS, and
operations in a conventional RNS (see Equations (
25)–(
28) and [
27]).
The base extension operation
, which maps the modular code of the number
from
back to
, also requires
modular operations in both minimally redundant and conventional RNS implementations (see Equation (
29)).
Calculating the mixed-radix representation
of the number
requires
modular operations, as shown in [
28].
Comparing the numbers and in the MRS requires l modular operations.
In the final step, computing the modular code of the number
, taking into account the redundant modulus
, requires
modular operations in a minimally redundant RNS, and
k operations in a conventional RNS (see Equation (
29)).
Without loss of generality and for simplicity, let us assume that the moduli of the sets and are selected such that .
Under these assumptions, when employing a minimally redundant RNS, the total number of required modular operations, as derived from
Table 1, is given by
In contrast, for a conventional non-redundant RNS, the required number of modular operations is
Hence, when
A,
B and
belong to
(see Theorem 1), the reduction factor in computational complexity for implementing MM in a minimally redundant RNS, compared to a conventional non-redundant RNS, is given by
For instance, the computed values of for specific values of l are as follows: , , and . Furthermore, as l increases, the factor continues to grow, asymptotically approaching a limiting value of 1.40.
6.2. Basic Exponentiation Algorithm by Large Modulus
It is well known that MM is primarily used for computing powers of natural numbers with respect to a large modulus
p:
where the numbers
X and
E are represented by their residue and binary codes, respectively:
and
, where
=1 and
b denotes the bit-length of
E.
The well-known method for computing (
43) relies on the multiplicative decomposition of the exponentiation function:
All multiplicative operations in (
44) are performed within the framework of scheme (
11), which allows the reuse of previously computed results as subsequent operands. In this case, the fact that the number
belongs to the ring
when
is ensured by condition (
39). Consequently, the correction operation (
36) needs to be applied only at the final step of the computation in accordance with (
44).
In a minimally redundant RNS, the implementation of MM when
and
belong to
requires
modular operations (see
Table 1).
In contrast, in a conventional non-redundant RNS, the number of required modular operations is given by
Thus, in the considered case, based on (
45) and (
46), the computational complexity reduction factor is expressed as
For example, the values of
for specific values of
l are as follows:
,
, and
. Furthermore, as
l increases, the reduction factor
(see (
47)) continues to grow, asymptotically approaching the value 1.50.
It is important to note that the same reduction factor of computational complexity is also achieved when implementing the exponentiation procedure (
44).
7. Conclusions
This paper presents an efficient implementation of MM in a minimally redundant RNS. The core feature of the proposed approach consists of using an advanced rank calculation method introduced by the author in [
25,
27].
The key contributions of this work are as follows:
The use of minimally redundant RNS significantly reduces the computational cost of MM implementation compared to conventional non-redundant RNS. Moreover, the complexity of rank computation for the moduli sets and is reduced by a factor of and , respectively. This optimization improves the efficiency of base extension operations and , as well as the overall MM procedure;
The conditions used to choose the moduli sets and with respect to the operating modulus p are established (see Theorems 1 and 2). These conditions ensure the correctness of multiple accesses to the MM procedure when both operands and results reside in the ring . This property is particularly beneficial for implementing modular exponentiation procedures based on MM;
A novel MM algorithm for minimally redundant RNS is introduced. This algorithm is well suited for efficient implementation using lookup table techniques. Moreover, the most computationally demanding operations required for generating lookup tables are organized as a preprocessing step, executed independently before the main computational procedure;
The computational costs of the proposed MM scheme in a minimally redundant RNS are analyzed and compared with those of a non-redundant RNS. The results demonstrate that the minimally redundant RNS implementation requires 1.5 times fewer computational resources when operands and results belong to the ring . This advantage also extends to the modular exponentiation procedure based on MM.
In summary, the intrinsic parallelism and tabular nature of minimally redundant RNS arithmetic, combined with the simplicity of rank computation and base extension operations, significantly enhance the performance of Montgomery multiplication-based algorithms.