1. Introduction
Nowadays, high-performance computing is progressing extremely rapidly. This makes qualitatively new demands to designed number-theoretic methods and computational algorithms. That is why creating fundamentally new and efficient computing tools for fast and reliable parallel data processing is especially important. Modular computational structures occupy a special place among them. Modular arithmetic, i.e., the arithmetic of RNS, creates their mathematical basis.
The inherent parallelism and carry-free properties of RNS provide a high potential for accelerating arithmetic operation compared with conventional weighted number systems (WNS). The main advantage of RNS consists of its unique ability to decompose large integer numbers into a set of small residues and to process them in parallel in independent modular channels.
A steadily growing interest in RNS arithmetic as a unique means of carrying out high-speed calculations stimulates developments focused on providing a fundamentally new performance level when carrying out large volumes of time-consuming calculations. The modular arithmetic has attracted the considerable attention of researchers and developers in number-theoretic methods [
1,
2,
3], computer technology [
4,
5], digital signal and image processing [
3,
5,
6,
7,
8], cryptography [
5,
8,
9,
10], computer networks and communication systems [
3,
5], and other areas [
9].
In this regard, one of the most promising ways in the specified area is the development of high-speed parallel modular computational structures as well as the enhancement of their functionality and optimization. In this case, the main optimization criteria are the minimum redundancy of data coding, the execution time minimization of implemented computational procedures, and the throughput maximization of the corresponding computational structures.
As is known, compared with WNS, the residue code of a number does not explicitly contain information about its integer value. Therefore, in RNS arithmetic, the implementation of operations, which require the estimating of the integer value of a number by its residue code, i.e., the evaluation of number position in the operating range, encounters specific difficulties. Such operations, in contrast to modular ones, are called non-modular.
The positional characteristics of the residue code such as core function, the rank of a number, interval index, and others, and the associated forms of number representation are of great importance for designing algorithms of non-modular operations [
1,
5,
7,
8]. The computational complexity of calculating the used positional characteristics ultimately determines the efficiency of the corresponding configuration of RNS arithmetic.
Division is one of the most complex arithmetic operations. Even in computers operating in a positional system, this operation stands apart, and its execution requires much more time than most elementary operations. In RNS arithmetic, the hardships with division operations are related to the non-modular character of this operation. This means that the residue of the quotient concerning primary RNS modulus is determined not only by the dividend and divisor residues, and it is necessary to get the additional information, in one form or another, about the integer values of the dividend and divisor [
1,
7]. It is no coincidence that many publications are devoted to the problem of modular division, for example [
11,
12,
13,
14,
15,
16,
17,
18,
19].
Along with the general division, modular scaling, i.e., the division of the RNS number by a constant, is a commonly used operation [
3,
5,
8,
10]. This operation plays a fundamental role in constructing residue arithmetic algorithms and is of great practical importance. The need for scaling is due to several tasks, for example, to round the floating point numbers with the residue representation of the mantissa and to reduce the dynamic range in digital signal processing and long-word-length cryptography. In addition, scaling by a power of two is often one of the integral steps of more complex non-modular operations, for example, in the method of general modular division [
19,
20].
Thus, developing novel approaches and methods for fast scaling is highly important in high-performance computing based on parallel algorithmic structures of RNS, especially for high-speed implementing digital signal processing applications and public-key cryptosystems. That should make it possible to widely use modular arithmetic in various priority areas of science and technology.
In the paper, we present a new approach to the power-of-two scaling based on using minimal redundancy of residue code, the rank form of a number, and fast calculation of the rank characteristic at each iteration of the scaling procedure. Compared with the conventional non-redundant RNS, the proposed method makes it possible to optimize and speed up the non-modular scaling operation and concurrently reduces its computational complexity to a large extent.
The paper is structured as follows. 
Section 2 discusses the basic theoretical concepts of the research. 
Section 3 presents the known approaches to rank calculation. 
Section 4 and 
Section 5 describe the RNS scaling algorithms, and the mathematical basis of the rank calculation in the bisection scaling method. 
Section 6 presents a novel power-of-two scaling algorithm and a numerical example. 
Section 7 provides discussion and 
Section 8 concludes the paper.
  2. The Basic Concepts of RNS Arithmetic
Abstract algebra and number theory [
21,
22] constitute the theoretical foundation of RNS arithmetic.Traditionally, the apparatus of congruences is used for the mathematical formalization of an RNS with integer ranges. At the same time, Euclid’s Division Lemma plays a fundamental role in building an RNS of the concerned type. For the ring 
Z of integers, it is formulated as follows.
Lemma 1 (Euclid’s Division Lemma). For any  and a positive integer m, there exists a unique pair of integers  such thatwhere .  On the set Z of integers, a non-redundant RNS is defined using pairwise prime moduli   by the mapping , which assigns to each  the k-tuple  of least nonnegative residues  of dividing X by . At the same time, the notation  is used.
The residue code 
 corresponds to the set of all integers 
X satisfying the system of simultaneous linear congruences
      
The following statement is true [
9,
23,
24].
Theorem 1 (Chinese Remainder Theorem). Let the moduli  be pairwise prime, and let , , . Then the system of congruences (2) has a unique solution, the class of residues modulo , defined by the congruence  The practical application of the RNS assumes that each residue code  must correspond only to one integer number. Therefore, certain sets of representatives of residue classes are used as the number range to ensure required single-valued correspondence. Since in the given RNS it is possible to represent  integers, the set  is usually used in computer applications as an RNS operating range.
Because of the above, we define modular coding as a mapping , which assigns a residue code  to each .
The decoding mapping 
 based on the CRT (3) executes according to the rule
      
Applying Euclid’s Division Lemma (1), we can write
      
      where 
 is a normalized residue modulo 
:
 denotes the largest integer less than or equal to 
x.
Substituting (5) into (4), and taking into consideration (6), we have
      
      that is equivalent to
      
Since the summands in (7) have narrower change bounds, the use of (4), which is a normalized analog of (1), is preferable for constructing RNS arithmetic.
Equation (7) is called the CRT-form of representing the integer  from the RNS number range .
The mapping 
 is an isomorphism concerning the basic arithmetic operations. The operation 
 on arbitrary elements 
A and 
B given by their residue codes 
 and 
 is carried out by the rule
      
      where 
.
In the RNS, according to (8), the modular addition, subtraction, and multiplication are performed independently for each modulus . It must be noted that (8) is correct only if the result  of the arithmetic operation does not go beyond the RNS number range, i.e., if .
The RNS inherent code parallelism illustrated by (8), which consists of the decomposition of arithmetic operations on integers A and B into independent small word length operations on the like digits  and  of residue code, is the main advantage of modular arithmetic compared with the arithmetic of weighted number systems (WNS). Realizing this advantage to the fullest extent is a key strategic goal of all computer applications in the RNS.
As is known, in contrast to the positional code, the residue code  of the number X does not explicitly contain information about its value. Therefore, the implementation in the RNS arithmetic operations that require calculating the so-called positional characteristics which give information about the numbers location in the RNS range  encounters specific difficulties. Such procedures, in contrast to modular ones, are called non-modular.
The efficiency factor of RNS arithmetic, to a decisive extent, is determined by the optimality of the applied non-modular procedures. At the same time, the main factor that has the most impact on the quality indicators of algorithms for non-modular operations is the computational complexity of calculating the positional characteristics of the residue code and related integer representation forms.
As for Equation (7), its direct application as the general form of integers for building non-modular procedures is practically impossible due to the complexity of straightforward implementation, especially in the case of large . At the same time, the use of the specific positional characteristics enables us to obtain from (7) the relevant forms of integer representation, which have good implementation properties and make it possible to overcome the problem of time-consuming addition operations modulo .
As follows from (7), the difference
      
      is a multiple of 
. Hence, the following equality holds
      
The positional characteristic  is called a rank of the number X. In essence, the rank  is a CRT reconstruction coefficient that indicates how many times the upper bound  of the number range is exceeded when the integer value of the number X is calculated by its residue code .
Equation (9) is called a rank form of the integer X.
From (9), it also follows that the rank  is a quotient of the integer division of  by .
Therefore, since , the inequality  holds.
Compared with (7), Equation (9) does not contain time-consuming reduction modulo . Therefore, designing non-modular procedures in RNS arithmetic on the basis of the rank form has a substantial lead over the canonical CRT implementation.
  3. The Approaches Currently Used to Calculate the Rank of a Number
First, the rank of a number as a main RNS integral characteristic has been studied in [
1], and later in [
2]. The rank evaluation algorithm consists of a slow 
k-step iterative procedure of sequential additions large modulo of specific constants defined by the chosen RNS moduli-set 
. Moreover, the upper bound of the rank 
 depends on the values of the weights 
 (see (4)), and can be sufficiently large for most moduli-sets suitable for practical use. If we assume that the processing of such long 
L-bit word-length numbers 
 is comparable in time with 
k operations on the small residues, then the complexity of this method is equal to 
. Because of that, the given approach to the rank calculation is time-consuming and practically unacceptable for high-performance computing due to its computational complexity, especially when using huge 
.
The so-called “extra modulus method” for rank calculation has been proposed in [
25]. It rearranges the canonical CRT implementation to an exact integer equation, i.e., the same form as (9). To be able to retrieve the value of the CRT reconstruction coefficient, i.e., the rank of a number, the extra-modulus 
 must satisfy the following conditions: 
, and 
 is any integer prime to 
. In this way, the slow and challenging addition modulo 
 in the straightforward CRT implementation is replaced by subtraction and multiplication modulo 
. Thus, we have an extra modular channel for rank calculation. This method works well and correctly when it assumes that proper redundant residue 
 is available. Hence, the “extra modulus method” is suitable for the base extension operation. At the same time, when the number under consideration results from the modular addition or subtraction operation [
26], it cannot be used owing to eventual overflow or underflow, respectively. Thus, in such a case, the exact value of 
 is not available. Therefore, this method is not applicable for sign determination and magnitude comparison of two numbers in RNS.
A different approach to evaluating the CRT reconstruction coefficient is proposed in [
27,
28,
29]. The main idea of the so-called ”fractional domain method” consists in the representation of the reconstruction coefficient 
r as an integer part of a sum of at most 
k proper fractions (see (10)). The value 
r is recursively estimated by approximating terms of a fraction 
. To avoid division by the modulus 
 in the fraction, the denominator 
 is replaced by 
, while the numerator 
 is approximated by its most significant 
 bits 
. Since the division by powers of 2 is equivalent to simple shifts, then the calculation of 
r can be implemented by addition only.
The main drawbacks of this method consist of the following. First of all, full-precision fractional computations are required. In any case, such calculations are slower than operating on smaller word-length, and the full-precision fractional bits require substantial storage. On the other hand, the number of iterations required is of the order of the bit-length needed for the approximation. For example, the method employing a fractional interpretation of the CRT [
27] needs a very high precision of 
 bits. The method proposed in [
28,
29] uses a sequential bit-by-bit manner for evaluating reconstruction coefficient 
r. The iterative structure of this method makes it very slow in the case of large word-length numbers.
There are also approaches to reconstruct the integer value of RNS number based on the CRT by using special moduli-sets with a limited number of moduli such as 
 [
5,
8]. The main drawback of these methods consists of a small number of the selected moduli, typically from three to five. Such moduli sets are suitable for the efficient implementations of digital signal processing algorithms but completely not applicable for the processing of large numbers which are widely used in cryptography.
In recent decades, the CRT algorithm, corresponding forms of number representation, and the methods of integer reconstruction by residue code have been intensively studied, especially concerning their application in high-performance computing. The major efforts are aimed at reducing the computational complexity of calculating the main integral characteristics of residue code.
There are some new approaches for calculating an approximate value of the rank of a number which allow us to reduce the computational complexity of complicated non-modular operations in RNS arithmetic [
30,
31,
32]. The method proposed in [
30] is based on the so-called interval floating-point characteristic which provides information about the range of changes in the relative value of RNS representation. Generally, it enables us to perform effectively such operations as magnitude comparison, sign determination, and overflow detection. The concept of an approximate value of the rank of a number is introduced in [
31]. This approach allows us to reduce the computational complexity of the decoding from residue code to binary representation and decrease the size of the required coefficients. Based on the properties of the approximate value and arithmetic properties of RNS, a new method for error detection, correction, and controlling computational results has been proposed. In [
32], a new original general-purpose technique for CRT basis extension and scaling in RNS using floating-point arithmetic for the rank estimation is proposed for a homomorphic encryption scheme. The main algorithmic improvements focus on optimizing decryption and homomorphic multiplication in the RNS using the CRT to represent and manipulate the large coefficients in the ciphertext polynomials.
The rank positional characteristic has been thoroughly investigated in [
33,
34]. As shown, the rank 
 has a simple structure, high modularity of calculation, and a small range of changes. At the same time, the rank 
 is a sum of two small numbers, namely, the inexact rank 
 and two-valued rank correction 
:
      where
      
      and
      
In conventional non-redundant RNS, as it follows from (11)–(14), the calculation of the inexact rank 
 is reduced to a summation of 
k small residues 
, 
, …, 
 modulo 
 taking into account the number of the overflows occurring during the modular addition operations. At the same time, as demonstrated in [
34], the main computational cost is associated with the estimation of the rank correction 
. Its evaluation requires concurred modular addition operations in all independent modular channels corresponding to primary RNS moduli 
. These computations can be implemented easily by the pre-computation and lookup table techniques. As a result, the total number of required modular addition operations and lookup tables for rank 
 calculation are 
 and 
, respectively.
As shown in [
34], the minimum redundancy residue code enables optimization of the rank calculation. It assumes the extension of non-redundant residue code 
 of the number 
X by the redundant residue 
 concerning extra modulus 
, i.e., by adding the parity of the number 
X to its residue representation. Therefore, in minimally redundant RNS, the number 
 is represented by its minimally redundant residue code 
. So, the total residue code length increases by only one bit.
The main advantage of minimally redundant RNS compared with non-redundant analogs consists of a significant simplification of calculating the rank correction  and, accordingly, the rank .
The use of minimum redundancy residue code makes it possible to replace in (
11) the rank correction 
, which evaluation is time-consuming and requires performing addition operations in all modular channels, with a trivially calculated binary attribute 
. At the same time,
      
      and
      
      where 
, 
, and 
.
Compared with non-redundant analogs, the use of minimally redundant RNS enables us to reduce significantly the complexity of the rank  calculation both in terms of required modular addition operations and lookup tables. At the same time, the corresponding computational cost is k modular addition operations and k one-input lookup tables. The time complexity depends only on the number of primary RNS moduli and equals  modular clock cycles.
As shown in [
34], the transition to the minimum redundant residue code enables a decrease in the computational complexity of the rank calculation from the order 
 to 
 concerning required modular addition operations and lookup tables. Thus, the computational complexity reduction factor increases with the number 
k of non-redundant moduli 
 and asymptotically approaches the threshold 
.
The use of minimally redundant RNS ensures significant optimization of calculating the rank  of the number X. Moreover, this is also applied to the implementation of the CRT algorithm and, correspondingly, to the execution of various non-modular operations based on it. First of all, that is caused owing to the extreme simplicity evaluation of two-valued characteristic  as well as the modular structure of the main calculation equation for inexact rank  (see (12)). This circumstance enables radical simplifying the calculation of the rank  in minimally redundant RNS in comparison with conventional non-redundant RNS and, consequently, makes it possible to construct faster and optimal with respect to computational complexity variants of RNS arithmetic.
Therefore, the application of minimally redundant residue representation takes priority over conventional non-redundant RNS arithmetic to implement the scaling procedures based on the rank form of a number.
  4. The Main Types of Scaling Algorithms in RNS Arithmetic
In the conventional WNS, the power-of-two scaling is performed simply by right shifting. In the RNS, compared with WNS, this procedure has substantial difficulty because it is not easily implementable due to its non-positional nature.
The classical power-of-two scaling method consists of the residue code conversion to binary representation, scaling in the conventional WNS, and converting the result back to the RNS.
Unlike the WNS, the residue code does not contain explicit information about the integer value of the represented number. Therefore, in addition to its usual purpose, which consists of limiting the undesirable growth of calculation results, the scaling in RNS is also used to detect the position of integers in a particular range (i.e., to evaluate their values), rounding, and solving other similar tasks. This operation is often used in more complex non-modular procedures such as general modular division. Many different scaling algorithms, which do not require RNS-to-binary conversion, have been presented in the literature. A detailed review of the known modular scaling methods is presented in [
8].
The essence of the modular scaling operation is to obtain some integer approximation   to the fraction , where  is an arbitrary element of the RNS number range , and S is a constant factor (scale). The fraction  is usually approximated by the integers  and  ( and  are the floor and ceiling function of x, respectively).
The most important aspect of the scaling problem in RNS is to ensure the high flexibility of the created algorithmic tools. That implies adoption of the set  of scales   which is usually chosen based on the criterion for the minimum calculating error under a given constraint on the number of scaling factors.
All known scaling techniques can be classified into four main categories:
- 1.
- scaling by the product of some RNS moduli [ 32- , 35- , 36- , 37- , 38- ], 
- 2.
- scaling by an integer from the RNS number range [ 39- , 40- ], 
- 3.
- scaling by a common fraction [ 41- ], 
- 4.
- scaling by a power of two [ 42- , 43- , 44- ]. 
In the first group, many scaling methods take the scaling factor 
S as a product of 
l moduli, i.e., of the form 
  [
35,
36,
37,
38]. That makes it easier to obtain the residues 
 of the approximation 
 to the fraction 
. The remaining residues 
 can be calculated sufficiently lightly within the framework of procedures based on one of the base extension algorithms [
2,
35,
45]. Due to the small word length of residues, the pre-computation and lookup table techniques are suitable for modular scaling.
In [
35], the base extension algorithm uses the reverse conversion of residue code to mixed-radix representation. The method proposed in [
36] requires a redundant modulus to evaluate the CRT reconstruction coefficient, i.e., the rank of a number, to complete the base extension procedure. In [
38], the suggested approach is entirely based on a lookup tables technique, while all the required tables have two inputs. At the same time, the memory costs are too high when the number of chosen moduli is sufficiently large. The method proposed in [
37] enables one to carry out base extension and exact scaling without some system redundancy only by using additional lookup tables.
The CRT-base technique for modular scaling by an integer has been suggested in [
39]. Here, the main idea is to approximate the CRT calculating relation for reconstructing the integer value of RNS numbers. This enables the substitution of large modulo 
 addition in the canonic CRT-decoding scheme by smaller word-length modular addition operations. In [
40], the proposed method uses minimum redundancy for modular scaling by arbitrary positive scales. The distinctive feature of the algorithm consists of using the interval index as a positional characteristic of residue code. At the same time, the interval index can be calculated fast and lightly by modular addition of small residues in the 
kth modular channel corresponding to the modulus 
 from the RNS moduli-set 
.
In the case of arbitrary rational scale 
S, an efficient basis for modular scaling is the approach presented in [
41]. The main feature is that for the scales of the form 
, the numbers 
p and 
q can take any integer values for which the fraction 
 does not exceed the upper bound of the RNS number range. In addition, both the number 
 and the results of intermediate calculations may not satisfy the specified requirement.
The scaling methods in the fourth group implement division by constants of the form 
 , 
 [
7,
42,
43]. General approaches to solving this task are based mainly on the bisection method. It consists of calculating the recurrence relation 
 for 
. In this case, 
, and 
. The residue 
  of approximation 
 is determined as
      
      while all the primary moduli 
 are coprime odd numbers. The last condition ensures that 2 and 
 are relatively prime numbers, and, correspondingly, the existence of a modular multiplicative inverse of 2, i.e., the number 
. As followed from (17), the scaling by 2 requires the parity detection of the number 
, 
. So, there is a need for a base extension operation to extra modulus equal 2.
An iterative algorithm for scaling by the factor 
 proposed in [
42] is implemented in 
l steps. At the same time, the parity of the intermediate results is checked at each iteration using the base extension operation suggested in [
25]. In [
43], the power-of-two scaling technique is applied to realize a digital filter in quadratic RNS. The scaling algorithm presented in [
44] focuses on arbitrary moduli sets with large dynamic ranges and requires only machine-precision integer and floating-point operations. At the same time, it is used for software implementation of rounding and exponent alignment procedures in a multiple-precision RNS-based arithmetic library for parallel CPU-GPU systems.
Many modular scaling algorithms use special moduli sets with a limited number of moduli. A detailed review of some of these methods is given in [
8]. The most commonly used moduli sets for efficient RNS scalers are 
, 
, 
, 
 among others [
46,
47,
48,
49,
50,
51]. The main drawback of such approaches is imposing very restrictive constraints on the moduli sets. They are certainly suitable for implementing scaling tasks in digital signal processing but, at the same time, they do not fit for scaling and other non-modular operations on numbers belonging to large dynamic ranges which are widely used in long-word-length cryptography.
  6. A Novel Power-of-Two Modular Scaling Based on the Rank Positional Characteristic in Minimally Redundant RNS
Theorem 2 implies the following step algorithm for power-of-two scaling in minimally redundant RNS with primary pairwise prime odd modules , extra modulus , and scales of the form .
S.1. Based on the minimum redundant residue code  of the original number X, the rank  is calculated following to (12)–(16). In addition, it is assumed that , , , and .
S.2. For the residue number 
, the integer
      
      is calculated, where
      
 and 
 are obtained by formulas similar to (19) and (20), namely:
.
 S.3. The digits 
, 
, ⋯, 
 of the minimally redundant residue code and the rank 
 of the number 
 are determined, respectively, according to the rules
      
 S.4. The redundant residue 
 is calculated according to equation following from the rank form (9)
      
      where 
 and 
. In essence, it determines the parity of the number 
.
 If , then the number  is the required number, and the scaling process ends. Otherwise, the variable j is incremented by one , and the jump to step S.2 is carried out.
For its hardware implementation, the most important feature of the above recursive scaling algorithm is that the specified operations on steps S.2, S.3, and S.4 can be combined in time and carried out within one modular clock cycle. Due to this circumstance, after obtaining the rank , each iteration of RNS number scaling by 2, i.e., each shift of its integer value by one bit to the right, is performed in one modular clock cycle.
Since the calculating process of the rank  has a pipeline structure, with the appropriate organization of computations the described scaling procedure at low hardware costs provides a reasonably high speed.
It follows from the above that all the necessary calculations within the scaling algorithm can be implemented using tabular computational structures.
For example, the calculation of the inexact rank  of the initial number X is reduced to a summation of the sets of small residues  modulo . Simultaneously, we take into account the number of occurred overflows when performing these modular addition operations (see (12)–(14)). Therefore, we need k one-input lookup tables to store the given set, while the bit length of recorded residues is . At the same time, the estimation of two-valued rank correction  (see (16)) requires the set  of least significant bits of normalized residues  of the number X (see (6)).
Similarly, the sets of binary flags ,  and also  (see (26)–(28)) enable us to obtain the integer  required for rank calculating in the corresponding iterations of scaling procedure . All these binary sets can also be recorded in the appropriate lookup tables.
Thus, the content of the ith lookup table corresponding to the input residue  has the form  .
Below we present the proposed scaling method in the form of a pseudo-code algorithm.
Let us evaluate the computational complexity of the proposed iterative power-of-two scaling method. As follows from the above, Algorithm 1 requires total 
 modular clock cycles. According to [
33,
34], in minimally redundant RNS, the time complexity of calculating the rank 
 of the initial number 
X depends only on the number 
k of primary RNS moduli and can be evaluated as 
. At the same time, all calculations within each iteration, consisting in obtaining both the minimally redundant residue code 
 and the rank 
 of the number 
, can be performed in one modular clock cycle by using lookup table technique. Therefore, 
. Hence, the algorithm time complexity 
 modular clock cycles.   
      
| Algorithm 1: Power-of-two scaling in minimally redundant RNS | 
| ![Entropy 24 01824 i001 Entropy 24 01824 i001]() | 
To illustrate the power-of-two scaling of the number  based on the rank form (9) in the proposed minimally redundant RNS, we present below a numerical example.
Let us consider the RNS with the primary moduli , , , and , taking into account the excess modulus 
Example 1. Suppose we wish to scale the number  having the minimally redundant residue code  by the constant .
Therefore, the number of required iterations is .
Before describing the proposed scaling algorithm, we give below the required primitive constants used in the RNS under consideration. So, we have
,
,
, , , ,
.
S.1. The rank calculation of the initial number.
First, having the non-redundant residue code  of the number X, by using lookup tables, we obtain the following sets of residues and least-significant bits, respectively,
,
.
Let us show in more detail how these values were obtained, according to (6), (19), and (13), (14), respectively, before storing in the lookup tables:
, ,
, ,
, ,
, ,
,
,
,
.
Further, using the set of residues , according to (12), we calculate the inexact rankand also take its parity bit Then, taking into account that , using the set  and , according to (16), we find two-valued correction As a result, according to (16), we get the exact rank of the initial number X To verify the obtained result, using the rank form (9), we find In addition, it is assumed that , , , , .
Iteration 1.
S.2.1.Since , using the sets of binary flags (see (27) and (28))
,
,
according to (25), we calculate the quantity S.3.1.We calculate the non-redundant residue code and the rank of the number , according to (29) and (30), respectively:
,
,
.
S.4.1.Using the set  corresponding to the non-redundant residue code  and taking into account that , according to (31), we find
.
Hence, as a result of Iteration 1, we have the minimally redundant residue code  of the number .
Iteration 2.
S.2.2. Since , using the following sets of binary flags
,
,
S.3.2. We calculate the non-redundant residue code and the rank of the number :
,
,
.
S.4.2. Using the set  corresponding to the non-redundant residue code  and taking into account that , we obtain
.
Hence, as a result of Iteration 2, we have the minimally redundant residue code  of the number .
Iteration 3.
S.2.3. Since , using the set
,
according to (25), we find S.3.3.We calculate the non-redundant residue code and the rank of the number :
,
,
.
S.4.3.Using the set  corresponding to the non-redundant residue code  and taking into account that , we get
.
Hence, as a result of Iteration 3, we have the minimally redundant residue code  of the number .
As far as , the scaling procedure ends, and the number  is the desired solution.
To verify the obtained result, according to the rank form (9), we find The result is correct.
 The above example shows that the use of minimally redundant RNS enables us to optimize and speed up the power-of-two scaling procedure compared with the conventional non-redundant RNS to a large extent. First of all, that is caused by the extreme simplicity of calculating the inexact rank  and estimating two-valued characteristic  of the initial number X as well as by the trivial operations for obtaining the rank   at each iteration of the scaling procedure (see Theorem 2).
Therefore, the proposed minimally redundant residue representation takes priority over non-redundant analogs in optimization and speed-up of the scaling and other non-modular procedures based on the CRT implementation using a rank characteristic.
  7. Discussion
Let us now discuss the theoretical and practical aspects of the approach proposed in this paper.
As followed from (17), the power-of-two scaling algorithm based on the bisection method requires the parity detection of the number   at each iteration. Therefore, fast calculating the residue concerning extra modulus  is a significantly important task.
In conventional non-redundant RNS, the parity detection of the number 
 is usually based on estimating the integer value of 
 by the use of specific positional characteristics. The generally accepted ones are the digits of mixed-radix representation, core function, the rank of a number, and interval index [
1,
2,
3,
5,
8].
In RNS arithmetic, the parity check of a number refers to complicated non-modular operations requiring high computational costs. The computational complexity of this operation is comparable to the computational complexity of the reverse conversion from the residue code into the mixed-radix representation or to the calculation of the rank of a number.
Generally, in non-redundant RNS, the implementation of parallel parity check algorithm requires 
 modular addition operations [
33,
34]. So it can become computationally expensive for large values of 
k. Thus, for efficient implementation of the power-of-two scaling algorithm based on the bisection method, one needs to speed up and optimize the RNS parity detection technique.
In this article, the proposed approach to power-of-two scaling is based on using the rank of a number as the main RNS positional characteristic. Therefore, in our case, obtaining residue modulo  is reduced to the calculation of the rank  with the following use of the rank form (9).
Thus, determining the parity of a number has a computational complexity identical to the complexity of rank calculating concerning the numbers of required modular addition operations  and lookup tables . At the same time, obtaining the residue code  of the number  needs k additional lookup tables (see (17)).
Therefore, the computational cost of the iterative procedure of scaling by  consists of  modular addition operations and  lookup tables, whereas the time complexity is  modular clock cycles, where  is a performance time of one iteration based on the bisection method.
Thus, in conventional non-redundant RNS, the computational cost of the canonical power-of-two scaling procedure based on the bisection method (17) and the rank calculation method described in [
34] is estimated as
      
The main advantage of the proposed approach to power-of-two scaling over the existing ones consists in the use of minimally redundant RNS and the novel method for calculating the rank of a number resulting from division by two (see Theorem 2) in each iteration of the scaling algorithm. This circumstance enables a significant reduction of the computational complexity of the scaling algorithm.
As follows from [
34], the corresponding computational cost of calculating the rank 
 of the initial number 
X is 
 and 
 in terms of required modular addition operations and lookup tables, respectively. Furthermore, the performance time of the rank calculation is 
 modular clock cycles (see 
Section 3).
It is important to note that all calculations at each iteration are implemented using the lookup tables technique and the simplest combinational logic circuits.
As shown above, the minimally redundant residue code of the number   is yielded in only one modular clock cycle and needs the use of  additional lookup tables. At the same time, the first k of these lookup tables are used for obtaining the residue code , while the last lookup table gives us the rank  of the number  (see (29) and (30)). So, at each iteration, there are no additional modular operations.
The total numbers of required modular addition operations and lookup tables are estimated, respectively, as
      
      and
      
The time complexity of the novel power-of-two scaling algorithm is  modular clock cycles.
Thus, the use of minimally redundant RNS and novel approach to rank calculation at each iteration of power-of-two scaling (see Theorem 2) enables significant decrease of the computational complexity. The corresponding reduction factors of the computational complexity, in terms of the required modular addition operations (see (32) and (34)) and lookup tables (see (33) and (35)), are
      
It should be noted that the use of the novel method for calculating the rank 
 of the number 
 at each iteration of the scaling procedure (see Theorem 2) in non-redundant RNS, gives us the following computational cost
      
Simultaneously, the time complexity is  modular clock cycles.
As can be seen, the reduction factors of the computational complexity of power-of-two scaling based on Theorem 2 in minimally redundant RNS compared with conventional non-redundant RNS are represented by the following fractions
      
In this case, as follows from (40), the reduction factor  does not depend on the value  . At the same time, .
The dependence of the reduction factors 
 and 
 on the number of primary RNS moduli 
k is presented in 
Table 3.
Thus, the use of minimally redundant RNS and novel approach to calculating the rank of a number at each iterations of bisection method enables radically simplifying the carrying out of power-of-two scaling compared with conventional non-redundant RNS. This circumstance enables us to construct faster and optimal in computational cost RNS-oriented complicated computing procedures which widely use scaling algorithms.