Next Article in Journal
A New Deep Learning Method with Self-Supervised Learning for Delineation of the Electrocardiogram
Next Article in Special Issue
EXK-SC: A Semantic Communication Model Based on Information Framework Expansion and Knowledge Collision
Previous Article in Journal
An Image Encryption Algorithm Using Cascade Chaotic Map and S-Box
Previous Article in Special Issue
DIR-Net: Deep Residual Polar Decoding Network Based on Information Refinement
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Efficient CRT-Base Power-of-Two Scaling in Minimally Redundant Residue Number System

by
Mikhail Selianinau
and
Yuriy Povstenko
*
Department of Mathematics and Computer Sciences, Faculty of Science and Technology, Jan Dlugosz University in Czestochowa, al. Armii Krajowej 13/15, 42-200 Czestochowa, Poland
*
Author to whom correspondence should be addressed.
Entropy 2022, 24(12), 1824; https://doi.org/10.3390/e24121824
Submission received: 23 November 2022 / Revised: 9 December 2022 / Accepted: 10 December 2022 / Published: 14 December 2022
(This article belongs to the Special Issue Advances in Information and Coding Theory)

Abstract

:
In this paper, we consider one of the key problems in modular arithmetic. It is known that scaling in the residue number system (RNS) is a rather complicated non-modular procedure, which requires expensive and complex operations at each iteration. Hence, it is time consuming and needs too much hardware for implementation. We propose a novel approach to power-of-two scaling based on the Chinese Remainder Theorem (CRT) and rank form of the number representation in RNS. By using minimal redundancy of residue code, we optimize and speed up the rank calculation and parity determination of divisible integers in each iteration. The proposed enhancements make the power-of-two scaling simpler and faster than the currently known methods. After calculating the rank of the initial number, each iteration of modular scaling by two is performed in one modular clock cycle. The computational complexity of the proposed method of scaling by a constant S l = 2 l associated with both required modular addition operations and lookup tables is estimeted as k and 2 k + 1 , respectively, where k equals the number of primary non-redundant RNS moduli. The time complexity is log 2 k + l modular clock cycles.

1. Introduction

Nowadays, high-performance computing is progressing extremely rapidly. This makes qualitatively new demands to designed number-theoretic methods and computational algorithms. That is why creating fundamentally new and efficient computing tools for fast and reliable parallel data processing is especially important. Modular computational structures occupy a special place among them. Modular arithmetic, i.e., the arithmetic of RNS, creates their mathematical basis.
The inherent parallelism and carry-free properties of RNS provide a high potential for accelerating arithmetic operation compared with conventional weighted number systems (WNS). The main advantage of RNS consists of its unique ability to decompose large integer numbers into a set of small residues and to process them in parallel in independent modular channels.
A steadily growing interest in RNS arithmetic as a unique means of carrying out high-speed calculations stimulates developments focused on providing a fundamentally new performance level when carrying out large volumes of time-consuming calculations. The modular arithmetic has attracted the considerable attention of researchers and developers in number-theoretic methods [1,2,3], computer technology [4,5], digital signal and image processing [3,5,6,7,8], cryptography [5,8,9,10], computer networks and communication systems [3,5], and other areas [9].
In this regard, one of the most promising ways in the specified area is the development of high-speed parallel modular computational structures as well as the enhancement of their functionality and optimization. In this case, the main optimization criteria are the minimum redundancy of data coding, the execution time minimization of implemented computational procedures, and the throughput maximization of the corresponding computational structures.
As is known, compared with WNS, the residue code of a number does not explicitly contain information about its integer value. Therefore, in RNS arithmetic, the implementation of operations, which require the estimating of the integer value of a number by its residue code, i.e., the evaluation of number position in the operating range, encounters specific difficulties. Such operations, in contrast to modular ones, are called non-modular.
The positional characteristics of the residue code such as core function, the rank of a number, interval index, and others, and the associated forms of number representation are of great importance for designing algorithms of non-modular operations [1,5,7,8]. The computational complexity of calculating the used positional characteristics ultimately determines the efficiency of the corresponding configuration of RNS arithmetic.
Division is one of the most complex arithmetic operations. Even in computers operating in a positional system, this operation stands apart, and its execution requires much more time than most elementary operations. In RNS arithmetic, the hardships with division operations are related to the non-modular character of this operation. This means that the residue of the quotient concerning primary RNS modulus is determined not only by the dividend and divisor residues, and it is necessary to get the additional information, in one form or another, about the integer values of the dividend and divisor [1,7]. It is no coincidence that many publications are devoted to the problem of modular division, for example [11,12,13,14,15,16,17,18,19].
Along with the general division, modular scaling, i.e., the division of the RNS number by a constant, is a commonly used operation [3,5,8,10]. This operation plays a fundamental role in constructing residue arithmetic algorithms and is of great practical importance. The need for scaling is due to several tasks, for example, to round the floating point numbers with the residue representation of the mantissa and to reduce the dynamic range in digital signal processing and long-word-length cryptography. In addition, scaling by a power of two is often one of the integral steps of more complex non-modular operations, for example, in the method of general modular division [19,20].
Thus, developing novel approaches and methods for fast scaling is highly important in high-performance computing based on parallel algorithmic structures of RNS, especially for high-speed implementing digital signal processing applications and public-key cryptosystems. That should make it possible to widely use modular arithmetic in various priority areas of science and technology.
In the paper, we present a new approach to the power-of-two scaling based on using minimal redundancy of residue code, the rank form of a number, and fast calculation of the rank characteristic at each iteration of the scaling procedure. Compared with the conventional non-redundant RNS, the proposed method makes it possible to optimize and speed up the non-modular scaling operation and concurrently reduces its computational complexity to a large extent.
The paper is structured as follows. Section 2 discusses the basic theoretical concepts of the research. Section 3 presents the known approaches to rank calculation. Section 4 and Section 5 describe the RNS scaling algorithms, and the mathematical basis of the rank calculation in the bisection scaling method. Section 6 presents a novel power-of-two scaling algorithm and a numerical example. Section 7 provides discussion and Section 8 concludes the paper.

2. The Basic Concepts of RNS Arithmetic

Abstract algebra and number theory [21,22] constitute the theoretical foundation of RNS arithmetic.Traditionally, the apparatus of congruences is used for the mathematical formalization of an RNS with integer ranges. At the same time, Euclid’s Division Lemma plays a fundamental role in building an RNS of the concerned type. For the ring Z of integers, it is formulated as follows.
Lemma 1 (Euclid’s Division Lemma).
For any X Z and a positive integer m, there exists a unique pair of integers Q , R such that
X = Q m + R ,
where R Z m = 0 , 1 , , m 1 .
On the set Z of integers, a non-redundant RNS is defined using pairwise prime moduli m 1 , m 2 , , m k k > 1 by the mapping Z Z m 1 × Z m 2 × × Z m k , which assigns to each X Z the k-tuple ( χ 1 , χ 2 , , χ k ) of least nonnegative residues χ i = X m i of dividing X by m i i = 1 , 2 , , k . At the same time, the notation X = χ 1 , χ 2 , , χ k is used.
The residue code χ 1 , χ 2 , , χ k corresponds to the set of all integers X satisfying the system of simultaneous linear congruences
X χ 1 ( mod m 1 ) , X χ 2 ( mod m 2 ) , X χ k ( mod m k )
The following statement is true [9,23,24].
Theorem 1 (Chinese Remainder Theorem).
Let the moduli m 1 , m 2 , , m k be pairwise prime, and let M k = i = 1 k m i , M i , k = M k / m i , μ i , k = M i , k 1 m i i = 1 , 2 , , k . Then the system of congruences (2) has a unique solution, the class of residues modulo M k , defined by the congruence
X i = 1 k M i , k μ i , k χ i ( mod M k ) .
The practical application of the RNS assumes that each residue code χ 1 , χ 2 , , χ k must correspond only to one integer number. Therefore, certain sets of representatives of residue classes are used as the number range to ensure required single-valued correspondence. Since in the given RNS it is possible to represent M k integers, the set Z M k = 0 , 1 , , M k 1 is usually used in computer applications as an RNS operating range.
Because of the above, we define modular coding as a mapping Φ R N S : Z M k Z 1 × Z 2 × × Z m k , which assigns a residue code ( χ 1 , χ 2 , , χ k ) to each X Z M k .
The decoding mapping Φ R N S 1 : Z 1 × Z 2 × × Z m k Z M k based on the CRT (3) executes according to the rule
X = i = 1 k M i , k μ i , k χ i M k .
Applying Euclid’s Division Lemma (1), we can write
μ i , k χ i = μ i , k χ i m i + μ i , k χ i m i m i = χ i , k + μ i , k χ i m i m i ,
where χ i , k is a normalized residue modulo m i :
χ i , k = μ i , k χ i m i i = 1 , 2 , , k ,
x denotes the largest integer less than or equal to x.
Substituting (5) into (4), and taking into consideration (6), we have
X = i = 1 k M i , k χ i , k + M k i = 1 k μ i , k χ i m i M k
that is equivalent to
X = i = 1 k M i , k χ i , k M k .
Since the summands in (7) have narrower change bounds, the use of (4), which is a normalized analog of (1), is preferable for constructing RNS arithmetic.
Equation (7) is called the CRT-form of representing the integer X = χ 1 , χ 2 , , χ k from the RNS number range Z M k .
The mapping Φ R N S is an isomorphism concerning the basic arithmetic operations. The operation + , , × on arbitrary elements A and B given by their residue codes A = α 1 , α 2 , , α k and B = β 1 , β 2 , , β k is carried out by the rule
A B = α 1 , α 2 , , α k β 1 , β 2 , , β k =
= α 1 β 1 m 1 , α 2 β 2 m 2 , , α k β k m k ,
where α i = A m i , β i = B m i , i = 1 , 2 , , k .
In the RNS, according to (8), the modular addition, subtraction, and multiplication are performed independently for each modulus m i i = 1 , 2 , , k . It must be noted that (8) is correct only if the result A B of the arithmetic operation does not go beyond the RNS number range, i.e., if A B Z M k .
The RNS inherent code parallelism illustrated by (8), which consists of the decomposition of arithmetic operations on integers A and B into independent small word length operations on the like digits α i and β i of residue code, is the main advantage of modular arithmetic compared with the arithmetic of weighted number systems (WNS). Realizing this advantage to the fullest extent is a key strategic goal of all computer applications in the RNS.
As is known, in contrast to the positional code, the residue code ( χ 1 , χ 2 , , χ k ) of the number X does not explicitly contain information about its value. Therefore, the implementation in the RNS arithmetic operations that require calculating the so-called positional characteristics which give information about the numbers location in the RNS range Z M k encounters specific difficulties. Such procedures, in contrast to modular ones, are called non-modular.
The efficiency factor of RNS arithmetic, to a decisive extent, is determined by the optimality of the applied non-modular procedures. At the same time, the main factor that has the most impact on the quality indicators of algorithms for non-modular operations is the computational complexity of calculating the positional characteristics of the residue code and related integer representation forms.
As for Equation (7), its direct application as the general form of integers for building non-modular procedures is practically impossible due to the complexity of straightforward implementation, especially in the case of large M k . At the same time, the use of the specific positional characteristics enables us to obtain from (7) the relevant forms of integer representation, which have good implementation properties and make it possible to overcome the problem of time-consuming addition operations modulo M k .
As follows from (7), the difference
i = 1 k M i , k χ i , k X
is a multiple of M k . Hence, the following equality holds
X = i = 1 k M i , k χ i , k ρ k X M k .
The positional characteristic ρ k X is called a rank of the number X. In essence, the rank ρ k X is a CRT reconstruction coefficient that indicates how many times the upper bound M k of the number range is exceeded when the integer value of the number X is calculated by its residue code χ 1 , χ 2 , , χ k .
Equation (9) is called a rank form of the integer X.
From (9), it also follows that the rank ρ k X is a quotient of the integer division of X k by M k .
Hence, we obtain
ρ k X = 1 M k i = 1 k M i , k χ i , k = i = 1 k χ i , k m i .
Therefore, since χ i , k Z m i i = 1 , 2 , , k , the inequality 0 ρ k X k 1 holds.
Compared with (7), Equation (9) does not contain time-consuming reduction modulo M k . Therefore, designing non-modular procedures in RNS arithmetic on the basis of the rank form has a substantial lead over the canonical CRT implementation.

3. The Approaches Currently Used to Calculate the Rank of a Number

First, the rank of a number as a main RNS integral characteristic has been studied in [1], and later in [2]. The rank evaluation algorithm consists of a slow k-step iterative procedure of sequential additions large modulo of specific constants defined by the chosen RNS moduli-set m 1 , m 2 , , m k . Moreover, the upper bound of the rank r ( X ) depends on the values of the weights μ 1 , k , μ 2 , k , , μ k , k (see (4)), and can be sufficiently large for most moduli-sets suitable for practical use. If we assume that the processing of such long L-bit word-length numbers L = log 2 M k is comparable in time with k operations on the small residues, then the complexity of this method is equal to O k 2 . Because of that, the given approach to the rank calculation is time-consuming and practically unacceptable for high-performance computing due to its computational complexity, especially when using huge M k .
The so-called “extra modulus method” for rank calculation has been proposed in [25]. It rearranges the canonical CRT implementation to an exact integer equation, i.e., the same form as (9). To be able to retrieve the value of the CRT reconstruction coefficient, i.e., the rank of a number, the extra-modulus m e must satisfy the following conditions: m e > k , and m e is any integer prime to M k . In this way, the slow and challenging addition modulo M k in the straightforward CRT implementation is replaced by subtraction and multiplication modulo m e . Thus, we have an extra modular channel for rank calculation. This method works well and correctly when it assumes that proper redundant residue X m e is available. Hence, the “extra modulus method” is suitable for the base extension operation. At the same time, when the number under consideration results from the modular addition or subtraction operation [26], it cannot be used owing to eventual overflow or underflow, respectively. Thus, in such a case, the exact value of X m e is not available. Therefore, this method is not applicable for sign determination and magnitude comparison of two numbers in RNS.
A different approach to evaluating the CRT reconstruction coefficient is proposed in [27,28,29]. The main idea of the so-called ”fractional domain method” consists in the representation of the reconstruction coefficient r as an integer part of a sum of at most k proper fractions (see (10)). The value r is recursively estimated by approximating terms of a fraction χ i , k / m i . To avoid division by the modulus m i in the fraction, the denominator m i is replaced by 2 n m i < 2 n , while the numerator χ i , k is approximated by its most significant υ bits υ < n i = 1 , 2 , , k . Since the division by powers of 2 is equivalent to simple shifts, then the calculation of r can be implemented by addition only.
The main drawbacks of this method consist of the following. First of all, full-precision fractional computations are required. In any case, such calculations are slower than operating on smaller word-length, and the full-precision fractional bits require substantial storage. On the other hand, the number of iterations required is of the order of the bit-length needed for the approximation. For example, the method employing a fractional interpretation of the CRT [27] needs a very high precision of log 2 k M k bits. The method proposed in [28,29] uses a sequential bit-by-bit manner for evaluating reconstruction coefficient r. The iterative structure of this method makes it very slow in the case of large word-length numbers.
There are also approaches to reconstruct the integer value of RNS number based on the CRT by using special moduli-sets with a limited number of moduli such as m = 2 n + d d 1 , 0 , 1 [5,8]. The main drawback of these methods consists of a small number of the selected moduli, typically from three to five. Such moduli sets are suitable for the efficient implementations of digital signal processing algorithms but completely not applicable for the processing of large numbers which are widely used in cryptography.
In recent decades, the CRT algorithm, corresponding forms of number representation, and the methods of integer reconstruction by residue code have been intensively studied, especially concerning their application in high-performance computing. The major efforts are aimed at reducing the computational complexity of calculating the main integral characteristics of residue code.
There are some new approaches for calculating an approximate value of the rank of a number which allow us to reduce the computational complexity of complicated non-modular operations in RNS arithmetic [30,31,32]. The method proposed in [30] is based on the so-called interval floating-point characteristic which provides information about the range of changes in the relative value of RNS representation. Generally, it enables us to perform effectively such operations as magnitude comparison, sign determination, and overflow detection. The concept of an approximate value of the rank of a number is introduced in [31]. This approach allows us to reduce the computational complexity of the decoding from residue code to binary representation and decrease the size of the required coefficients. Based on the properties of the approximate value and arithmetic properties of RNS, a new method for error detection, correction, and controlling computational results has been proposed. In [32], a new original general-purpose technique for CRT basis extension and scaling in RNS using floating-point arithmetic for the rank estimation is proposed for a homomorphic encryption scheme. The main algorithmic improvements focus on optimizing decryption and homomorphic multiplication in the RNS using the CRT to represent and manipulate the large coefficients in the ciphertext polynomials.
The rank positional characteristic has been thoroughly investigated in [33,34]. As shown, the rank ρ k X has a simple structure, high modularity of calculation, and a small range of changes. At the same time, the rank ρ k X is a sum of two small numbers, namely, the inexact rank ρ ^ k X < k and two-valued rank correction Δ k X 0 , 1 :
ρ k X = ρ ^ k X + Δ k X ,
where
ρ ^ k X = 1 m k i = 1 k R i , k ( χ i )
and
R i , k ( χ i ) = m k χ i , k m i = m k μ i , k χ i m i m i i = 1 , 2 , . . . , k 1 ,
R k , k ( χ k ) = χ k , k = μ k , k χ k m k .
In conventional non-redundant RNS, as it follows from (11)–(14), the calculation of the inexact rank ρ ^ k X is reduced to a summation of k small residues R 1 , k ( χ 1 ) , R 2 , k ( χ 2 ) , …, R k , k ( χ k ) modulo m k taking into account the number of the overflows occurring during the modular addition operations. At the same time, as demonstrated in [34], the main computational cost is associated with the estimation of the rank correction Δ k X . Its evaluation requires concurred modular addition operations in all independent modular channels corresponding to primary RNS moduli m 1 , m 2 , , m k . These computations can be implemented easily by the pre-computation and lookup table techniques. As a result, the total number of required modular addition operations and lookup tables for rank ρ k X calculation are k 2 + 5 k 10 / 2 and k 2 + k 2 / 2 , respectively.
As shown in [34], the minimum redundancy residue code enables optimization of the rank calculation. It assumes the extension of non-redundant residue code χ 1 , χ 2 , , χ k of the number X by the redundant residue χ 0 = X m 0 concerning extra modulus m 0 = 2 , i.e., by adding the parity of the number X to its residue representation. Therefore, in minimally redundant RNS, the number X Z M k is represented by its minimally redundant residue code χ 0 , χ 1 , , χ k . So, the total residue code length increases by only one bit.
The main advantage of minimally redundant RNS compared with non-redundant analogs consists of a significant simplification of calculating the rank correction Δ k X and, accordingly, the rank ρ k X .
The use of minimum redundancy residue code makes it possible to replace in (11) the rank correction Δ k X , which evaluation is time-consuming and requires performing addition operations in all modular channels, with a trivially calculated binary attribute δ k X 0 , 1 . At the same time,
ρ k X = ρ ^ k X + δ k X
and
δ k X = χ 0 + i = 1 k ψ i + ρ ^ 0 2 ,
where χ 0 = X 2 , ψ i = χ i , k 2 = μ i , k χ i m i 2 , and ρ ^ 0 = ρ ^ k X 2 .
Compared with non-redundant analogs, the use of minimally redundant RNS enables us to reduce significantly the complexity of the rank ρ k X calculation both in terms of required modular addition operations and lookup tables. At the same time, the corresponding computational cost is k modular addition operations and k one-input lookup tables. The time complexity depends only on the number of primary RNS moduli and equals T r a n k = log 2 k modular clock cycles.
As shown in [34], the transition to the minimum redundant residue code enables a decrease in the computational complexity of the rank calculation from the order O k 2 / 2 to O k concerning required modular addition operations and lookup tables. Thus, the computational complexity reduction factor increases with the number k of non-redundant moduli m 1 , m 2 , , m k and asymptotically approaches the threshold k / 2 .
The use of minimally redundant RNS ensures significant optimization of calculating the rank ρ k X of the number X. Moreover, this is also applied to the implementation of the CRT algorithm and, correspondingly, to the execution of various non-modular operations based on it. First of all, that is caused owing to the extreme simplicity evaluation of two-valued characteristic δ k ( X ) { 0 , 1 } as well as the modular structure of the main calculation equation for inexact rank ρ ^ k X (see (12)). This circumstance enables radical simplifying the calculation of the rank ρ k X in minimally redundant RNS in comparison with conventional non-redundant RNS and, consequently, makes it possible to construct faster and optimal with respect to computational complexity variants of RNS arithmetic.
Therefore, the application of minimally redundant residue representation takes priority over conventional non-redundant RNS arithmetic to implement the scaling procedures based on the rank form of a number.

4. The Main Types of Scaling Algorithms in RNS Arithmetic

In the conventional WNS, the power-of-two scaling is performed simply by right shifting. In the RNS, compared with WNS, this procedure has substantial difficulty because it is not easily implementable due to its non-positional nature.
The classical power-of-two scaling method consists of the residue code conversion to binary representation, scaling in the conventional WNS, and converting the result back to the RNS.
Unlike the WNS, the residue code does not contain explicit information about the integer value of the represented number. Therefore, in addition to its usual purpose, which consists of limiting the undesirable growth of calculation results, the scaling in RNS is also used to detect the position of integers in a particular range (i.e., to evaluate their values), rounding, and solving other similar tasks. This operation is often used in more complex non-modular procedures such as general modular division. Many different scaling algorithms, which do not require RNS-to-binary conversion, have been presented in the literature. A detailed review of the known modular scaling methods is presented in [8].
The essence of the modular scaling operation is to obtain some integer approximation X ^ = χ ^ 1 , χ ^ 2 , , χ ^ k i = 1 , 2 , , k to the fraction X / S , where X = χ 1 , χ 2 , , χ k is an arbitrary element of the RNS number range Z M k , and S is a constant factor (scale). The fraction X / S is usually approximated by the integers X / S and X / S ( x and x are the floor and ceiling function of x, respectively).
The most important aspect of the scaling problem in RNS is to ensure the high flexibility of the created algorithmic tools. That implies adoption of the set S = S 0 , S 1 , , S Λ 1 of scales S l > 1 l = 0 , 1 , , Λ 1 which is usually chosen based on the criterion for the minimum calculating error under a given constraint on the number of scaling factors.
All known scaling techniques can be classified into four main categories:
1.
scaling by the product of some RNS moduli [32,35,36,37,38],
2.
scaling by an integer from the RNS number range [39,40],
3.
scaling by a common fraction [41],
4.
scaling by a power of two [42,43,44].
In the first group, many scaling methods take the scaling factor S as a product of l moduli, i.e., of the form S = M l l = 1 , 2 , , k 1 [35,36,37,38]. That makes it easier to obtain the residues χ ^ l + 1 , χ ^ l + 2 , , χ ^ k of the approximation X ^ to the fraction X / S . The remaining residues χ ^ 1 , χ ^ 2 , , χ ^ l can be calculated sufficiently lightly within the framework of procedures based on one of the base extension algorithms [2,35,45]. Due to the small word length of residues, the pre-computation and lookup table techniques are suitable for modular scaling.
In [35], the base extension algorithm uses the reverse conversion of residue code to mixed-radix representation. The method proposed in [36] requires a redundant modulus to evaluate the CRT reconstruction coefficient, i.e., the rank of a number, to complete the base extension procedure. In [38], the suggested approach is entirely based on a lookup tables technique, while all the required tables have two inputs. At the same time, the memory costs are too high when the number of chosen moduli is sufficiently large. The method proposed in [37] enables one to carry out base extension and exact scaling without some system redundancy only by using additional lookup tables.
The CRT-base technique for modular scaling by an integer has been suggested in [39]. Here, the main idea is to approximate the CRT calculating relation for reconstructing the integer value of RNS numbers. This enables the substitution of large modulo M k addition in the canonic CRT-decoding scheme by smaller word-length modular addition operations. In [40], the proposed method uses minimum redundancy for modular scaling by arbitrary positive scales. The distinctive feature of the algorithm consists of using the interval index as a positional characteristic of residue code. At the same time, the interval index can be calculated fast and lightly by modular addition of small residues in the kth modular channel corresponding to the modulus m k from the RNS moduli-set { m 1 , m 2 , , m k } .
In the case of arbitrary rational scale S, an efficient basis for modular scaling is the approach presented in [41]. The main feature is that for the scales of the form S = p / q , the numbers p and q can take any integer values for which the fraction q X / p does not exceed the upper bound of the RNS number range. In addition, both the number q X and the results of intermediate calculations may not satisfy the specified requirement.
The scaling methods in the fourth group implement division by constants of the form S = 2 l   l = 1 , 2 , , Λ , Λ log 2 M k [7,42,43]. General approaches to solving this task are based mainly on the bisection method. It consists of calculating the recurrence relation X j + 1 = X j / 2 for j = 0 , 1 , , l 1 . In this case, X 0 = X , and X l = X / 2 l . The residue χ ^ i j + 1   i = 1 , 2 , , k of approximation X j + 1 is determined as
χ i j + 1 = 1 2 χ i j m i     if X j is even , 1 2 χ i j 1 m i     if X j is odd ,
while all the primary moduli m 1 , m 2 , , m k are coprime odd numbers. The last condition ensures that 2 and m i ( i = 1 , 2 , , k ) are relatively prime numbers, and, correspondingly, the existence of a modular multiplicative inverse of 2, i.e., the number 2 1 M k = 2 1 m 1 , 2 1 m 2 , , 2 1 m k . As followed from (17), the scaling by 2 requires the parity detection of the number X j , j = 0 , 1 , , l 1 . So, there is a need for a base extension operation to extra modulus equal 2.
An iterative algorithm for scaling by the factor S = 2 l proposed in [42] is implemented in l steps. At the same time, the parity of the intermediate results is checked at each iteration using the base extension operation suggested in [25]. In [43], the power-of-two scaling technique is applied to realize a digital filter in quadratic RNS. The scaling algorithm presented in [44] focuses on arbitrary moduli sets with large dynamic ranges and requires only machine-precision integer and floating-point operations. At the same time, it is used for software implementation of rounding and exponent alignment procedures in a multiple-precision RNS-based arithmetic library for parallel CPU-GPU systems.
Many modular scaling algorithms use special moduli sets with a limited number of moduli. A detailed review of some of these methods is given in [8]. The most commonly used moduli sets for efficient RNS scalers are { 2 2 n + 1 + 1 , 2 2 n + 1 , 2 2 n + 1 1 } , { 2 n 1 , 2 n + p , 2 n + 1 } , { 2 n + 1 1 , 2 n , 2 n 1 } , { 2 n + p , 2 n 1 , 2 n 1 1 } among others [46,47,48,49,50,51]. The main drawback of such approaches is imposing very restrictive constraints on the moduli sets. They are certainly suitable for implementing scaling tasks in digital signal processing but, at the same time, they do not fit for scaling and other non-modular operations on numbers belonging to large dynamic ranges which are widely used in long-word-length cryptography.

5. A Novel Approach for Calculating the Rank of a Number Resulting from Scaling by 2

In RNS, the rank ρ k X Z k = 0 , 1 , , k 1 is a principal positional characteristic since all the non-modular operations, such as magnitude comparison, sign determination, overflow detection, general division, scaling, residue-to-binary conversion, and others, can be implemented on its basis. Because the rank ρ k X enables estimation of the integer value of the RNS-number X, then the development of efficient methods and algorithms for its calculating is of primary importance in building efficient variants of RNS arithmetic and, accordingly, high-performance modular computational structures.
Let us show that the rank form (9) of the number representation in residue arithmetic creates a basis for constructing relatively fast and sufficiently simple iterative algorithms for the implementation of division by constant S l = 2 l l = 1 , 2 , , Λ , Λ log 2 M k . In this case, the following theorem is fundamental for solving the problem of modular scaling by powers of 2.
Theorem 2.
Let in RNS with pairwise prime odd moduli m 1 , m 2 , , m k the arbitrary number X = χ 1 , χ 2 , , χ k from the range Z M k having rank ρ k X be given. Then the rank of the integer X ^ = X / 2 satisfies the equation
ρ k X ^ = 1 2 ρ k X + i = 1 k ψ i     if X is even , 1 2 ρ k X ρ k 1 + i = 1 k ω i + i = 1 k φ i     if X is odd ,
where
ψ i = χ i , k 2 = μ i , k χ i m i 2 ,
ω i = 0     if     χ i , k μ i , k , 1     if     χ i , k < μ i , k ,
φ i = ψ i + ω i 2     if   μ i , k 2 = 0 , ψ i + ω i 2 ¯     if   μ i , k 2 = 1 ,
ρ k 1 is the rank of the number 1, and x ¯ denotes the negation of the Boolean value x.
Proof. 
As follows from the rank form (9), the number 1 in a given RNS has the following form
1 = i = 1 k M i , k μ i , k ρ k 1 M k .
Therefore, we can write
X ^ = X / 2 = 1 2 X X 2 =
= 1 2 i = 1 k M i , k χ i , k ρ k X M k X 2 i = 1 k M i , k μ i , k ρ k 1 M k =
= 1 2 i = 1 k M i , k χ i , k X 2 μ i , k M k ρ k X X 2 ρ k 1 .
Then, in accordance with Euclid’s Division Lemma (1), from (22) we have
X ^ = 1 2 i = 1 k M i , k χ i , k X 2 μ i , k m i + χ i , k X 2 μ i , k / m i m i M k ρ k X X 2 ρ k 1 .
Thus,
X ^ = i = 1 k 1 2 M i , k χ i , k X 2 μ i , k m i 1 2 M k ρ k X X 2 ρ k 1 i = 1 k χ i , k X 2 μ i , k / m i .
Since for each least nonnegative residue χ Z m modulo an arbitrary odd modulus m, there is a unique formal quotient χ / 2 m , and
χ / 2 m = χ + m χ 2 / 2
(see, for example, [1]), then
χ ^ i , k = 1 2 χ i , k X 2 μ i , k m i m i = 1 2 χ i , k X 2 μ i , k m i + m i χ i , k X 2 μ i , k m i 2 .
Therefore,
1 2 χ i , k X 2 μ i , k m i = χ ^ i , k 1 2 m i χ i , k X 2 μ i , k m i 2 .
Taking this into account, from (23) we get
X ^ = i = 1 k M i , k χ ^ i , k 1 2 ρ k X X 2 ρ k 1 i = 1 k χ i , k X 2 μ i , k / m i + + i = 1 k χ i , k X 2 μ i , k m i 2 .
Hence, according to the rank form of number representation (9), we conclude that the following equation for the rank ρ k X ^ of the number X ^ is valid:
ρ k X ^ = 1 2 ρ k X X 2 ρ k 1 i = 1 k χ i , k X 2 μ i , k / m i + + i = 1 k χ i , k X 2 μ i , k m i 2 .
If the number X is even, then X 2 = 0 , so that
χ i , k X 2 μ i , k / m i = χ i , k / m i = 0
and
χ i , k X 2 μ i , k m i 2 = χ i , k 2 = ψ i
i = 1 , 2 , , k . Therefore, in this case, Equation (24) takes the form
ρ k X ^ = 1 2 ρ k X + i = 1 k ψ i
which corresponds to (18).
If the number X is odd, then X 2 = 1 , and it is easy to check that
χ i , k X 2 μ i , k / m i = χ i , k μ i , k / m i = ω i ,
while
χ i , k X 2 μ i , k m i 2 = χ i , k μ i , k m i 2 = φ i
i = 1 , 2 , , k , where ω i and φ i are two-valued quantities determined by (19) and (20), respectively. In this case, Equation (24) takes the form
ρ k X ^ = 1 2 ρ k X ρ k 1 + i = 1 k ω i + i = 1 k φ i
which also corresponds to (18).
The theorem is proved.    □
As it follows from Theorem 2, the rank ρ k X ^ of the number X ^ = X / 2 can be calculated rapidly and easily only taking into account the known value of the rank ρ k X of the initial number X. This circumstance makes it possible to optimize and significantly speed up the execution of the power-of-two scaling operation. In this case, it is not necessary at each iteration to calculate the rank of the number, which is the intermediate result of scaling, by its residue code. At the same time, the complete operation of rank calculation is necessary only for the initial number X at the preliminary stage of the scaling procedure.

6. A Novel Power-of-Two Modular Scaling Based on the Rank Positional Characteristic in Minimally Redundant RNS

Theorem 2 implies the following step algorithm for power-of-two scaling in minimally redundant RNS with primary pairwise prime odd modules m 1 , m 2 , , m k , extra modulus m 0 = 2 , and scales of the form S l = 2 l l = 1 , 2 , , Λ , Λ = log 2 M k .
S.1. Based on the minimum redundant residue code χ 0 , χ 1 , , χ k of the original number X, the rank ρ k X is calculated following to (12)–(16). In addition, it is assumed that X 0 = X , χ i 0 = χ i i = 0 , 1 , , k , χ i , k 0 = χ i , k = μ i , k χ i m i i = 1 , 2 , , k , and j = 0 .
S.2. For the residue number X j = χ 0 j , χ 1 j , , χ k j , the integer
Δ j = i = 1 k ψ i j     if     χ 0 j = 0 , i = 1 k ω i j + i = 1 k φ i j ρ k 1     if     χ 0 j = 1
is calculated, where
ψ i j = χ i , k j 2 ,
ω i j and φ i j are obtained by formulas similar to (19) and (20), namely:
ω i j = 0     if     χ i , k j μ i , k , 1     if     χ i , k j < μ i , k ,
φ i j = ψ i j + ω i j 2     if     μ i , k 2 = 0 , ψ i j + ω i j 2 ¯     if     μ i , k 2 = 1 ,
i = 1 , 2 , , k .
S.3. The digits χ 1 j + 1 , χ 2 j + 1 , ⋯, χ k j + 1 of the minimally redundant residue code and the rank ρ k X j + 1 of the number X j + 1 = X j / 2 are determined, respectively, according to the rules
χ i j + 1 = 1 2 χ i j χ 0 j m i   i = 1 , 2 , , k ,
ρ k X j + 1 = 1 2 ρ k X j + Δ j .
S.4. The redundant residue χ 0 j + 1 = ( X j + 1 2 is calculated according to equation following from the rank form (9)
χ 0 j + 1 = i = 1 k ψ i j + 1 + ρ 0 j + 1 2 ,
where ψ i j + 1 = χ i , k j + 1 2 = μ i , k χ i j + 1 m i 2 and ρ 0 j + 1 = ρ k X j + 1 2 . In essence, it determines the parity of the number X j + 1 .
If j = l 1 , then the number X j + 1 = X l = X / 2 l is the required number, and the scaling process ends. Otherwise, the variable j is incremented by one ( j = j + 1 ) , and the jump to step S.2 is carried out.
For its hardware implementation, the most important feature of the above recursive scaling algorithm is that the specified operations on steps S.2, S.3, and S.4 can be combined in time and carried out within one modular clock cycle. Due to this circumstance, after obtaining the rank ρ k X , each iteration of RNS number scaling by 2, i.e., each shift of its integer value by one bit to the right, is performed in one modular clock cycle.
Since the calculating process of the rank ρ k X has a pipeline structure, with the appropriate organization of computations the described scaling procedure at low hardware costs provides a reasonably high speed.
It follows from the above that all the necessary calculations within the scaling algorithm can be implemented using tabular computational structures.
For example, the calculation of the inexact rank ρ ^ k X of the initial number X is reduced to a summation of the sets of small residues R 1 , k ( χ 1 ) , R 2 , k ( χ 2 ) , , R k , K ( χ k ) modulo m k . Simultaneously, we take into account the number of occurred overflows when performing these modular addition operations (see (12)–(14)). Therefore, we need k one-input lookup tables to store the given set, while the bit length of recorded residues is log 2 m k l = 1 , 2 , , k . At the same time, the estimation of two-valued rank correction δ k X (see (16)) requires the set ψ 1 , ψ 2 , , ψ k of least significant bits of normalized residues χ i , k i = 1 , 2 , , k of the number X (see (6)).
Similarly, the sets of binary flags ψ 1 j , ψ 2 j , , ψ k j , ω 1 j , ω 2 j , , ω k j and also φ 1 j , φ 2 j , , φ k j (see (26)–(28)) enable us to obtain the integer Δ j required for rank calculating in the corresponding iterations of scaling procedure j = 0 , 1 , , l 1 . All these binary sets can also be recorded in the appropriate lookup tables.
Thus, the content of the ith lookup table corresponding to the input residue χ i j has the form R i , k χ i j , ψ i j , ω i j , φ i j i = 1 , 2 , , k , j = 0 , 1 , , l 1 .
Below we present the proposed scaling method in the form of a pseudo-code algorithm.
Let us evaluate the computational complexity of the proposed iterative power-of-two scaling method. As follows from the above, Algorithm 1 requires total T s c a l = T r a n k + T i t e r × l modular clock cycles. According to [33,34], in minimally redundant RNS, the time complexity of calculating the rank ρ k X of the initial number X depends only on the number k of primary RNS moduli and can be evaluated as T r a n k = log 2 k . At the same time, all calculations within each iteration, consisting in obtaining both the minimally redundant residue code χ 0 j + 1 , χ 1 J + 1 , , χ k j + 1 and the rank ρ k X j + 1 of the number X j + 1 = X j / 2 j = 0 , 1 , , l 1 , can be performed in one modular clock cycle by using lookup table technique. Therefore, T i t e r = 1 . Hence, the algorithm time complexity T s c a l = log 2 k + l modular clock cycles.   
Algorithm 1: Power-of-two scaling in minimally redundant RNS
Entropy 24 01824 i001
To illustrate the power-of-two scaling of the number X = χ 0 , χ 1 , , χ k based on the rank form (9) in the proposed minimally redundant RNS, we present below a numerical example.
Let us consider the RNS with the primary moduli m 1 = 5 , m 2 = 7 , m 3 = 9 , and m 4 = 11 , taking into account the excess modulus m 0 = 2 .
Example 1.
Suppose we wish to scale the number X = 1731 having the minimally redundant residue code χ 0 , χ 1 , χ 2 , χ 3 χ 4 = 1 , 1 , 2 , 3 , 4 by the constant S 3 = 2 3 = 8 .
Therefore, the number of required iterations is l = 3 .
Before describing the proposed scaling algorithm, we give below the required primitive constants used in the RNS under consideration. So, we have
M 4 = 3465 ,
M 1 , 4 = 693 , M 2 , 4 = 495 , M 3 , 4 = 385 , M 4 , 4 = 315 ,
μ 1 , 4 = 2 , μ 2 , 4 = 3 , μ 3 , 4 = 4 , μ 4 , 4 = 8 ,
ρ 4 1 = 2 .
S.1. The rank calculation of the initial number.
First, having the non-redundant residue code 1 , 2 , 3 , 4 of the number X, by using lookup tables, we obtain the following sets of residues and least-significant bits, respectively,
R 1 , 4 χ 1 , R 2 , 4 χ 2 , R 3 , 4 χ 3 , R 4 , 4 χ 4 = 4 , 9 , 3 , 10 ,
ψ 1 , ψ 2 , ψ 3 , ψ 4 = 0 , 0 , 1 , 0 .
Let us show in more detail how these values were obtained, according to (6), (19), and (13), (14), respectively, before storing in the lookup tables:
χ 1 , 4 = 2 · 1 5 = 2 , ψ 1 = 2 2 = 0 ,
χ 2 , 4 = 3 · 2 7 = 6 , ψ 2 = 6 2 = 0 ,
χ 3 , 4 = 4 · 3 9 = 3 , ψ 3 = 3 2 = 1 ,
χ 4 , 4 = 8 · 4 11 = 10 , ψ 4 = 10 2 = 0 ,
R 1 , 4 χ 1 = 11 · 2 / 5 = 4 ,
R 2 , 4 χ 2 = 11 · 6 / 7 = 9 ,
R 3 , 4 χ 3 = 11 · 3 / 9 = 3 ,
R 4 , 4 χ 4 = 8 · 4 11 = 10 .
Further, using the set of residues 4 , 9 , 3 , 10 , according to (12), we calculate the inexact rank
ρ ^ 4 X = 4 + 9 + 3 + 10 / 11 = 26 / 11 = 2 ,
and also take its parity bit
ρ ^ 0 = ρ ^ 4 X 2 = 0 .
Then, taking into account that χ 0 = 1 , using the set ψ 1 , ψ 2 , ψ 3 , ψ 4 = 0 , 0 , 1 , 0 and ρ ^ 0 , according to (16), we find two-valued correction
δ 4 X = 1 + 0 + 0 + 1 + 0 + 0 2 = 2 2 = 0 .
As a result, according to (16), we get the exact rank of the initial number X
ρ 4 X = ρ ^ 4 X + δ 4 X = 2 + 0 = 2 .
To verify the obtained result, using the rank form (9), we find
X = i = 1 4 M i , 4 χ i , 4 ρ 4 X M 4 = 693 · 2 + 495 · 6 + 385 · 3 + 315 · 10 2 · 3465 = 1731 .
In addition, it is assumed that j = 0 , X 0 = X , χ i 0 = χ i i = 0 , 4 ¯ , χ i , 4 0 = χ i , 4 , ψ i 0 = ψ i i = 1 , 4 ¯ .
Iteration 1.
S.2.1.Since χ 0 0 = 1 , using the sets of binary flags (see (27) and (28))
ω 1 0 , ω 2 0 , ω 3 0 , ω 4 0 = 0 , 0 , 1 , 0 ,
φ 1 0 , φ 2 0 , φ 3 0 , φ 4 0 = 0 , 1 , 0 , 0 ,
according to (25), we calculate the quantity
Δ 0 = i = 1 4 ω i 0 + i = 1 4 φ i 0 ρ k 1 = 1 + 1 2 = 0 .
S.3.1.We calculate the non-redundant residue code and the rank of the number X 1 = X 0 / 2 , according to (29) and (30), respectively:
χ 1 1 , χ 2 1 , χ 3 1 , χ 4 1 = 0 , 4 , 1 , 7 ,
ρ 4 X 1 = 1 2 ρ 4 X 0 + Δ 0 = 1 2 2 + 0 = 1 ,
ρ 0 1 = ρ 4 X 1 2 = 1 .
S.4.1.Using the set ψ 1 1 , ψ 2 1 , ψ 3 1 , ψ 4 1 = 0 , 1 , 0 , 1 corresponding to the non-redundant residue code 0 , 4 , 1 , 7 and taking into account that ρ 0 1 = 1 , according to (31), we find
χ 0 1 = 0 + 1 + 0 + 1 + 1 2 = 1 .
Hence, as a result of Iteration 1, we have the minimally redundant residue code 1 , 0 , 4 , 1 , 7 of the number X 1 = X 0 / 2 = 1731 / 2 = 864 .
Iteration 2.
S.2.2. Since χ 0 1 = 1 , using the following sets of binary flags
ω 1 1 , ω 2 1 , ω 3 1 , ω 4 1 = 1 , 0 , 0 , 1 ,
φ 1 1 , φ 2 1 , φ 3 1 , φ 4 1 = 1 , 0 , 0 , 0 ,
we have
Δ 1 = i = 1 4 ω i 1 + i = 1 4 φ i 1 ρ k 1 = 2 + 1 2 = 1 .
S.3.2. We calculate the non-redundant residue code and the rank of the number X 2 = X 1 / 2 :
χ 1 2 , χ 2 2 , χ 3 2 , χ 4 2 = 2 , 5 , 0 , 3 ,
ρ 4 X 2 = 1 2 ρ 4 X 1 + Δ 1 = 1 2 1 + 1 = 1 ,
ρ 0 2 = ρ 4 X 2 2 = 1 .
S.4.2. Using the set ψ 1 2 , ψ 2 2 , ψ 3 2 , ψ 4 2 = 0 , 1 , 0 , 0 corresponding to the non-redundant residue code 2 , 5 , 0 , 3 and taking into account that ρ 0 2 = 1 , we obtain
χ 0 2 = 0 + 1 + 0 + 0 + 1 2 = 0 .
Hence, as a result of Iteration 2, we have the minimally redundant residue code 0 , 2 , 5 , 0 , 3 of the number X 2 = X 1 / 2 = X 0 / 4 = 1731 / 4 = 432 .
Iteration 3.
S.2.3. Since χ 0 2 = 0 , using the set
ψ 1 2 , ψ 2 2 , ψ 3 2 , ψ 4 2 = 0 , 1 , 0 , 0 ,
according to (25), we find
Δ 2 = i = 1 4 ψ i 2 = 1 .
S.3.3.We calculate the non-redundant residue code and the rank of the number X 3 = X 2 / 2 :
χ 1 3 , χ 2 3 , χ 3 3 , χ 4 3 = 1 , 6 , 0 , 7 ,
ρ 4 X 3 = 1 2 ρ 4 X 2 + Δ 2 = 1 2 1 + 1 = 1 ,
ρ 0 3 = ρ 4 X 3 2 = 1 .
S.4.3.Using the set ψ 1 3 , ψ 2 3 , ψ 3 3 , ψ 4 3 = 0 , 0 , 0 , 1 corresponding to the non-redundant residue code 1 , 6 , 0 , 7 and taking into account that ρ 0 3 = 1 , we get
χ 0 3 = 0 + 0 + 0 + 1 + 1 2 = 0 .
Hence, as a result of Iteration 3, we have the minimally redundant residue code 0 , 1 , 6 , 0 , 7 of the number X 3 = X 2 / 2 = X 0 / 8 = 1731 / 8 = 216 .
As far as j = l 1 = 2 , the scaling procedure ends, and the number X 3 is the desired solution.
To verify the obtained result, according to the rank form (9), we find
X 3 = i = 1 4 M i , 4 χ i , 4 3 ρ 4 X 3 M 4 = 693 · 1 + 495 · 4 + 385 · 0 + 315 · 1 1 · 3465 = = 3681 3645 = 216 .
The result is correct.
The above example shows that the use of minimally redundant RNS enables us to optimize and speed up the power-of-two scaling procedure compared with the conventional non-redundant RNS to a large extent. First of all, that is caused by the extreme simplicity of calculating the inexact rank ρ ^ k X and estimating two-valued characteristic δ k X of the initial number X as well as by the trivial operations for obtaining the rank ρ k X j j = 0 , 1 , , l 1 at each iteration of the scaling procedure (see Theorem 2).
Therefore, the proposed minimally redundant residue representation takes priority over non-redundant analogs in optimization and speed-up of the scaling and other non-modular procedures based on the CRT implementation using a rank characteristic.

7. Discussion

Let us now discuss the theoretical and practical aspects of the approach proposed in this paper.
As followed from (17), the power-of-two scaling algorithm based on the bisection method requires the parity detection of the number X j j = 0 , 1 , , l 1 at each iteration. Therefore, fast calculating the residue concerning extra modulus m 0 = 2 is a significantly important task.
In conventional non-redundant RNS, the parity detection of the number X j = χ 1 j , χ 2 j , , χ k j is usually based on estimating the integer value of X j by the use of specific positional characteristics. The generally accepted ones are the digits of mixed-radix representation, core function, the rank of a number, and interval index [1,2,3,5,8].
In RNS arithmetic, the parity check of a number refers to complicated non-modular operations requiring high computational costs. The computational complexity of this operation is comparable to the computational complexity of the reverse conversion from the residue code into the mixed-radix representation or to the calculation of the rank of a number.
Generally, in non-redundant RNS, the implementation of parallel parity check algorithm requires O ( k 2 ) modular addition operations [33,34]. So it can become computationally expensive for large values of k. Thus, for efficient implementation of the power-of-two scaling algorithm based on the bisection method, one needs to speed up and optimize the RNS parity detection technique.
In this article, the proposed approach to power-of-two scaling is based on using the rank of a number as the main RNS positional characteristic. Therefore, in our case, obtaining residue modulo m 0 = 2 is reduced to the calculation of the rank ρ k X j with the following use of the rank form (9).
Hence,
χ 0 j = X j 2 = i = 1 k M i , k χ i , k j 2 + ρ k X j M k 2 2 = i = 1 k ψ i j + ρ 0 j 2 .
Thus, determining the parity of a number has a computational complexity identical to the complexity of rank calculating concerning the numbers of required modular addition operations R M O and lookup tables R L U T . At the same time, obtaining the residue code χ 1 j + 1 , χ 2 j + 1 , , χ k j + 1 of the number X j + 1 = X j / 2 needs k additional lookup tables (see (17)).
Therefore, the computational cost of the iterative procedure of scaling by S l = 2 l consists of S M O = R M O × l modular addition operations and S L U T = R L U T + k lookup tables, whereas the time complexity is T s c a l = T i t e r × l modular clock cycles, where T i t e r is a performance time of one iteration based on the bisection method.
Thus, in conventional non-redundant RNS, the computational cost of the canonical power-of-two scaling procedure based on the bisection method (17) and the rank calculation method described in [34] is estimated as
S M O = R M O × l = 1 2 k 2 + 5 k 10 × l .
S L U T = R L U T + k = 1 2 k 2 + 3 k 2 .
The main advantage of the proposed approach to power-of-two scaling over the existing ones consists in the use of minimally redundant RNS and the novel method for calculating the rank of a number resulting from division by two (see Theorem 2) in each iteration of the scaling algorithm. This circumstance enables a significant reduction of the computational complexity of the scaling algorithm.
As follows from [34], the corresponding computational cost of calculating the rank ρ k X of the initial number X is R M O = k and R L U T = k in terms of required modular addition operations and lookup tables, respectively. Furthermore, the performance time of the rank calculation is T r a n k = log 2 k modular clock cycles (see Section 3).
It is important to note that all calculations at each iteration are implemented using the lookup tables technique and the simplest combinational logic circuits.
As shown above, the minimally redundant residue code of the number X j + 1 = X j / 2 j = 0 , 1 , , l 1 is yielded in only one modular clock cycle and needs the use of k + 1 additional lookup tables. At the same time, the first k of these lookup tables are used for obtaining the residue code χ 1 j + 1 , χ 2 j + 1 , , χ k j + 1 , while the last lookup table gives us the rank ρ k X j + 1 of the number X j + 1 (see (29) and (30)). So, at each iteration, there are no additional modular operations.
The total numbers of required modular addition operations and lookup tables are estimated, respectively, as
S M O = k
and
S L U T = 2 k + 1 ,
The time complexity of the novel power-of-two scaling algorithm is T s c a l = T r a n k + l = log 2 k + l modular clock cycles.
Thus, the use of minimally redundant RNS and novel approach to rank calculation at each iteration of power-of-two scaling (see Theorem 2) enables significant decrease of the computational complexity. The corresponding reduction factors of the computational complexity, in terms of the required modular addition operations (see (32) and (34)) and lookup tables (see (33) and (35)), are
C M O k , l = S M O S M O = k 2 + 5 k 10 2 k × l ,
C L U T k = S L U T S L U T = k 2 + 3 k 2 4 k + 2 .
Below, Table 1 and Table 2 present these reduction factors.
It should be noted that the use of the novel method for calculating the rank ρ k X j + 1 of the number X j + 1 = X j / 2 j = 0 , 1 , , l 1 at each iteration of the scaling procedure (see Theorem 2) in non-redundant RNS, gives us the following computational cost
S M O = R M O = 1 2 k 2 + 5 k 10 ,
S L U T = R L U T + ( k + 1 ) = 1 2 k 2 + 3 k .
Simultaneously, the time complexity is T s c a l = log 2 k + l + 1 modular clock cycles.
As can be seen, the reduction factors of the computational complexity of power-of-two scaling based on Theorem 2 in minimally redundant RNS compared with conventional non-redundant RNS are represented by the following fractions
C M O k = S M O S M O = k 2 + 5 k 10 2 k ,
C L U T k = S L U T S L U T = k 2 + 3 k 4 k + 2 .
In this case, as follows from (40), the reduction factor C M O k = C M O k , 1 does not depend on the value S l = 2 l   l = 0 , 1 , , Λ 1 . At the same time, C L U T k C L U T k .
The dependence of the reduction factors C M O k and C L U T k on the number of primary RNS moduli k is presented in Table 3.
Thus, the use of minimally redundant RNS and novel approach to calculating the rank of a number at each iterations of bisection method enables radically simplifying the carrying out of power-of-two scaling compared with conventional non-redundant RNS. This circumstance enables us to construct faster and optimal in computational cost RNS-oriented complicated computing procedures which widely use scaling algorithms.

8. Conclusions

As shown in this paper, the use of minimum-redundancy residue code enables the construction of efficient scaling procedures based on the CRT due to optimizing the calculation of the rank of a number, a principal positional characteristic in RNS arithmetic.
At the beginning stage of the power-of-two scaling procedure, to calculate the rank of the initial number, we apply the approach for the rank calculation proposed by one of the authors in [33,34]. It is reduced to the summation of the small word-length residues R 1 , k χ 1 , R 2 , k χ 2 , …, R k , k χ k , taking into account the number of occurred overflows during the modular addition operations modulo m k , and fast calculation of two-valued rank correction δ k X 0 , 1 (see (12) and (16)).
We propose a novel approach to power-of-two scaling based on Theorem 2. Using minimal residue code redundancy, we have optimized and sped up the rank calculation and parity determination of the numbers that result from division by two at each iteration of the bisection method. Each iteration of modular scaling by two is performed in only one modular clock cycle. Thus, owing to the proposed improvements, the power-of-two scaling procedure becomes simplest and faster than the currently known methods.
The computational complexity of the proposed scaling method by constant S l = 2 l concerning required both modular addition operations and lookup tables is estimated as k and 2 k + 1 , respectively, where k equals the number of primary non-redundant RNS moduli. The time complexity is log 2 k + l modular clock cycles.
The use of minimally redundant RNS and a novel approach to calculating the rank of a number at each iteration of the bisection method enables a significant decrease in the power-of-two scaling computational complexity. Corresponding reduction factors concerning the required modular addition operations and lookup tables are given in Table 1, Table 2 and Table 3.
The proposed approach to power-of-two scaling coincides with the development vector of modern high-performance computing using RNS arithmetic. It enables the implementation of an extensive class of tasks in various areas of science and technology, first of all in cryptography and digital signal processing.

Author Contributions

Conceptualization, M.S.; investigation, Y.P.; methodology, M.S.; writing—original draft preparation, M.S.; writing—review and editing, Y.P. All authors have read and improved the final version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Akushskii, I.Y.; Juditskii, D.I. Machine Arithmetic in Residue Classes; Soviet Radio: Moscow, Russia, 1968. (In Russian) [Google Scholar]
  2. Amerbayev, V.M. Theoretical Foundations of Machine Arithmetic; Nauka: Alma-Ata, Kazakhstan, 1976. (In Russian) [Google Scholar]
  3. Omondi, A.R.; Premkumar, B. Residue Number Systems: Theory and Implementation; Imperial College Press: London, UK, 2007. [Google Scholar]
  4. Szabo, N.S.; Tanaka, R.I. Residue Arithmetic and Its Application to Computer Technology; McGraw-Hill: New York, NY, USA, 1967. [Google Scholar]
  5. Molahosseini, A.S.; de Sousa, L.S.; Chang, C.H. (Eds.) Embedded Systems Design with Special Arithmetic and Number Systems; Springer: Cham, Switzerland, 2017. [Google Scholar]
  6. Soderstrand, M.A.; Jenkins, W.K.; Jullien, G.A.; Taylor, F.J. (Eds.) Residue Number System Arithmetic: Modern Applications in Digital Signal Processing; IEEE Press: New York, NY, USA, 1986. [Google Scholar]
  7. Chernyavsky, A.F.; Danilevich, V.V.; Kolyada, A.A.; Selyaninov, M.Y. High-Speed Methods, and Systems of Digital Information Processing; Belarusian State University: Minsk, Belarus, 1996. (In Russian) [Google Scholar]
  8. Ananda Mohan, P.V. Residue Number Systems. Theory and Applications; Springer: Cham, Switzerland, 2016. [Google Scholar]
  9. Ding, C.; Pei, D.; Salomaa, A. Chinese Remainder Theorem: Applications in Computing, Coding, Cryptography; World Scientific: Singapore, 1996. [Google Scholar]
  10. Omondi, A.R. Cryptography Arithmetic: Algorithms and Hardware Architectures; Springer: Cham, Switzerland, 2020. [Google Scholar]
  11. Chren, W.A., Jr. A new residue number division algorithm. Comput. Math. Appl. 1990, 19, 13–29. [Google Scholar] [CrossRef] [Green Version]
  12. Chiang, J.-S.; Lu, M. A general division algorithm for the Residue Number System. In Proceedings of the 10th IEEE Symposium on Computer Arithmetic, Grenoble, France, 26–28 June 1991; IEEE Computer Society Press: Washington, DC, USA, 1991; pp. 76–83. [Google Scholar]
  13. Lu, M.; Chiang, J.-S. A novel division algorithm for Residue Number Systems. IEEE Trans. Comput. 1992, 41, 1026–1032. [Google Scholar] [CrossRef]
  14. Hitz, M.A.; Kaltofen, E. Integer division in residue number systems. IEEE Trans. Comput. 1995, 44, 983–989. [Google Scholar] [CrossRef] [Green Version]
  15. Hiasat, A.A.; Abdel-Aty-Zohdy, H.S. Design and implementation of an RNS division algorithm. In Proceedings of the 13th IEEE Symposium on Computer Arithmetic, Asilomar, CA, USA, 6–9 July 1997; IEEE Computer Society Press: Washington, DC, USA, 1997; pp. 240–249. [Google Scholar]
  16. Bajard, J.-C.; Didier, L.-S.; Muller, J.-M. A new Euclidean division algorithm for residue number systems. J. VLSI Signal Process. Syst. Signal Image Video Technol. 1998, 19, 167–178. [Google Scholar] [CrossRef]
  17. Yang, Y.H.; Chang, C.C.; Chen, C.Y. A high-speed division algorithm in residue number system using parity-checking technique. Int. J. Comput. Math. 2004, 81, 775–780. [Google Scholar] [CrossRef]
  18. Chang, C.-C.; Lai, Y.-P. A division algorithm for residue numbers. Appl. Math. Comput. 2006, 172, 368–378. [Google Scholar] [CrossRef]
  19. Chang, C.-C.; Yang, J.-H. A division algorithm using bisection method in residue number system. Int. J.Comput. Consum. Control (IJ3C) 2013, 2, 59–66. [Google Scholar]
  20. Hung, C.Y.; Parhami, B. An approximate sign detection method for residue numbers and its application to RNS division. Comput. Math. Appl. 1994, 27, 23–35. [Google Scholar] [CrossRef] [Green Version]
  21. Burton, D.M. Elementary Number Theory, 7th ed.; McGraw-Hill: New York, NY, USA, 2011. [Google Scholar]
  22. Hardy, G.H.; Wright, E.M. An Introduction to the Theory of Numbers, 6th ed.; Oxford University Press: London, UK, 2008. [Google Scholar]
  23. Knuth, D.E. The Art of Computer Programming, Volume 2: Seminumerical Algorithms, 3rd ed.; Addison-Wesley: Boston, MA, USA, 1998. [Google Scholar]
  24. Shoup, V. A Computational Introduction to Number Theory and Algebra, 2nd ed.; Cambridge University Press: Cambridge, UK, 2005. [Google Scholar]
  25. Shenoy, A.P.; Kumaresan, R. Fast base extension using a redundant modulus in RNS. IEEE Trans. Comput. 1989, 38, 292–297. [Google Scholar] [CrossRef]
  26. Phatak, D.S.; Houston, S.D. New distributed algorithms for fast sign detection in residue number systems (RNS). J. Parallel Distrib. Comput. 2016, 97, 78–95. [Google Scholar] [CrossRef]
  27. Vu, T.V. Efficient implementations of the Chinese Remainder Theorem for sign detection and residue decoding. IEEE Trans. Comput. 1985, 34, 646–651. [Google Scholar]
  28. Kawamura, S.; Koike, M.; Sano, F.; Shimbo, A. Cox-rower architecture for fast parallel Montgomery multiplication. In Proceedings of the EUROCRYPT’00: 19th International Conference on Theory and Application of Cryptographic Techniques, Bruges, Belgium, 14–18 May 2000; Springer: Berlin, Germany, 2000; pp. 523–538. [Google Scholar]
  29. Nozaki, H.; Motoyama, M.; Shimbo, A.; Kawamura, S. Implementation of RSA algorithm based on RNS Montgomery multiplication. In Proceedings of the CHES 2001: Cryptographic Hardware and Embedded Systems, Third International Workshop, Paris, France, 6–14 May 2001; Springer: Berlin, Germany, 2001; pp. 364–376. [Google Scholar]
  30. Isupov, K.; Knyazkov, V. Interval estimation of relative values in Residue Number System. J. Circuits, Syst. Comput. 2018, 27, 1850004. [Google Scholar] [CrossRef]
  31. Chervyakov, N.; Babenko, M.; Tchernykh, A.; Kucherov, N.; Miranda-López, V.; Cortés-Mendoza, J.M. AR-RRNS: Configurable reliable distributed data storage systems for Internet of Things to ensure security. Future Gener. Comput. Syst. 2019, 92, 1080–1092. [Google Scholar] [CrossRef]
  32. Halevi, S.; Polyakov, Y.; Shoup, V. An improved RNS variant of the BFV homomorphic encryption scheme. In Topics in Cryptology–CT-RSA 2019, Proceedings of the Cryptographers’ Track at the RSA Conference, San Francisco, CA, USA, 4–8 March 2019; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2019; Volume 11405, pp. 83–105. [Google Scholar]
  33. Selianinau, M. An efficient implementation of the Chinese Remainder Theorem in minimally redundant Residue Number System. Comput. Sci. 2020, 21, 237–252. [Google Scholar] [CrossRef]
  34. Selianinau, M. Computationally efficient approach to implementation of the Chinese Remainder Theorem algorithm in minimally redundant Residue Number System. Theory Comput. Syst. 2021, 65, 1117–1140. [Google Scholar] [CrossRef]
  35. Jullien, G. Residue number scaling and other operations using ROM arrays. IEEE Trans. Comput. 1978, 27, 325–336. [Google Scholar] [CrossRef]
  36. Shenoy, M.A.P.; Kumaresan, R. A fast and accurate RNS scaling technique for high speed signal processing. IEEE Trans. Acoust. Speech Signal Process. 1989, 37, 929–937. [Google Scholar] [CrossRef]
  37. Barsi, F.; Pinotti, M. Fast base extension and precise scaling in RNS for look-up table implementation. IEEE Trans. Signal Process. 1995, 43, 2427–2430. [Google Scholar] [CrossRef]
  38. Garsia, A.; Lloris, A. A look up scheme for scaling in the RNS. IEEE Trans. Comput. 1999, 48, 748–751. [Google Scholar] [CrossRef]
  39. Griffin, M.; Sousa, M.; Taylor, F. Efficient scaling in the residue number system. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, Glasgow, UK, 23–26 May 1989; IEEE Computer Society Press: Washington, DC, USA, 1989; pp. 1075–1078. [Google Scholar]
  40. Vasilevich, L.N.; Kolyada, A.A. Scaling in residue number systems. Cybern. Syst. Anal. 1989, 25, 610–615. [Google Scholar] [CrossRef]
  41. Chernyavsky, A.F.; Kolyada, A.A.; Revinsky, V.V.; Selyaninov, M.Y.; Shabinskaja, E.V. Scaling methods in minimally redundant modular arithmetic. Proc. Natl. Acad. Sci. Belarus Phys. Math. Ser. 1998, 4, 132–137. [Google Scholar]
  42. Meyer-Base, U.; Stouraitis, T. New power-of-2 RNS scaling scheme for cell-based IC design. IEEE Trans. VLSI Syst. 2003, 11, 280–283. [Google Scholar] [CrossRef]
  43. Cardarilli, G.C.; Del Re, A.; Nannarelli, A.; Re, M. Programmable power-of-two RNS scaler and its application to a QRNS polyphase filter. In Proceedings of the IEEE International Symposium on Circuits and Systems, Kobe, Japan, 23–26 May 2005; IEEE Computer Society Press: Washington, DC, USA, 2005; pp. 1002–1005. [Google Scholar]
  44. Isupov, K.; Knyazkov, V.; Kuvaev, A. Fast power-of-two RNS scaling algorithm for large dynamic ranges. In Proceedings of the 2017 IVth International Conference on Engineering and Telecommunication (EnT), Moscow, Russia, 29–30 November 2017; IEEE Computer Society Press: Washington, DC, USA, 2017; pp. 135–139. [Google Scholar]
  45. Clemens, K.J. A modified definition of symmetric RNS improving scaling and overflow detection. IEEE Trans. Circuits Syst. 1985, 32, 412–413. [Google Scholar] [CrossRef]
  46. Sousa, L. 2n RNS scalers for extended 4-moduli sets. IEEE Trans. Comput. 2015, 64, 3322–3334. [Google Scholar] [CrossRef]
  47. Mustapha, K.S.; Bankas, E.K. RNS scaling algorithm for a new moduli set {22n+1 + 1,22n+1,22n+1 − 1}. Int. J. Comput. Appl. 2017, 165, 21–28. [Google Scholar]
  48. Hiasat, A. Efficient RNS scalers for the extended three-moduli set {2n − 1,2n+p,2n + 1}. IEEE Trans Comput. 2017, 66, 1253–1260. [Google Scholar] [CrossRef]
  49. Hiasat, A. New residue number system scaler for the three-moduli set {2n+1 − 1,2n,2n − 1}. Computers 2018, 3, 46. [Google Scholar] [CrossRef] [Green Version]
  50. Hiasat, A. A scaler design for the RNS three-moduli set {2n+1 − 1,2n,2n − 1} based on mixed-radix conversion. J. Circuits. Syst. Comput. 2020, 29, 2050041. [Google Scholar] [CrossRef]
  51. Taheri, M.R.; Navi, K.; Molahosseini, A.S. Efficient programmable power-of-two scaler for the three-moduli set {2n+p,2n − 1,2n+1 − 1}. ETRI J. 2020, 42, 596–607. [Google Scholar] [CrossRef]
Table 1. Dependence of the reduction factor C L U T k on the moduli number.
Table 1. Dependence of the reduction factor C L U T k on the moduli number.
Reduction FactorModuli Number
k = 5 k = 10 k = 15 k = 20 k = 25 k = 30
C L U T 1.733.054.325.596.848.10
Table 2. Dependence of the reduction factor C M O k , l on the moduli number k and scaling factor S l = 2 l .
Table 2. Dependence of the reduction factor C M O k , l on the moduli number k and scaling factor S l = 2 l .
Scaling FactorModuli Number
S l l k = 5 k = 10 k = 15 k = 20 k = 25 k = 30
8312.0021,0029.0036.7544.4052.00
16416.0028.0038.6749.0059.2069.33
32520.0035.0048.3361.2574.0086.67
64624.0042.0058.0073.5088.80104.0
128728.0049.0067.6785.75103.60121.33
256832.0056.0077.3398.00118.40138.67
512936.0063.0087.00110.25133.20156.00
10241040.0070.0096.67122.5148.00173.33
Table 3. Dependence of reduction factors C M O and C L U T on the moduli number k.
Table 3. Dependence of reduction factors C M O and C L U T on the moduli number k.
Reduction FactorModuli Number
k = 5 k = 10 k = 15 k = 20 k = 25 k = 30
C M O 4.007.009.6712.2514.8017.33
C L U T 1.813.104.845.616.868.11
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Selianinau, M.; Povstenko, Y. An Efficient CRT-Base Power-of-Two Scaling in Minimally Redundant Residue Number System. Entropy 2022, 24, 1824. https://doi.org/10.3390/e24121824

AMA Style

Selianinau M, Povstenko Y. An Efficient CRT-Base Power-of-Two Scaling in Minimally Redundant Residue Number System. Entropy. 2022; 24(12):1824. https://doi.org/10.3390/e24121824

Chicago/Turabian Style

Selianinau, Mikhail, and Yuriy Povstenko. 2022. "An Efficient CRT-Base Power-of-Two Scaling in Minimally Redundant Residue Number System" Entropy 24, no. 12: 1824. https://doi.org/10.3390/e24121824

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop