An Efficient CRT-Base Power-of-Two Scaling in Minimally Redundant Residue Number System

Selianinau, Mikhail; Povstenko, Yuriy

doi:10.3390/e24121824

Open AccessArticle

An Efficient CRT-Base Power-of-Two Scaling in Minimally Redundant Residue Number System

by

Mikhail Selianinau

and

Yuriy Povstenko

^*

Department of Mathematics and Computer Sciences, Faculty of Science and Technology, Jan Dlugosz University in Czestochowa, al. Armii Krajowej 13/15, 42-200 Czestochowa, Poland

^*

Author to whom correspondence should be addressed.

Entropy 2022, 24(12), 1824; https://doi.org/10.3390/e24121824

Submission received: 23 November 2022 / Revised: 9 December 2022 / Accepted: 10 December 2022 / Published: 14 December 2022

(This article belongs to the Special Issue Advances in Information and Coding Theory)

Download Versions Notes

Abstract

In this paper, we consider one of the key problems in modular arithmetic. It is known that scaling in the residue number system (RNS) is a rather complicated non-modular procedure, which requires expensive and complex operations at each iteration. Hence, it is time consuming and needs too much hardware for implementation. We propose a novel approach to power-of-two scaling based on the Chinese Remainder Theorem (CRT) and rank form of the number representation in RNS. By using minimal redundancy of residue code, we optimize and speed up the rank calculation and parity determination of divisible integers in each iteration. The proposed enhancements make the power-of-two scaling simpler and faster than the currently known methods. After calculating the rank of the initial number, each iteration of modular scaling by two is performed in one modular clock cycle. The computational complexity of the proposed method of scaling by a constant

S_{l} = 2^{l}

associated with both required modular addition operations and lookup tables is estimeted as k and

2 k + 1

, respectively, where k equals the number of primary non-redundant RNS moduli. The time complexity is

⌈\log_{2} k⌉ + l

modular clock cycles.

Keywords:

residue number system; Chinese Remainder Theorem; modular arithmetic; rank of a number; power-of-two scaling

1. Introduction

Nowadays, high-performance computing is progressing extremely rapidly. This makes qualitatively new demands to designed number-theoretic methods and computational algorithms. That is why creating fundamentally new and efficient computing tools for fast and reliable parallel data processing is especially important. Modular computational structures occupy a special place among them. Modular arithmetic, i.e., the arithmetic of RNS, creates their mathematical basis.

The inherent parallelism and carry-free properties of RNS provide a high potential for accelerating arithmetic operation compared with conventional weighted number systems (WNS). The main advantage of RNS consists of its unique ability to decompose large integer numbers into a set of small residues and to process them in parallel in independent modular channels.

A steadily growing interest in RNS arithmetic as a unique means of carrying out high-speed calculations stimulates developments focused on providing a fundamentally new performance level when carrying out large volumes of time-consuming calculations. The modular arithmetic has attracted the considerable attention of researchers and developers in number-theoretic methods [1,2,3], computer technology [4,5], digital signal and image processing [3,5,6,7,8], cryptography [5,8,9,10], computer networks and communication systems [3,5], and other areas [9].

In this regard, one of the most promising ways in the specified area is the development of high-speed parallel modular computational structures as well as the enhancement of their functionality and optimization. In this case, the main optimization criteria are the minimum redundancy of data coding, the execution time minimization of implemented computational procedures, and the throughput maximization of the corresponding computational structures.

As is known, compared with WNS, the residue code of a number does not explicitly contain information about its integer value. Therefore, in RNS arithmetic, the implementation of operations, which require the estimating of the integer value of a number by its residue code, i.e., the evaluation of number position in the operating range, encounters specific difficulties. Such operations, in contrast to modular ones, are called non-modular.

The positional characteristics of the residue code such as core function, the rank of a number, interval index, and others, and the associated forms of number representation are of great importance for designing algorithms of non-modular operations [1,5,7,8]. The computational complexity of calculating the used positional characteristics ultimately determines the efficiency of the corresponding configuration of RNS arithmetic.

Division is one of the most complex arithmetic operations. Even in computers operating in a positional system, this operation stands apart, and its execution requires much more time than most elementary operations. In RNS arithmetic, the hardships with division operations are related to the non-modular character of this operation. This means that the residue of the quotient concerning primary RNS modulus is determined not only by the dividend and divisor residues, and it is necessary to get the additional information, in one form or another, about the integer values of the dividend and divisor [1,7]. It is no coincidence that many publications are devoted to the problem of modular division, for example [11,12,13,14,15,16,17,18,19].

Along with the general division, modular scaling, i.e., the division of the RNS number by a constant, is a commonly used operation [3,5,8,10]. This operation plays a fundamental role in constructing residue arithmetic algorithms and is of great practical importance. The need for scaling is due to several tasks, for example, to round the floating point numbers with the residue representation of the mantissa and to reduce the dynamic range in digital signal processing and long-word-length cryptography. In addition, scaling by a power of two is often one of the integral steps of more complex non-modular operations, for example, in the method of general modular division [19,20].

Thus, developing novel approaches and methods for fast scaling is highly important in high-performance computing based on parallel algorithmic structures of RNS, especially for high-speed implementing digital signal processing applications and public-key cryptosystems. That should make it possible to widely use modular arithmetic in various priority areas of science and technology.

In the paper, we present a new approach to the power-of-two scaling based on using minimal redundancy of residue code, the rank form of a number, and fast calculation of the rank characteristic at each iteration of the scaling procedure. Compared with the conventional non-redundant RNS, the proposed method makes it possible to optimize and speed up the non-modular scaling operation and concurrently reduces its computational complexity to a large extent.

The paper is structured as follows. Section 2 discusses the basic theoretical concepts of the research. Section 3 presents the known approaches to rank calculation. Section 4 and Section 5 describe the RNS scaling algorithms, and the mathematical basis of the rank calculation in the bisection scaling method. Section 6 presents a novel power-of-two scaling algorithm and a numerical example. Section 7 provides discussion and Section 8 concludes the paper.

2. The Basic Concepts of RNS Arithmetic

Abstract algebra and number theory [21,22] constitute the theoretical foundation of RNS arithmetic.Traditionally, the apparatus of congruences is used for the mathematical formalization of an RNS with integer ranges. At the same time, Euclid’s Division Lemma plays a fundamental role in building an RNS of the concerned type. For the ring Z of integers, it is formulated as follows.

Lemma 1 (Euclid’s Division Lemma).

For any

X \in Z

and a positive integer m, there exists a unique pair of integers

Q, R

such that

X = Q m + R,

(1)

where

R \in Z_{m} = \{0, 1, \dots, m - 1\}

.

On the set Z of integers, a non-redundant RNS is defined using pairwise prime moduli

m_{1}, m_{2}, \dots, m_{k}

(k > 1)

by the mapping

Z \to Z_{m_{1}} \times Z_{m_{2}} \times \dots \times Z_{m_{k}}

, which assigns to each

X \in Z

the k-tuple

(χ_{1}, χ_{2}, \dots, χ_{k})

of least nonnegative residues

χ_{i} = {|X|}_{m_{i}}

of dividing X by

m_{i}

(i = 1, 2, \dots, k)

. At the same time, the notation

X = (χ_{1}, χ_{2}, \dots, χ_{k})

is used.

The residue code

(χ_{1}, χ_{2}, \dots, χ_{k})

corresponds to the set of all integers X satisfying the system of simultaneous linear congruences

\{\begin{matrix} X \equiv χ_{1} (mod m_{1}), \\ X \equiv χ_{2} (mod m_{2}), \\ \dots \\ X \equiv χ_{k} (mod m_{k}) \end{matrix}

(2)

The following statement is true [9,23,24].

Theorem 1 (Chinese Remainder Theorem).

Let the moduli

m_{1}, m_{2}, \dots, m_{k}

be pairwise prime, and let

M_{k} = \prod_{i = 1}^{k} m_{i}

,

M_{i, k} = M_{k} / m_{i}

,

μ_{i, k} = {|M_{i, k}^{- 1}|}_{m_{i}}

(i = 1, 2, \dots, k)

. Then the system of congruences (2) has a unique solution, the class of residues modulo

M_{k}

, defined by the congruence

X \equiv \sum_{i = 1}^{k} M_{i, k} μ_{i, k} χ_{i} (mod M_{k}) .

(3)

The practical application of the RNS assumes that each residue code

(χ_{1}, χ_{2}, \dots, χ_{k})

must correspond only to one integer number. Therefore, certain sets of representatives of residue classes are used as the number range to ensure required single-valued correspondence. Since in the given RNS it is possible to represent

M_{k}

integers, the set

Z_{M_{k}} = \{0, 1, \dots, M_{k} - 1\}

is usually used in computer applications as an RNS operating range.

Because of the above, we define modular coding as a mapping

Φ_{R N S} : Z_{M_{k}} \to Z_{1} \times Z_{2} \times \dots \times Z_{m_{k}}

, which assigns a residue code

(χ_{1}, χ_{2}, \dots, χ_{k})

to each

X \in Z_{M_{k}}

.

The decoding mapping

Φ_{R N S}^{- 1} : Z_{1} \times Z_{2} \times \dots \times Z_{m_{k}} \to Z_{M_{k}}

based on the CRT (3) executes according to the rule

X = {|\sum_{i = 1}^{k} M_{i, k} μ_{i, k} χ_{i}|}_{M_{k}} .

(4)

Applying Euclid’s Division Lemma (1), we can write

μ_{i, k} χ_{i} = {|μ_{i, k} χ_{i}|}_{m_{i}} + ⌊\frac{μ_{i, k} χ_{i}}{m_{i}}⌋ m_{i} = χ_{i, k} + ⌊\frac{μ_{i, k} χ_{i}}{m_{i}}⌋ m_{i},

(5)

where

χ_{i, k}

is a normalized residue modulo

m_{i}

:

χ_{i, k} = {|μ_{i, k} χ_{i}|}_{m_{i}} (i = 1, 2, \dots, k),

(6)

⌊x⌋

denotes the largest integer less than or equal to x.

Substituting (5) into (4), and taking into consideration (6), we have

X = {|\sum_{i = 1}^{k} M_{i, k} χ_{i, k} + M_{k} \sum_{i = 1}^{k} ⌊\frac{μ_{i, k} χ_{i}}{m_{i}}⌋|}_{M_{k}}

that is equivalent to

X = {|\sum_{i = 1}^{k} M_{i, k} χ_{i, k}|}_{M_{k}} .

(7)

Since the summands in (7) have narrower change bounds, the use of (4), which is a normalized analog of (1), is preferable for constructing RNS arithmetic.

Equation (7) is called the CRT-form of representing the integer

X = (χ_{1}, χ_{2}, \dots, χ_{k})

from the RNS number range

Z_{M_{k}}

.

The mapping

Φ_{R N S}

is an isomorphism concerning the basic arithmetic operations. The operation

\circ \in \{+, -, \times\}

on arbitrary elements A and B given by their residue codes

A = (α_{1}, α_{2}, \dots, α_{k})

and

B = (β_{1}, β_{2}, \dots, β_{k})

is carried out by the rule

A \circ B = (α_{1}, α_{2}, \dots, α_{k}) \circ (β_{1}, β_{2}, \dots, β_{k}) =

= ({|α_{1} \circ β_{1}|}_{m_{1}}, {|α_{2} \circ β_{2}|}_{m_{2}}, \dots, {|α_{k} \circ β_{k}|}_{m_{k}}),

(8)

where

α_{i} = {|A|}_{m_{i}}, β_{i} = {|B|}_{m_{i}}, i = 1, 2, \dots, k

.

In the RNS, according to (8), the modular addition, subtraction, and multiplication are performed independently for each modulus

m_{i}

(i = 1, 2, \dots, k)

. It must be noted that (8) is correct only if the result

A \circ B

of the arithmetic operation does not go beyond the RNS number range, i.e., if

A \circ B \in Z_{M_{k}}

.

The RNS inherent code parallelism illustrated by (8), which consists of the decomposition of arithmetic operations on integers A and B into independent small word length operations on the like digits

α_{i}

and

β_{i}

of residue code, is the main advantage of modular arithmetic compared with the arithmetic of weighted number systems (WNS). Realizing this advantage to the fullest extent is a key strategic goal of all computer applications in the RNS.

As is known, in contrast to the positional code, the residue code

(χ_{1}, χ_{2}, \dots, χ_{k})

of the number X does not explicitly contain information about its value. Therefore, the implementation in the RNS arithmetic operations that require calculating the so-called positional characteristics which give information about the numbers location in the RNS range

Z_{M_{k}}

encounters specific difficulties. Such procedures, in contrast to modular ones, are called non-modular.

The efficiency factor of RNS arithmetic, to a decisive extent, is determined by the optimality of the applied non-modular procedures. At the same time, the main factor that has the most impact on the quality indicators of algorithms for non-modular operations is the computational complexity of calculating the positional characteristics of the residue code and related integer representation forms.

As for Equation (7), its direct application as the general form of integers for building non-modular procedures is practically impossible due to the complexity of straightforward implementation, especially in the case of large

M_{k}

. At the same time, the use of the specific positional characteristics enables us to obtain from (7) the relevant forms of integer representation, which have good implementation properties and make it possible to overcome the problem of time-consuming addition operations modulo

M_{k}

.

As follows from (7), the difference

\sum_{i = 1}^{k} M_{i, k} χ_{i, k} - X

is a multiple of

M_{k}

. Hence, the following equality holds

X = \sum_{i = 1}^{k} M_{i, k} χ_{i, k} - ρ_{k} (X) M_{k} .

(9)

The positional characteristic

ρ_{k} (X)

is called a rank of the number X. In essence, the rank

ρ_{k} (X)

is a CRT reconstruction coefficient that indicates how many times the upper bound

M_{k}

of the number range is exceeded when the integer value of the number X is calculated by its residue code

(χ_{1}, χ_{2}, \dots, χ_{k})

.

Equation (9) is called a rank form of the integer X.

From (9), it also follows that the rank

ρ_{k} (X)

is a quotient of the integer division of

X_{k}

by

M_{k}

.

Hence, we obtain

ρ_{k} (X) = ⌊\frac{1}{M_{k}} \sum_{i = 1}^{k} M_{i, k} χ_{i, k}⌋ = ⌊\sum_{i = 1}^{k} \frac{χ_{i, k}}{m_{i}}⌋ .

(10)

Therefore, since

χ_{i, k} \in Z_{m_{i}}

(i = 1, 2, \dots, k)

, the inequality

0 \leq ρ_{k} (X) \leq k - 1

holds.

Compared with (7), Equation (9) does not contain time-consuming reduction modulo

M_{k}

. Therefore, designing non-modular procedures in RNS arithmetic on the basis of the rank form has a substantial lead over the canonical CRT implementation.

3. The Approaches Currently Used to Calculate the Rank of a Number

First, the rank of a number as a main RNS integral characteristic has been studied in [1], and later in [2]. The rank evaluation algorithm consists of a slow k-step iterative procedure of sequential additions large modulo of specific constants defined by the chosen RNS moduli-set

\{m_{1}, m_{2}, \dots, m_{k}\}

. Moreover, the upper bound of the rank

r (X)

depends on the values of the weights

μ_{1, k}, μ_{2, k}, \dots, μ_{k, k}

(see (4)), and can be sufficiently large for most moduli-sets suitable for practical use. If we assume that the processing of such long L-bit word-length numbers

L = (⌈{log}_{2} M_{k}⌉)

is comparable in time with k operations on the small residues, then the complexity of this method is equal to

O (k^{2})

. Because of that, the given approach to the rank calculation is time-consuming and practically unacceptable for high-performance computing due to its computational complexity, especially when using huge

M_{k}

.

The so-called “extra modulus method” for rank calculation has been proposed in [25]. It rearranges the canonical CRT implementation to an exact integer equation, i.e., the same form as (9). To be able to retrieve the value of the CRT reconstruction coefficient, i.e., the rank of a number, the extra-modulus

m_{e}

must satisfy the following conditions:

m_{e} > k

, and

m_{e}

is any integer prime to

M_{k}

. In this way, the slow and challenging addition modulo

M_{k}

in the straightforward CRT implementation is replaced by subtraction and multiplication modulo

m_{e}

. Thus, we have an extra modular channel for rank calculation. This method works well and correctly when it assumes that proper redundant residue

{|X|}_{m_{e}}

is available. Hence, the “extra modulus method” is suitable for the base extension operation. At the same time, when the number under consideration results from the modular addition or subtraction operation [26], it cannot be used owing to eventual overflow or underflow, respectively. Thus, in such a case, the exact value of

{|X|}_{m_{e}}

is not available. Therefore, this method is not applicable for sign determination and magnitude comparison of two numbers in RNS.

A different approach to evaluating the CRT reconstruction coefficient is proposed in [27,28,29]. The main idea of the so-called ”fractional domain method” consists in the representation of the reconstruction coefficient r as an integer part of a sum of at most k proper fractions (see (10)). The value r is recursively estimated by approximating terms of a fraction

χ_{i, k} / m_{i}

. To avoid division by the modulus

m_{i}

in the fraction, the denominator

m_{i}

is replaced by

2^{n}

(m_{i} < 2^{n})

, while the numerator

χ_{i, k}

is approximated by its most significant

υ

bits

(υ < n)

(i = 1, 2, \dots, k)

. Since the division by powers of 2 is equivalent to simple shifts, then the calculation of r can be implemented by addition only.

The main drawbacks of this method consist of the following. First of all, full-precision fractional computations are required. In any case, such calculations are slower than operating on smaller word-length, and the full-precision fractional bits require substantial storage. On the other hand, the number of iterations required is of the order of the bit-length needed for the approximation. For example, the method employing a fractional interpretation of the CRT [27] needs a very high precision of

⌈\log_{2} (k M_{k})⌉

bits. The method proposed in [28,29] uses a sequential bit-by-bit manner for evaluating reconstruction coefficient r. The iterative structure of this method makes it very slow in the case of large word-length numbers.

There are also approaches to reconstruct the integer value of RNS number based on the CRT by using special moduli-sets with a limited number of moduli such as

m = 2^{n} + d

(d \in \{- 1, 0, 1\})

[5,8]. The main drawback of these methods consists of a small number of the selected moduli, typically from three to five. Such moduli sets are suitable for the efficient implementations of digital signal processing algorithms but completely not applicable for the processing of large numbers which are widely used in cryptography.

In recent decades, the CRT algorithm, corresponding forms of number representation, and the methods of integer reconstruction by residue code have been intensively studied, especially concerning their application in high-performance computing. The major efforts are aimed at reducing the computational complexity of calculating the main integral characteristics of residue code.

There are some new approaches for calculating an approximate value of the rank of a number which allow us to reduce the computational complexity of complicated non-modular operations in RNS arithmetic [30,31,32]. The method proposed in [30] is based on the so-called interval floating-point characteristic which provides information about the range of changes in the relative value of RNS representation. Generally, it enables us to perform effectively such operations as magnitude comparison, sign determination, and overflow detection. The concept of an approximate value of the rank of a number is introduced in [31]. This approach allows us to reduce the computational complexity of the decoding from residue code to binary representation and decrease the size of the required coefficients. Based on the properties of the approximate value and arithmetic properties of RNS, a new method for error detection, correction, and controlling computational results has been proposed. In [32], a new original general-purpose technique for CRT basis extension and scaling in RNS using floating-point arithmetic for the rank estimation is proposed for a homomorphic encryption scheme. The main algorithmic improvements focus on optimizing decryption and homomorphic multiplication in the RNS using the CRT to represent and manipulate the large coefficients in the ciphertext polynomials.

The rank positional characteristic has been thoroughly investigated in [33,34]. As shown, the rank

ρ_{k} (X)

has a simple structure, high modularity of calculation, and a small range of changes. At the same time, the rank

ρ_{k} (X)

is a sum of two small numbers, namely, the inexact rank

{\hat{ρ}}_{k} (X) < k

and two-valued rank correction

Δ_{k} (X) \in \{0, 1\}

:

ρ_{k} (X) = {\hat{ρ}}_{k} (X) + Δ_{k} (X),

(11)

where

{\hat{ρ}}_{k} (X) = ⌊\frac{1}{m_{k}} \sum_{i = 1}^{k} R_{i, k} (χ_{i})⌋

(12)

and

R_{i, k} (χ_{i}) = ⌊\frac{m_{k} χ_{i, k}}{m_{i}}⌋ = ⌊\frac{m_{k} {|μ_{i, k} χ_{i}|}_{m_{i}}}{m_{i}}⌋ (i = 1, 2, . . ., k - 1),

(13)

R_{k, k} (χ_{k}) = χ_{k, k} = {|μ_{k, k} χ_{k}|}_{m_{k}} .

(14)

In conventional non-redundant RNS, as it follows from (11)–(14), the calculation of the inexact rank

{\hat{ρ}}_{k} (X)

is reduced to a summation of k small residues

R_{1, k} (χ_{1})

,

R_{2, k} (χ_{2})

, …,

R_{k, k} (χ_{k})

modulo

m_{k}

taking into account the number of the overflows occurring during the modular addition operations. At the same time, as demonstrated in [34], the main computational cost is associated with the estimation of the rank correction

Δ_{k} (X)

. Its evaluation requires concurred modular addition operations in all independent modular channels corresponding to primary RNS moduli

m_{1}, m_{2}, \dots, m_{k}

. These computations can be implemented easily by the pre-computation and lookup table techniques. As a result, the total number of required modular addition operations and lookup tables for rank

ρ_{k} (X)

calculation are

(k^{2} + 5 k - 10) / 2

and

(k^{2} + k - 2) / 2

, respectively.

As shown in [34], the minimum redundancy residue code enables optimization of the rank calculation. It assumes the extension of non-redundant residue code

(χ_{1}, χ_{2}, \dots, χ_{k})

of the number X by the redundant residue

{χ_{0} = |X|}_{m_{0}}

concerning extra modulus

m_{0} = 2

, i.e., by adding the parity of the number X to its residue representation. Therefore, in minimally redundant RNS, the number

X \in Z_{M_{k}}

is represented by its minimally redundant residue code

(χ_{0}, χ_{1}, \dots, χ_{k})

. So, the total residue code length increases by only one bit.

The main advantage of minimally redundant RNS compared with non-redundant analogs consists of a significant simplification of calculating the rank correction

Δ_{k} (X)

and, accordingly, the rank

ρ_{k} (X)

.

The use of minimum redundancy residue code makes it possible to replace in (11) the rank correction

Δ_{k} (X)

, which evaluation is time-consuming and requires performing addition operations in all modular channels, with a trivially calculated binary attribute

δ_{k} (X) \in \{0, 1\}

. At the same time,

ρ_{k} (X) = {\hat{ρ}}_{k} (X) + δ_{k} (X)

(15)

and

δ_{k} (X) = {|χ_{0} + \sum_{i = 1}^{k} ψ_{i} + {\hat{ρ}}_{0}|}_{2},

(16)

where

χ_{0} = {|X|}_{2}

,

ψ_{i} = {|χ_{i, k}|}_{2} = {|{|μ_{i, k} χ_{i}|}_{m_{i}}|}_{2}

, and

{\hat{ρ}}_{0} = {|{\hat{ρ}}_{k} (X)|}_{2}

.

Compared with non-redundant analogs, the use of minimally redundant RNS enables us to reduce significantly the complexity of the rank

ρ_{k} (X)

calculation both in terms of required modular addition operations and lookup tables. At the same time, the corresponding computational cost is k modular addition operations and k one-input lookup tables. The time complexity depends only on the number of primary RNS moduli and equals

T_{r a n k} = ⌈\log_{2} k⌉

modular clock cycles.

As shown in [34], the transition to the minimum redundant residue code enables a decrease in the computational complexity of the rank calculation from the order

O (k^{2} / 2)

to

O (k)

concerning required modular addition operations and lookup tables. Thus, the computational complexity reduction factor increases with the number k of non-redundant moduli

m_{1}, m_{2}, \dots, m_{k}

and asymptotically approaches the threshold

k / 2

.

The use of minimally redundant RNS ensures significant optimization of calculating the rank

ρ_{k} (X)

of the number X. Moreover, this is also applied to the implementation of the CRT algorithm and, correspondingly, to the execution of various non-modular operations based on it. First of all, that is caused owing to the extreme simplicity evaluation of two-valued characteristic

δ_{k} (X) \in {0, 1}

as well as the modular structure of the main calculation equation for inexact rank

{\hat{ρ}}_{k} (X)

(see (12)). This circumstance enables radical simplifying the calculation of the rank

ρ_{k} (X)

in minimally redundant RNS in comparison with conventional non-redundant RNS and, consequently, makes it possible to construct faster and optimal with respect to computational complexity variants of RNS arithmetic.

Therefore, the application of minimally redundant residue representation takes priority over conventional non-redundant RNS arithmetic to implement the scaling procedures based on the rank form of a number.

4. The Main Types of Scaling Algorithms in RNS Arithmetic

In the conventional WNS, the power-of-two scaling is performed simply by right shifting. In the RNS, compared with WNS, this procedure has substantial difficulty because it is not easily implementable due to its non-positional nature.

The classical power-of-two scaling method consists of the residue code conversion to binary representation, scaling in the conventional WNS, and converting the result back to the RNS.

Unlike the WNS, the residue code does not contain explicit information about the integer value of the represented number. Therefore, in addition to its usual purpose, which consists of limiting the undesirable growth of calculation results, the scaling in RNS is also used to detect the position of integers in a particular range (i.e., to evaluate their values), rounding, and solving other similar tasks. This operation is often used in more complex non-modular procedures such as general modular division. Many different scaling algorithms, which do not require RNS-to-binary conversion, have been presented in the literature. A detailed review of the known modular scaling methods is presented in [8].

The essence of the modular scaling operation is to obtain some integer approximation

\hat{X} = ({\hat{χ}}_{1}, {\hat{χ}}_{2}, \dots, {\hat{χ}}_{k})

(i = 1, 2, \dots, k)

to the fraction

X / S

, where

X = (χ_{1}, χ_{2}, \dots, χ_{k})

is an arbitrary element of the RNS number range

Z_{M_{k}}

, and S is a constant factor (scale). The fraction

X / S

is usually approximated by the integers

⌊ X / S ⌋

and

⌈ X / S ⌉

(

⌊ x ⌋

and

⌈ x ⌉

are the floor and ceiling function of x, respectively).

The most important aspect of the scaling problem in RNS is to ensure the high flexibility of the created algorithmic tools. That implies adoption of the set

S = \{S_{0}, S_{1}, \dots, S_{Λ - 1}\}

of scales

S_{l} > 1

(l = 0, 1, \dots, Λ - 1)

which is usually chosen based on the criterion for the minimum calculating error under a given constraint on the number of scaling factors.

All known scaling techniques can be classified into four main categories:

1.: scaling by the product of some RNS moduli [32,35,36,37,38],
2.: scaling by an integer from the RNS number range [39,40],
3.: scaling by a common fraction [41],
4.: scaling by a power of two [42,43,44].

In the first group, many scaling methods take the scaling factor S as a product of l moduli, i.e., of the form

S = M_{l}

(l = 1, 2, \dots, k - 1)

[35,36,37,38]. That makes it easier to obtain the residues

{\hat{χ}}_{l + 1}, {\hat{χ}}_{l + 2}, \dots, {\hat{χ}}_{k}

of the approximation

\hat{X}

to the fraction

X / S

. The remaining residues

{\hat{χ}}_{1}, {\hat{χ}}_{2}, \dots, {\hat{χ}}_{l}

can be calculated sufficiently lightly within the framework of procedures based on one of the base extension algorithms [2,35,45]. Due to the small word length of residues, the pre-computation and lookup table techniques are suitable for modular scaling.

In [35], the base extension algorithm uses the reverse conversion of residue code to mixed-radix representation. The method proposed in [36] requires a redundant modulus to evaluate the CRT reconstruction coefficient, i.e., the rank of a number, to complete the base extension procedure. In [38], the suggested approach is entirely based on a lookup tables technique, while all the required tables have two inputs. At the same time, the memory costs are too high when the number of chosen moduli is sufficiently large. The method proposed in [37] enables one to carry out base extension and exact scaling without some system redundancy only by using additional lookup tables.

The CRT-base technique for modular scaling by an integer has been suggested in [39]. Here, the main idea is to approximate the CRT calculating relation for reconstructing the integer value of RNS numbers. This enables the substitution of large modulo

M_{k}

addition in the canonic CRT-decoding scheme by smaller word-length modular addition operations. In [40], the proposed method uses minimum redundancy for modular scaling by arbitrary positive scales. The distinctive feature of the algorithm consists of using the interval index as a positional characteristic of residue code. At the same time, the interval index can be calculated fast and lightly by modular addition of small residues in the kth modular channel corresponding to the modulus

m_{k}

from the RNS moduli-set

{m_{1}, m_{2}, \dots, m_{k}}

.

In the case of arbitrary rational scale S, an efficient basis for modular scaling is the approach presented in [41]. The main feature is that for the scales of the form

S = p / q

, the numbers p and q can take any integer values for which the fraction

q X / p

does not exceed the upper bound of the RNS number range. In addition, both the number

q X

and the results of intermediate calculations may not satisfy the specified requirement.

The scaling methods in the fourth group implement division by constants of the form

S = 2^{l}

(l = 1, 2, \dots, Λ)

,

Λ \leq ⌊ \log_{2} M_{k} ⌋

[7,42,43]. General approaches to solving this task are based mainly on the bisection method. It consists of calculating the recurrence relation

X^{(j + 1)} = ⌊ X^{(j)} / 2 ⌋

for

j = 0, 1, \dots, l - 1

. In this case,

X^{(0)} = X

, and

X^{(l)} = ⌊ X / 2^{l} ⌋

. The residue

{\hat{χ}}_{i}^{(j + 1)}

(i = 1, 2, \dots, k)

of approximation

X^{(j + 1)}

is determined as

χ_{i}^{(j + 1)} = \{\begin{matrix} {|\frac{1}{2} χ_{i}^{(j)}|}_{m_{i}} if X^{(j)} is even, \\ {|\frac{1}{2} (χ_{i}^{(j)} - 1)|}_{m_{i}} if X^{(j)} is odd, \end{matrix}

(17)

while all the primary moduli

m_{1}, m_{2}, \dots, m_{k}

are coprime odd numbers. The last condition ensures that 2 and

m_{i}

(i = 1, 2, \dots, k)

are relatively prime numbers, and, correspondingly, the existence of a modular multiplicative inverse of 2, i.e., the number

{|2^{- 1}|}_{M_{k}} = ({|2^{- 1}|}_{m_{1}}, {|2^{- 1}|}_{m_{2}}, \dots, {|2^{- 1}|}_{m_{k}})

. As followed from (17), the scaling by 2 requires the parity detection of the number

X^{(j)}

,

j = 0, 1, \dots, l - 1

. So, there is a need for a base extension operation to extra modulus equal 2.

An iterative algorithm for scaling by the factor

S = 2^{l}

proposed in [42] is implemented in l steps. At the same time, the parity of the intermediate results is checked at each iteration using the base extension operation suggested in [25]. In [43], the power-of-two scaling technique is applied to realize a digital filter in quadratic RNS. The scaling algorithm presented in [44] focuses on arbitrary moduli sets with large dynamic ranges and requires only machine-precision integer and floating-point operations. At the same time, it is used for software implementation of rounding and exponent alignment procedures in a multiple-precision RNS-based arithmetic library for parallel CPU-GPU systems.

Many modular scaling algorithms use special moduli sets with a limited number of moduli. A detailed review of some of these methods is given in [8]. The most commonly used moduli sets for efficient RNS scalers are

{2^{2 n + 1} + 1, 2^{2 n + 1}, 2^{2 n + 1} - 1}

,

{2^{n} - 1, 2^{n + p}, 2^{n} + 1}

,

{2^{n + 1} - 1, 2^{n}, 2^{n} - 1}

,

{2^{n + p}, 2^{n} - 1, 2^{n - 1} - 1}

among others [46,47,48,49,50,51]. The main drawback of such approaches is imposing very restrictive constraints on the moduli sets. They are certainly suitable for implementing scaling tasks in digital signal processing but, at the same time, they do not fit for scaling and other non-modular operations on numbers belonging to large dynamic ranges which are widely used in long-word-length cryptography.

5. A Novel Approach for Calculating the Rank of a Number Resulting from Scaling by 2

In RNS, the rank

ρ_{k} (X) \in Z_{k} = \{0, 1, \dots, k - 1\}

is a principal positional characteristic since all the non-modular operations, such as magnitude comparison, sign determination, overflow detection, general division, scaling, residue-to-binary conversion, and others, can be implemented on its basis. Because the rank

ρ_{k} (X)

enables estimation of the integer value of the RNS-number X, then the development of efficient methods and algorithms for its calculating is of primary importance in building efficient variants of RNS arithmetic and, accordingly, high-performance modular computational structures.

Let us show that the rank form (9) of the number representation in residue arithmetic creates a basis for constructing relatively fast and sufficiently simple iterative algorithms for the implementation of division by constant

S_{l} = 2^{l}

(l = 1, 2, \dots, Λ, Λ \leq ⌊ \log_{2} M_{k} ⌋)

. In this case, the following theorem is fundamental for solving the problem of modular scaling by powers of 2.

Theorem 2.

Let in RNS with pairwise prime odd moduli

m_{1}, m_{2}, \dots, m_{k}

the arbitrary number

X = (χ_{1}, χ_{2}, \dots, χ_{k})

from the range

Z_{M_{k}}

having rank

ρ_{k} (X)

be given. Then the rank of the integer

\hat{X} = ⌊ X / 2 ⌋

satisfies the equation

ρ_{k} (\hat{X}) = \{\begin{matrix} \frac{1}{2} (ρ_{k} (X) + \sum_{i = 1}^{k} ψ_{i}) if X is even, \\ \frac{1}{2} (ρ_{k} (X) - ρ_{k} (1) + \sum_{i = 1}^{k} ω_{i} + \sum_{i = 1}^{k} φ_{i}) if X is odd, \end{matrix}

(18)

where

ψ_{i} = {|χ_{i, k}|}_{2} = {|{|μ_{i, k} χ_{i}|}_{m_{i}}|}_{2},

(19)

ω_{i} = \{\begin{matrix} 0 if χ_{i, k} \geq μ_{i, k}, \\ 1 if χ_{i, k} < μ_{i, k}, \end{matrix}

(20)

φ_{i} = \{\begin{matrix} {|ψ_{i} + ω_{i}|}_{2} if {|μ_{i, k}|}_{2} = 0, \\ \bar{{|ψ_{i} + ω_{i}|}_{2}} if {|μ_{i, k}|}_{2} = 1, \end{matrix}

(21)

ρ_{k} (1)

is the rank of the number 1, and

\bar{x}

denotes the negation of the Boolean value x.

Proof.

As follows from the rank form (9), the number 1 in a given RNS has the following form

1 = \sum_{i = 1}^{k} M_{i, k} μ_{i, k} - ρ_{k} (1) M_{k} .

Therefore, we can write

\hat{X} = ⌊ X / 2 ⌋ = \frac{1}{2} (X - {|X|}_{2}) =

= \frac{1}{2} (\sum_{i = 1}^{k} M_{i, k} χ_{i, k} - ρ_{k} (X) M_{k} - {|X|}_{2} (\sum_{i = 1}^{k} M_{i, k} μ_{i, k} - ρ_{k} (1) M_{k})) =

= \frac{1}{2} (\sum_{i = 1}^{k} M_{i, k} (χ_{i, k} - {|X|}_{2} μ_{i, k}) - M_{k} (ρ_{k} (X) - {|X|}_{2} ρ_{k} (1))) .

(22)

Then, in accordance with Euclid’s Division Lemma (1), from (22) we have

\begin{matrix} \hat{X} = \frac{1}{2} (\sum_{i = 1}^{k} M_{i, k} ({|χ_{i, k} - {|X|}_{2} μ_{i, k}|}_{m_{i}} + ⌊(χ_{i, k} - {|X|}_{2} μ_{i, k}) / m_{i}⌋ m_{i}) - \\ - M_{k} (ρ_{k} (X) - {|X|}_{2} ρ_{k} (1))) . \end{matrix}

Thus,

\begin{matrix} \hat{X} = \sum_{i = 1}^{k} \frac{1}{2} M_{i, k} {|χ_{i, k} - {|X|}_{2} μ_{i, k}|}_{m_{i}} - \frac{1}{2} M_{k} (ρ_{k} (X) - {|X|}_{2} ρ_{k} (1) - \\ - \sum_{i = 1}^{k} ⌊(χ_{i, k} - {|X|}_{2} μ_{i, k}) / m_{i}⌋) . \end{matrix}

(23)

Since for each least nonnegative residue

χ \in Z_{m}

modulo an arbitrary odd modulus m, there is a unique formal quotient

{|χ / 2|}_{m}

, and

{|χ / 2|}_{m} = (χ + m {|χ|}_{2}) / 2

(see, for example, [1]), then

{\hat{χ}}_{i, k} = {|\frac{1}{2} {|χ_{i, k} - {|X|}_{2} μ_{i, k}|}_{m_{i}}|}_{m_{i}} = \frac{1}{2} ({|χ_{i, k} - {|X|}_{2} μ_{i, k}|}_{m_{i}} + m_{i} {|{|χ_{i, k} - {|X|}_{2} μ_{i, k}|}_{m_{i}}|}_{2}) .

Therefore,

\frac{1}{2} {|χ_{i, k} - {|X|}_{2} μ_{i, k}|}_{m_{i}} = {\hat{χ}}_{i, k} - \frac{1}{2} m_{i} {|{|χ_{i, k} - {|X|}_{2} μ_{i, k}|}_{m_{i}}|}_{2} .

Taking this into account, from (23) we get

\begin{matrix} \hat{X} = \sum_{i = 1}^{k} M_{i, k} {\hat{χ}}_{i, k} - \frac{1}{2} (ρ_{k} (X) - {|X|}_{2} ρ_{k} (1) - \sum_{i = 1}^{k} ⌊(χ_{i, k} - {|X|}_{2} μ_{i, k}) / m_{i}⌋ + \\ + \sum_{i = 1}^{k} {|{|χ_{i, k} - {|X|}_{2} μ_{i, k}|}_{m_{i}}|}_{2}) . \end{matrix}

Hence, according to the rank form of number representation (9), we conclude that the following equation for the rank

ρ_{k} (\hat{X})

of the number

\hat{X}

is valid:

\begin{matrix} ρ_{k} (\hat{X}) = \frac{1}{2} (ρ_{k} (X) - {|X|}_{2} ρ_{k} (1) - \sum_{i = 1}^{k} ⌊(χ_{i, k} - {|X|}_{2} μ_{i, k}) / m_{i}⌋ + \\ + \sum_{i = 1}^{k} {|{|χ_{i, k} - {|X|}_{2} μ_{i, k}|}_{m_{i}}|}_{2}) . \end{matrix}

(24)

If the number X is even, then

{|X|}_{2} = 0

, so that

⌊(χ_{i, k} - {|X|}_{2} μ_{i, k}) / m_{i}⌋ = ⌊χ_{i, k} / m_{i}⌋ = 0

and

{|{|χ_{i, k} - {|X|}_{2} μ_{i, k}|}_{m_{i}}|}_{2} = {|χ_{i, k}|}_{2} = ψ_{i}

(i = 1, 2, \dots, k)

. Therefore, in this case, Equation (24) takes the form

ρ_{k} (\hat{X}) = \frac{1}{2} (ρ_{k} (X) + \sum_{i = 1}^{k} ψ_{i})

which corresponds to (18).

If the number X is odd, then

{|X|}_{2} = 1

, and it is easy to check that

⌊(χ_{i, k} - {|X|}_{2} μ_{i, k}) / m_{i}⌋ = ⌊(χ_{i, k} - μ_{i, k}) / m_{i}⌋ = - ω_{i},

while

{|{|χ_{i, k} - {|X|}_{2} μ_{i, k}|}_{m_{i}}|}_{2} = {|{|χ_{i, k} - μ_{i, k}|}_{m_{i}}|}_{2} = φ_{i}

(i = 1, 2, \dots, k)

, where

ω_{i}

and

φ_{i}

are two-valued quantities determined by (19) and (20), respectively. In this case, Equation (24) takes the form

ρ_{k} (\hat{X}) = \frac{1}{2} (ρ_{k} (X) - ρ_{k} (1) + \sum_{i = 1}^{k} ω_{i} + \sum_{i = 1}^{k} φ_{i})

which also corresponds to (18).

The theorem is proved. □

As it follows from Theorem 2, the rank

ρ_{k} (\hat{X})

of the number

\hat{X} = ⌊ X / 2 ⌋

can be calculated rapidly and easily only taking into account the known value of the rank

ρ_{k} (X)

of the initial number X. This circumstance makes it possible to optimize and significantly speed up the execution of the power-of-two scaling operation. In this case, it is not necessary at each iteration to calculate the rank of the number, which is the intermediate result of scaling, by its residue code. At the same time, the complete operation of rank calculation is necessary only for the initial number X at the preliminary stage of the scaling procedure.

6. A Novel Power-of-Two Modular Scaling Based on the Rank Positional Characteristic in Minimally Redundant RNS

Theorem 2 implies the following step algorithm for power-of-two scaling in minimally redundant RNS with primary pairwise prime odd modules

m_{1}, m_{2}, \dots, m_{k}

, extra modulus

m_{0} = 2

, and scales of the form

S_{l} = 2^{l}

(l = 1, 2, \dots, Λ, Λ = ⌊ \log_{2} M_{k} ⌋)

.

S.1. Based on the minimum redundant residue code

(χ_{0}, χ_{1}, \dots, χ_{k})

of the original number X, the rank

ρ_{k} (X)

is calculated following to (12)–(16). In addition, it is assumed that

X^{(0)} = X

,

χ_{i}^{(0)} = χ_{i}

(i = 0, 1, \dots, k)

,

χ_{i, k}^{(0)} = χ_{i, k} = {|μ_{i, k} χ_{i}|}_{m_{i}}

(i = 1, 2, \dots, k)

, and

j = 0

.

S.2. For the residue number

X^{(j)} = (χ_{0}^{(j)}, χ_{1}^{(j)}, \dots, χ_{k}^{(j)})

, the integer

Δ^{(j)} = \{\begin{matrix} \sum_{i = 1}^{k} ψ_{i}^{(j)} if χ_{0}^{(j)} = 0, \\ \sum_{i = 1}^{k} ω_{i}^{(j)} + \sum_{i = 1}^{k} φ_{i}^{(j)} - ρ_{k} (1) if χ_{0}^{(j)} = 1 \end{matrix}

(25)

is calculated, where

ψ_{i}^{(j)} = {|χ_{i, k}^{(j)}|}_{2},

(26)

ω_{i}^{(j)}

and

φ_{i}^{(j)}

are obtained by formulas similar to (19) and (20), namely:

ω_{i}^{(j)} = \{\begin{matrix} 0 if χ_{i, k}^{(j)} \geq μ_{i, k}, \\ 1 if χ_{i, k}^{(j)} < μ_{i, k}, \end{matrix}

(27)

φ_{i}^{(j)} = \{\begin{matrix} {|ψ_{i}^{(j)} + ω_{i}^{(j)}|}_{2} if {|μ_{i, k}|}_{2} = 0, \\ \bar{{|ψ_{i}^{(j)} + ω_{i}^{(j)}|}_{2}} if {|μ_{i, k}|}_{2} = 1, \end{matrix}

(28)

i = 1, 2, \dots, k

.

S.3. The digits

χ_{1}^{(j + 1)}

,

χ_{2}^{(j + 1)}

, ⋯,

χ_{k}^{(j + 1)}

of the minimally redundant residue code and the rank

ρ_{k} (X^{(j + 1)})

of the number

X^{(j + 1)} = ⌊ X^{(j)} / 2 ⌋

are determined, respectively, according to the rules

χ_{i}^{(j + 1)} = {|\frac{1}{2} (χ_{i}^{(j)} - χ_{0}^{(j)})|}_{m_{i}} (i = 1, 2, \dots, k),

(29)

ρ_{k} (X^{(j + 1)}) = \frac{1}{2} (ρ_{k} (X^{(j)}) + Δ^{(j)}) .

(30)

S.4. The redundant residue

χ_{0}^{(j + 1)} = {|(X^{(j + 1)}|}_{2}

is calculated according to equation following from the rank form (9)

χ_{0}^{(j + 1)} = {|\sum_{i = 1}^{k} ψ_{i}^{(j + 1)} + ρ_{0}^{(j + 1)}|}_{2},

(31)

where

ψ_{i}^{(j + 1)} = {|χ_{i, k}^{(j + 1)}|}_{2} = {|{|μ_{i, k} χ_{i}^{(j + 1)}|}_{m_{i}}|}_{2}

and

ρ_{0}^{(j + 1)} = {|ρ_{k} (X^{(j + 1)})|}_{2}

. In essence, it determines the parity of the number

X^{(j + 1)}

.

If

j = l - 1

, then the number

X^{(j + 1)} = X^{(l)} = ⌊ X / 2^{l} ⌋

is the required number, and the scaling process ends. Otherwise, the variable j is incremented by one

(j = j + 1)

, and the jump to step S.2 is carried out.

For its hardware implementation, the most important feature of the above recursive scaling algorithm is that the specified operations on steps S.2, S.3, and S.4 can be combined in time and carried out within one modular clock cycle. Due to this circumstance, after obtaining the rank

ρ_{k} (X)

, each iteration of RNS number scaling by 2, i.e., each shift of its integer value by one bit to the right, is performed in one modular clock cycle.

Since the calculating process of the rank

ρ_{k} (X)

has a pipeline structure, with the appropriate organization of computations the described scaling procedure at low hardware costs provides a reasonably high speed.

It follows from the above that all the necessary calculations within the scaling algorithm can be implemented using tabular computational structures.

For example, the calculation of the inexact rank

{\hat{ρ}}_{k} (X)

of the initial number X is reduced to a summation of the sets of small residues

〈 R_{1, k} (χ_{1}), R_{2, k} (χ_{2}), \dots, R_{k, K} (χ_{k}) 〉

modulo

m_{k}

. Simultaneously, we take into account the number of occurred overflows when performing these modular addition operations (see (12)–(14)). Therefore, we need k one-input lookup tables to store the given set, while the bit length of recorded residues is

⌈{log}_{2} m_{k}⌉ (l = 1, 2, \dots, k)

. At the same time, the estimation of two-valued rank correction

δ_{k} (X)

(see (16)) requires the set

〈 ψ_{1}, ψ_{2}, \dots, ψ_{k} 〉

of least significant bits of normalized residues

χ_{i, k}

(i = 1, 2, \dots, k)

of the number X (see (6)).

Similarly, the sets of binary flags

〈 ψ_{1}^{(j)}, ψ_{2}^{(j)}, \dots, ψ_{k}^{(j)} 〉

,

〈 ω_{1}^{(j)}, ω_{2}^{(j)}, \dots, ω_{k}^{(j)} 〉

and also

〈 φ_{1}^{(j)}, φ_{2}^{(j)}, \dots, φ_{k}^{(j)} 〉

(see (26)–(28)) enable us to obtain the integer

Δ^{(j)}

required for rank calculating in the corresponding iterations of scaling procedure

(j = 0, 1, \dots, l - 1)

. All these binary sets can also be recorded in the appropriate lookup tables.

Thus, the content of the ith lookup table corresponding to the input residue

χ_{i}^{(j)}

has the form

〈 R_{i, k} (χ_{i}^{(j)}), ψ_{i}^{(j)}, ω_{i}^{(j)}, φ_{i}^{(j)} 〉

(i = 1, 2, \dots, k), (j = 0, 1, \dots, l - 1)

.

Below we present the proposed scaling method in the form of a pseudo-code algorithm.

Let us evaluate the computational complexity of the proposed iterative power-of-two scaling method. As follows from the above, Algorithm 1 requires total

T_{s c a l} = T_{r a n k} + T_{i t e r} \times l

modular clock cycles. According to [33,34], in minimally redundant RNS, the time complexity of calculating the rank

ρ_{k} (X)

of the initial number X depends only on the number k of primary RNS moduli and can be evaluated as

T_{r a n k} = ⌈\log_{2} k⌉

. At the same time, all calculations within each iteration, consisting in obtaining both the minimally redundant residue code

(χ_{0}^{(j + 1)}, χ_{1}^{(J + 1)}, \dots, χ_{k}^{(j + 1)})

and the rank

ρ_{k} (X^{(j + 1)})

of the number

X^{(j + 1)} = ⌊ X^{(j)} / 2 ⌋

(j = 0, 1, \dots, l - 1)

, can be performed in one modular clock cycle by using lookup table technique. Therefore,

T_{i t e r} = 1

. Hence, the algorithm time complexity

T_{s c a l} = ⌈\log_{2} k⌉ + l

modular clock cycles.

Algorithm 1: Power-of-two scaling in minimally redundant RNS

To illustrate the power-of-two scaling of the number

X = (χ_{0}, χ_{1}, \dots, χ_{k})

based on the rank form (9) in the proposed minimally redundant RNS, we present below a numerical example.

Let us consider the RNS with the primary moduli

m_{1} = 5

,

m_{2} = 7

,

m_{3} = 9

, and

m_{4} = 11

, taking into account the excess modulus

m_{0} = 2 .

Example 1.

Suppose we wish to scale the number

X = 1731

having the minimally redundant residue code

(χ_{0}, χ_{1}, χ_{2}, χ_{3} χ_{4}) = (1, 1, 2, 3, 4)

by the constant

S_{3} = 2^{3} = 8

.

Therefore, the number of required iterations is

l = 3

.

Before describing the proposed scaling algorithm, we give below the required primitive constants used in the RNS under consideration. So, we have

M_{4} = 3465

,

M_{1, 4} = 693, M_{2, 4} = 495, M_{3, 4} = 385, M_{4, 4} = 315

,

μ_{1, 4} = 2

,

μ_{2, 4} = 3

,

μ_{3, 4} = 4

,

μ_{4, 4} = 8

,

ρ_{4} (1) = 2

.

S.1. The rank calculation of the initial number.

First, having the non-redundant residue code

(1, 2, 3, 4)

of the number X, by using lookup tables, we obtain the following sets of residues and least-significant bits, respectively,

〈 R_{1, 4} (χ_{1}), R_{2, 4} (χ_{2}), R_{3, 4} (χ_{3}), R_{4, 4} (χ_{4}) 〉 = 〈 4, 9, 3, 10 〉

,

〈 ψ_{1}, ψ_{2}, ψ_{3}, ψ_{4} 〉 = 〈 0, 0, 1, 0 〉

.

Let us show in more detail how these values were obtained, according to (6), (19), and (13), (14), respectively, before storing in the lookup tables:

χ_{1, 4} = {|2 \cdot 1|}_{5} = 2

,

ψ_{1} = {|2|}_{2} = 0

,

χ_{2, 4} = {|3 \cdot 2|}_{7} = 6

,

ψ_{2} = {|6|}_{2} = 0

,

χ_{3, 4} = {|4 \cdot 3|}_{9} = 3

,

ψ_{3} = {|3|}_{2} = 1

,

χ_{4, 4} = {|8 \cdot 4|}_{11} = 10

,

ψ_{4} = {|10|}_{2} = 0

,

R_{1, 4} (χ_{1}) = ⌊(11 \cdot 2) / 5⌋ = 4

,

R_{2, 4} (χ_{2}) = ⌊(11 \cdot 6) / 7⌋ = 9

,

R_{3, 4} (χ_{3}) = ⌊(11 \cdot 3) / 9⌋ = 3

,

R_{4, 4} (χ_{4}) = {|8 \cdot 4|}_{11} = 10

.

Further, using the set of residues

〈 4, 9, 3, 10 〉

, according to (12), we calculate the inexact rank

{\hat{ρ}}_{4} (X) = ⌊(4 + 9 + 3 + 10) / 11⌋ = ⌊26 / 11⌋ = 2,

and also take its parity bit

{\hat{ρ}}_{0} = {|{\hat{ρ}}_{4} (X)|}_{2} = 0 .

Then, taking into account that

χ_{0} = 1

, using the set

〈 ψ_{1}, ψ_{2}, ψ_{3}, ψ_{4} 〉 = 〈 0, 0, 1, 0 〉

and

{\hat{ρ}}_{0}

, according to (16), we find two-valued correction

δ_{4} (X) = {|1 + (0 + 0 + 1 + 0) + 0|}_{2} = {|2|}_{2} = 0 .

As a result, according to (16), we get the exact rank of the initial number X

ρ_{4} (X) = {\hat{ρ}}_{4} (X) + δ_{4} (X) = 2 + 0 = 2 .

To verify the obtained result, using the rank form (9), we find

X = \sum_{i = 1}^{4} M_{i, 4} χ_{i, 4} - ρ_{4} (X) M_{4} = 693 \cdot 2 + 495 \cdot 6 + 385 \cdot 3 + 315 \cdot 10 - 2 \cdot 3465 = 1731 .

In addition, it is assumed that

j = 0

,

X^{(0)} = X

,

χ_{i}^{(0)} = χ_{i}

(i = \bar{0, 4})

,

χ_{i, 4}^{(0)} = χ_{i, 4}

,

ψ_{i}^{(0)} = ψ_{i}

(i = \bar{1, 4})

.

Iteration 1.

S.2.1.Since

χ_{0}^{(0)} = 1

, using the sets of binary flags (see (27) and (28))

〈 ω_{1}^{(0)}, ω_{2}^{(0)}, ω_{3}^{(0)}, ω_{4}^{(0)} 〉 = 〈 0, 0, 1, 0 〉

,

〈 φ_{1}^{(0)}, φ_{2}^{(0)}, φ_{3}^{(0)}, φ_{4}^{(0)} 〉 = 〈 0, 1, 0, 0 〉

,

according to (25), we calculate the quantity

Δ^{(0)} = \sum_{i = 1}^{4} ω_{i}^{(0)} + \sum_{i = 1}^{4} φ_{i}^{(0)} - ρ_{k} (1) = 1 + 1 - 2 = 0 .

S.3.1.We calculate the non-redundant residue code and the rank of the number

X^{(1)} = ⌊ X^{(0)} / 2 ⌋

, according to (29) and (30), respectively:

(χ_{1}^{(1)}, χ_{2}^{(1)}, χ_{3}^{(1)}, χ_{4}^{(1)}) = (0, 4, 1, 7)

,

ρ_{4} (X^{(1)}) = \frac{1}{2} (ρ_{4} (X^{(0)}) + Δ^{(0)}) = \frac{1}{2} (2 + 0) = 1

,

ρ_{0}^{(1)} = {|ρ_{4} (X^{(1)})|}_{2} = 1

.

S.4.1.Using the set

〈 ψ_{1}^{(1)}, ψ_{2}^{(1)}, ψ_{3}^{(1)}, ψ_{4}^{(1)} 〉 = 〈 0, 1, 0, 1 〉

corresponding to the non-redundant residue code

(0, 4, 1, 7)

and taking into account that

ρ_{0}^{(1)} = 1

, according to (31), we find

χ_{0}^{(1)} = {|(0 + 1 + 0 + 1) + 1|}_{2} = 1

.

Hence, as a result of Iteration 1, we have the minimally redundant residue code

(1, 0, 4, 1, 7)

of the number

X^{(1)} = ⌊ X^{(0)} / 2 ⌋ = ⌊ 1731 / 2 ⌋ = 864

.

Iteration 2.

S.2.2. Since

χ_{0}^{(1)} = 1

, using the following sets of binary flags

〈 ω_{1}^{(1)}, ω_{2}^{(1)}, ω_{3}^{(1)}, ω_{4}^{(1)} 〉 = 〈 1, 0, 0, 1 〉

,

〈 φ_{1}^{(1)}, φ_{2}^{(1)}, φ_{3}^{(1)}, φ_{4}^{(1)} 〉 = 〈 1, 0, 0, 0 〉

,

we have

Δ^{(1)} = \sum_{i = 1}^{4} ω_{i}^{(1)} + \sum_{i = 1}^{4} φ_{i}^{(1)} - ρ_{k} (1) = 2 + 1 - 2 = 1 .

S.3.2. We calculate the non-redundant residue code and the rank of the number

X^{(2)} = ⌊ X^{(1)} / 2 ⌋

:

(χ_{1}^{(2)}, χ_{2}^{(2)}, χ_{3}^{(2)}, χ_{4}^{(2)}) = (2, 5, 0, 3)

,

ρ_{4} (X^{(2)}) = \frac{1}{2} (ρ_{4} (X^{(1)}) + Δ^{(1)}) = \frac{1}{2} (1 + 1) = 1

,

ρ_{0}^{(2)} = {|ρ_{4} (X^{(2)})|}_{2} = 1

.

S.4.2. Using the set

〈 ψ_{1}^{(2)}, ψ_{2}^{(2)}, ψ_{3}^{(2)}, ψ_{4}^{(2)} 〉 = 〈 0, 1, 0, 0 〉

corresponding to the non-redundant residue code

(2, 5, 0, 3)

and taking into account that

ρ_{0}^{(2)} = 1

, we obtain

χ_{0}^{(2)} = {|(0 + 1 + 0 + 0) + 1|}_{2} = 0

.

Hence, as a result of Iteration 2, we have the minimally redundant residue code

(0, 2, 5, 0, 3)

of the number

X^{(2)} = ⌊ X^{(1)} / 2 ⌋ = ⌊ X^{(0)} / 4 ⌋ = ⌊ 1731 / 4 ⌋ = 432

.

Iteration 3.

S.2.3. Since

χ_{0}^{(2)} = 0

, using the set

〈 ψ_{1}^{(2)}, ψ_{2}^{(2)}, ψ_{3}^{(2)}, ψ_{4}^{(2)} 〉 = 〈 0, 1, 0, 0 〉

,

according to (25), we find

Δ^{(2)} = \sum_{i = 1}^{4} ψ_{i}^{(2)} = 1 .

S.3.3.We calculate the non-redundant residue code and the rank of the number

X^{(3)} = ⌊ X^{(2)} / 2 ⌋

:

(χ_{1}^{(3)}, χ_{2}^{(3)}, χ_{3}^{(3)}, χ_{4}^{(3)}) = (1, 6, 0, 7)

,

ρ_{4} (X^{(3)}) = \frac{1}{2} (ρ_{4} (X^{(2)}) + Δ^{(2)}) = \frac{1}{2} (1 + 1) = 1

,

ρ_{0}^{(3)} = {|ρ_{4} (X^{(3)})|}_{2} = 1

.

S.4.3.Using the set

〈 ψ_{1}^{(3)}, ψ_{2}^{(3)}, ψ_{3}^{(3)}, ψ_{4}^{(3)} 〉 = 〈 0, 0, 0, 1 〉

corresponding to the non-redundant residue code

(1, 6, 0, 7)

and taking into account that

ρ_{0}^{(3)} = 1

, we get

χ_{0}^{(3)} = {|(0 + 0 + 0 + 1) + 1|}_{2} = 0

.

Hence, as a result of Iteration 3, we have the minimally redundant residue code

(0, 1, 6, 0, 7)

of the number

X^{(3)} = ⌊ X^{(2)} / 2 ⌋ = ⌊ X^{(0)} / 8 ⌋ = ⌊ 1731 / 8 ⌋ = 216

.

As far as

j = l - 1 = 2

, the scaling procedure ends, and the number

X^{(3)}

is the desired solution.

To verify the obtained result, according to the rank form (9), we find

\begin{matrix} X^{(3)} = & \sum_{i = 1}^{4} M_{i, 4} χ_{i, 4}^{(3)} - ρ_{4} (X^{(3)}) M_{4} = 693 \cdot 1 + 495 \cdot 4 + 385 \cdot 0 + 315 \cdot 1 - 1 \cdot 3465 = \\ = & 3681 - 3645 = 216 . \end{matrix}

The result is correct.

The above example shows that the use of minimally redundant RNS enables us to optimize and speed up the power-of-two scaling procedure compared with the conventional non-redundant RNS to a large extent. First of all, that is caused by the extreme simplicity of calculating the inexact rank

{\hat{ρ}}_{k} (X)

and estimating two-valued characteristic

δ_{k} (X)

of the initial number X as well as by the trivial operations for obtaining the rank

ρ_{k} (X^{(j)})

(j = 0, 1, \dots, l - 1)

at each iteration of the scaling procedure (see Theorem 2).

Therefore, the proposed minimally redundant residue representation takes priority over non-redundant analogs in optimization and speed-up of the scaling and other non-modular procedures based on the CRT implementation using a rank characteristic.

7. Discussion

Let us now discuss the theoretical and practical aspects of the approach proposed in this paper.

As followed from (17), the power-of-two scaling algorithm based on the bisection method requires the parity detection of the number

X^{(j)}

(j = 0, 1, \dots, l - 1)

at each iteration. Therefore, fast calculating the residue concerning extra modulus

m_{0} = 2

is a significantly important task.

In conventional non-redundant RNS, the parity detection of the number

X^{(j)} = (χ_{1}^{(j)}, χ_{2}^{(j)}, \dots, χ_{k}^{(j)})

is usually based on estimating the integer value of

X^{(j)}

by the use of specific positional characteristics. The generally accepted ones are the digits of mixed-radix representation, core function, the rank of a number, and interval index [1,2,3,5,8].

In RNS arithmetic, the parity check of a number refers to complicated non-modular operations requiring high computational costs. The computational complexity of this operation is comparable to the computational complexity of the reverse conversion from the residue code into the mixed-radix representation or to the calculation of the rank of a number.

Generally, in non-redundant RNS, the implementation of parallel parity check algorithm requires

O (k^{2})

modular addition operations [33,34]. So it can become computationally expensive for large values of k. Thus, for efficient implementation of the power-of-two scaling algorithm based on the bisection method, one needs to speed up and optimize the RNS parity detection technique.

In this article, the proposed approach to power-of-two scaling is based on using the rank of a number as the main RNS positional characteristic. Therefore, in our case, obtaining residue modulo

m_{0} = 2

is reduced to the calculation of the rank

ρ_{k} (X^{(j)})

with the following use of the rank form (9).

Hence,

χ_{0}^{(j)} = {|X^{(j)}|}_{2} = {|\sum_{i = 1}^{k} {|M_{i, k} χ_{i, k}^{(j)}|}_{2} + {|ρ_{k} (X^{(j)}) M_{k}|}_{2}|}_{2} = {|\sum_{i = 1}^{k} ψ_{i}^{(j)} + ρ_{0}^{(j)}|}_{2} .

Thus, determining the parity of a number has a computational complexity identical to the complexity of rank calculating concerning the numbers of required modular addition operations

R_{M O}

and lookup tables

R_{L U T}

. At the same time, obtaining the residue code

(χ_{1}^{(j + 1)}, χ_{2}^{(j + 1)}, \dots, χ_{k}^{(j + 1)})

of the number

X^{(j + 1)} = ⌊ X^{(j)} / 2 ⌋

needs k additional lookup tables (see (17)).

Therefore, the computational cost of the iterative procedure of scaling by

S_{l} = 2^{l}

consists of

S_{M O} = R_{M O} \times l

modular addition operations and

S_{L U T} = R_{L U T} + k

lookup tables, whereas the time complexity is

T_{s c a l} = T_{i t e r} \times l

modular clock cycles, where

T_{i t e r}

is a performance time of one iteration based on the bisection method.

Thus, in conventional non-redundant RNS, the computational cost of the canonical power-of-two scaling procedure based on the bisection method (17) and the rank calculation method described in [34] is estimated as

S_{M O} = R_{M O} \times l = \frac{1}{2} (k^{2} + 5 k - 10) \times l .

(32)

S_{L U T} = R_{L U T} + k = \frac{1}{2} (k^{2} + 3 k - 2) .

(33)

The main advantage of the proposed approach to power-of-two scaling over the existing ones consists in the use of minimally redundant RNS and the novel method for calculating the rank of a number resulting from division by two (see Theorem 2) in each iteration of the scaling algorithm. This circumstance enables a significant reduction of the computational complexity of the scaling algorithm.

As follows from [34], the corresponding computational cost of calculating the rank

ρ_{k} (X)

of the initial number X is

R_{M O}^{★} = k

and

R_{L U T}^{★} = k

in terms of required modular addition operations and lookup tables, respectively. Furthermore, the performance time of the rank calculation is

T_{r a n k}^{★} = ⌈\log_{2} k⌉

modular clock cycles (see Section 3).

It is important to note that all calculations at each iteration are implemented using the lookup tables technique and the simplest combinational logic circuits.

As shown above, the minimally redundant residue code of the number

X^{(j + 1)} = ⌊ X^{(j)} / 2 ⌋

(j = 0, 1, \dots, l - 1)

is yielded in only one modular clock cycle and needs the use of

k + 1

additional lookup tables. At the same time, the first k of these lookup tables are used for obtaining the residue code

(χ_{1}^{(j + 1)}, χ_{2}^{(j + 1)}, \dots, χ_{k}^{(j + 1)})

, while the last lookup table gives us the rank

ρ_{k} (X^{(j + 1)})

of the number

X^{(j + 1)}

(see (29) and (30)). So, at each iteration, there are no additional modular operations.

The total numbers of required modular addition operations and lookup tables are estimated, respectively, as

S_{M O}^{★} = k

(34)

and

S_{L U T}^{★} = 2 k + 1,

(35)

The time complexity of the novel power-of-two scaling algorithm is

T_{s c a l}^{★} = T_{r a n k}^{★} + l = ⌈\log_{2} k⌉ + l

modular clock cycles.

Thus, the use of minimally redundant RNS and novel approach to rank calculation at each iteration of power-of-two scaling (see Theorem 2) enables significant decrease of the computational complexity. The corresponding reduction factors of the computational complexity, in terms of the required modular addition operations (see (32) and (34)) and lookup tables (see (33) and (35)), are

C_{M O} (k, l) = \frac{S_{M O}}{S_{M O}^{★}} = \frac{(k^{2} + 5 k - 10)}{2 k} \times l,

(36)

C_{L U T} (k) = \frac{S_{L U T}}{S_{L U T}^{★}} = \frac{k^{2} + 3 k - 2}{4 k + 2} .

(37)

Below, Table 1 and Table 2 present these reduction factors.

It should be noted that the use of the novel method for calculating the rank

ρ_{k} (X^{(j + 1)})

of the number

X^{(j + 1)} = ⌊ X^{(j)} / 2 ⌋

(j = 0, 1, \dots, l - 1)

at each iteration of the scaling procedure (see Theorem 2) in non-redundant RNS, gives us the following computational cost

S_{M O}^{'} = R_{M O} = \frac{1}{2} (k^{2} + 5 k - 10),

(38)

S_{L U T}^{'} = R_{L U T} + (k + 1) = \frac{1}{2} (k^{2} + 3 k) .

(39)

Simultaneously, the time complexity is

T_{s c a l}^{'} = ⌈\log_{2} k⌉ + l + 1

modular clock cycles.

As can be seen, the reduction factors of the computational complexity of power-of-two scaling based on Theorem 2 in minimally redundant RNS compared with conventional non-redundant RNS are represented by the following fractions

C_{M O}^{'} (k) = \frac{S_{M O}^{'}}{S_{M O}^{★}} = \frac{k^{2} + 5 k - 10}{2 k},

(40)

C_{L U T}^{'} (k) = \frac{S_{L U T}^{'}}{S_{L U T}^{★}} = \frac{k^{2} + 3 k}{4 k + 2} .

(41)

In this case, as follows from (40), the reduction factor

C_{M O}^{'} (k) = C_{M O} (k, 1)

does not depend on the value

S_{l} = 2^{l}

(l = 0, 1, \dots, Λ - 1)

. At the same time,

C_{L U T}^{'} (k) \approx C_{L U T} (k)

.

The dependence of the reduction factors

C_{M O}^{'} (k)

and

C_{L U T}^{'} (k)

on the number of primary RNS moduli k is presented in Table 3.

Thus, the use of minimally redundant RNS and novel approach to calculating the rank of a number at each iterations of bisection method enables radically simplifying the carrying out of power-of-two scaling compared with conventional non-redundant RNS. This circumstance enables us to construct faster and optimal in computational cost RNS-oriented complicated computing procedures which widely use scaling algorithms.

8. Conclusions

As shown in this paper, the use of minimum-redundancy residue code enables the construction of efficient scaling procedures based on the CRT due to optimizing the calculation of the rank of a number, a principal positional characteristic in RNS arithmetic.

At the beginning stage of the power-of-two scaling procedure, to calculate the rank of the initial number, we apply the approach for the rank calculation proposed by one of the authors in [33,34]. It is reduced to the summation of the small word-length residues

R_{1, k} (χ_{1})

,

R_{2, k} (χ_{2})

, …,

R_{k, k} (χ_{k})

, taking into account the number of occurred overflows during the modular addition operations modulo

m_{k}

, and fast calculation of two-valued rank correction

δ_{k} (X) \in \{0, 1\}

(see (12) and (16)).

We propose a novel approach to power-of-two scaling based on Theorem 2. Using minimal residue code redundancy, we have optimized and sped up the rank calculation and parity determination of the numbers that result from division by two at each iteration of the bisection method. Each iteration of modular scaling by two is performed in only one modular clock cycle. Thus, owing to the proposed improvements, the power-of-two scaling procedure becomes simplest and faster than the currently known methods.

The computational complexity of the proposed scaling method by constant

S_{l} = 2^{l}

concerning required both modular addition operations and lookup tables is estimated as k and

2 k + 1

, respectively, where k equals the number of primary non-redundant RNS moduli. The time complexity is

⌈\log_{2} k⌉ + l

modular clock cycles.

The use of minimally redundant RNS and a novel approach to calculating the rank of a number at each iteration of the bisection method enables a significant decrease in the power-of-two scaling computational complexity. Corresponding reduction factors concerning the required modular addition operations and lookup tables are given in Table 1, Table 2 and Table 3.

The proposed approach to power-of-two scaling coincides with the development vector of modern high-performance computing using RNS arithmetic. It enables the implementation of an extensive class of tasks in various areas of science and technology, first of all in cryptography and digital signal processing.

Author Contributions

Conceptualization, M.S.; investigation, Y.P.; methodology, M.S.; writing—original draft preparation, M.S.; writing—review and editing, Y.P. All authors have read and improved the final version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Akushskii, I.Y.; Juditskii, D.I. Machine Arithmetic in Residue Classes; Soviet Radio: Moscow, Russia, 1968. (In Russian) [Google Scholar]
Amerbayev, V.M. Theoretical Foundations of Machine Arithmetic; Nauka: Alma-Ata, Kazakhstan, 1976. (In Russian) [Google Scholar]
Omondi, A.R.; Premkumar, B. Residue Number Systems: Theory and Implementation; Imperial College Press: London, UK, 2007. [Google Scholar]
Szabo, N.S.; Tanaka, R.I. Residue Arithmetic and Its Application to Computer Technology; McGraw-Hill: New York, NY, USA, 1967. [Google Scholar]
Molahosseini, A.S.; de Sousa, L.S.; Chang, C.H. (Eds.) Embedded Systems Design with Special Arithmetic and Number Systems; Springer: Cham, Switzerland, 2017. [Google Scholar]
Soderstrand, M.A.; Jenkins, W.K.; Jullien, G.A.; Taylor, F.J. (Eds.) Residue Number System Arithmetic: Modern Applications in Digital Signal Processing; IEEE Press: New York, NY, USA, 1986. [Google Scholar]
Chernyavsky, A.F.; Danilevich, V.V.; Kolyada, A.A.; Selyaninov, M.Y. High-Speed Methods, and Systems of Digital Information Processing; Belarusian State University: Minsk, Belarus, 1996. (In Russian) [Google Scholar]
Ananda Mohan, P.V. Residue Number Systems. Theory and Applications; Springer: Cham, Switzerland, 2016. [Google Scholar]
Ding, C.; Pei, D.; Salomaa, A. Chinese Remainder Theorem: Applications in Computing, Coding, Cryptography; World Scientific: Singapore, 1996. [Google Scholar]
Omondi, A.R. Cryptography Arithmetic: Algorithms and Hardware Architectures; Springer: Cham, Switzerland, 2020. [Google Scholar]
Chren, W.A., Jr. A new residue number division algorithm. Comput. Math. Appl. 1990, 19, 13–29. [Google Scholar] [CrossRef]
Chiang, J.-S.; Lu, M. A general division algorithm for the Residue Number System. In Proceedings of the 10th IEEE Symposium on Computer Arithmetic, Grenoble, France, 26–28 June 1991; IEEE Computer Society Press: Washington, DC, USA, 1991; pp. 76–83. [Google Scholar]
Lu, M.; Chiang, J.-S. A novel division algorithm for Residue Number Systems. IEEE Trans. Comput. 1992, 41, 1026–1032. [Google Scholar] [CrossRef]
Hitz, M.A.; Kaltofen, E. Integer division in residue number systems. IEEE Trans. Comput. 1995, 44, 983–989. [Google Scholar] [CrossRef]
Hiasat, A.A.; Abdel-Aty-Zohdy, H.S. Design and implementation of an RNS division algorithm. In Proceedings of the 13th IEEE Symposium on Computer Arithmetic, Asilomar, CA, USA, 6–9 July 1997; IEEE Computer Society Press: Washington, DC, USA, 1997; pp. 240–249. [Google Scholar]
Bajard, J.-C.; Didier, L.-S.; Muller, J.-M. A new Euclidean division algorithm for residue number systems. J. VLSI Signal Process. Syst. Signal Image Video Technol. 1998, 19, 167–178. [Google Scholar] [CrossRef]
Yang, Y.H.; Chang, C.C.; Chen, C.Y. A high-speed division algorithm in residue number system using parity-checking technique. Int. J. Comput. Math. 2004, 81, 775–780. [Google Scholar] [CrossRef]
Chang, C.-C.; Lai, Y.-P. A division algorithm for residue numbers. Appl. Math. Comput. 2006, 172, 368–378. [Google Scholar] [CrossRef]
Chang, C.-C.; Yang, J.-H. A division algorithm using bisection method in residue number system. Int. J.Comput. Consum. Control (IJ3C) 2013, 2, 59–66. [Google Scholar]
Hung, C.Y.; Parhami, B. An approximate sign detection method for residue numbers and its application to RNS division. Comput. Math. Appl. 1994, 27, 23–35. [Google Scholar] [CrossRef]
Burton, D.M. Elementary Number Theory, 7th ed.; McGraw-Hill: New York, NY, USA, 2011. [Google Scholar]
Hardy, G.H.; Wright, E.M. An Introduction to the Theory of Numbers, 6th ed.; Oxford University Press: London, UK, 2008. [Google Scholar]
Knuth, D.E. The Art of Computer Programming, Volume 2: Seminumerical Algorithms, 3rd ed.; Addison-Wesley: Boston, MA, USA, 1998. [Google Scholar]
Shoup, V. A Computational Introduction to Number Theory and Algebra, 2nd ed.; Cambridge University Press: Cambridge, UK, 2005. [Google Scholar]
Shenoy, A.P.; Kumaresan, R. Fast base extension using a redundant modulus in RNS. IEEE Trans. Comput. 1989, 38, 292–297. [Google Scholar] [CrossRef]
Phatak, D.S.; Houston, S.D. New distributed algorithms for fast sign detection in residue number systems (RNS). J. Parallel Distrib. Comput. 2016, 97, 78–95. [Google Scholar] [CrossRef]
Vu, T.V. Efficient implementations of the Chinese Remainder Theorem for sign detection and residue decoding. IEEE Trans. Comput. 1985, 34, 646–651. [Google Scholar]
Kawamura, S.; Koike, M.; Sano, F.; Shimbo, A. Cox-rower architecture for fast parallel Montgomery multiplication. In Proceedings of the EUROCRYPT’00: 19th International Conference on Theory and Application of Cryptographic Techniques, Bruges, Belgium, 14–18 May 2000; Springer: Berlin, Germany, 2000; pp. 523–538. [Google Scholar]
Nozaki, H.; Motoyama, M.; Shimbo, A.; Kawamura, S. Implementation of RSA algorithm based on RNS Montgomery multiplication. In Proceedings of the CHES 2001: Cryptographic Hardware and Embedded Systems, Third International Workshop, Paris, France, 6–14 May 2001; Springer: Berlin, Germany, 2001; pp. 364–376. [Google Scholar]
Isupov, K.; Knyazkov, V. Interval estimation of relative values in Residue Number System. J. Circuits, Syst. Comput. 2018, 27, 1850004. [Google Scholar] [CrossRef]
Chervyakov, N.; Babenko, M.; Tchernykh, A.; Kucherov, N.; Miranda-López, V.; Cortés-Mendoza, J.M. AR-RRNS: Configurable reliable distributed data storage systems for Internet of Things to ensure security. Future Gener. Comput. Syst. 2019, 92, 1080–1092. [Google Scholar] [CrossRef]
Halevi, S.; Polyakov, Y.; Shoup, V. An improved RNS variant of the BFV homomorphic encryption scheme. In Topics in Cryptology–CT-RSA 2019, Proceedings of the Cryptographers’ Track at the RSA Conference, San Francisco, CA, USA, 4–8 March 2019; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2019; Volume 11405, pp. 83–105. [Google Scholar]
Selianinau, M. An efficient implementation of the Chinese Remainder Theorem in minimally redundant Residue Number System. Comput. Sci. 2020, 21, 237–252. [Google Scholar] [CrossRef]
Selianinau, M. Computationally efficient approach to implementation of the Chinese Remainder Theorem algorithm in minimally redundant Residue Number System. Theory Comput. Syst. 2021, 65, 1117–1140. [Google Scholar] [CrossRef]
Jullien, G. Residue number scaling and other operations using ROM arrays. IEEE Trans. Comput. 1978, 27, 325–336. [Google Scholar] [CrossRef]
Shenoy, M.A.P.; Kumaresan, R. A fast and accurate RNS scaling technique for high speed signal processing. IEEE Trans. Acoust. Speech Signal Process. 1989, 37, 929–937. [Google Scholar] [CrossRef]
Barsi, F.; Pinotti, M. Fast base extension and precise scaling in RNS for look-up table implementation. IEEE Trans. Signal Process. 1995, 43, 2427–2430. [Google Scholar] [CrossRef]
Garsia, A.; Lloris, A. A look up scheme for scaling in the RNS. IEEE Trans. Comput. 1999, 48, 748–751. [Google Scholar] [CrossRef]
Griffin, M.; Sousa, M.; Taylor, F. Efficient scaling in the residue number system. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, Glasgow, UK, 23–26 May 1989; IEEE Computer Society Press: Washington, DC, USA, 1989; pp. 1075–1078. [Google Scholar]
Vasilevich, L.N.; Kolyada, A.A. Scaling in residue number systems. Cybern. Syst. Anal. 1989, 25, 610–615. [Google Scholar] [CrossRef]
Chernyavsky, A.F.; Kolyada, A.A.; Revinsky, V.V.; Selyaninov, M.Y.; Shabinskaja, E.V. Scaling methods in minimally redundant modular arithmetic. Proc. Natl. Acad. Sci. Belarus Phys. Math. Ser. 1998, 4, 132–137. [Google Scholar]
Meyer-Base, U.; Stouraitis, T. New power-of-2 RNS scaling scheme for cell-based IC design. IEEE Trans. VLSI Syst. 2003, 11, 280–283. [Google Scholar] [CrossRef]
Cardarilli, G.C.; Del Re, A.; Nannarelli, A.; Re, M. Programmable power-of-two RNS scaler and its application to a QRNS polyphase filter. In Proceedings of the IEEE International Symposium on Circuits and Systems, Kobe, Japan, 23–26 May 2005; IEEE Computer Society Press: Washington, DC, USA, 2005; pp. 1002–1005. [Google Scholar]
Isupov, K.; Knyazkov, V.; Kuvaev, A. Fast power-of-two RNS scaling algorithm for large dynamic ranges. In Proceedings of the 2017 IVth International Conference on Engineering and Telecommunication (EnT), Moscow, Russia, 29–30 November 2017; IEEE Computer Society Press: Washington, DC, USA, 2017; pp. 135–139. [Google Scholar]
Clemens, K.J. A modified definition of symmetric RNS improving scaling and overflow detection. IEEE Trans. Circuits Syst. 1985, 32, 412–413. [Google Scholar] [CrossRef]
Sousa, L. 2ⁿ RNS scalers for extended 4-moduli sets. IEEE Trans. Comput. 2015, 64, 3322–3334. [Google Scholar] [CrossRef]
Mustapha, K.S.; Bankas, E.K. RNS scaling algorithm for a new moduli set {2²ⁿ⁺¹ + 1,2²ⁿ⁺¹,2²ⁿ⁺¹ − 1}. Int. J. Comput. Appl. 2017, 165, 21–28. [Google Scholar]
Hiasat, A. Efficient RNS scalers for the extended three-moduli set {2ⁿ − 1,2^n+p,2ⁿ + 1}. IEEE Trans Comput. 2017, 66, 1253–1260. [Google Scholar] [CrossRef]
Hiasat, A. New residue number system scaler for the three-moduli set {2ⁿ⁺¹ − 1,2ⁿ,2ⁿ − 1}. Computers 2018, 3, 46. [Google Scholar] [CrossRef]
Hiasat, A. A scaler design for the RNS three-moduli set {2ⁿ⁺¹ − 1,2ⁿ,2ⁿ − 1} based on mixed-radix conversion. J. Circuits. Syst. Comput. 2020, 29, 2050041. [Google Scholar] [CrossRef]
Taheri, M.R.; Navi, K.; Molahosseini, A.S. Efficient programmable power-of-two scaler for the three-moduli set {2^n+p,2ⁿ − 1,2ⁿ⁺¹ − 1}. ETRI J. 2020, 42, 596–607. [Google Scholar] [CrossRef]

Table 1. Dependence of the reduction factor

C_{L U T} (k)

on the moduli number.

Table 1. Dependence of the reduction factor

C_{L U T} (k)

on the moduli number.

Reduction Factor	Moduli Number
Reduction Factor	$k = 5$	$k = 10$	$k = 15$	$k = 20$	$k = 25$	$k = 30$
$C_{L U T}$	1.73	3.05	4.32	5.59	6.84	8.10

Table 2. Dependence of the reduction factor

C_{M O} (k, l)

on the moduli number k and scaling factor

S_{l} = 2^{l}

.

Table 2. Dependence of the reduction factor

C_{M O} (k, l)

on the moduli number k and scaling factor

S_{l} = 2^{l}

.

Scaling Factor		Moduli Number
$S_{l}$	$l$	$k = 5$	$k = 10$	$k = 15$	$k = 20$	$k = 25$	$k = 30$
8	3	12.00	21,00	29.00	36.75	44.40	52.00
16	4	16.00	28.00	38.67	49.00	59.20	69.33
32	5	20.00	35.00	48.33	61.25	74.00	86.67
64	6	24.00	42.00	58.00	73.50	88.80	104.0
128	7	28.00	49.00	67.67	85.75	103.60	121.33
256	8	32.00	56.00	77.33	98.00	118.40	138.67
512	9	36.00	63.00	87.00	110.25	133.20	156.00
1024	10	40.00	70.00	96.67	122.5	148.00	173.33

Table 3. Dependence of reduction factors

C_{M O}^{'}

and

C_{L U T}^{'}

on the moduli number k.

Table 3. Dependence of reduction factors

C_{M O}^{'}

and

C_{L U T}^{'}

on the moduli number k.

Reduction Factor	Moduli Number
Reduction Factor	$k = 5$	$k = 10$	$k = 15$	$k = 20$	$k = 25$	$k = 30$
$C_{M O}^{'}$	4.00	7.00	9.67	12.25	14.80	17.33
$C_{L U T}^{'}$	1.81	3.10	4.84	5.61	6.86	8.11

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Selianinau, M.; Povstenko, Y. An Efficient CRT-Base Power-of-Two Scaling in Minimally Redundant Residue Number System. Entropy 2022, 24, 1824. https://doi.org/10.3390/e24121824

AMA Style

Selianinau M, Povstenko Y. An Efficient CRT-Base Power-of-Two Scaling in Minimally Redundant Residue Number System. Entropy. 2022; 24(12):1824. https://doi.org/10.3390/e24121824

Chicago/Turabian Style

Selianinau, Mikhail, and Yuriy Povstenko. 2022. "An Efficient CRT-Base Power-of-Two Scaling in Minimally Redundant Residue Number System" Entropy 24, no. 12: 1824. https://doi.org/10.3390/e24121824

APA Style

Selianinau, M., & Povstenko, Y. (2022). An Efficient CRT-Base Power-of-Two Scaling in Minimally Redundant Residue Number System. Entropy, 24(12), 1824. https://doi.org/10.3390/e24121824

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Efficient CRT-Base Power-of-Two Scaling in Minimally Redundant Residue Number System

Abstract

1. Introduction

2. The Basic Concepts of RNS Arithmetic

3. The Approaches Currently Used to Calculate the Rank of a Number

4. The Main Types of Scaling Algorithms in RNS Arithmetic

5. A Novel Approach for Calculating the Rank of a Number Resulting from Scaling by 2

6. A Novel Power-of-Two Modular Scaling Based on the Rank Positional Characteristic in Minimally Redundant RNS

7. Discussion

8. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI