1. Introduction
The results of studies on improving the performance of computing systems show that within the limits of positional number systems, a significant improvement cannot be expected without a considerable increase in the operating frequencies of elements and complications of the hardware of digital computing structures [
1]. An important issue is to choose an effective method for encoding numerical information, i.e., selecting a number representation for fast processing. The residue number system (RNS) is used to improve the efficiency of data encryption algorithms [
2,
3,
4], cloud computing [
5,
6,
7], digital signal processing [
8,
9], wireless networks [
10], matrix computing [
11,
12], and artificial neural networks [
13,
14]. One of the computationally complex operations in RNS is the Euclidean division or remainder division. Reducing the computational complexity of the remainder division algorithm will expand the range of RNS applicability for more efficient use of it in the implementation of numerical methods, etc.
Positional number systems, in which information is presented and processed in modern computing devices, have drawbacks. The main one for the speed limit is the presence of inter-digit transfers. They impose restrictions on the methods of implementing arithmetic operations. Therefore, it seems natural to construct an arithmetic system with no inter-bit transfers, i.e., a non-weighted number system. One of such systems is the RNS, where numbers are represented by the remainders of division by the selected bases of the system, and operations can be performed in parallel on each digit independently.
The development of computing devices based on the RNS began in the 1950s to 1960s of the 20th century. They were implemented in the form of modular coprocessors [
15].
If a series of positive integers 
, 
, 
, 
, called moduli or bases of the system, is given, then the RNS is a system in which a positive integer is represented as a set of remainders obtained by division by the chosen base 
, where 
 for 
 [
1].
It is known from number theory that if the moduli  are coprime, then the representation of the number  is unique, and  satisfies the condition , where  is a dynamic range of number representation.
For numbers 
 and 
 the following expression holds:
      where 
.
However, despite several advantages, the RNS has the following disadvantages: limited action of this system by the field of positive integers, difficulty in determining the ratios of numbers in magnitude, determining if the result of an operation is out of range, etc.
Operations in the RNS can be divided into two groups: modular, where calculations are performed for each digit independently, and non-modular, which require, to one degree or another, knowledge of the positional characteristics of the number.
In 
Section 2, we consider the features of division in the RNS, an approximate method based on the Chinese remainder theorem, the Akushsky core function, and also a block diagram of the division algorithm is presented. 
Section 3 considers examples of the division operation implementation in the RNS in the form of a computing system. 
Section 4 presents the main results of the work and directions for further research.
  2. Features of Division in the Residue Number System
The division is one of the primary arithmetic operations. However, in RNS, the implementation of the modular division is computationally complex. There are methods for performing division with numbers of a specific type, for example, division with zero remainders, scaling, etc. [
16].
The well-known division algorithms in the RNS [
17,
18,
19] are based on comparing and subtracting numbers.
Let a dividend 
, a divisor 
, a quotient 
, and a remainder 
 be given. Then 
, while 
. Consider the division algorithm based on the sequential approximation of the quotient 
 by degrees of the base of the number system, i.e., for a binary system, the process consists in finding 
 such that the equality holds
      
Substituting (1) into the division formula, we obtain
      
Thus, the division process can be reduced to a sequence of subtractions. Let 
 enter into the representation of the quotient 
, that is, 
, then we denote 
, and 
. Substitute 
 in (2).
      
Let us continue this process. We denote . Since  is the sum of the remainder of the division and remaining members of the sequence of degrees of the number system are multiplied by the divisor, then  is always satisfied.
If  is not included in the representation of the quotient , i.e., , then . It is necessary to check the occurrence of  for which  is calculated.
In RNS, any number 
 is unambiguously represented by a set of residues 
 of dividing the number 
 by relatively prime moduli of the RNS 
, where 
, 
 is the working range of the RNS, 
. The recovery of the number 
 from the RNS to the positional number system can be done, as in the prototype, using the approximate Chinese remainder theorem
      
      where 
, 
, 
 is a multiplicative inverse. The application of the approximate method based on the Chinese remainder theorem is considered, in particular, in the patent [
20]. However, the 
 coefficient rarely turns out to be a finite fraction. Its rounding leads to accumulation errors.
The sign in the RNS is most often introduced by dividing the range into two parts, then, taking into account the dynamic range , in the RNS, it is possible to represent the numbers
, if  is odd,
, if  is even.
Then,
 is positive if , if  is even, , if  is odd,
 is negative if , if  is even, , if  is odd.
To perform division according to formula (2), it is necessary to compare RNS numbers and determine their signs.
Since the RNS is a non-weighted number system, then for comparing numbers and determining the sign, i.e., finding the position of the number on the number line, it is necessary to calculate the positional characteristic. An example of a positional characteristic is the Chinese remainder theorem with fractions used in the prototype. Another positional characteristic can be a core function, introduced by I. Ya. Akushsky [
21,
22]:
The numbers 
, called weights, can be arbitrary. They define each specific core function and can vary depending on the task. The essential property of the core function is that its maximum range can vary and can be significantly less than the 
 number, depending on the choice of weights. For example, you can use as 
 some arbitrary value 
, which has the properties necessary for solving a specific problem. The values of the core function 
, specified by the weights 
, under the condition 
, 
, can be calculated using the formula
      
      where 
 are the orthogonal bases of RNS. However, in general, the core function does not have the monotonicity required for comparing numbers.
To construct a core function with specified properties, we use the following algorithm (Algorithm 1).
      
| Algorithm 1: Selection of parameters for the core function of a special type for a given set of moduli. | 
| Input: Set of RNS moduli . It is required to construct a core function with a module of a special type with  and non-negative coefficients. Output: Coefficients  of the constructed core function.
 
 end.Let .For the given value  calculate , , and , where .Calculate Q by the formula .If , then  and go to step 2. Otherwise, go to step 5.Choose such a  that . Calculate  for .Check the conditions for the absence of critical cores from below:  and the absence of critical cores from above: , for all . If it does not hold,  and go to step 2.
 | 
The core function with the given properties is given by expression (4), where
      
To compare numbers, let us use the following Algorithm 2.
      
| Algorithm 2: Comparison of numbers represented in the RNS using a core function with non-negative coefficients. | 
| Input: and Output: ,  or end.
 | 
In this case, non-negative coefficients  are taken, and  is the first non-zero coefficient.
To determine the sign of a number, it is necessary to construct a core function such that  for positive numbers and  for negative numbers, where  if  is even ( if  is odd).  is the middle of the RNS range. Therefore, use Algorithm 2 for  and .
Let us consider an example of division in RNS based on function (4) and Algorithm 2.
We take  as RNS. Then , , , , ,  , , , , , , . Using Algorithm 1, we obtain , .
	  
Then, the auxiliary values  are equal to .
The core function takes the form
      
The middle of the RNS range is , for which .
We find the quotient of dividing  by  Let us check the signs of the dividend and the divisor, for which we calculate  and :
, the number is negative,
, the number is positive.
Since the dividend and divisor are different signs, the result will be negative. For the dividend, we find the opposite value, to perform division over the absolute values. For this, in the RNS, it is necessary to subtract the corresponding remainder from the modulus.
      
We get 
. Representations of powers “2” in RNS can be calculated in advance, depending on the range of RNS (the highest occurrence power of 
 is 
) and stored in memory:
      
        
      
      
      
      
     The highest possible degree of quotient when performing division is equal to the dimension of the dividend. It is necessary to multiply the divisor sequentially by 
 to find the number for which the values of the core function satisfy the expression 
.
      
Using the formula (2), we calculate 
:
It means that 
 is not included in the representation of the quotient 
. Let us check 
 by calculating 
:
It means that 
 is included in the representation of the quotient 
. Let us check 
, by calculating 
:
It means that 
 is included in the representation of the quotient 
. Let us check 
, by calculating 
:
It means that 
 is included in the representation of the quotient 
. Let us check 
 by calculating 
:
It means that 
 is included in the representation of the quotient 
. Let us check 
 by calculating 
:
It means that 
 is included in the representation of the quotient 
. Let us check 
 by calculating 
:
It means that 
 is included in the representation of the quotient 
. Let us check 
 by calculating 
:
It means that  is included in the representation of the quotient .
Since the result must be negative, then 
. Let us check:
  3. Implementation of the RNS Division
Let us consider the block diagram of the division of numbers represented in the RNS (
Figure 1) [
23].
Figure 1 shows the general block diagram of the division calculation, which contains the input of the dividend 
, input of the divisor 
, block for calculating positional characteristics, block for refining the approximation series, block for the derivation of the quotient, and output of the quotient 
. The inputs of the dividend 
 and the divisor 
 are connected to the first and second input blocks for calculating positional characteristics.
 From the first output of the block for calculating positional characteristics, the sign value of the result is fed to the first input of the quotient output block.
From the second output of the block for calculating positional characteristics, the signal “” is sent to the second input of the quotient output block, i.e., the result of the division is 0.
From the third output of the block for calculating positional characteristics, the signal “” is sent to the third input of the quotient output block, i.e., the division result is , depending on the signs of the input numbers.
From the fourth and sixth outputs of the block for calculating positional characteristics, the absolute values of the dividend  and divisor  are sent.
From the fifth output of the block for calculating positional characteristics, the value of the core function of the absolute value of the dividend  is sent to the second input of the block for refining the approximation series.
The signal of the end of enumeration of powers “2” included in the representation of the quotient  is received from the first output of the block for refining the approximation series to the fourth input of the quotient output unit.
The first output of the quotient output block is the quotient output.
Figure 2 shows a block diagram for calculating positional characteristics, which contains inverters of the dividend and divider, blocks for multiplying by constants and for addition, blocks for determining the sign, dividend and divisor multiplexers, an XOR element, and a comparison block.
 At the inputs of the dividend and the divisor, the values of the dividend  and the divisor  are sent, represented in RNS as  and .
In inverters of the dividend and divisor, the opposite values  and  are calculated, correspondingly.
In the RNS, to obtain the opposite value , it is necessary to subtract the corresponding remainder from the modulus . Then, the numbers  and ,  and  in the blocks of multiplication by constants are multiplied by the constants of the values of the core function of the RNS orthogonal bases , i.e., in each block, there is a parallel multiplication of the residues by  according to the formula (4).
The values of the products from the multiplication blocks by the constants are fed to the inputs of the addition blocks, the output of which is the lowest  bits of the sum, which corresponds to finding the remainder modulo  in formula (4).  is determined in advance by Algorithm 1 while constructing the core function.
The values of the dividend and the divisor from the inputs of the dividend and the divisor, respectively, are sent to the first inputs of the sign determination blocks. The values of the core function  and  from the outputs of the addition blocks are sent to the second inputs of the sign determination blocks. In the blocks for determining the sign, the values of the core function and the remainders are compared on one of the bases with the middle of the RNS range  according to Algorithm 2.
The signs of  and  from the outputs of the blocks for determining the sign are sent to the inputs of the XOR element, as well as to the control inputs of the corresponding multiplexers. The XOR element output is the first output of the positional characteristic calculation block.
The first and second inputs of the multiplexer of the dividend receive the value of the core function  from the output of the addition block and the value of the dividend  from the input of the dividend.
The third and fourth information inputs of the multiplexer receive the value of the core function  from the output of the addition block and the value  from the output of the inverter.
The first output of the multiplexer is connected to the second input of the comparison unit and the fifth output of the positional characteristics calculating unit and transfers the value of the core function from the absolute value of the dividend . The second output of the multiplexer is connected to the first input of the comparison unit and is the fourth output of the unit for calculating positional characteristics. It transmits the absolute value of the dividend .
The first and second inputs of the divider multiplexer receive the value of the core function  from the output of the addition block, and the value of the divider  from the input of the divider. The third and fourth information inputs of the multiplexer receive the value of the core function  from the output of the addition block, and the value  from the output of the inverter. The first output of the multiplexer is connected to the third input of the comparison unit. It transfers the value of the core function from the absolute value of the divisor . The second output of the multiplexer is connected to the fourth input of the comparison unit, is the sixth output of the positional characteristics calculating unit and transmits the absolute value of the divider .
The comparison block is based on Algorithm 2. It compares the absolute values of the dividend  and divisor  with  and  values, respectively. It sends to the first output of the comparison unit a signal in the case of “”, which is fed to the second output of the positional characteristics calculating unit. It sends a signal to the second output of the comparison unit in the case of “”, which is fed to the third output of the block for calculating positional characteristics.
Figure 3 shows the block for refining the approximation series. It contains the storage register of the reduced, storage register of the divisor, modulo multiplying register, storage register of degrees “2”, power counting unit, demultiplexer, multiplexer, modulo subtraction block, multiplication by the constants blocks, addition blocks, block for determining the sign, and logical element AND.
 The value  from the first input of the refinement block of the approximation series is received at the first input of the storage register of the reduced one. The input of the divider storage register is the third input of the approximation series refinement block. It transmits the value  to the first input of the modulo multiplication unit, where the divider is multiplied by the power of “2”, presented in the RNS, which come from the first output of the storage register of the powers of “2”.
Additionally, degrees “2” from the first output of the storage register are fed to the second input of the demultiplexer. The first input receives the value of the product from the output of the multiplying unit.
The degree counting unit determines the degree “2”, for which the product  and then it counts down the degrees to check the occurrence of the power “2” in the representation of the quotient . To determine the maximum degree “2”, the power count unit supplies the first output connected to the first input of the power storage register “2” with the address values starting from 1, while the maximum degree is equal to , the storage register of degrees “2” supplies the first output of degrees “2” presented in the RNS.
The second output, connected to the control input of the demultiplexer, is supplied with the value of the operating mode: direct or countdown degrees.
In the direct counting case, the value of the product from the output of the modulo multiplication unit is fed to the third output of the demultiplexer. It is connected to the input of the multiplication unit by a constant where the values of the core function from the orthogonal bases of the RNS  are multiplied and then added in the addition block.
The least significant  bits of the result are fed to the second input of the power counting block, where it is compared with the value of the dividend core function, which is fed to the first input of the power counting block from the second input of the approximation series refinement block.
In the countdown case, the value of the product from the output of the modulo multiplying unit is fed to the first output of the demultiplexer, to the second output of which the value of the power “2” is supplied. At the end of the countdown, a signal about the end of counting is sent to the third output of the power counting unit.
In the subtraction unit, subtraction is performed according to formula (2). The first input is fed from the output of the storage register of the reduced product, and the second one is fed from the first output of the demultiplexer.
The result of subtraction is fed to the input of the multiplying unit by constants, from where it is fed through the addition unit to the first input of the sign determination unit, the second input of which is connected to the output of the modulo subtraction unit, which is also connected to the second information input of the multiplexer.
The output of the sign determination unit is connected through the inverter to the first output of the AND logic element, the second input of which receives degrees “2” from the second output of the demultiplexer, and to the control input of the multiplexer, the first input of which is connected to the output of the storage register of the decreasing one, and the output of the multiplexer is connected to the second to the input of the storage register of the decrement. The output of the AND gate is the second output of the approximation series refinement block.
Figure 4 shows a quotient output block containing a modulo addition block, demultiplexer, inverter, storage register “1” in the RNS, storage register “−1” in the RNS, private multiplexer, unit multiplexer, quotient selection multiplexer, and the AND logic gate.
 Degrees “2” from the fifth input of the quotient output unit are fed to the first input of the modulo addition unit. Its output is connected to the output of the demultiplexer.
Depending on the end-of-count signal, it connects to the demultiplexer control input from the fourth input of the quotient output unit, feeds the result to the second input of the block addition modulo or to the second input of the quotient multiplexer and to the first input of the quotient multiplexer through the inverter.
The signal of  sign from the first input of the quotient output unit is fed to the control inputs of the quotient multiplexer and unit multiplexer. Its first and second information inputs receive signals from the outputs of the storage register “1” in the RNS and the storage register “−1” in the RNS, respectively.
The outputs of the quotient multiplexer and unit multiplexer are connected to the first and second inputs of the private selection multiplexer. Its control input receives the signal “” from the third input of the private output block. The output of the private selection multiplexer is connected to the first input of the AND gate. The signal “” is supplied through the inverter from the second input of the private output. The output of the AND gate is the output of the quotient .
  4. Discussion
This section presents a description of the example for verification of obtained results, their interpretation, as well as the conclusions that can be drawn.
Let us consider an example for the RNS 
. According to Algorithm 1, the internal parameters 
, 
, 
, 
, and 
 are calculated. The constant multiplication blocks multiply each of the four remainders of the number by 
, 
, 
, and 
, respectively. Addition blocks add the obtained products and output the least significant 
 bits of the number. Thus, pairs of blocks of multiplication by constants with addition blocks implement the formula
      
The inverters find the opposite value for the number represented in the RNS by subtracting the corresponding remainder from the modulus.
 is fed to the input of the dividend, which, after calculating the core function by blocks of multiplication by constants and addition, feeds the value  to the second input of the sign determination block, to the first input of which  is sent from the input of the dividend.
In the block for determining the sign, a comparison is made with  and  according to Algorithm 2. The value of the sign of  is 1 (negative) and it is fed to the control input of the dividend multiplexer, and to the first input of the XOR element. Additionally, the value  is fed to the inverter, the result of which is .
After calculating the core function by the constant multiplication and addition blocks,  is obtained.  is fed to the first information input of the multiplexer.  is fed to the second information input.  is fed to the third information input.  is fed to the fourth information input. Since the dividend is negative,  is fed to the first output of the multiplexer of the dividend and  is fed to the second output.
At the same time,  is fed to the input of the divider, which, after calculating the core function by the blocks of multiplication by constants and addition, feeds the value  to the second input of the sign determining block, the first input of which is  from the input of the divider. In the block for determining the sign, a comparison is made with  and  according to Algorithm 2.
The value of the sign of  is 0 (positive), and it is fed to the control input of the divider multiplexer and to the second input of the XOR element. In addition, the value  is fed to the inverter. The result is . After calculating the core function by the constant multiplication and addition blocks,  is obtained.  is fed to the first information input of the multiplexer,  is fed to the second information input,  is fed to the third information input, and  is fed to the fourth information input. Since the divisor is positive,  is fed to the first output of the multiplexer, and  is sent to the second output.
Since the signs of the dividend  and the divisor  are different, signal 1 is sent to the output of the XOR element, i.e., the result is negative. The comparison block is based on Algorithm 2.  and  are compared with  and  coming from the dividend and divisor multiplexers. Since , then , “”, and “”, 0 is sent.
Thus, the sign of result 1 is fed to the first output of the block for calculating positional characteristics. The zeros are fed to the second and third outputs, which means that the conditions “” and “” are not met. At the fourth, fifth, and sixth outputs, respectively, there are the values ,  and .
 is fed to the input of the divider storage register. The degree counting unit delivers to the first output the address of the power “” represented in the RNS, which is stored in the degree storage register “2”.
The value (2,2,2,2) is fed to the second input of the modulo multiplication block, the first input of which is (1,10,6,4). The result (2,7,12,8), under the action of the signal at the control input of the demultiplexer, is fed to the multiplication and addition blocks. The value of the core function  is calculated. In the degree counting unit, this value is compared with , which is fed to the first input. Since , the degree counting continues. This countdown continues until , for which . After that, the degree counting unit goes into the countdown state and sends a corresponding signal to the control input of the demultiplexer.
The degree counting unit delivers to the first output the address of the power “” represented in the RNS, which is stored in the degree storage register “2”. The value (7,11,9,14) is fed to the second input of the modulo block, the first input of which is (1,10,6,4).
The result (7,6,3,18), under the action of a signal at the control input of the demultiplexer, is fed to the second input of the subtractor modulo, to the first input of which  is fed.
The result  is fed to the blocks of multiplication by a constant and addition, in which the value of the core function  is calculated. It is fed to the first input of the block for determining the sign, to the second input of which the value  is sent, since the number , then  and  is not included in the representation of the quotient .
“1” is sent to the output of the block for determining the sign, which arrives to the multiplexer control input, overwriting the value  in the decrement storage register. Additionally, 1 from the output of the sign determination unit is fed to the inverted first input of the AND gate, zeroing the value of the power “” supplied from the second output of the demultiplexer.
Further, the degree counting unit feeds to the first output the address of the degree “”, represented in the RNS, which is stored in the degree storage register “2”. The value (9,12,13,7) is fed to the second input of the modulo block, the first input of which is (1,10,6,4). The result (9,3,10,9), under the action of a signal at the control input of the demultiplexer, is fed to the second input of the subtractor modulo, to the first input of which  is fed. The result  is fed to the blocks of multiplication by a constant and addition, in which the value of the core function  is calculated, which is fed to the first input of the block for determining the sign, to the second input of which comes the value of , since the number , then  and  is included in the representation of the quotient .
At the output of the block for determining the sign, 0 is sent, which is fed to the control input of the multiplexer, recording the value of  in the storage register to be reduced. In addition, 0 from the output of the sign determining block is fed to the inverted first input of the AND gate, passing the value of the power “” supplied from the second output of the demultiplexer to the second output of the approximation series refinement block.
The rest of the degrees are checked in the same way. Finally, the degree counting unit sends to the first output the address of the “” represented in the RNS, which is stored in the degree storage register “2”. The value (1,1,1,1) is fed to the second input of the modulo block, the first input of which is (1,10,6,4). The result (1,10,6,4) under the action of the signal at the control input of the demultiplexer is fed to the second input of the subtractor modulo. The first input is sent with  from the output storage register decreasing.
The result  is fed to the blocks of multiplication by a constant and addition, where the value of the core function  is calculated, which is fed to the first input of the block for determining the sign. The second input receives the value , since the number . Then  and  are included in the representation of the quotient .
“0” is sent to the output of the block for determining the sign, which is fed to control input of the multiplexer, writing the value of  in the storage register to be reduced. In addition, “0” from the output of the sign determining block is fed to the inverted first input of the AND element, passing the value of the power “” supplied from the second output of the demultiplexer to the second output of the approximation series refinement block. The end of the counting signal is sent to the third output of the power counting block.
The final quotient is formed in the block of the output of the quotient. The first input of the modulus addition block sequentially receives the degrees “2” included in the representation of the quotient . The second input of the modulus addition block receives the sum of the previously obtained degrees.
The number “” after logical multiplication with zero is equal to (0,0,0,0), entering the first input, is added with (0,0,0,0). Then
 is added with .
 is added with ,
 is added with ,
 is added with ,
 is added with ,
 is added with ,
 is added with 
The result  under the effect of the signal of the end of the counting of the fourth input of the quotient output block to the control input of the demultiplexer is fed to the input of the inverter, where the opposite value  is located, arriving at the first input of the multiplexer quotient, the second input of which receives  from the second output of the demultiplexer.
Since the control inputs of the private multiplexer and the unit multiplexer receive a signal that the result is negative, the value  from the inverter and  from the storage register “” in the RNS. Under the action of the signal “” (in this case 0) from the third input of the quotient output unit to the control input of the quotient selection multiplexer, the value  is sent to the output from the inverter and since the signal “” (in this case, 0) from the second input of the quotient output block is inverted by the second input of the AND gate, then the quotient output is supplied with the value , which corresponds to the value .
  5. Comparative Analysis
The proposed implementation of the Euclidean division algorithm reduces the size of the operands by half compared to the algorithm from [
17] by using the Akushsky core function instead of the approximate method. On the other hand, using the Akushsky core function without critical cores allows reducing the depth compared to the algorithms from [
18,
24,
25,
26] (see 
Table 1). By depth, we mean a number of the RNS processor elements, circuit for arithmetic or Boolean operations, such as addition, multiplication, modulo, etc.
Akushsky core function is a generalization of the Pirlo and Impedovo function. Both Akushsky’s core function and approximate method possess similar arithmetic options and used for similar application areas, for instance, Euclidean division algorithms. However, Akushsky’s core function avoids computational errors arising due to rounding operations.
  6. Conclusions
We propose an enhanced modular division implementation. It has an improved accuracy and performance with minimal hardware requirements. The proposed method reduces the size of the operands by half in comparison with the RNS division algorithm based on the approximate method. The field programmable gate arrays (FPGAs) implementation can be used both as a separate device and as a coprocessor to perform non-modular operations.
We use Akushsky core function  with no critical cores due to its monotonicity, which allows accurately comparing numbers and determining the sign of the number.
Our method improves the accuracy of calculating the division of numbers and determining the sign of the RNS numbers by avoiding rounding errors arising when the approximate method based on the Chinese remainder theorem is used.