T-Count Optimized Quantum Circuit Designs for Single-Precision Floating-Point Division

: The implementation of quantum computing processors for scientiﬁc applications includes quantum ﬂoating points circuits for arithmetic operations. This work adopts the standard division algorithms for ﬂoating-point numbers with restoring, non-restoring, and Goldschmidt division algorithms for single-precision inputs. The design proposals are carried out while using the quantum Clifford+T gates set, and resource estimates in terms of numbers of qubits, T-count, and T-depth are provided for the proposed circuits. By improving the leading zero detector (LZD) unit structure, the proposed division circuits show a signiﬁcant reduction in the T-count when compared to the existing works on ﬂoating-point division.


Introduction
Quantum integer circuits attracted researchers to implement Shor's factorization algorithm, in order to address fundamental arithmetic problems on a quantum computer and resolve discrete logarithmic problems in a polynomial time [1]. The applications of quantum computing are widespread in the field of quantum chemistry [2] and in solving the linear system of equations [3,4]. However, quantum computers simulate some problems with large-scale qubits that are not available, yet that are also essential in constructing some optimum quantum circuits to assist in the efficient hardware design of future quantum computers [5]. Quantum circuits provide special features, such as a one-to-one mapping between vectors of input and output, as contrasted with traditional classic circuits. As a result of the unique mapping between inputs and outputs, additional input and outputs, called ancilla inputs, and garbage outputs are generated [6]. The garbage generated must be cleared by running the circuit backward, after copying the necessary results to excess auxiliary constants, so that the quantum circuits can be used to implement the complete quantum algorithm. The hardware for physical quantum computing is uniquely susceptible to noise errors [7]. Circuits can be designed using the fault-tolerant Clifford+T gates set to overcome the noise-intolerant behavior of physical quantum computing systems. The implementation of fault-tolerant quantum circuits comes with an additional overhead for Tgates count. The costs that are related to T-gate implementation make T-count and T-depth calculations an essential parameter for the quantum circuit's implementation that permits reliable and scalable computing [8]. A small number of qubits is supported by existing quantum computers, which makes it necessary to create any quantum circuit with fewer qubits. Quantum circuits that are available in literature concentrate on minimizing the qubits and T-Count for fundamental integer arithmetic applications that show a substantial decrease in T-count [9][10][11][12][13][14][15][16]. Quantum Fourier Transform (QFT) is the most potent operation in quantum computing. It helps to extract the periodicity of the quantum states' amplitude in Shor's factorization algorithm [17,18]. The importance of QFT is estimated in many applications in cryptosystems, like Rivest-Shamir-Adleman (RSA) algorithm and in Elliptic curve cryptography [19]. Single-qubit rotations performed in QFT using R gate are challenging to realize in a fault-tolerant manner [18]. QFT circuits utilize fault-tolerant T gate to propose a fault-tolerant phase rotations, which presents an additional overhead on T-count or ancillary qubits [18,20]. QFT was also used to produce a variety of integer arithmetic circuits, but they are fault-intolerant and, thus, highly susceptible to noise errors [21,22]. Floating-point numbers allow for accurate representations of magnitude that would benefit algorithms of linear systems and quantum machine learning algorithms [4,23,24]. While floating-point arithmetic plays a crucial role in modern, large-scale digital sign processing systems, research on reversible and quantum floating-point arithmetic circuits has received a minimum level of attention than the integer problem solving circuitry [5,[25][26][27][28]. In a number of the science, multimedia, and adaptive computing sectors, floating-point division is an essential operation [29]. Restoring, non-restoring, and SRT division algorithms are slow division algorithms, whereas fast division algorithms are Goldschmidt and Newton-Raphson algorithm. The slow division algorithms are numerical repeat algorithms, which generate a single bit quotient per iteration and the number of iterations is the same as the input qubits [30]. The fast division algorithm utilizes log 2 N number of iterations to compute the quotient for N-bit input. The fast division algorithms are functional algorithms that typically compute the input qubit's reciprocal and then perform subtraction and multiplication to compute the input qubits' accurate quotient. In this paper, a quantum division circuit for single-precision floating-point (SPFP) number is proposed using restoring, non-restoring, and Goldschmidt algorithms. The critical goal in developing a quantum division circuit using Restoring, non-restoring, and Goldschmidt algorithm is to minimize the T-count and T-depth of the proposed quantum circuit. Section 2 provides the preliminary literature on floating-point numbers, floating-point division algorithm, quantum Clifford+T gates, the structure of Gidney's adder, quantum adder-subtractor using Gidney's adder [10], and controlled-adder block [15], which is utilized to design the proposed division circuits. Section 3 deals with the design of the proposed quantum leading zero detector, Section 4 deals with the design of a quantum circuit for the proposed single-precision restoring, non-restoring division circuit, and Section 5 deals with the design of the quantum circuit for the proposed single-precision Goldschmidt division algorithm. Section 6 deals with resource utilization discussion of the proposed floating-point division circuits and Section 7 presents the conclusion of the proposed work.

Preliminaries
In floating-point depictions, every number is represented in three segments namely sign bit [S x ], exponent bits [E x ], and mantissa, which is the fractional part [M x ]. The IEEE754 standard floating-point number portrayal varies in the length of exponent and mantissa bits for the different precision numbers [5,26]. For example, a single-precision floating-point (SPFP) number can be represented, as shown in Equation (1). Algorithm 1 presents the steps for the calculation of the division of two floating-point numbers. The resulting sign bit is initially computed while using an XOR operation on input sign bits. The exponent of the divisor subtracts the dividend exponent and then division takes place on qubits of mantissa. The intermediate exponent and mantissa are normalized in order to obtain the resultant quotient qubits. Figure 1 shows the high-level description of the generic quantum floating-point division circuit demonstrating the steps that are outlined in Algorithm 1.  The proposed quantum floating-point division circuit is realized using the Clifford+T gates set listed in Table 1. The performance metric parameter of Fault-tolerant quantum circuit implementation are defined, as follows: 1.
T-Count: The number of T-gates employed in the quantum circuit.

2.
T-depth: The number of T-gate layers in the quantum circuit that can perform parallel quantum information processing.

3.
Qubits: The total number of qubits required to implement the quantum circuit.

4.
Circuit size (KQ): T-depth × No. of qubits. The proposed quantum floating-point division circuits use CNOT gate, Pauli-x gate, and CCNOT(Toffoli gate) to implement the complete circuit. In terms of fault-tolerant Clifford+T gates set, several researchers worked on the decomposition of the CCNOT gate in order to compute quantum information and estimate the cost of the T-gate [10,[31][32][33]. Among the handful designs of CCNOT gate decomposition, the Toffoli gate implementation by Amy et al. [31] and the Toffoli implementation as temporary logic AND by Gidney [10] shows a better resource utilization. CCNOT decomposition by Amy et al. is widely used for logical operations other than logical AND shows a T-count of 7 and its T-depth is 3, with no excess ancilla qubits [31] is shown in Figure 2. The temporary logic AND decomposition of CCNOT gate shows a T-count of 4 and a T-depth of 2 in the computation section and a T-count of 0 in the uncomputation section [10]

. Elementary Quantum Circuits for Single-Precision Operands Used in Designing the Proposed Quantum Floating-Point Divider
Craig Gidney's adder has a T-count of 4n with a T-depth of 2n, which is the lowest T-count adder [10] employed to design the subtractor and adder/subtractor components in the proposed quantum floating-point division circuit. A quantum control-add block is required to realize the slow division algorithms on floating-point operands. The control-add unit adapted from the design proposed in 2020 by Haner et al. [15] using two CCNOT gates, four CNOT gates, a temporary logical AND gate, and uncomputation gate is estimated to have a T-count of 18n with a T-depth of 8n and n ancillary qubits is shown in Figure 5. The quantum adder/subtractor circuit is constructed with the ripple carry adder [10], which does not lead to the excess usage of T-gate by the circuit and, thus, the T-cost circuit of this quantum circuit remains the same as the quantum n qubit adder with carry input is depicted in Figure 6. The structure of the quantum ripple borrow subtractor for the subtraction of operands in the proposed floating-point division circuits has a T-count of 4n, T-depth of 2n, with n ancillary qubits as Gidney's adder [10] is shown in Figure 7. Quantum Subtractor for n qubits adopted from [10] with T-count = 4n, T-depth = 2n and ancillae = n qubits.

Proposed Quantum Leading Zero Detector
A quantum leading zero detector circuit and a quantum normalization unit are required to normalize and pack the floating-point divider result. Here, the Clifford+T gates set are used to construct a quantum leading zero detector unit that counts the number of zeros in the most important bit positions before the first one appears. A four-qubit quantum leading zero detector is proposed and it is scaled for the size of the mantissa. Equations (2)-(4) depict the output expression of the proposed four-qubit quantum leading zero detector by changing the classical Boolean expression. The output variable Q 0 , Q 1 calculates the leading zero count and the output variable V becomes high when all of the input qubits are zeros. Figure 8 shows a quantum circuit for the proposed four-qubit quantum LZD unit, which uses nine CCNOT gates, of which three CCNOT gates are uncomputed for the auxiliary qubit restoration, hence making a T-count of 24, T-depth is 12, and ancillary qubits count is 4. A 32 qubit leading zero detector is necessary for performing normalization and rounding on a floating-point number, as the leading zero detector unit generates log 2 N bits output for N bit inputs. An input array of 32 qubits is required to normalize the proposed division circuit's output, but the mantissa's size is only 23 qubits. As a required input, it is possible to take the first 23 qubits and consider the remaining nine qubits as ancillary inputs.
The proposed LZD is constructed based on the delay effective design that is presented in [34] to adopt it for SPFP input. The mantissa is first divided into nibbles in order to evaluate the number of leading zeros: the nibble containing Most Significant Bit (MSB) is nibble 0 and Least Significant Bit (LSB) is nibble 7. Two output variables, namely V i and Q i , are expected to be created by each nibble. Where V i denotes the logical sum of each nibble as complementary, and Q i represents the number of leading zeros in each nibble. The output of 8 bit LZD generates the output The output Q i is generated as a function with the help of the previous stage inputs Q i and V i . The output V is the logical sum of V i and output Q i = 0, Q 0 Q 1 is produced when V 1 = 1 and when V 2 = 1 the output Q i = 1, Q 2 Q 3 is produced. The proposed 8 qubit quantum leading zero detector utilizes two 4 bit leading zero detectors with the control logic is shown in Figure 9. The proposed leading zero detector utilizes 19 CCNOT gates and 11 ancillary qubits to generate the output VQ i . Table 2 shows the resource utilization of the proposed 8 qubit quantum leading zero detector. The proposed 8 qubit LZD is optimized in T-count, T-depth, and ancilla, showing a significant savings of around 32.14%, 29.62%, and 71.05% of the parameters mentioned above, respectively, over the existing work proposed in [26]. The proposed 8 qubit LZD is also compared with the reversible LZD proposed in [35], which is modified to adapt it to the quantum environment. The comparison result shows a significant savings in T-count, T-depth, and ancilla around 45.23%, 65.78%, and 47.61%, respectively, over the existing LZD [35].

Quantum Restoring Circuit for Mantissa Division
Algorithm 2 presents the restoring division algorithm for mantissa division.  Figure 10 presents the quantum circuit for calculating the quotient and remainder for the mantissa part of the floating-point number using the restoring division algorithm for a 4 qubit input. This quantum circuit performs n iterations for n qubit inputs. Each iteration uses an n qubit quantum subtractor and n qubit quantum controlled adder. The subtractor computes the output of Y = Y − M y and the CNOT gate rewrites the location of register R n−i as R n−i ⊕Y n−1 . The quantum controlled adder circuit computes Y = Y + M y if the condition Y < 0 is true. The quantum Pauli-x gate writes the inverse of Q n−1 and produces the output R 0 bit of the division circuit. The subtractor in the final iteration computes the output of Q = Q − M y and the CNOT gate is applied on R 0 and Q n−1 , such that Q n−1 is transformed to Q n−1 ⊕R 0 . The controlled adder computes the Q = Q + M y if the condition Y < 0 is true. After performing the aforementioned steps, the quotient output is stored in register Q and the register locations R holds the remainder output.   Figure 11 shows the quantum circuit built for mantissa division using the non-restoring algorithm. In the initial step, the quantum subtractor calculates Q = Q − M y output and then the iteration starts. The quotient bit Q n−i is reversed in each iteration and the quotient bits are concatenated into a temporary variable Y. By performing controlled addition and subtraction using the quantum controlled adder-subtractor circuit, the quotient output is computed. An addition is performed by the quantum adder-subtractor circuit and the output Y = Y + M y is generated when Q n−i = 0. Similarly, for iterations 1 ≤ n − 1, the circuit performs subtraction and produces the output Y = Y − M y when Q n−i = 1.  Figure 11. Quantum non-restoring division circuit for four qubit input.

Quantum Division Circuit Design Proposed for SPFP Number Using Goldschmidt Division Algorithm
The division algorithm of Goldschmidt requires a pair of multiplications and a subtraction unit per iteration. Although the Goldschmidt algorithm involves two multiplications per iteration, the key feature of this algorithm is that both multiplications are independent and, hence, the circuit depth is reduced [36]. Algorithm 4 provides the algorithm for calculating the quotient of two numbers X and Y using the Goldschmidt algorithm.

Algorithm 4: Goldschmidt Division Algorithm for floating-point number with single-precision.
Input: X and Y Initialize Figure 12 shows a high-level description of the SPFP quantum division circuit using the Goldschmidt division. The denominator's reciprocal approximation is determined using a look-up table (LUT) that finds the closest reciprocal approximation of the denominator. A LUT is an array that contains a floating-point number that stores pre-calculated reciprocal values of entire ranges. The denominator is found once the estimated reciprocal and the numerator inputs of the division circuit are multiplied by the denominator's reciprocal value and N i and D i are produced. The new factor F i is computed using the formula 2 − D i , and the new factor is multiplied with each N i and D i in each iteration. The number of iteration to compute the quotient of two input floating-point number is calculated as log 2 N for N-bit input.

Quantum Subtractor Circuit Design for SPFP Input
The addition and subtraction of two floating-point numbers is the prerequisite for implementing the floating-point arithmetic logic unit (ALU) of the signal processors and a quantum floating-point divider. Algorithm 5 provides the algorithm for adding and subtracting floating-point numbers. In Figure 13, a high level view of the quantum circuit is depicted for the addition and subtraction of a pair of floating-point numbers.  A quantum floating-point subtractor requires a quantum magnitude comparator circuit that is equivalent to the size of the exponent qubits. Several works have been proposed for the realization of a quantum comparator while using serial and tree-based architecture [37][38][39]. An efficient quantum multi-qubit magnitude comparator was proposed to compute a < b and a ≥ b for image binarization in quantum image processing [40]. All of these designs focus on reducing the qubits by reducing ancillary qubits utilized in the circuit. Here, a quantum comparator is used that is designed using Gidney's adder [10] that computes a ≤ b and a > b, depending on the borrow output of the subtractor. Figure 14 shows the flowchart to design a multi-qubit magnitude comparator using a quantum subtractor with an additional ancilla for two n-qubit inputs A n and B n . The final borrow output is copied on the extra ancilla using a CNOT gate. Whenever the borrowed output is high, the input A n < B n is calculated, and when the final borrowed output is low, A n ≥ B n . Therefore, the multi qubit quantum magnitude comparator that provides a decreased T-count of 4n and uses ancillary qubits of n + 1 for qubit inputs of n is shown in Figure 15.

Quantum Circuit Design of a Multiplier for Floating-Point Number with Single-Precision
The quantum Goldschmidt division circuit for SPFP number requires two multiplication operations for each iteration. Algorithm 6 presents the algorithm to implement floating-point multiplication.
The mantissa multiplier unit is required to perform quick calculations, as multiplication operations in the Goldschmidt division algorithm are independent operations. Vedic multipliers hold the merit of having less delay, as an increase in bits shows a slow increase in delay and area [41,42] and, hence, the Vedic multiplier is used to realize the floating point multiplication algorithm. Typically a 4 × 4 vedic multiplier requires 16 CCNOT gates that generate the partial product, seven full adder circuits, eight half adder circuits, and XOR with a total ancilla of 31 qubits. The 4 × 4 quantum vedic multiplier can be uniformly scaled with reference to the design that is shown in Figure 16. Figure 17 shows the high-level description of a quantum floating-point multiplier circuit. For various floating-point precision numbers, this structure can be practiced uniformly. Figure 18 shows the line diagram to design a 4 × 4 multiplier using the vedic multiplication algorithm.

N × N Vedic Multiplier
Each input multiplier and multiplicand is divided into equals of lower and higher order bits to multiply. The N × N multiplier will require four N/2 multipliers and three Nbit ripple adders in order to produce the final product output. Figure 16 shows the proposed General Vedic multiplier structure.

Resource Utilization of the Proposed Quantum Divider Circuits for SPFP Number
By evaluating the T-count and qubit utilization of each stage in the division algorithms, the T-count and qubit analysis of the proposed floating-point division is calculated. The Total T-count and ancilla are obtained by summing up the T-count at each level of the proposed floating-point division circuit. The proposed 8 qubit leading zero detector is uniformly scaled while using the architecture proposed in [43] to detect the number of zeros for the mantissa and align the mantissas to normalize the floating-point number.

Resource Analysis of the Proposed Quantum SPFP Divider Circuit Using Restoring and Non-Restoring Algorithms
The resource usage of the proposed quantum restoring division and non-restoring division circuit for SPFP division is calculated for each step.
In order to analyze the resource utilization of the proposed floating-point division circuits using non-restoring division algorithm, the methodology differs from the mantissa division of the circuit. The proposed quantum division circuit uses quantum subtractor and quantum adder-subtractor for the length of the mantissa and, hence, the resource utility varies only in the mantissa division of the restoring division algorithm and is illustrated in Table 3.
For the normalization of floating-point number, a quantum barrel shifter circuit is utilized that can operate in a unidirectional and a bidirectional way. The proposed quantum floating-point division circuits employ the quantum bidirectional shift register proposed in [44] that uses the Fredkin (CSWAP) and Feynman (CNOT) gate.  Table 4 shows the resource count of both floating-point subtraction and floating-point multiplication using vedic multiplier excluding leading zero detector and normalization unit. To perform mantissa multiplication using vedic multiplication algorithm, a 24 × 24 multiplier is constructed using four 12 × 12 multiplier and three 24 bit ripple carry adders and a partial product generator that generates 576 partial products, as shown in Figure 16. There are few designs for floating-point division algorithms that have been proposed in the literature that implement slow and fast division algorithms. To adopt those designs for the quantum circuit environment [27,45], the reversible division circuit is modified by analyzing the quantum decomposition of the reversible gates used in the circuit. A comparison of resource utilization of proposed quantum SPFP division circuit is compared with the slow division and fast division algorithms are shown in Tables 5 and 6 respectively.  The proposed quantum circuit for the restoring division algorithm is showing an improvement of 8.74%, 50.11% over the existing design in ancilla and T-count, respectively. Similarly, the proposed non-restoring division quantum circuit shows an improvement of 7.29%, 77.06 % over existing design in ancilla and T-count, respectively [45]. The proposed quantum SPFP division circuit using the Goldschmidt division algorithm is compared with the existing modified reversible floating-point divider [7]. The proposed SPFP quantum division circuit achieves a greater T-count savings of 48.58 % and T-depth savings of 70.20% over the existing design [27].

Conclusions
The implementation of a SPFP division circuit using the Restoring, non-restoring, and Goldschmidt algorithm using Clifford + T gates set is discussed in this paper. Several combinations of fundamental arithmetic operators, such as adders, subtractors, multipliers, leading zero detector, and normalization unit, are implemented as efficiently as possible to create the floating-point division circuit. The proposed floating-point division circuits produce garbage outputs. In order to eliminate the garbage output, the quotient is copied on temporary ancilla to run the entire circuit backward. The uncomputation phase of the proposed division circuits doesn't contribute to the T-count or T-depth of the resultant circuit. The proposed quantum division circuit using restoring division algorithm and non-restoring division algorithm shows a better T-count reduction of 50.11% and 77.06 %, respectively, when compared with the existing work. The proposed quantum restoring and quantum non-restoring division circuits also provide savings in ancillary qubits of around 7.29% and 10%, respectively, over the existing design. The proposed Goldschmidt's division quantum circuit shows more significant savings in the T-count of 48.58% and T-depth of 70.20% over the existing work. The proposed floating-point division circuits can be used to design complex circuits where T-gate cost is of paramount concern.