Next Article in Journal
Resilient SDN-Based Communication Architecture for Adaptive Control in Green Hydrogen Hybrid Microgrids
Previous Article in Journal
Memristor-Based Read–Write Interface Design for Neural Networks: A Comparative Study of Linear-Drift and VTEAM Models
Previous Article in Special Issue
Accelerating Post-Quantum Cryptography: A High-Efficiency NTT for ML-KEM on RISC-V
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Qrisp-Based Implementation and Experimental Evaluation of a T-Count-Optimized Non-Restoring Quantum Square-Root Circuit

by
Heorhi Kupryianau
1 and
Marcin Niemiec
1,2,*
1
AGH University of Krakow, Faculty of Computer Science, Electronics, and Telecommunications, Mickiewicza 30, 30-059 Krakow, Poland
2
Klaipeda University, H. Manto 84, 92294 Klaipeda, Lithuania
*
Author to whom correspondence should be addressed.
Electronics 2026, 15(11), 2334; https://doi.org/10.3390/electronics15112334
Submission received: 25 April 2026 / Revised: 20 May 2026 / Accepted: 25 May 2026 / Published: 28 May 2026
(This article belongs to the Special Issue Recent Advances in Quantum Information)

Abstract

Efficient quantum arithmetic is a prerequisite for the practical realization of large-scale quantum algorithms, yet many resource-optimized designs remain at the theoretical level. In this work, we present a complete implementation of the T-count-optimized non-restoring quantum square-root circuit proposed by Muñoz-Coreas E. and Thapliyal H. in the Qrisp quantum programming framework. The implemented design follows the garbageless square-root construction based on reversible arithmetic and is built from modular sub-circuits, including reversible adders, subtractors, controlled add/subtract blocks, and controlled adders. We show that the high-level abstractions provided by Qrisp enable a direct and reusable realization of the algorithm while preserving the theoretical resource advantages of the original circuit. To assess practical feasibility, the circuits were additionally executed on IBM’s ibm_marrakesh superconducting quantum processor. The experimental results show that the algorithm can run on contemporary NISQ hardware for small input sizes, although compilation overhead, two-qubit gate errors, readout errors, and relaxation effects significantly reduce success rates as the circuit size increases. Among the tested runtime techniques, dynamical decoupling provided only limited improvement. These results establish the practical realizability of a resource-efficient quantum square-root circuit and provide insight into the challenges of executing arithmetic-heavy quantum algorithms on present-day hardware. These results demonstrate that the previously proposed T-count-optimized non-restoring square-root circuit can be realized as a modular Qrisp implementation, exported to Qiskit, and experimentally evaluated on contemporary NISQ hardware, while also highlighting the practical limitations imposed by compilation overhead and hardware noise.

1. Introduction

Quantum computing has emerged as a paradigm capable of solving certain problems much faster than classical computers. Quantum algorithms use superposition and entanglement to achieve these speedups. For example, Shor’s algorithm [1] for integer factorization runs in polynomial time and is capable of breaking popular cryptographic algorithms such as RSA. Grover’s search, on the other hand, can find a target item in an unsorted database in O ( N ) steps instead of O ( N ) , which can speed up brute force attacks [2]. More broadly, quantum algorithms have been developed for applications ranging from cryptography to linear algebra and the simulation of physical systems [3,4,5,6]. These advances highlight the potential of quantum computing across diverse fields of science and engineering.
Among the computational tasks that quantum computers will tackle, implementing fundamental arithmetic operations is crucial for enabling larger algorithms. One such operation is the square root, which is common in scientific and engineering computations. Following the discussion in [7], the optimized square-root circuit can decrease the required resource for computing the natural logarithm [8]. The square-root circuit can also be used in the implementation of algorithms computing roots of polynomials [9] or evaluating quadratic congruence [10]. Wang et al. [11] also use a square-root circuit in their quantum Fast Poisson Solver. In their work, the square-root operation is computed with m-qubit precision, where m = 2 n + 2 + f , and f depends on the required eigenvalue accuracy ε 1 . Accordingly, efficient quantum circuits for functions such as the square root are needed to integrate quantum computing into these applications.
Several quantum circuit designs for square-root operation have been proposed in the literature, each with varying trade-offs between gate count, qubit usage, and garbage output [8,12,13]. One particularly efficient approach is based on the non-restoring square-root algorithm [7]. This algorithm has been shown to produce a quantum circuit with reduced T-count and qubit requirements compared to other square root methods, such as those based on Newton iteration [8]. The design minimizes the usage of the ancilla and avoids garbage production by construction, and its resource efficiency has been analyzed in detail in terms of both gate complexity and fault-tolerant considerations.
In this work, we implement the non-restoring square-root algorithm proposed in [7] using the Qrisp quantum programming framework [14]. Qrisp enables the high-level construction of reversible circuits and offers built-in support for common quantum operations that are used in the implementation of arithmetic components. The modularity and flexibility of Qrisp allow for a direct mapping of the algorithm’s structure into a circuit composed of reusable subcomponents, including reversible addition, subtraction, and conditional logic blocks. Qrisp was chosen for this implementation due to its ability to simplify the design of arithmetic circuits through high-level abstractions, uncomputation, and backend-independent compilation, significantly simplifying the development and analysis of quantum algorithms.
The implementation demonstrates that the algorithm from [7] can be realized in a modern quantum software framework and executed beyond the level of theoretical design. We constructed the circuit in Qrisp, validated its correctness, and evaluated its behavior on actual IBM Quantum hardware under realistic noise conditions. This work shows that resource-optimized arithmetic designs can be translated into executable quantum circuits and experimentally studied on noisy intermediate-scale quantum (NISQ) devices, highlighting the practical limitations imposed by hardware noise, limited connectivity, and compilation overhead.
In the following, we give a roadmap for the remainder of this paper. Section 2 introduces the quantum gates and theoretical concepts required for the construction of the algorithm. Section 3 describes the non-restoring quantum square-root algorithm. Section 4 presents its implementation in the Qrisp framework, including the design of the main arithmetic components. Section 5 provides experimental validation on real quantum hardware and analyzes the obtained results. Finally, Section 6 summarizes the main contributions of this work and outlines directions for future research.

2. Basics

This section briefly introduces selected types of quantum gates and circuits that are needed to describe and build a quantum square-root algorithm.

2.1. T-Depth and T-Count

Let a quantum circuit be expressed on the Clifford + T gate set. Two common cost metrics are the following:
  • T-Count: Total number of T gates in the circuit.
  • T-Depth: Minimum number of sequential layers of T gates, where gates in the same layer act on disjoint qubits and may be executed in parallel.
These metrics are used because T gates are expensive to implement in fault-tolerant quantum computation. Optimizing for low T-count reduces the total overhead, whereas optimizing T-depth minimizes the circuit runtime. Together, they help assess the practicality of quantum algorithms on real hardware.

2.2. The NOT Gate

The NOT gate is a single-qubit gate represented as shown in Figure 1a. Since it does not contain T gates, its T-count and T-cost are 0.

2.3. The CNOT Gate

The Controlled-NOT (CNOT) gate is a 2-qubit reversible gate having the mapping for input qubits a and b to output qubits a and a b , respectively. The quantum representation of the CNOT gate is shown in Figure 1b. The T-count and T-cost of the CNOT gate are 0.

2.4. The SWAP Gate

The SWAP gate is a gate that swaps the states of two qubits. A quantum representation of the gate in a quantum circuit is shown in Figure 2a. It can be broken down into 3 CNOT gates, as shown in Figure 2b. The T-count and T-cost of the CNOT gate are 0.

2.5. The Hadamard Gate

The Hadamard gate is a single-qubit gate that maps the state | 0 to the superposition state 1 2 ( | 0 + | 1 ) and the state | 1 to the superposition state 1 2 ( | 0 | 1 ) . A quantum representation of the Hadamard gate is shown in Figure 3. The Hadamard gate is a Clifford gate; therefore, the T-count and T-depth of the Hadamard gate are 0.

2.6. The T and T Gates

The T and T gates are single-qubit gates that are used to control the phase of a qubit. The T gate is a gate that adds a phase of π 4 to the state, and the T gate is a gate that removes a phase of π 4 from the state.

2.7. The Toffoli Gate

The Toffoli gate is a 3-qubit reversible gate having the mapping for three input qubits ( a , b , c ) to three output qubits ( a , b , a · b c ) , as shown in Figure 4a.
One of the Toffoli gate realizations was presented in [15] and is shown in Figure 4b. The decomposition consists of two Hadamard gates, six CNOT gates, four T gates, and four T gates. Therefore, the T-count of the Toffoli gate is 7 and the T-depth is 3.

2.8. The Peres Gate

The Peres gate is a 3-qubit reversible gate having the mapping for three input qubits ( a , b , c ) to output qubits ( a , a b , a · b c ) , as shown in Figure 5a. The gate can be constructed from sequentially applied Toffoli and CNOT gates, as shown in Figure 5b. The Peres gate inherits the T-count and T-depth of the Toffoli gate, since the CNOT gate has a T-count and T-depth of 0.

2.9. Addition and Subtraction Circuits

The quantum square-root algorithm described in the next section also requires T-count-efficient addition and subtraction circuits. The addition circuit used in the current implementation follows the idea from [16]. Unlike the original work, this version of the adder omits the overflow qubit. An example of the 4-bit adder is shown in Figure 6.
The implemented reversible ripple carry adder with no ancilla input qubit produces no garbage and places the result of the calculation in the first register. The algorithm follows six steps, described below, for two n-qubits with numbers a and b.
  • For i = 1 to n 1 :
    Apply the CNOT gate to the qubits b i and a i , where a i is the target qubit.
  • For i = n 1 to 1:
    Apply the CNOT gate to the qubits b i and b i 1 , where b i is the target qubit.
  • For i = 0 to n 2 :
    Apply the Toffoli gate to the qubits a i , b i , and b i + 1 , where b i + 1 is the target qubit.
  • For i = n 1 to 0:
    If i = n 1 apply the CNOT gate to the qubits b i and a i , where a i is the target qubit (in the original algorithm the Peres gate is applied, but since the overflow qubit is omitted, the CNOT gate is used instead). Otherwise, apply the Peres gate to the qubits b i , a i , and b i + 1 such that b i , a i , and b i + 1 are passed to the inputs a , b , c of the Peres gate, respectively.
  • For i = 1 to n 2 :
    Apply the CNOT gate to the qubits b i and b i + 1 , where b i + 1 is the target qubit.
  • For i = 1 to n 1 :
    Apply the CNOT gate to the qubits a i and b i , where b i is the target qubit.
The subtraction circuit utilizes the property that a b = a ¯ + b ¯ . Using this property, a subtractor can be designed by inverting the first register, applying the adder, and then inverting the first register again [17]. The example of such a circuit for 4-bit registers is shown in Figure 7.
The T gates are only used in the third and fourth steps of the algorithm and in each step there are n 1 Toffoli gates applied. The total T-count then is ( n 1 ) · 7 + ( n 1 ) · 7 , which can be reduced to 14 n 14 . Since the subtractor does not add any additional T gates, the total T-count of the subtractor is 14 n 14 as well.

2.10. Controlled Addition Circuit

Another arithmetical operation required by the algorithm is T-count-efficient controlled addition. The implemented version of the controlled addition circuit is theoretically described in [18]. Unlike the original work, this version of the controlled adder omits the overflow qubits. An example of the 4-bit controlled adder is shown in Figure 8.
The implemented reversible controlled adder with no ancilla input qubit produces no garbage and places the result of the calculation in the first register. The algorithm follows seven steps, described below, for two n-qubit numbers a and b and control qubit z.
  • For i = 1 to n 1 :
    Apply the CNOT gate to the qubits b i and a i , where a i is the target qubit.
  • For i = n 2 to 1:
    Apply the CNOT gate to the qubits b i and b i + 1 , where b i + 1 is the target qubit.
  • For i = 0 to n 2 :
    Apply the Toffoli gate to the qubits a i , b i , and b i + 1 , where b i + 1 is the target qubit.
  • Apply the Toffoli gate to the qubits z, b n 1 and a n 1 , where a n 1 is the target qubit.
  • For i = n 2 to 0:
    First apply the Toffoli gate to the qubits a i , b i , and b i + 1 , where b i + 1 is the target qubit.
    Then, apply the Toffoli gate to the qubits z, b i , and a i , where a i is the target qubit.
  • For i = 1 to n 2 :
    Apply the CNOT gate to the qubits b i and b i + 1 , where b i + 1 is the target qubit.
  • For i = 1 to n 1 :
    Apply the CNOT gate to the qubits b i and a i , where a i is the target qubit.
The T gates are used in the third, fourth, and fifth steps of the algorithms inside Toffoli gates. In the third step, the total amount of Toffoli gates is n 1 ; in the next step there is one Toffoli gate used. In the fifth step there are 2 ( n 1 ) . The total amount of Toffoli gates is 3 n 2 ; therefore, the T-count of the circuit is 21 n 14 since the T-count of the Toffoli gate is 7.

3. Quantum Square-Root Algorithm

The quantum circuit presented in [7] calculates the integer square root of a number as well as the remainder utilizing the classical non-restoring square-root algorithm [19]. The proposed circuit is garbageless; it also requires fewer qubits and has a lower T-count compared to the existing designs. Consider a positive binary value a that has an even bit length n. Before computations, three registers are initialized: n-qubit register | R that contains a, n-qubit register | F set to 1, and ancilla qubit | z initialized to 0. Afterwards, the computation register | R will hold the value of the remainder and the | F register will contain the integer square root of a for the locations | F n / 2 1 through | F 2 .
The quantum algorithm is divided into three parts: (1) initial subtraction, (2) conditional addition/subtraction, and (3) remainder restoration.

3.1. Part 1: Initial Subtraction

This part occurs once and contains six steps.
  • Apply the NOT gate on the qubit R n 2 .
  • Apply the CNOT gate on the qubits R n 2 and R n 1 such that R n 1 is the target qubit.
  • Apply the CNOT gate on the qubits R n 1 and F 1 such that R n 1 is the target qubit.
  • Apply the inverted CNOT gate on the qubit R n 1 and the ancilla qubit z such that R n 1 is the target qubit.
  • Apply the inverted CNOT gate on the qubits R n 1 and F 2 such that R n 1 is the target qubit.
  • Apply the conditioned ADD/SUB such that qubits R n 4 to R n 1 make the first argument of the ADD/SUB circuit and qubits F 0 to F 3 make the second argument, while the ancilla qubit z controls the operation performed.

3.2. Part 2: Conditional Addition or Subtraction

This part occurs n / 2 2 times for i from 2 to n / 2 1 and is made of seven steps.
  • Apply the inverted CNOT gate on the qubit z and the ancilla qubit F 1 such that F 1 is the target qubit.
  • Apply the CNOT gate on the qubits F 2 and z such that z is the target qubit.
  • Apply the CNOT gate on the qubits R n 1 and F 1 such that F 1 is the target qubit.
  • Apply the inverted CNOT gate on the qubit R n 1 and the ancilla qubit z such that z is the target qubit.
  • Apply the inverted CNOT gate on the qubits R n 1 and F i + 1 such that F i + 1 is the target qubit.
  • For j = i + 1 to 3:
    Apply the SWAP gate on the qubits F j and F j 1 .
  • Apply the conditioned ADD/SUB such that qubits R n 1 to R n 2 · i 2 make the first argument of the ADD/SUB circuit and qubits F 2 · i + 1 to F 0 make the second argument, while the ancilla qubit z controls the operation performed.

3.3. Part 3: Remainder Restoration

The last part occurs only once and contains nine steps.
  • Apply the inverted CNOT gate on the qubits z and F 1 such that F 1 is the target qubit.
  • Apply the CNOT gate on the qubits F 2 and z such that z is the target qubit.
  • Apply the inverted CNOT gate on the qubits R n 1 and z such that z is the target qubit.
  • Apply the inverted CNOT gate on the qubits R n 1 and F n / 2 + 1 such that F n / 2 + 1 is the target qubit.
  • Apply the NOT gate on the qubit z.
  • Apply the controlled addition on the registers R, F, and z such that if the ancilla qubit z has value 1 the R register will hold the value R + F and F will be unchanged. If z is 0, both registers will be unchanged.
  • Apply the NOT gate on the qubit z.
  • For j = n / 2 + 1 to 3:
    Apply the SWAP gate on the qubits F j and F j 1 .
  • Apply the CNOT gate on the qubits F 2 and z such that z is the target qubit.
After the last step, the qubits | F n / 2 + 1 through | F 2 will contain the integer square root of a. And the register | R will hold the remainder.

4. Implementation in Qrisp

The non-restoring integer square-root algorithm is implemented as a reversible quantum circuit using the Qrisp framework [14]. Each stage of the classical algorithm corresponds to a specific quantum sub-circuit, composed and executed on quantum registers.
First, we need to use the QuantumCircuit(qubit_amount) function, which initiates a circuit with a given number of qubits (qubit_amount) to create a reusable quantum sub-circuit. Then, we can apply operations on the circuit using the append(gate, qubits) function; parameters represent selected gate and indexes of qubits. Known gates like X (NOT gate), CX (CNOT gate), CCX (Toffoli gate), SWAP, etc., can be applied by calling the corresponding methods of QuantumCircuit; for example, qc.x(0) will apply the NOT gate on the first qubit in the circuit. Finally, to build the circuit the to_gate(name) function is used; the resulting gate can be appended to a different circuit.

4.1. Peres Gate

To facilitate arithmetic operations, a three-qubit Peres gate is defined (the introduced gate that combines Toffoli and CNOT gates). The Qrisp code in the peres_gate function constructs this gate by applying a CCX followed by a CX, and converts the circuit into a reusable gate, as shown in Listing 1.
Listing 1. Implementation of the Peres gate.
Electronics 15 02334 i001
The Peres gate uses one Toffoli gate and therefore has a T-count of 7.

4.2. Reversible Addition Circuit

Using the Peres gate, an n-bit ripple-carry adder circuit is implemented in the function named add_circuit(n), which returns a reversible gate “ADD” acting on 2 n qubits. The gate adds two n-qubit numbers, while leaving the second register unchanged and storing the sum A + B in the first register.
The code in Listing 2 follows a six-step operation. The registers A and B are defined as lists of qubit positions, and the comments “Step 1”–“Step 6” correspond to the stages of the addition algorithm described in Section 2.
Listing 2. Implementation of the reversible n-bit ripple-carry adder.
Electronics 15 02334 i002
The third step involves n 1 Toffoli gates, whereas step 4 uses n 1 Peres gates, resulting in a total T-count of 14 n 14 .
# <... Preparing the circuit ...> is used to avoid boilerplate code and indicates the part of the code that creates a circuit and defines registers.

4.3. Controlled Addition/Subtraction

Some steps in the non-restoring square-root algorithm require usage of the addition and subtraction operations based on a condition (the sign of the current remainder). These steps are implemented in the ctrl_add_sub_circuit(n) function that constructs a reversible ( 2 n + 1 ) -qubit gate, as shown in Listing 3.
Listing 3. Implementation of the controlled addition/subtraction circuit.
Electronics 15 02334 i003
The ADD circuit is always applied, and the only thing that changes based on the condition is the sign of the first argument A. This implies that the T-count of the ctrl_add_sub_circuit circuit remains at 14 n 14 .

4.4. Controlled Addition

The last step of the original algorithm utilizes the controlled addition circuit to make final adjustments of the arguments based on the remainder’s sign. The implementation of the circuit is located in the ctrl_add_circuit function. If the controlled qubit is 1, the function calculates the sum of the two numbers and places the result in the first register A (the second register remains unchanged); otherwise, both registers stay unchanged. The implementation in Listing 4 follows the seven steps of the described controlled adder.
Listing 4. Implementation of the controlled addition circuit.
Electronics 15 02334 i004
The third step uses n 1 Toffoli gates, step 4 uses one Toffoli gate, and the fifth step uses 2 ( n 1 ) = 2 n 2 Toffoli gates. In total, the controlled addition circuit contains 3 n 2 Toffoli gates, which results in a T-count of 21 n 14 .

4.5. Initial Subtraction Stage

The algorithm begins by an initial subtraction on the most significant bits to establish the first partial remainder. The function part1_circuit(n) returns a ( 2 n + 1 ) -qubit operation “PART 1” that prepares the remainder register R and the result register F for the iterative process. The code in Listing 5 shows this initialization step.
Listing 5. Implementation of the initial subtraction stage (PART 1).
Electronics 15 02334 i005
The first three steps are implemented using Qrisp functions x(qubit) and cx(ctrl, target) that implement the NOT and CNOT gates, respectively. The fourth and fifth steps require an inverted CNOT operation (zero-controlled CNOT gate) that is implemented by the zcx() function. The function works as a wrapper for the XGate().control(ctrl_state=0) gate. The gate performs the CNOT operation on the target qubit if the control qubit is in state | 0 . The last step uses the ctrl_add_sub_circuit(4) function that implements the controlled addition/subtraction on the qubits specified in the argument (qubits z, R [ n 4 ] to R [ n 1 ] and F [ 0 ] to F [ 3 ] ).
Only the sixth step involves T gates due to the use of the controlled addition/subtraction circuit on 4 qubits. Therefore, the T-count of the first part is 14 n 14 = 42 .

4.6. Conditional Addition/Subtraction Stage

After initialization, the algorithm processes the remaining bits of the input in pairs. The function part2_circuit(n) generates the looped circuit that handles each subsequent pair of bits, using the control logic to decide on addition or subtraction at each step. The pseudocode is essentially a loop that, for each iteration i, prepares the control signals and then applies a controlled add/subtract on an expanding portion of the registers. The implementation is shown in Listing 6.
Listing 6. Implementation of the conditional addition/subtraction stage (PART 2).
Electronics 15 02334 i006a
Electronics 15 02334 i006b
First, five steps are implemented in a similar manner as in the previous part. The sixth step uses the swap(q1, q2) function to swap the argument qubits. In the last step, the ctrl_add_sub_circuit() function is appended to the circuit with the control qubit z.
Each iteration i performs controlled addition/subtraction on 2 i + 2 qubits, consequently the T-count is 28 i + 14 . The total T-count can be calculated as i = 2 n / 2 1 28 i + 14 = 7 2 n 2 56 .

4.7. Remainder Restoration Stage

After processing all pairs of bits, the algorithm may end in a state where the last operation was a subtraction, potentially leaving a negative remainder. The final step is to restore a correct non-negative remainder. The function part3_circuit(n) produces a sub-circuit “PART 3” that conditionally adds back the last subtracted value. Its implementation is given in Listing 7.
Listing 7. Implementation of the remainder restoration stage (PART 3).
Electronics 15 02334 i007a
Electronics 15 02334 i007b
The operations used in the last part of the algorithm are implemented in a similar way as in the previous parts. The sixth step performs controlled addition on n qubits, which gives a total T-count of 21 n 14 .

4.8. Assembling the Square-Root Circuit

The top-level function square_root_circuit(n) composes the full quantum circuit for the integer square root by concatenating the three stages described above. As shown in Listing 8, it simply appends the gates for initial subtraction, the iterative conditional add/subtract, and remainder restoration in sequence on a common set of registers R, F, and z.
Listing 8. Assembly of the full integer square-root circuit (ISQRT).
Electronics 15 02334 i008
This assembled gate (named “ISQRT”) acts on 2 n + 1 qubits, where n is chosen based on the input size. In our implementation, n is determined as the smallest even number of qubits sufficient to represent the input number a in binary (with two extra bits if needed to accommodate the algorithm’s grouping of bits). The ISQRT circuit can then be applied to quantum registers representing the input and will produce the integer square root in the F register and the remainder in the R register upon measurement.
The total T-count of the implementation can be calculated as the sum of the T-counts of the individual parts of the algorithm. The first part has a T-count of 42, the second part uses 7 2 n 2 56 T gates, and the final part has a T-count of 21 n 14 . Thus, the total T-count of the implemented circuit is
42 + 7 2 n 2 56 + 21 n 14 ,
which simplifies to
7 2 n 2 + 21 n 28 .
This is equal to the theoretical T-count of the original algorithm reported in [7].

4.9. Executing the Circuit in Qrisp

Finally, to use this circuit within Qrisp, we allocate quantum registers and run a quantum session. Qrisp provides the class QuantumFloat(bit_length, exponent) to represent a n-qubit quantum number and QuantumSession() to simulate circuit execution. The isqrt function as input gets a 2’s complement quantum number R with an even number of qubits and as a result returns the square root of the input and transforms the R register into the remainder.
We prepare two quantum registers: F (result) and z (control flag). Initially, F is set to 1 (as required by the algorithm’s initial conditions), and z to 0. We then append the ISQRT gate to a session and execute it. After execution, the square root is in qubits F n / 2 + 1 to F 2 , so we also need to shift F by 2, to correctly return the square root. The wrapper is given in Listing 9.
Listing 9. Qrisp session wrapper for the integer square-root circuit.
Electronics 15 02334 i009
As a result of this function, we obtain a quantum number whose value is the square root. Meanwhile, the input parameter R, which originally held the square, now carries the remainder of the computation. Since the input quantum number R can represent a superposition of multiple values, the resulting output registers (F and R) will also be in a corresponding superposition of square roots and remainders. This property reflects the inherently parallel nature of quantum computation.
In summary, the described implementation in Qrisp provides a complete quantum circuit for the integer square root using the non-restoring method. Each logical step of the classical algorithm is mirrored by a reversible quantum operation, and the Qrisp framework allows us to combine these into a single coherent quantum circuit (ISQRT) that can be applied and tested on arbitrary input values. The implementation illustrates how high-level quantum programming constructs (such as controlled operations and modular circuit composition) can be used to realize complex arithmetic algorithms on quantum hardware.

5. Experimental Validation and Analysis

To evaluate the practical feasibility of the proposed square-root circuit, we performed experimental validation on real NISQ quantum hardware provided by the IBM Quantum Platform. The objective was to verify whether the implemented non-restoring square-root algorithm can be executed on contemporary superconducting quantum processors and to analyze how hardware noise influences the probability of obtaining the correct result.
The experiments were conducted using IBM Quantum Runtime and the Sampler primitive, which returns measurement distributions for executed circuits. Each instance of the algorithm was compiled into a hardware-compatible circuit, transpiled for the selected quantum processor, and executed multiple times to obtain statistically meaningful output distributions.

5.1. Experimental Setup

Currently, IBM Quantum Platform offers three QPUs (quantum processing units): Marrakesh, Fez, and Kingston, which are available cost-free with execution time limitations. These QPUs belong to the Heron r2 generation of superconducting quantum processors and provide 156 physical qubits [20]. The devices are based on superconducting transmon qubits arranged in IBM’s heavy-hex lattice architecture, which limits qubit connectivity in order to reduce crosstalk and improve the reliability of two-qubit gate operations. As a consequence, logical circuits must be mapped to the hardware connectivity graph using hardware-aware transpilation, which may introduce additional routing operations and increase the circuit depth. The processors operate in the noisy intermediate-scale quantum (NISQ) regime, where gate errors, decoherence, and readout imperfections influence the probability of obtaining correct computational results.
The experiments were executed on the IBM Quantum Marrakesh backend. Although Marrakesh does not exhibit the lowest error rates among the publicly accessible IBM Quantum processors, the observed differences in calibration metrics between available backends remain relatively moderate within the context of the conducted experiments. The backend was therefore selected primarily due to its high availability and suitability for repeated execution within the IBM Quantum infrastructure. The calibration data for 12 March 2026 are shown in Table 1.
The implemented square-root algorithm operates on 2 n + 1 qubits, where n denotes the number of bits used to represent the input integer. The input number is a signed integer with an even number of bits. Because current quantum processors operate in the NISQ regime and are limited by gate fidelity, coherence times, and connectivity constraints, the experiments were conducted for relatively small input sizes that remain feasible after hardware-aware compilation. Table 2 shows a total of 12 input values tested. For each n, one perfect square, one single-bit number, one Mersenne number, and one randomly chosen number were evaluated. The evaluation aims to provide a representative sample of execution scenarios. The column "Expected Output" represents the expected output register, where the first bit is the ancilla value, which is always 0; the next n bits represents the expected square root, where n is the input size; and the rest is the remainder.
The algorithm was implemented using the Qrisp framework and exported to a Qiskit-compatible quantum circuit for execution on IBM hardware by calling the method to_qiskit() on the Qrisp quantum circuit object. Before execution, each circuit was transpiled for the target backend using a hardware-aware compilation pipeline, the example of such transpilation is shown in Listing 10.
Listing 10. Hardware-aware transpilation with a preset pass manager.
Electronics 15 02334 i010
This process included qubit mapping, routing operations required to satisfy hardware connectivity constraints, and circuit optimization aimed at reducing circuit depth and the number of two-qubit gates.
All circuits were executed using the IBM Quantum Runtime environment with the Sampler primitive, which returns measurement distributions for executed circuits. For each tested input value a, the compiled circuit was executed with 10,000 measurement shots, producing a probability distribution over all measured bitstrings corresponding to the values stored in the root and remainder registers.
To study the influence of hardware noise and error suppression techniques, the circuits were executed under multiple runtime configurations. In addition to baseline execution, the experiments were repeated with runtime error suppression mechanisms enabled, including dynamical decoupling and Pauli twirling. These techniques were selected because the implemented square-root circuits become deep after hardware-aware transpilation, which increases their exposure to decoherence, idle-time errors, and accumulated two-qubit gate imperfections. Dynamical decoupling can improve performance by inserting pulse sequences into idle intervals, thereby reducing the accumulation of errors associated with relaxation and dephasing while qubits are waiting for subsequent operations. Pauli twirling can improve performance by randomizing coherent and systematic gate errors, converting them into a more stochastic Pauli-like noise channel that is less likely to accumulate constructively over many circuit layers. In this way, dynamical decoupling mainly targets idle-time decoherence, whereas Pauli twirling targets coherent gate-error accumulation. Comparing these configurations makes it possible to assess whether these complementary error suppression mechanisms increase the probability of measuring the correct root and remainder on NISQ hardware. The example of the configuration is shown in Listing 11.
Listing 11. Sampler configuration with dynamical decoupling and Pauli twirling.
Electronics 15 02334 i011
To perform the noise simulation, the high-level Qrisp circuit was first transpiled to the native gate set of the target backend. After transpilation, a Qiskit noise model was constructed from the IBM Marrakesh backend instance. This noise model makes it possible to isolate gate errors, readout errors, and thermal relaxation effects, as well as to perform a full-noise simulation of the target backend. The Aer Simulator was then used to execute the noise simulations, as shown in Listing 12.
Listing 12. Noise-model construction and Aer Simulator execution.
Electronics 15 02334 i012

5.2. Evaluated Metrics

To assess the performance of the implemented quantum algorithm on real hardware, several complementary metrics were evaluated. These metrics capture both the structural properties of the compiled circuits and the quality of the obtained measurement results under realistic noise conditions.
The primary structural metrics include circuit depth, total gate count, and the number of two-qubit gates. The comparison between logical and compiled circuits provides a direct measure of the overhead introduced by hardware constraints. In particular, circuit depth and total gate count quantify the temporal and operational complexity of the computation, whereas the number of two-qubit gates is especially important due to their significantly higher error rates compared to single-qubit operations.
Additionally, detailed gate counts were reported for the Qrisp logical circuit, its decomposition into the Clifford+T gate set, and the circuit transpiled to the native gate set of IBM Marrakesh. These counts make it possible to verify the theoretical T-count and provide a clearer overview of how the logical circuit is mapped onto the native superconducting gate set.
The main performance metric is the success rate, defined as the probability of obtaining the correct full-register output state: P success = N correct / N shots , where N correct denotes the number of measurements corresponding to the expected output and N shots is the total number of circuit executions. This metric directly reflects the practical usability of the algorithm on NISQ hardware and captures the cumulative impact of all noise sources.
To evaluate the impact of qubit relaxation, the ratio between total circuit execution time and the relaxation time T 1 is considered. This dimensionless quantity characterizes the exposure of qubits to decoherence during computation. Higher values indicate an increased probability of energy relaxation events occurring before the circuit finishes, particularly affecting idle qubits that remain unused for extended periods.
In addition to absolute success probability, the structure of the output distribution is analyzed using the dominance ratio R dom = P correct / P next , where P correct is the probability of the expected output state and P next corresponds to the second-most-probable measurement outcome. This metric captures how clearly the correct result stands out from competing erroneous states. Values significantly greater than 1 indicate a well-defined peak in the output distribution, while values close to or smaller than 1 suggest a noise-dominated regime with nearly uniform outcome probabilities.
The effectiveness of noise mitigation techniques relative to the baseline is evaluated by paired improvement metric. For each input value a and calibration window w, the paired improvement was computed as
Δ p j = p mitigation , w p baseline , w ,
where p mitigation , w denotes the success rate obtained using a given mitigation technique and p baseline , w denotes the corresponding baseline success rate measured for the same input instance and calibration window. Positive values of Δ p j indicate an improvement over the baseline execution, while negative values indicate degraded performance.
To quantify run-to-run variability and assess the statistical significance of the observed improvements, the mean paired improvement across calibration windows was reported together with a 95% Student-T confidence interval.
Multiple noise simulations were performed using gate-only, readout-only, relaxation-only, and full-noise models. The simulations were conducted for selected 4-bit and 6-bit test values across multiple random seeds. This allowed the impact of individual error sources to be analyzed separately and compared with theoretical estimates. Due to the substantial depth and gate count of the transpiled circuit, simulations for 8-bit input values were computationally infeasible within the available resources.
Together, these metrics provide a comprehensive evaluation framework, capturing both the resource overhead introduced by hardware constraints and the resulting impact on computational reliability in the presence of realistic quantum noise.

5.3. Experimental Results

Before executing the circuit under noisy conditions and on real quantum hardware, the correctness of the implementation was first verified using the noiseless simulator provided by the Qrisp quantum session. The circuit was tested for all integer input values a in the range from 0 to 2 10 , which corresponds to input bit widths n of 4, 6, 8, and 10. The minimum input bit width required by the circuit design is 4, therefore, the values 0, 1, and 2 were represented using 4-bit registers by padding them with two leading zeros. For each input value, a separate circuit instance was constructed and executed, after which the output registers F, R, and the ancilla register were measured. The measurement result represents a probability distribution over possible values of the root, remainder, and ancilla registers. The value measured in F, representing the possible integer square root, and the value measured in R, representing the possible remainder, were compared against the expected classical values, computed as a and a a 2 , respectively. For all tested input values, the noiseless simulation produced the expected root, remainder, and ancilla value with probability 100 % , where the ancilla register was always measured as 0. This confirms the functional correctness of the implemented circuit before hardware-level noise effects were considered.
Table 3 presents the characteristics of the compiled quantum circuits after hardware-aware transpilation. The reported metrics are divided into two categories: logical (denoted as “L.”) and physical (compiled). Logical metrics correspond to the original circuit generated at the algorithmic level, prior to any hardware constraints, whereas physical metrics describe the circuit after mapping onto a specific quantum device, including routing and optimization overhead.
The difference between these two representations is substantial. In particular, the circuit depth increases by approximately 3.8 × 4.5 × after compilation, depending on the input size. Similarly, the total gate count grows by more than an order of magnitude (from roughly 13 × to 17 × ). This overhead is primarily caused by limited qubit connectivity, which requires insertion of SWAP operations, as well as additional decomposition of high-level gates into native gate sets.
Overall, the comparison between logical and physical metrics highlights the gap between algorithmic designs and their realization on current quantum hardware. Whereas the logical circuit exhibits relatively moderate resource scaling, the compiled circuit incurs substantial overhead, which grows with the number of qubits and ultimately limits practical execution on NISQ devices.
Table 4 presents detailed gate counts for logical Qrisp gates, as well as for their decomposition into Clifford + T gates and transpilation to the native gate set of ibm_marrakesh for different input sizes.
The theoretical T-count shown in Table 4b is calculated as 7 2 n 2 + 21 n 28 [7] and matches the total number of T and T (“t” and “tdg” columns, respectively) gates. This confirms that the implemented circuit preserves the T-count of the original algorithm.
In superconducting quantum hardware, R z rotations are typically implemented virtually, by adjusting the reference frame of the qubit rather than applying a physical gate. As a result, R z gates do not contribute significantly to execution time or error accumulation, despite appearing in large numbers in the compiled circuit.
Figure 9 and Table 5 show the success rates of the full-register match for the input values tested with different noise mitigation techniques across one calibration window. The success rate drops significantly for larger inputs and approaches 0 for n = 8 . This behavior is expected when deep quantum circuits are executed on NISQ hardware, where the accumulated noise grows with the number of operations and qubits involved. For 4-bit values, dynamical decoupling tends to give a better rate of about 0.28 , against 0.22 , 0.15 , and 0.14 for baseline, Pauli twirling, and dynamical decoupling + Pauli twirling executions, respectively. However, the effect is not conclusive due to the high variability in results between input values.
The large run-to-run variability is primarily caused by calibration-dependent hardware noise rather than finite-shot uncertainty. The implemented circuit becomes substantially deeper after transpilation and contains hundreds to thousands of two-qubit gates, making the full-register success probability highly sensitive to small changes in qubit mapping, two-qubit gate errors, readout errors, relaxation times, and idle-time structure. Since success requires the exact complete output bitstring, even a small variation in any of these noise sources can produce a noticeable change in the measured success rate.
Figure 10 and Table 6 illustrate the mean paired improvement in success rate relative to baseline execution for the tested error suppression techniques across five different calibration windows. The results indicate that dynamical decoupling provides the most consistent positive trend among the evaluated techniques. For the smallest input values, the mean improvement obtained with dynamical decoupling is positive, reaching approximately 0.05 0.09 in absolute success probability. However, the confidence intervals for these inputs are relatively large and often cross zero, which means that the observed improvement cannot be regarded as statistically significant at the 95% confidence level. This suggests that dynamical decoupling may be beneficial, but its effect is strongly affected by run-to-run variability.
Pauli twirling alone does not show a consistent improvement over the baseline. For most small input values, the mean paired difference is negative or close to zero, indicating that Pauli twirling either has negligible effect or slightly reduces the probability of obtaining the correct output. The combined use of dynamical decoupling and Pauli twirling also does not consistently outperform dynamical decoupling alone. In most cases, its mean improvement is close to zero or negative, suggesting that the addition of Pauli twirling does not provide a clear advantage for this circuit.
For larger input values, all paired differences converge toward zero. This behavior indicates that, as the circuit size and depth increase, the accumulated effects of gate errors, routing overhead, readout errors, and relaxation dominate the execution. In this regime, the evaluated error suppression techniques are unable to produce a measurable improvement in the success probability. Overall, the paired-difference analysis shows that dynamical decoupling exhibits the most favorable trend, but the large confidence intervals and the near-zero improvements for larger circuits indicate that the mitigation benefit is limited on the tested NISQ hardware.
The dominant sources of errors are gate errors, measurement errors, and qubit relaxation during circuit execution. Figure 11 and Table 7 present isolated noise-model simulations for the implemented circuit. The success probability is evaluated as the probability of obtaining the correct full-register output. The four considered cases are the full-backend-noise model, gate-error-only model, relaxation-noise-only model, and readout-error-only model.
In the readout-error-only model, the success probability remains the highest and does not significantly drop for higher input sizes. This is consistent with the fact that readout errors occur only during final measurement and do not accumulate throughout the circuit. If e readout is the average readout error and N m is the number of measured qubits, the expected success probability can be approximated as
P readout ( 1 e readout ) N m .
Given a median readout error of 0.013 in the calibration window, we can estimate the readout error as ( 1 0.013 ) 9 = 0.89 for 4-bit input values and ( 1 0.013 ) 13 for 6-bit input values, which is relatively close to the given results.
In the relaxation-noise-only model, the success probability is about 0.75 for the 4-bit inputs and decreases to approximately 0.40 for the 6-bit inputs. This indicates that relaxation becomes more significant as circuit duration increases, since qubits remain exposed to T 1 -related decay for a longer time. However, relaxation noise alone is not sufficient to account for the strongest performance degradation. This may also explain why the improvement obtained from dynamical decoupling is relatively limited: if relaxation is not the dominant error source, suppressing idle-time decoherence would only partially improve the overall success probability, while other errors would continue to contribute substantially to the observed degradation.
In the gate-error-only model, the success probability is approximately 0.42 for the 4-bit inputs, but drops below 0.10 for the 6-bit inputs. This sharp decline is caused by the significant increase in the transpiled gate count and circuit depth for n = 6 . Since gate errors accumulate multiplicatively over the sequence of operations, the larger number of gates, especially two-qubit gates, produces a much higher effective failure probability.
In the full-backend-noise model, the success probability is the lowest overall: around 0.35 for 4-bit inputs and below 0.10 for 6-bit inputs. Its close agreement with the gate-error-only curve indicates that gate errors are the dominant source of degradation, while relaxation and readout errors provide additional but smaller contributions.
Overall, the simulations show that the circuit is mainly limited by accumulated gate noise after hardware-aware transpilation. Readout errors have a comparatively small effect; relaxation noise becomes more visible for deeper circuits.
Figure 12 shows the ratio between the total circuit execution time and the qubit relaxation time T 1 . The ratio starts from approximately 0.19 for the smallest input values and reaches about 0.66 for the largest ones. The T circ / T 1 ratio increases with input size because the total depth of the transpiled circuit grows after hardware-aware compilation, extending the circuit execution time and therefore increasing the duration over which qubits are exposed to T 1 -related relaxation. The increase in the ratio T c i r c / T 1 has important implications for the susceptibility of qubits to decoherence during circuit execution. As the duration of the circuit approaches the characteristic relaxation time T 1 , the probability that a qubit undergoes energy relaxation before the computation finishes increases significantly. This effect is particularly relevant for qubits that remain idle for substantial portions of the circuit, such as ancillary qubit, which is changed only in specific stages of the algorithm. Although such qubits may participate in relatively few gate operations, they still remain exposed to environmental noise throughout the full execution time of the circuit. This phenomenon highlights an important challenge in NISQ devices: qubits that are logically inactive are still physically evolving and may decohere, which can ultimately affect the reliability of the final measurement outcomes.
Figure 13 shows the dominance ratio P c o r r e c t / P n e x t , where P c o r r e c t denotes the probability of measuring the expected output state and P n e x t corresponds to the probability of the second-most-probable measured outcome for the baseline execution. The exact numerical values are reported in Table 8. This metric captures how strongly the output distribution is biased toward the correct solution. For small input values, the ratio is significantly greater than 1, in some cases exceeding 5, indicating that the correct result is clearly distinguishable from competing erroneous states and remains the dominant outcome. As the input size increases, the ratio decreases, reflecting the accumulation of gate errors, readout errors, and relaxation effects in deeper circuits. For intermediate values, the ratio often remains above 1, suggesting that the correct output can still be identified as the most likely measurement outcome despite a reduced absolute success probability. However, for the largest tested inputs, the ratio approaches 0, and the output distribution becomes effectively noise-dominated. In this regime, the probabilities of different outcomes are nearly uniform, indicating that the measured results are close to random and no longer exhibit a meaningful preference for the correct state.
The experimental evaluation demonstrates that the implemented non-restoring quantum square-root circuit is practically executable on current NISQ hardware for small input sizes. The results confirm that the logical design preserves its theoretical efficiency after implementation in the Qrisp framework, while also highlighting a substantial gap between logical and physical circuit representations due to hardware-aware compilation overhead. As observed, the depth of the circuit and the gate count increase significantly after transpilation, leading to a rapid degradation of the success probability with increasing input size. The dominant error sources include two-qubit gate errors, readout inaccuracies, and qubit relaxation effects, which collectively limit the scalability. Among the evaluated error mitigation techniques, dynamical decoupling provides limited improvement, indicating that decoherence during idle periods can be an important factor affecting performance. Overall, the findings validate the feasibility of executing quantum arithmetic circuits on existing devices, while simultaneously emphasizing the need for improved hardware reliability and compilation strategies to enable larger-scale computations.

6. Conclusions

This work translated a previously proposed resource-efficient theoretical design for quantum square-root computation into an executable implementation in the Qrisp framework and examined its behavior under realistic hardware conditions. The study showed that the non-restoring approach can be expressed in a modular way through reversible arithmetic building blocks, allowing the complete circuit to be assembled from reusable components such as adders, subtractors, controlled add/subtract modules, and controlled addition circuits. This confirms that high-level quantum programming environments can be used not only for algorithm prototyping, but also for preserving the structural properties of optimized arithmetic constructions.
From the algorithmic perspective, the implemented circuit retains the main advantages of the design proposed in prior work, namely, the absence of garbage outputs and a low T-count relative to alternative square-root constructions. In particular, the Thapliyal-based non-restoring square-root circuit has a T-count of 7 2 n 2 + 21 n 28 , while the comparable designs considered in the prior evaluation required higher T-counts: 7 n 2 + 14 n for the design of Sultana et al.; 420 n 2 + 168 n 364 for the design of Bhaskar et al.; 21 4 n 2 + 105 2 n 42 for the first design of AnanthaLakshmi et al.; and 21 4 n 2 + 7 2 n 14 for the second design of AnanthaLakshmi et al. [8,12,13].
The hardware experiments additionally revealed the current execution limits of such arithmetic-heavy circuits on NISQ devices. Although correct results were obtained for small instances, the compiled circuits experienced a substantial increase in depth and gate count after hardware-aware transpilation. This overhead, together with two-qubit gate imperfections, readout errors, and decoherence, rapidly reduced the probability of observing the correct output as the problem size increased. In particular, the results indicate that physical execution costs remain the main obstacle to scaling even theoretically efficient arithmetic circuits on present-day superconducting processors.
Overall, the presented results show that implementing optimized square-root circuits in contemporary quantum software stacks is feasible and that such implementations can be validated on real hardware. The empirical findings are most directly supported for IBM Marrakesh, since all real-hardware executions were performed on this backend. More cautiously, they may be regarded as indicative for closely related IBM Heron r2 devices, such as Fez and Kingston, because these systems share the same processor generation and architectural characteristics. At the same time, quantitative performance should not be assumed to transfer unchanged across devices, since calibration metrics, routing decisions, and backend-specific noise profiles may affect the observed success rates. These results emphasize that progress in compilation methods, qubit connectivity handling, gate fidelity, and error suppression will be essential before larger instances of quantum arithmetic can be executed reliably. The developed implementation can therefore serve as a reference realization of the previously proposed non-restoring square-root circuit, supporting modular circuit construction, Qiskit export, and further experimental studies of arithmetic-heavy quantum algorithms on NISQ hardware.

Author Contributions

Conceptualization, H.K. and M.N.; methodology, H.K. and M.N.; software, H.K.; validation, H.K.; formal analysis, H.K.; investigation, H.K. and M.N.; resources, H.K.; data curation, H.K.; writing—original draft preparation, H.K. and M.N.; writing—review and editing, H.K. and M.N.; visualization, H.K.; supervision, M.N.; project administration, M.N.; funding acquisition, M.N. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the EU Horizon Europe Framework Program under Grant Agreement no. 101119547 (PQ-REACT) and no. 101225759 (PQ-NEXT). The research was also supported by the Research Council of Lithuania (LMTLT), agreement no. S-ITP-25-7.

Data Availability Statement

The source code, raw results, and metadata are openly available in URL: https://doi.org/10.5281/zenodo.20209072 (accessed on 24 May 2026).

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Shor, P.W. Polynomial-Time Algorithms for Prime Factorization and Discrete Logarithms on a Quantum Computer. SIAM J. Comput. 1997, 26, 1484–1509. [Google Scholar] [CrossRef]
  2. Grover, L.K. Quantum Mechanics Helps in Searching for a Needle in a Haystack. Phys. Rev. Lett. 1997, 79, 325–328. [Google Scholar] [CrossRef]
  3. Montanaro, A. Quantum algorithms: An overview. npj Quantum Inf. 2015, 2, 15023. [Google Scholar] [CrossRef]
  4. Harrow, A.W.; Hassidim, A.; Lloyd, S. Quantum Algorithm for Linear Systems of Equations. Phys. Rev. Lett. 2009, 103, 150502. [Google Scholar] [CrossRef]
  5. Pawlitko, P.; Moćko, N.; Niemiec, M.; Chołda, P. Implementation and Analysis of Regev’s Quantum Factorization Algorithm. arXiv 2025, arXiv:2502.09772. [Google Scholar]
  6. Krzyszkowski, J.; Niemiec, M. Analysis of Surface Code Algorithms on Quantum Hardware Using the Qrisp Framework. Electronics 2025, 14, 4707. [Google Scholar] [CrossRef]
  7. Muñoz-Coreas, E.; Thapliyal, H. T-count and Qubit Optimized Quantum Circuit Design of the Non-Restoring Square Root Algorithm. J. Emerg. Technol. Comput. Syst. 2018, 14, 36. [Google Scholar] [CrossRef]
  8. Bhaskar, M.K.; Hadfield, S.; Papageorgiou, A.; Petras, I. Quantum algorithms and circuits for scientific computing. Quantum Info. Comput. 2016, 16, 197–236. [Google Scholar] [CrossRef]
  9. Sun, G.; Su, S.; Xu, M. Quantum Algorithm for Polynomial Root Finding Problem. In Proceedings of the 2014 Tenth International Conference on Computational Intelligence and Security, Kunming, China, 15–16 November 2014; pp. 469–473. [Google Scholar] [CrossRef]
  10. van Dam, W.; Hallgren, S. Efficient Quantum Algorithms for Shifted Quadratic Character Problems. arXiv 2000, arXiv:quantph/0011067. [Google Scholar]
  11. Wang, S.; Wang, Z.; Li, W.; Fan, L.; Wei, Z.; Gu, Y. Quantum fast Poisson solver: The algorithm and complete and modular circuit design. Quantum Inf. Process. 2020, 19, 170. [Google Scholar] [CrossRef]
  12. AnanthaLakshmi, A.; Sudha, G.F. A novel power efficient 0.64-GFlops fused 32-bit reversible floating point arithmetic unit architecture for digital signal processing applications. Microprocess. Microsyst. 2017, 51, 366–385. [Google Scholar] [CrossRef]
  13. Sultana, S.; Radecka, K. Reversible implementation of square-root circuit. In Proceedings of the 2011 18th IEEE International Conference on Electronics, Circuits, and Systems, Beirut, Lebanon, 11–14 December 2011; pp. 141–144. [Google Scholar] [CrossRef]
  14. Seidel, R.; Bock, S.; Zander, R.; Petrič, M.; Steinmann, N.; Tcholtchev, N.; Hauswirth, M. Qrisp: A Framework for Compilable High-Level Programming of Gate-Based Quantum Computers. arXiv 2024, arXiv:quant-ph/2406.14792. [Google Scholar]
  15. Amy, M.; Maslov, D.; Mosca, M.; Roetteler, M. A Meet-in-the-Middle Algorithm for Fast Synthesis of Depth-Optimal Quantum Circuits. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 2013, 32, 818–830. [Google Scholar] [CrossRef]
  16. Thapliyal, H.; Ranganathan, N. Design of efficient reversible logic-based binary and BCD adder circuits. ACM J. Emerg. Technol. Comput. Syst. 2013, 9, 17. [Google Scholar] [CrossRef] [PubMed]
  17. Thapliyal, H. Mapping of Subtractor and Adder-Subtractor Circuits on Reversible Quantum Gates; Springer: Berlin/Heidelberg, Germany, 2016; Volume 9570, pp. 10–34. [Google Scholar] [CrossRef]
  18. Muñoz-Coreas, E.; Thapliyal, H. Quantum Circuit Design of a T-count Optimized Integer Multiplier. IEEE Trans. Comput. 2019, 68, 729–739. [Google Scholar] [CrossRef]
  19. Samavi, S.; Sadrabadi, A.; Fanian, A. Modular array structure for non-restoring square root circuit. J. Syst. Archit. 2008, 54, 957–966. [Google Scholar] [CrossRef]
  20. IBM. Processor Types. 2026. Available online: https://quantum.cloud.ibm.com/docs/en/guides/processor-types (accessed on 28 March 2026).
Figure 1. NOT (a) and CNOT (b) gates.
Figure 1. NOT (a) and CNOT (b) gates.
Electronics 15 02334 g001
Figure 2. The SWAP gate (a) and it’s decomposition (b).
Figure 2. The SWAP gate (a) and it’s decomposition (b).
Electronics 15 02334 g002
Figure 3. The Hadamard gate.
Figure 3. The Hadamard gate.
Electronics 15 02334 g003
Figure 4. The Toffoli gate (a) and it’s decomposition (b).
Figure 4. The Toffoli gate (a) and it’s decomposition (b).
Electronics 15 02334 g004
Figure 5. The Peres gate (a) and it’s decomposition (b).
Figure 5. The Peres gate (a) and it’s decomposition (b).
Electronics 15 02334 g005
Figure 6. Example 4-bit addition circuit.
Figure 6. Example 4-bit addition circuit.
Electronics 15 02334 g006
Figure 7. Example of 4-bit subtraction circuit.
Figure 7. Example of 4-bit subtraction circuit.
Electronics 15 02334 g007
Figure 8. Example 4-bit controlled addition circuit.
Figure 8. Example 4-bit controlled addition circuit.
Electronics 15 02334 g008
Figure 9. Comparison of success rates for different runtime error suppression techniques across one calibration window. Error bars show 95% Wilson binomial confidence intervals computed from 10,000 measurement shots.
Figure 9. Comparison of success rates for different runtime error suppression techniques across one calibration window. Error bars show 95% Wilson binomial confidence intervals computed from 10,000 measurement shots.
Electronics 15 02334 g009
Figure 10. Paired improvement in success rate relative to baseline execution for different runtime error suppression configurations. Error bars show 95% Student-t confidence interval.
Figure 10. Paired improvement in success rate relative to baseline execution for different runtime error suppression configurations. Error bars show 95% Student-t confidence interval.
Electronics 15 02334 g010
Figure 11. Comparison of the different noise model simulations. Error bars show 95% Student-t confidence interval.
Figure 11. Comparison of the different noise model simulations. Error bars show 95% Student-t confidence interval.
Electronics 15 02334 g011
Figure 12. Ratio of total execution time T c i r c to T 1 .
Figure 12. Ratio of total execution time T c i r c to T 1 .
Electronics 15 02334 g012
Figure 13. Dominance ratio between the probability of the correct full-register output state and the probability of the second-most-probable measured state.
Figure 13. Dominance ratio between the probability of the correct full-register output state and the probability of the second-most-probable measured state.
Electronics 15 02334 g013
Table 1. Calibration data for IBM Marrakesh.
Table 1. Calibration data for IBM Marrakesh.
MetricMedian
T 1 179.02 μs
T 2 86.58 μs
Readout assignment error 1.07 × 10 2
P ( meas 0 prep 1 ) 1.61 × 10 2
P ( meas 1 prep 0 ) 3.42 × 10 3
Identity gate error 3.00 × 10 4
RX gate error 3.00 × 10 4
SX gate error 3.00 × 10 4
X gate error 3.00 × 10 4
Measurement error 1.07 × 10 2
Readout length2584 ns
Single-qubit gate length36 ns
CZ gate error (2-qubit couplings) 2.37 × 10 3
RZZ gate error (2-qubit couplings) 4.72 × 10 3
2-qubit gate length68 ns
Table 2. Input values used in the evaluation.
Table 2. Input values used in the evaluation.
Input Value aBitstringRootRemainderExpected Output
3001112000010010
4010020000100000
5010121000100001
7011123000100011
9001001300000011000000
16010000400000100000000
23010111470000100000111
31011111560000101000110
32001000005700000010100000111
49001100017000000011100000000
71010001118700000100000000111
1270111111111600000101100000110
Table 3. Compiled circuit characteristics.
Table 3. Compiled circuit characteristics.
aQubitsDepthL. Depth2q GatesGatesL. Gates
39624164274116264
49630164274115964
59645164283123664
79634164277117564
91313143236742747132
161313493236422670132
231313073236542702132
311313473236692760132
3217229651912324842219
4917230251912094820219
7117234651912314861219
12717219851911664649219
Table 4. Gate-count comparison of the implemented Qrisp circuit across logical, Clifford + T, and IBM Marrakesh native representations.
Table 4. Gate-count comparison of the implemented Qrisp circuit across logical, Clifford + T, and IBM Marrakesh native representations.
(a) Logical gate counts of the Qrisp circuit
n 2CXCXSWAPX
4164134
6329064
852153104
(b) T-Count and Clifford + T decomposition gates counts
n T-CountCXhttdgX
411214632644818
6224300641289626
836449510420815634
(c) IBM Marrakesh native gate counts after transpilation
n CZRZSXX
43172895737
6763588138517
81341959241928
Table 5. Success rates for different runtime error suppression techniques. Uncertainties denote 95% Wilson binomial confidence intervals computed from 10,000 measurement shots.
Table 5. Success rates for different runtime error suppression techniques. Uncertainties denote 95% Wilson binomial confidence intervals computed from 10,000 measurement shots.
Input aBaselineDynamical DecouplingPauli TwirlingDD + PT
3 0.2457 0.0083 + 0.0085 0.3240 0.0091 + 0.0092 0.1259 0.0064 + 0.0066 0.1459 0.0068 + 0.0071
4 0.1677 0.0072 + 0.0075 0.3047 0.0089 + 0.0091 0.1050 0.0059 + 0.0062 0.1137 0.0061 + 0.0064
5 0.2541 0.0084 + 0.0086 0.2470 0.0084 + 0.0085 0.1363 0.0066 + 0.0069 0.1266 0.0064 + 0.0067
7 0.2217 0.0080 + 0.0082 0.2380 0.0082 + 0.0084 0.2451 0.0083 + 0.0085 0.1600 0.0071 + 0.0073
9 0.0062 0.0014 + 0.0017 0.0196 0.0025 + 0.0029 0.0078 0.0015 + 0.0019 0.0071 0.0015 + 0.0018
16 0.0069 0.0014 + 0.0018 0.0105 0.0018 + 0.0022 0.0024 0.0008 + 0.0012 0.0044 0.0011 + 0.0015
23 0.0150 0.0022 + 0.0026 0.0110 0.0019 + 0.0022 0.0053 0.0012 + 0.0016 0.0046 0.0011 + 0.0015
31 0.0029 0.0009 + 0.0013 0.0067 0.0014 + 0.0018 0.0021 0.0007 + 0.0011 0.0028 0.0009 + 0.0012
32 0.0000 0.0000 + 0.0004 0.0000 0.0000 + 0.0004 0.0000 0.0000 + 0.0004 0.0000 0.0000 + 0.0004
49 0.0001 0.0001 + 0.0005 0.0001 0.0001 + 0.0005 0.0000 0.0000 + 0.0004 0.0000 0.0000 + 0.0004
71 0.0001 0.0001 + 0.0005 0.0000 0.0000 + 0.0004 0.0000 0.0000 + 0.0004 0.0000 0.0000 + 0.0004
127 0.0000 0.0000 + 0.0004 0.0002 0.0001 + 0.0005 0.0000 0.0000 + 0.0004 0.0000 0.0000 + 0.0004
Table 6. Paired improvement in success rate relative to baseline execution for different runtime error suppression configurations. Values are reported as mean ± 95% Student-t confidence interval and rounded to three decimal places.
Table 6. Paired improvement in success rate relative to baseline execution for different runtime error suppression configurations. Values are reported as mean ± 95% Student-t confidence interval and rounded to three decimal places.
Input aDynamical DecouplingDD + PTPauli Twirling
3 0.054 ± 0.087 0.050 ± 0.070 0.061 ± 0.059
4 0.092 ± 0.096 0.003 ± 0.081 0.009 ± 0.055
5 0.054 ± 0.094 0.045 ± 0.109 0.052 ± 0.070
7 0.095 ± 0.107 0.006 ± 0.084 0.003 ± 0.029
9 0.003 ± 0.007 0.000 ± 0.001 0.000 ± 0.001
16 0.003 ± 0.004 0.001 ± 0.003 0.001 ± 0.002
23 0.001 ± 0.006 0.003 ± 0.006 0.003 ± 0.005
31 0.002 ± 0.003 0.001 ± 0.002 0.001 ± 0.002
32 0.000 ± 0.000 0.000 ± 0.000 0.000 ± 0.000
49 0.000 ± 0.000 0.000 ± 0.000 0.000 ± 0.000
71 0.000 ± 0.000 0.000 ± 0.000 0.000 ± 0.000
127 0.000 ± 0.000 0.000 ± 0.000 0.000 ± 0.000
Table 7. Comparison of the different noise model simulations. Values are reported as mean ± 95% Student-t confidence interval and rounded to three decimal places.
Table 7. Comparison of the different noise model simulations. Values are reported as mean ± 95% Student-t confidence interval and rounded to three decimal places.
Input aFull Noise ModelGate-OnlyRelaxation-OnlyReadout-Only
3 0.356 ± 0.043 0.419 ± 0.052 0.764 ± 0.027 0.819 ± 0.029
4 0.348 ± 0.040 0.426 ± 0.047 0.751 ± 0.028 0.808 ± 0.027
5 0.353 ± 0.044 0.420 ± 0.051 0.759 ± 0.029 0.815 ± 0.029
7 0.348 ± 0.043 0.416 ± 0.049 0.757 ± 0.027 0.815 ± 0.031
9 0.067 ± 0.039 0.093 ± 0.049 0.392 ± 0.129 0.733 ± 0.053
16 0.064 ± 0.033 0.093 ± 0.049 0.379 ± 0.124 0.736 ± 0.050
23 0.071 ± 0.039 0.098 ± 0.050 0.401 ± 0.129 0.757 ± 0.035
31 0.077 ± 0.036 0.103 ± 0.049 0.440 ± 0.102 0.752 ± 0.033
Table 8. Dominance ratio between the probability of the correct full-register output state and the probability of the second-most-probable measured state. Values are rounded to three decimal places.
Table 8. Dominance ratio between the probability of the correct full-register output state and the probability of the second-most-probable measured state. Values are rounded to three decimal places.
Input Number aDominance Ratio
33.597
45.051
57.677
73.739
91.476
162.464
233.061
312.071
320.000
490.250
710.250
1270.000
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kupryianau, H.; Niemiec, M. Qrisp-Based Implementation and Experimental Evaluation of a T-Count-Optimized Non-Restoring Quantum Square-Root Circuit. Electronics 2026, 15, 2334. https://doi.org/10.3390/electronics15112334

AMA Style

Kupryianau H, Niemiec M. Qrisp-Based Implementation and Experimental Evaluation of a T-Count-Optimized Non-Restoring Quantum Square-Root Circuit. Electronics. 2026; 15(11):2334. https://doi.org/10.3390/electronics15112334

Chicago/Turabian Style

Kupryianau, Heorhi, and Marcin Niemiec. 2026. "Qrisp-Based Implementation and Experimental Evaluation of a T-Count-Optimized Non-Restoring Quantum Square-Root Circuit" Electronics 15, no. 11: 2334. https://doi.org/10.3390/electronics15112334

APA Style

Kupryianau, H., & Niemiec, M. (2026). Qrisp-Based Implementation and Experimental Evaluation of a T-Count-Optimized Non-Restoring Quantum Square-Root Circuit. Electronics, 15(11), 2334. https://doi.org/10.3390/electronics15112334

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop