To evaluate the practical feasibility of the proposed square-root circuit, we performed experimental validation on real NISQ quantum hardware provided by the IBM Quantum Platform. The objective was to verify whether the implemented non-restoring square-root algorithm can be executed on contemporary superconducting quantum processors and to analyze how hardware noise influences the probability of obtaining the correct result.
The experiments were conducted using IBM Quantum Runtime and the Sampler primitive, which returns measurement distributions for executed circuits. Each instance of the algorithm was compiled into a hardware-compatible circuit, transpiled for the selected quantum processor, and executed multiple times to obtain statistically meaningful output distributions.
5.1. Experimental Setup
Currently, IBM Quantum Platform offers three QPUs (quantum processing units): Marrakesh, Fez, and Kingston, which are available cost-free with execution time limitations. These QPUs belong to the Heron r2 generation of superconducting quantum processors and provide 156 physical qubits [
20]. The devices are based on superconducting transmon qubits arranged in IBM’s heavy-hex lattice architecture, which limits qubit connectivity in order to reduce crosstalk and improve the reliability of two-qubit gate operations. As a consequence, logical circuits must be mapped to the hardware connectivity graph using hardware-aware transpilation, which may introduce additional routing operations and increase the circuit depth. The processors operate in the noisy intermediate-scale quantum (NISQ) regime, where gate errors, decoherence, and readout imperfections influence the probability of obtaining correct computational results.
The experiments were executed on the IBM Quantum Marrakesh backend. Although Marrakesh does not exhibit the lowest error rates among the publicly accessible IBM Quantum processors, the observed differences in calibration metrics between available backends remain relatively moderate within the context of the conducted experiments. The backend was therefore selected primarily due to its high availability and suitability for repeated execution within the IBM Quantum infrastructure. The calibration data for 12 March 2026 are shown in
Table 1.
The implemented square-root algorithm operates on
qubits, where
n denotes the number of bits used to represent the input integer. The input number is a signed integer with an even number of bits. Because current quantum processors operate in the NISQ regime and are limited by gate fidelity, coherence times, and connectivity constraints, the experiments were conducted for relatively small input sizes that remain feasible after hardware-aware compilation.
Table 2 shows a total of 12 input values tested. For each
n, one perfect square, one single-bit number, one Mersenne number, and one randomly chosen number were evaluated. The evaluation aims to provide a representative sample of execution scenarios. The column "Expected Output" represents the expected output register, where the first bit is the ancilla value, which is always 0; the next
n bits represents the expected square root, where
n is the input size; and the rest is the remainder.
The algorithm was implemented using the Qrisp framework and exported to a Qiskit-compatible quantum circuit for execution on IBM hardware by calling the method to_qiskit() on the Qrisp quantum circuit object. Before execution, each circuit was transpiled for the target backend using a hardware-aware compilation pipeline, the example of such transpilation is shown in Listing 10.
| Listing 10. Hardware-aware transpilation with a preset pass manager. |
![Electronics 15 02334 i010 Electronics 15 02334 i010]() |
This process included qubit mapping, routing operations required to satisfy hardware connectivity constraints, and circuit optimization aimed at reducing circuit depth and the number of two-qubit gates.
All circuits were executed using the IBM Quantum Runtime environment with the Sampler primitive, which returns measurement distributions for executed circuits. For each tested input value a, the compiled circuit was executed with 10,000 measurement shots, producing a probability distribution over all measured bitstrings corresponding to the values stored in the root and remainder registers.
To study the influence of hardware noise and error suppression techniques, the circuits were executed under multiple runtime configurations. In addition to baseline execution, the experiments were repeated with runtime error suppression mechanisms enabled, including dynamical decoupling and Pauli twirling. These techniques were selected because the implemented square-root circuits become deep after hardware-aware transpilation, which increases their exposure to decoherence, idle-time errors, and accumulated two-qubit gate imperfections. Dynamical decoupling can improve performance by inserting pulse sequences into idle intervals, thereby reducing the accumulation of errors associated with relaxation and dephasing while qubits are waiting for subsequent operations. Pauli twirling can improve performance by randomizing coherent and systematic gate errors, converting them into a more stochastic Pauli-like noise channel that is less likely to accumulate constructively over many circuit layers. In this way, dynamical decoupling mainly targets idle-time decoherence, whereas Pauli twirling targets coherent gate-error accumulation. Comparing these configurations makes it possible to assess whether these complementary error suppression mechanisms increase the probability of measuring the correct root and remainder on NISQ hardware. The example of the configuration is shown in Listing 11.
| Listing 11. Sampler configuration with dynamical decoupling and Pauli twirling. |
![Electronics 15 02334 i011 Electronics 15 02334 i011]() |
To perform the noise simulation, the high-level Qrisp circuit was first transpiled to the native gate set of the target backend. After transpilation, a Qiskit noise model was constructed from the IBM Marrakesh backend instance. This noise model makes it possible to isolate gate errors, readout errors, and thermal relaxation effects, as well as to perform a full-noise simulation of the target backend. The Aer Simulator was then used to execute the noise simulations, as shown in Listing 12.
| Listing 12. Noise-model construction and Aer Simulator execution. |
![Electronics 15 02334 i012 Electronics 15 02334 i012]() |
5.2. Evaluated Metrics
To assess the performance of the implemented quantum algorithm on real hardware, several complementary metrics were evaluated. These metrics capture both the structural properties of the compiled circuits and the quality of the obtained measurement results under realistic noise conditions.
The primary structural metrics include circuit depth, total gate count, and the number of two-qubit gates. The comparison between logical and compiled circuits provides a direct measure of the overhead introduced by hardware constraints. In particular, circuit depth and total gate count quantify the temporal and operational complexity of the computation, whereas the number of two-qubit gates is especially important due to their significantly higher error rates compared to single-qubit operations.
Additionally, detailed gate counts were reported for the Qrisp logical circuit, its decomposition into the Clifford+T gate set, and the circuit transpiled to the native gate set of IBM Marrakesh. These counts make it possible to verify the theoretical T-count and provide a clearer overview of how the logical circuit is mapped onto the native superconducting gate set.
The main performance metric is the success rate, defined as the probability of obtaining the correct full-register output state: where denotes the number of measurements corresponding to the expected output and is the total number of circuit executions. This metric directly reflects the practical usability of the algorithm on NISQ hardware and captures the cumulative impact of all noise sources.
To evaluate the impact of qubit relaxation, the ratio between total circuit execution time and the relaxation time is considered. This dimensionless quantity characterizes the exposure of qubits to decoherence during computation. Higher values indicate an increased probability of energy relaxation events occurring before the circuit finishes, particularly affecting idle qubits that remain unused for extended periods.
In addition to absolute success probability, the structure of the output distribution is analyzed using the dominance ratio where is the probability of the expected output state and corresponds to the second-most-probable measurement outcome. This metric captures how clearly the correct result stands out from competing erroneous states. Values significantly greater than 1 indicate a well-defined peak in the output distribution, while values close to or smaller than 1 suggest a noise-dominated regime with nearly uniform outcome probabilities.
The effectiveness of noise mitigation techniques relative to the baseline is evaluated by paired improvement metric. For each input value
a and calibration window
w, the paired improvement was computed as
where
denotes the success rate obtained using a given mitigation technique and
denotes the corresponding baseline success rate measured for the same input instance and calibration window. Positive values of
indicate an improvement over the baseline execution, while negative values indicate degraded performance.
To quantify run-to-run variability and assess the statistical significance of the observed improvements, the mean paired improvement across calibration windows was reported together with a 95% Student-T confidence interval.
Multiple noise simulations were performed using gate-only, readout-only, relaxation-only, and full-noise models. The simulations were conducted for selected 4-bit and 6-bit test values across multiple random seeds. This allowed the impact of individual error sources to be analyzed separately and compared with theoretical estimates. Due to the substantial depth and gate count of the transpiled circuit, simulations for 8-bit input values were computationally infeasible within the available resources.
Together, these metrics provide a comprehensive evaluation framework, capturing both the resource overhead introduced by hardware constraints and the resulting impact on computational reliability in the presence of realistic quantum noise.
5.3. Experimental Results
Before executing the circuit under noisy conditions and on real quantum hardware, the correctness of the implementation was first verified using the noiseless simulator provided by the Qrisp quantum session. The circuit was tested for all integer input values a in the range from 0 to , which corresponds to input bit widths n of 4, 6, 8, and 10. The minimum input bit width required by the circuit design is 4, therefore, the values 0, 1, and 2 were represented using 4-bit registers by padding them with two leading zeros. For each input value, a separate circuit instance was constructed and executed, after which the output registers F, R, and the ancilla register were measured. The measurement result represents a probability distribution over possible values of the root, remainder, and ancilla registers. The value measured in F, representing the possible integer square root, and the value measured in R, representing the possible remainder, were compared against the expected classical values, computed as and , respectively. For all tested input values, the noiseless simulation produced the expected root, remainder, and ancilla value with probability , where the ancilla register was always measured as 0. This confirms the functional correctness of the implemented circuit before hardware-level noise effects were considered.
Table 3 presents the characteristics of the compiled quantum circuits after hardware-aware transpilation. The reported metrics are divided into two categories: logical (denoted as “L.”) and physical (compiled). Logical metrics correspond to the original circuit generated at the algorithmic level, prior to any hardware constraints, whereas physical metrics describe the circuit after mapping onto a specific quantum device, including routing and optimization overhead.
The difference between these two representations is substantial. In particular, the circuit depth increases by approximately – after compilation, depending on the input size. Similarly, the total gate count grows by more than an order of magnitude (from roughly to ). This overhead is primarily caused by limited qubit connectivity, which requires insertion of SWAP operations, as well as additional decomposition of high-level gates into native gate sets.
Overall, the comparison between logical and physical metrics highlights the gap between algorithmic designs and their realization on current quantum hardware. Whereas the logical circuit exhibits relatively moderate resource scaling, the compiled circuit incurs substantial overhead, which grows with the number of qubits and ultimately limits practical execution on NISQ devices.
Table 4 presents detailed gate counts for logical Qrisp gates, as well as for their decomposition into Clifford + T gates and transpilation to the native gate set of
ibm_marrakesh for different input sizes.
The theoretical T-count shown in
Table 4b is calculated as
[
7] and matches the total number of T and T
† (“t” and “tdg” columns, respectively) gates. This confirms that the implemented circuit preserves the T-count of the original algorithm.
In superconducting quantum hardware, rotations are typically implemented virtually, by adjusting the reference frame of the qubit rather than applying a physical gate. As a result, gates do not contribute significantly to execution time or error accumulation, despite appearing in large numbers in the compiled circuit.
Figure 9 and
Table 5 show the success rates of the full-register match for the input values tested with different noise mitigation techniques across one calibration window. The success rate drops significantly for larger inputs and approaches 0 for
. This behavior is expected when deep quantum circuits are executed on NISQ hardware, where the accumulated noise grows with the number of operations and qubits involved. For 4-bit values, dynamical decoupling tends to give a better rate of about
, against
,
, and
for baseline, Pauli twirling, and dynamical decoupling + Pauli twirling executions, respectively. However, the effect is not conclusive due to the high variability in results between input values.
The large run-to-run variability is primarily caused by calibration-dependent hardware noise rather than finite-shot uncertainty. The implemented circuit becomes substantially deeper after transpilation and contains hundreds to thousands of two-qubit gates, making the full-register success probability highly sensitive to small changes in qubit mapping, two-qubit gate errors, readout errors, relaxation times, and idle-time structure. Since success requires the exact complete output bitstring, even a small variation in any of these noise sources can produce a noticeable change in the measured success rate.
Figure 10 and
Table 6 illustrate the mean paired improvement in success rate relative to baseline execution for the tested error suppression techniques across five different calibration windows. The results indicate that dynamical decoupling provides the most consistent positive trend among the evaluated techniques. For the smallest input values, the mean improvement obtained with dynamical decoupling is positive, reaching approximately
–
in absolute success probability. However, the confidence intervals for these inputs are relatively large and often cross zero, which means that the observed improvement cannot be regarded as statistically significant at the 95% confidence level. This suggests that dynamical decoupling may be beneficial, but its effect is strongly affected by run-to-run variability.
Pauli twirling alone does not show a consistent improvement over the baseline. For most small input values, the mean paired difference is negative or close to zero, indicating that Pauli twirling either has negligible effect or slightly reduces the probability of obtaining the correct output. The combined use of dynamical decoupling and Pauli twirling also does not consistently outperform dynamical decoupling alone. In most cases, its mean improvement is close to zero or negative, suggesting that the addition of Pauli twirling does not provide a clear advantage for this circuit.
For larger input values, all paired differences converge toward zero. This behavior indicates that, as the circuit size and depth increase, the accumulated effects of gate errors, routing overhead, readout errors, and relaxation dominate the execution. In this regime, the evaluated error suppression techniques are unable to produce a measurable improvement in the success probability. Overall, the paired-difference analysis shows that dynamical decoupling exhibits the most favorable trend, but the large confidence intervals and the near-zero improvements for larger circuits indicate that the mitigation benefit is limited on the tested NISQ hardware.
The dominant sources of errors are gate errors, measurement errors, and qubit relaxation during circuit execution.
Figure 11 and
Table 7 present isolated noise-model simulations for the implemented circuit. The success probability is evaluated as the probability of obtaining the correct full-register output. The four considered cases are the full-backend-noise model, gate-error-only model, relaxation-noise-only model, and readout-error-only model.
In the readout-error-only model, the success probability remains the highest and does not significantly drop for higher input sizes. This is consistent with the fact that readout errors occur only during final measurement and do not accumulate throughout the circuit. If
is the average readout error and
is the number of measured qubits, the expected success probability can be approximated as
Given a median readout error of in the calibration window, we can estimate the readout error as for 4-bit input values and for 6-bit input values, which is relatively close to the given results.
In the relaxation-noise-only model, the success probability is about for the 4-bit inputs and decreases to approximately for the 6-bit inputs. This indicates that relaxation becomes more significant as circuit duration increases, since qubits remain exposed to -related decay for a longer time. However, relaxation noise alone is not sufficient to account for the strongest performance degradation. This may also explain why the improvement obtained from dynamical decoupling is relatively limited: if relaxation is not the dominant error source, suppressing idle-time decoherence would only partially improve the overall success probability, while other errors would continue to contribute substantially to the observed degradation.
In the gate-error-only model, the success probability is approximately for the 4-bit inputs, but drops below for the 6-bit inputs. This sharp decline is caused by the significant increase in the transpiled gate count and circuit depth for . Since gate errors accumulate multiplicatively over the sequence of operations, the larger number of gates, especially two-qubit gates, produces a much higher effective failure probability.
In the full-backend-noise model, the success probability is the lowest overall: around for 4-bit inputs and below for 6-bit inputs. Its close agreement with the gate-error-only curve indicates that gate errors are the dominant source of degradation, while relaxation and readout errors provide additional but smaller contributions.
Overall, the simulations show that the circuit is mainly limited by accumulated gate noise after hardware-aware transpilation. Readout errors have a comparatively small effect; relaxation noise becomes more visible for deeper circuits.
Figure 12 shows the ratio between the total circuit execution time and the qubit relaxation time
. The ratio starts from approximately
for the smallest input values and reaches about
for the largest ones. The
ratio increases with input size because the total depth of the transpiled circuit grows after hardware-aware compilation, extending the circuit execution time and therefore increasing the duration over which qubits are exposed to
-related relaxation. The increase in the ratio
has important implications for the susceptibility of qubits to decoherence during circuit execution. As the duration of the circuit approaches the characteristic relaxation time
, the probability that a qubit undergoes energy relaxation before the computation finishes increases significantly. This effect is particularly relevant for qubits that remain idle for substantial portions of the circuit, such as ancillary qubit, which is changed only in specific stages of the algorithm. Although such qubits may participate in relatively few gate operations, they still remain exposed to environmental noise throughout the full execution time of the circuit. This phenomenon highlights an important challenge in NISQ devices: qubits that are logically inactive are still physically evolving and may decohere, which can ultimately affect the reliability of the final measurement outcomes.
Figure 13 shows the dominance ratio
, where
denotes the probability of measuring the expected output state and
corresponds to the probability of the second-most-probable measured outcome for the baseline execution. The exact numerical values are reported in
Table 8. This metric captures how strongly the output distribution is biased toward the correct solution. For small input values, the ratio is significantly greater than 1, in some cases exceeding 5, indicating that the correct result is clearly distinguishable from competing erroneous states and remains the dominant outcome. As the input size increases, the ratio decreases, reflecting the accumulation of gate errors, readout errors, and relaxation effects in deeper circuits. For intermediate values, the ratio often remains above 1, suggesting that the correct output can still be identified as the most likely measurement outcome despite a reduced absolute success probability. However, for the largest tested inputs, the ratio approaches 0, and the output distribution becomes effectively noise-dominated. In this regime, the probabilities of different outcomes are nearly uniform, indicating that the measured results are close to random and no longer exhibit a meaningful preference for the correct state.
The experimental evaluation demonstrates that the implemented non-restoring quantum square-root circuit is practically executable on current NISQ hardware for small input sizes. The results confirm that the logical design preserves its theoretical efficiency after implementation in the Qrisp framework, while also highlighting a substantial gap between logical and physical circuit representations due to hardware-aware compilation overhead. As observed, the depth of the circuit and the gate count increase significantly after transpilation, leading to a rapid degradation of the success probability with increasing input size. The dominant error sources include two-qubit gate errors, readout inaccuracies, and qubit relaxation effects, which collectively limit the scalability. Among the evaluated error mitigation techniques, dynamical decoupling provides limited improvement, indicating that decoherence during idle periods can be an important factor affecting performance. Overall, the findings validate the feasibility of executing quantum arithmetic circuits on existing devices, while simultaneously emphasizing the need for improved hardware reliability and compilation strategies to enable larger-scale computations.