Characterizing the Reproducibility of Noisy Quantum Circuits

The ability of a quantum computer to reproduce or replicate the results of a quantum circuit is a key concern for verifying and validating applications of quantum computing. Statistical variations in circuit outcomes that arise from ill-characterized fluctuations in device noise may lead to computational errors and irreproducible results. While device characterization offers a direct assessment of noise, an outstanding concern is how such metrics bound the reproducibility of a given quantum circuit. Here, we first directly assess the reproducibility of a noisy quantum circuit, in terms of the Hellinger distance between the computational results, and then we show that device characterization offers an analytic bound on the observed variability. We validate the method using an ensemble of single qubit test circuits, executed on a superconducting transmon processor with well-characterized readout and gate error rates. The resulting description for circuit reproducibility, in terms of a composite device parameter, is confirmed to define an upper bound on the observed Hellinger distance, across the variable test circuits. This predictive correlation between circuit outcomes and device characterization offers an efficient method for assessing the reproducibility of noisy quantum circuits.


Introduction
Quantum computing leverages phenomena such as superposition and entanglement to offer novel capabilities relative to conventional computing, such as super-polynomial reductions, in time to solution and energy consumption, as well as memory storage [1]. Quantum advantages may be found for solving a variety of problems, such as simulating quantum many-body systems [2], solving unstructured optimization problems [3], achieving complex linear algebra computations [4], efficiently sampling high-dimensional probability distributions [5], and enhancing the security of communication networks [6]. These assessments are based on an idealized quantum computing model [7][8][9], comprising of an n qubit register that encodes a 2 n -dimensional Hilbert space C 2⊗n . After initializing the register to the state |ψ = |0 ⊗n , a sequence of logical unitary transformations U , known as quantum gates, transform the state as |ψ → U k · · · U 1 U 0 |ψ . Ultimately, a measurement operation reads out an n-bit string s, where s ∈ {0, 1} ⊗n with Pr(s) = | s|ψ | 2 has the probability to observe the outcome.
In practice, on-going efforts to realize a quantum computer introduces many additional physical processes that complicate the operational description above. For example, a crude approximation to an actual device can be conceptualized as a stack of interacting layers [10][11][12] that includes physical qubits, a physical control layer, a hardware-aware compiler, a logical control layer (including fault tolerant quantum error correction protocols), and a logical compiler and circuit optimizer, as well as the quantum algorithms and applications. Each of these intermediate processes introduce the possibility for noise and errors that make modeling the system more complicated. Subsequent certification that an actual physical system is performing as expected represents a leading concern for validating experimental results [13]. This is made more difficult by the inherent randomness that often manifests in the computed results, inability to pin-point exactly where in a circuit an error has occurred, curse of dimensionality, and the inability to step through program execution [14][15][16].
While quantum computers using fault-tolerant operations may eventually alleviate some of these concerns [17], current noisy intermediate-scale quantum (NISQ [18]) computers are influenced heavily by noise and errors in physical operations [19]. Computational errors arise due to noise in the register, imperfect gate implementations, and faulty thermodynamic control systems [20][21][22][23][24]. For example, noise in the register may arise from spontaneous decay, decoherence, coupling to the environment, cross-talk, and leakage. Errors in gate operations also arise when one of the control operations goes awry, due to lack of precision in the applied pulses, e.g., pulse attenuation and distortion, pulse jitter, and frequency drift. Thermodynamic control systems that maintain the stability of the operating environment may also be a source of noise, due to limits on the cooling power of the dilution refrigerators, contaminants in the vacuum chamber, or fluctuations in the electromagnetic shielding and vibration suppression systems.
The varying environment and fluctuating controls present in current NISQ computing platforms lead to transient sources of noise that impact the ability to reproduce the results of a quantum circuit [25,26]. While errors arising from fixed sources of device noise can frequently be mitigated using tailored methods [27][28][29][30][31], ill-characterized and transient noise impedes the reproduction of NISQ circuit results and prevents the replication of quantum computed solutions. It is, therefore, important to assess the conditions under which a noisy quantum circuit may be expected to be reproducible, as well as the correlation with the corresponding noise in the quantum device. Bounding statistical variations, expected from a noisy quantum circuit, in terms of device noise itself, is potentially an efficient method for assessing reproducibility on NISQ platforms.
Here, we show how device characterization may be used to bound the reproducibility of a noisy quantum circuit. First, we formalize the problem of quantifying the reproducibility of results from a quantum circuit (C), executed on a noisy quantum device (D). We express this statistical variation, in terms of the Hellinger distance between the observed noisy output and expected ideal distribution, and we examine the conditions under which this distance deviates. Then, we derive an upper bound on the Hellinger distance by setting a threshold on these deviations that may be expressed in terms of the device noise itself. This yields a test as to whether the quantum circuit (C) is reproducible when executed on the given device (D). We show that this test is efficient when cast in terms of a single composite parameters computed by estimating the device noise. We then validate this method using an ensemble of single qubit test circuits executed on a well-characterized, but noisy, superconducting transmon processor.
The remainder of the presentation is as follows: in Section 2, we show how the composite parameter (γ D (C)), representing a quantum circuit (C), executed on a device (D), is composed from available device characterization, and that, when γ D (C) stays below the threshold γ max (C), then the circuit is reproducible. In Section 3, we present results from applying these methods to single qubit circuits, executed on a superconducting transmon processor, to validate agreement between the theory and experiment. Finally, in Section 4, we discuss the success of this approach, as well as its potential limitations.
This work is a significant extension of an earlier publication [32], in which the methods and results have been revised and refined to validate experimentally-observed bounds.

Method
Consider ρ 0 to denote the initial density matrix representing the state of the n qubit quantum register. In the ideal setting, the quantum state of the register evolves under a unitary transformation U that describes the circuit C as: Let {|i } N−1 i=0 represent the orthonormal computational basis, with N = 2 n , and let {Π i = |i i|} N−1 i=0 be the corresponding set of orthonormal projectors. The corresponding probability distribution for the results generated by an ideal quantum device (D ideal ) is then given as: is the probability to observe the i-th outcome. In general, it is not efficient to construct the set P ideal using conventional methods, i.e., classical computing, and the resources for such calculations scale exponentially in the size of n. However, such demanding calculations are feasible for values of n < 50, while calculations of highly structured circuits, such as the quantum Fourier transform, quantum search, or arithmetic circuits, may be simulated efficiently. We limit our current approach to the consideration of C for such instances.
In the presence of noise, the evolution of the quantum register is no longer modeled by a unitary operator, and the final state is generally a mixed-state. Let E be the super-operator representing a noisy quantum channel, in terms of Kraus operators (M k ) that represent the execution of the circuit C. The corresponding state of the register is then: where the Kraus operators are a function of the device reliability parameters. Thus, the corresponding probability distribution from executing C on the noisy quantum device D noisy is given by: where p noisy i = Tr[M † i M i ρ ], and M i is the measurement operator for a noisy readout channel [33].
We quantify the variation between the ideal and actual distributions for the results from executing the circuit C using the Hellinger distance. We denote the Hellinger distance between P ideal and P noisy as: with the Bhattacharyya coefficient BC(P ideal , P noisy ) ∈ [0, 1] defined as: Note that the Hellinger distance vanishes for identical distributions and approaches unity for those distributions with no overlap. Now, let δ ∈ [0, 1] be a parameter that defines a threshold for the Hellinger distance, i.e., when: then, the results from a quantum circuit (C) on a noisy, quantum device (D) are reproducible within a distance (δ) of the idealized outcomes. The equation for characterizing the reproducibility of a noisy quantum circuit introduces a test for deciding if the observed value of the Hellinger distance lies below the elected threshold.
In general, the above test for reproducibility requires estimating the noisy probability distribution (P noisy ) by repeatedly sampling the circuit outcomes. The number of samples required to estimate an arbitrary distribution over N = 2 n outcomes with precision ( ) scales exponentially with the Hilbert space dimension (n) and as an inverse square of the required precision ( ) (see Appendix A). Such a direct method for estimating the Hellinger distance is inefficient and requires repetition across devices. We next consider how to correlate the bound placed on the Hellinger distance in Equation (7), with the noise assumptions for the given device. We show, by example, that a composite parameter (γ D (C)), characterizing the noisy quantum circuit, can be similarly bound from above as:

Example
Consider an n qubit state prepared as a uniform superposition across the 2 n computational basis states as: The corresponding quantum circuit shown in Figure 1 corresponds to: for which the ideal distribution is: We next assume that the quantum register is reliably initialized as |0 ⊗n and that interqubit cross-talk is negligible during noisy circuit execution. Rather, gate error and readout fidelity capture the principal sources of noise in this circuit, each of which is modeled as a noisy process. Let I, X, Y, and Z denote the 2 × 2 identity matrix, Pauli-X matrix, Pauli-Y matrix, and the Pauli-Z matrix, respectively, in the computational basis as: Additionally, let R Y (α) denote the rotation by an angle (α) about the Y-axis on the Bloch sphere: An ideal Hadamard gate is then given by: We model a noisy Hadamard gate (H) by the unitary: where θ is the implementation error that is assumed to be small, i.e., θ 1. The operator representation for a unitary control error has only one term which can be seen as follows. Write a noisy unitaryŨ as where U is the ideal unitary. Thus, where E is the operator representing the unitary control error that arises due to imperfections in the control system. For our example, which is the 2D rotation matrix (no error will correspond to an identity channel with E = I). Thus,H In the absence of readout noise, when we initialize a qubit in the ground state, subject it to a noisy Hadamard gate, and measure in the Z-basis, we get the probabilities for observing the output as: We next consider what happens when the Hadamard gate is followed by a noisy measurement. The readout channel is characterized as a quantum channel [29,34,35] as shown in Figure 2, using two parameters for each qubit. The first parameter ( f 0 ) defines the probability of observing 0 post readout when the channel input state is |0 , and the second ( f 1 ) defines the probability of observing 1 post readout when the channel input state is |1 .
A classical representation for the single qubit readout channel is: where Λ is the readout error matrix with elements Λ ij = probability of observing |i when the input to channel is |j (i, j ∈ {0, 1}): Equivalently, the quantum channel representation for a single qubit noisy measurement has two terms, i.e., M 0 and M 1 , and is specified by a super-operator (E ), whose action on the quantum state is as follows [33]: which is equivalent to Equation (24), when you consider the action of the measurement operators (M i ), given by Pr(i) obs = Tr{M † i M i ρ}. Thus, the representation equivalently shows how a noisy measurement of a quantum state ρ yields a particular readout i.
Let Pr(0) be the probability of observing 0 when we prepare a qubit in the ground state, subject it to a Hadamard gate, and then measure it. Let Pr(1) be the corresponding probability of observing 1 for the same experiment (i.e., we prepare a qubit in the ground state, subject it to a Hadamard gate, and then measure it). Additionally, let f be the average readout fidelity and be the readout fidelity asymmetry: Thus, in the presence of readout noise, the probability of observing 0 and 1 for each qubit in Figure 1 is given by [33]: where Let (s n−1 s n−2 · · · s 0 ) represent the n-bit string with s i ∈ {0, 1}, and let s = and γ i refers to the i-th register element. Thus, and the sum is over all bit stringss = (s n−1 s n−2 · · · s 0 ) with decimal integer equivalent If δ is the user-defined threshold on the acceptable Hellinger distance, then it follows that: Without loss of generality, we consider that all register elements have identical readout and gate errors, which allows us to replace γ i with γ for each of the n register elements. Thence: As shown in Appendix B, when assuming that δ is small, this yields: with and Here, γ D (C) is a composite parameter, estimated using characterization of the individual gate operations, e.g., during periodic device calibration. For our specific circuit and device noise model, the parameter is a function of the gate angle error and measurement fidelity and serves as a combined measure of the hardware noise and output variation; the exact relationship is a function of the noisy circuit model used. Thus, we have shown that if γ D (C) exceeds the bound γ max (C), then the threshold on the Hellinger distance is also exceeded. Notably, the former statement does not require experimental execution of the circuit itself or even estimation of the Hellinger distance, but rather depends solely on our model for the noisy device and accuracy of the parameter estimation.
As we have assumed, δ is small, and Equation (35) may be reduced as: which tells us that the tolerance specification for an experimentally-observed output distribution when using a noisy device (D) that is lower bounded. Attempts to reproduce the experiment must use an error bar larger than this minimum, otherwise it will likely fail the accuracy test, and the ability of the device to reproduce the output distribution will be questioned.

Results
We validated our method for testing the reproducibility of the previously described test circuits on the superconducting transmon hardware, called ibmq_toronto , from IBM, whose schematic is shown in Figure A1. The test circuit shown in Figure 1 was programmed using the IBM qiskit toolkit [36] and compiled and executed remotely on the ibmq_toronto device on 8 April 2021.
To estimate the device parameters, we repeated our experiments to estimate the readout fidelity and gate angle error L times (each). Let l denote the index of the l-th experiment. For any instance of circuit execution, the ibmq_toronto device prepared an ensemble of S identical circuits, where S denotes the number of shots and s denotes the s-th shot in a particular experiment. In the tests reported below, L = 203 was the number of repetitions successfully executed during a 30-min reservation-window on ibmq_toronto , and S = 8, 192 was the number of shots, the maximum allowed by the device. We separately analyzed the results for the case n = 1 in Equation (36) using each of the 27 register elements available in ibmq_toronto .
Next, we characterized the device parameters required to estimate γ D (C) and γ max . Here, we introduce the convention that a random variable is denoted by bold font, and a caret sign denotes a particular realization of that random variable. We first characterized readout fidelity, in which SPAM(0) denotes an experiment with a register element, prepared as |0 and measured. Similarly, SPAM(1) denotes an experiment in with a register element, prepared as |1 measured. Then Let q l denote the realized fidelity asymmetry of the q-th register element in the l-th experiment. Thus: Let¯ q denote the mean of the fidelity asymmetry for the q-th register element over the L experiments, and letˆ q be the corresponding observed value. Thus: To quantify the error on these measurements, we define σ(¯ q ) as the standard deviation of population mean¯ q , such that: (47) The average readout fidelity f q for each qubit q is then calculated using Equation (26). Figure 3 shows the experimental results for the fidelity distributions of f 0 and f 1 for the first nine register elements of ibmq_toronto , collected on 8 April 2021, between 8:00-10:00 p.m. (UTC-05:00). The initialization fidelities of the computational states are not the same and, for some register elements, they are very distinct. The asymmetric nature of the single qubit noise channel is brought out starkly by the negligible overlap between the distributions of f 0 and f 1 for qubit 5. Additionally, observe the significant spread in values (indicating channel instability) in qubit 3, relative to the others. These results show that the naive approach of assuming a single value for readout error for a qubit is dangerous. Not only do we have to characterize f 0 and f 1 separately, our work must also take into account the significant dispersion around the mean. The remaining register elements are shown in the Appendix B in Figures A2 and A3. The register variation of the readout asymmetry is shown in Figure A5. The probability (Pr q (0)) for each qubit, as defined by Equation (28), was estimated by executing the circuit C in Figure 1 and counting the faction of zeros in the 8192-bit long binary string, returned by the remote server. Let b C l,s,q denote the random measurement outcome (a classical bit) when we conduct a C experiment and measure the q-th register element in the l-th experiment's s-th shot (measurement done in the computational Z-basis). Letb C l,s,q denote the corresponding observed value. Similarly, let Pr Let d q l denote the Hellinger distance between the noisy and ideal outcomes in the l-th experiment for the q-th register element. Letd q l be the corresponding observed value (or realization). Letd q denote the mean (a random variable) of the Hellinger distance for the q-th register element over L experiments. Letd q be the corresponding observed value (or realization). Thus: To quantify the error on these measurements, define σ(d q ) as the standard deviation of population meand q . Thus: The register variation of the experimentally-obtained Hellinger distance is shown in Figure A4. The Hadamard gate angle error was subsequently estimated using Equation (30). The register variation of the the Hadamard gate angle error (in degrees) is shown in Figure 4. The dotted red line denotes the register mean for the gate error, averaged over all qubits. The gate implementation on qubit 21 is the closest to ideal, while qubit 24 is the farthest.  Figure 5 shows the values for γ max and γ D for ibmq_toronto on 8 April 2021, when δ is set to be the observed Hellinger distance. The blue dots are experimentally-observed data for each register element (see Table A1 for the full list), using the characterization data versus the actual observed Hellinger distance. It validates our noise model as Equation (8) holds. The dashed line in Figure 5 provides the decision boundary to test circuit reproducibility, using characterization data as a proxy. Given a user-defined threshold on the statistical distance between the observed distribution and reference to be reproduced, the plot provides an upper bound for the device parameter γ D (the register variation of γ D is shown in Figure A6). The latter must lie below this boundary for reproducibility by the device. We conjecture that the magnitude of |γ max − γ D | serves as a reliability indicator, i.e., higher value provides greater cushion against temporal fluctuations (see [25]). Thus, Table A1 can serve as a basis for register selection tailored to a specific unitary when channel characterization data is available. Figure 5. Characterizing circuit reproducibility on ibmq_toronto . Plot of γ max (dashed line) and γ D (blue dots) for ibmq_toronto on 8 April 2021, when δ is set to be the observed Hellinger distance (d). Equation (8) should hold for all δ and hence it must hold for the corner case when δ is set to be the experimentally observed Hellinger distance. The blue dots are experimentally-observed data plotted using the characterization data versus the actual observed Hellinger distance (d) for each register element. Only a subset of qubits are shown. See Table A1 for all 27 qubits.

Conclusions
Reproducibility is important for validating the performance of applications in quantum computing and as a measure of consistency in computation. Current NISQ devices are strongly affected by intrinsic noise that lead to a variety of computational error mechanisms. Here, we have characterized the ability to reproduce the output generated by noisy quantum circuit, using single qubit rotation gates and asymmetric noisy readout. We showed how to derive a composite parameter that can be easily computed from the available device characterization data (e.g., readout fidelity and gate error), such that, if the derived parameter stays below a defined threshold, then the circuit is assured of reproducible output. Our model led to an analytic bound, in terms of the Hellinger distance between computational outputs, and we validated the resulting test using experiments on a publicly available transmon processor.
Despite the successful validation, we note that this approach has its limitations. A preliminary version of these results used only readout asymmetry and ignored the Hadamard gate error for describing circuit noise [32]. For example, on 21 August 2021, the typical mean error rates were (7.38 ± 5.90) × 10 −4 for single-qubit Pauli-X error and (1.15 ± 1.08) × 10 −1 for readout assignment error on the 27-qubit IBM quantum device ibmq_toronto . See Figure A7 for the results from that first-order model that failed the test of Lemma A1 for 5 of the 27 qubits, due to incorrect assumptions about the device model. An analytically-derived composite parameter, using a wrong noise model, cannot serve as a suitable proxy for circuit reproducibility. Similarly, if the device parameters themselves cannot be estimated from the available device characterization data, then the scheme is impractical.
While our method is generalizable in principle, a closed form expression for γ D (C) may not be obtainable for complex circuits. As the number of parameters needed to characterize the channel increases with circuit complexity (i.e., number of qubits and circuit depth); one will need to either utilize numerical methods to deduce γ D (C) or make simplifying assumptions. A circuit-specific modular or layered characterization for the device may reduce the parameter estimation overhead, but this remains to be investigated. There is still an open question regarding how to systematically extend this method to any model and what conditions must the model meet.
As quantum computations enter the realm of advantage over classical, we expect that the task of practically characterizing circuit reproducibility will become an important area. To our knowledge, this is a first attempt to frame the problem of linking circuit reproducibility to device reliability. Such reproducibility characterization serves four purposes. Firstly, it shows how to make use of the device characterization data in a targeted way for inferring circuit reproducibility. Second, it provides a basis for selecting a device for a circuit prior to execution. Third, it provides insight on how the quality of final digital output is related to the characteristics of the intermediate quantum channels and, hence, understand the device improvement pathways for a specific circuit. Lastly, it helps to estimate a lower bound for the error that can be expected from the circuit execution.  Acknowledgments: This research used resources of the Oak Ridge Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC05-00OR22725. This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-279access-plan, accessed on 29 December 2021).

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The , the RHS is positive and we can square both sides to get: (ii) Now suppose:      . Register variation of the derived parameter γ D (C). A high γ D (C) adversely impacts the computational reproducibility. However, low γ D (C) does not necessarily mean that the device is close to perfect because the terms in Equation (30) can cancel each other out and lead to an improvement in the output distance. For an unstable device, γ D (C) will vary with time and must be re-estimated. The dotted red line denotes the register mean averaged over all qubits. Qubit 16 has the best performance for this composite parameter, while qubit 24 has the worst. A consistent register color scheme has been used for all the figures.