Distributing Quantum Computations, Shot-Wise

Bisicchia, Giuseppe; Clemente, Giuseppe; Garcia-Alonso, Jose; Murillo, Juan Manuel; D’Elia, Massimo; Brogi, Antonio

doi:10.3390/fi17110507

Open AccessArticle

Distributing Quantum Computations, Shot-Wise

by

Giuseppe Bisicchia

^1,2

,

Giuseppe Clemente

³

,

Jose Garcia-Alonso

²

,

Juan Manuel Murillo

²,

Massimo D’Elia

³

and

Antonio Brogi

^1,*

¹

Department of Computer Science, University of Pisa, 56127 Pisa, Italy

²

Quercus Software Engineering Group, University of Extremadura, 10003 Cáceres, Spain

³

Dipartimento di Fisica, Istituto Nazionale di Fisica Nucleare (INFN)—Sezione di Pisa, Università di Pisa, 56127 Pisa, Italy

^*

Author to whom correspondence should be addressed.

Future Internet 2025, 17(11), 507; https://doi.org/10.3390/fi17110507

Submission received: 9 September 2025 / Revised: 6 October 2025 / Accepted: 27 October 2025 / Published: 4 November 2025

Download

Browse Figures

Versions Notes

Abstract

NISQ (Noisy Intermediate-Scale Quantum) era constraints, high sensitivity to noise and limited qubit count, impose significant barriers on the usability of QPUs (Quantum Process Units) capabilities. To overcome these challenges, researchers are exploring methods to maximize the utility of existing QPUs despite their limitations. Building upon the idea that the execution of a quantum circuit’s shots does not need to be treated as a singular monolithic unit, we propose a methodological framework, termed shot-wise, which enables the distribution of shots for a single circuit across multiple QPUs. Our framework features customizable policies to adapt to various scenarios. Additionally, it introduces a calibration method to pre-evaluate the accuracy and reliability of each QPU’s output before the actual distribution process and an incremental execution mechanism for dynamically managing the shot allocation and policy updates. Such an approach enables flexible and fine-grained management of the distribution process, taking into account various user-defined constraints and (contrasting) objectives. Demonstration results show that shot-wise distribution consistently and significantly improves the execution performance, with no significant drawbacks and additional qualitative advantages. Overall, the shot-wise methodology improves result stability and often outperforms single QPU runs, offering a robust and flexible approach to managing variability in quantum computing.

Keywords:

quantum computing; quantum software engineering; shot-wise distribution

1. Introduction

Advancements in the design and development of Quantum Computers are rapidly accelerating, setting unprecedented milestones and records [1,2,3]. This fast progress brought a plethora of qubit implementations and QPU (Quantum Process Unit) architectures. Currently, no singular technology reigns supreme, fostering substantial diversity and innovation in quantum computing approaches [4,5,6].

However, despite this remarkable technological progress, the capabilities of Quantum Computers remain highly constrained. John Preskill introduced the term NISQ (Noise Intermediate-scale Quantum) devices [7] to highlight their vulnerability to external noise [8] and their limited qubit count, typically ranging from a few dozen to a few hundred [9]. These constraints severely limit the range of computations feasible within present Quantum Computers, predominantly confining them to tasks requiring only a handful of qubits and a limited number of consecutive operations to mitigate the deleterious effects of noise accumulation [10].

In response, computer scientists and quantum software engineers are actively engaged in addressing the challenge of maximizing the utility of available quantum devices despite their severe limitations [7,11,12,13]. Their efforts are dedicated to devising strategies that optimize quantum computations within current technological boundaries. To this end, researchers are designing increasingly sophisticated techniques for quantum error detection [14], mitigation [15], and correction [16]. Moreover, strategies for “cutting” quantum circuits that exceed the capacity of NISQ devices are currently under active investigation [17,18]. Other approaches involve the careful selection of the most appropriate quantum computer for each computation, taking into account performance metrics and limitations of the available options as well as the characteristics of each specific quantum circuit [19,20,21].

In this paper, we introduce an innovative approach to face the constraints of current NISQ devices and to improve the effectiveness of quantum computations. Our methodology diverges from conventional strategies by proposing a shift in perspective regarding the execution of quantum tasks. Traditionally, executing a quantum circuit typically consists of executing it iteratively numerous independent times, referred to as “shots”, due to the inherently probabilistic nature of quantum mechanics and qubits. Consequently, the output of a quantum computation usually does not consist of a single measured state from a single run. Instead, it includes a distribution reflecting the frequency of output states obtained after running the quantum circuit for multiple (usually thousands) shots (i.e., iterations).

In our approach, we propose a departure from viewing the execution of the shots of a quantum circuit as a single, indivisible unit that must be completed in a solitary run. Instead, we advocate for a more flexible and fine-grained perspective, offering various degrees of freedom in the process. Specifically, we suggest that even for a single circuit, its shots can be distributed, or “split”, across multiple heterogeneous Quantum Computers based on specific custom policies. Note that our proposed approach differs from shot-reduction optimization strategies such as the one in [22]. Indeed, our focus is on distributing a particular pre-determined number of shots on multiple, independent Quantum Computers. Still, as we will discuss in Section 3, our framework is also capable of reducing the total amount of shots performed. Subsequently, the results obtained from each Quantum Computer, representing the output distributions of the circuit execution for that particular QPU, are then merged together in a unified output distribution. In the remainder of this paper, we both discuss the general methodological framework and propose several strategies to split and merge the shots of a quantum circuit (e.g., by distributing the shots equally to each Quantum Computer, in a random way or proportionally to the estimated “reliability” of each QPU).

Through this approach, we aim to leverage the limitations of the NISQ era and turn them into advantages. In the demonstration section of this paper, we present findings on the execution of quantum circuits using multiple QPUs, focusing on the impact of split and merge strategies. The demonstrations encompass various circuit types and assess performance against established baselines obtained from single QPU runs. Key findings indicate that while split-merged strategies do not consistently exceed the best baseline performance, they maintain robustness and are generally aligned with average baseline outcomes. Notably, performance disparities arise across different circuits, suggesting that circuit-specific characteristics influence results. These insights lay the groundwork for exploring advanced strategies in quantum computing.

Furthermore, as discussed in our previous work [23] and further elaborated in [21], adopting a shot-wise management approach to quantum computation offers various qualitative advantages. These encompass enhanced fault resilience to QPU failures, finer-grained management, greater customizability to user requirements, and reduction in waiting times. With respect to our preliminary work [21,23], in this article, we define, formalize, and improve the shot-wise distribution methodology with a more general and holistic approach to the problem and perform numerical demonstrations to assess the methodology distributing the shots up to seven QPUs from two different manufacturers and two different qubit implementations. Moreover, we discuss and develop four distribution policies, two of them informed by the expected noise of each QPU.

Demonstration results show that by enabling the distribution of shots across multiple NISQ devices, our proposal makes it possible to reconcile multiple conflicting objectives (such as waiting time, price, and reliability of a quantum computation [21,23,24]). Moreover, distributing and merging the shots of a particular quantum circuit on various noisy QPUs produces final output distributions more reliable than performing the whole computation on a single QPU (as discussed in the demonstration section of this paper, Section 4). As a final advantage, the framework offers high flexibility in the policies and strategies to assess the capabilities of the available QPUs, how to distribute and merge the shots, and how to dynamically optimize such procedures and reduce the number of shots. To the best of our knowledge, ours is the first work proposing a framework that comprehensively enables all the above features and capabilities.

Summarizing, the main contributions of this paper are as follows:

(a): We introduce a general and innovative framework for shot-wise distribution, enabling the execution of a single quantum circuit across multiple heterogeneous NISQ devices. The framework incorporates a reliability estimation mechanism and supports adaptive shot allocation, making it robust to device variability and noise.
(b): We define and formalize a set of flexible split and merge policies. These policies determine how shots are distributed among QPUs and how partial results are combined, with strategies that account for QPU-specific reliability and statistical trade-offs between bias and variance.
(c): We provide a comprehensive empirical evaluation of the framework on multiple circuits, quantum providers, and hardware technologies. Our results demonstrate that shot-wise distribution consistently improves worst-case stability, reduces variability, and delivers performance competitive with or superior to single-QPU execution.

This paper addresses the motivation and discussion surrounding the shot-wise distribution of quantum computations across heterogeneous Quantum Computers. We present a comprehensive methodology framework to implement such a distribution strategy, parameterized by a set of customizable policies. We provide an overview of related work (Section 2), and illustrate the various degrees of freedom and the underlying criteria guiding each possible decision (Section 3.2). Subsequently, we discuss various potential split and merge policies (Section 3.3). We then detail the demonstration protocol devised to evaluate and validate the shot-wise approach, and we analyze the results obtained from these demonstrations (Section 4). Finally, we draw some conclusions, also mentioning potential directions for future research (Section 5).

2. Related Work

Ever since Preskill highlighted the challenges of contemporary Quantum Computers [7], researchers have designed and developed strategies to tackle or mitigate these constraints [25]. Within the scope of this paper—generalizable as “approaches to perform quantum computations on NISQ devices”—we identify, to the best of our knowledge, three primary categories:

Fighting the Noise: this category encompasses methodologies and techniques aimed at mitigating, or ideally eliminating, the noise inherent in quantum computations. Within this realm, we discern two principal strategies:
–
Quantum Strategies: these approaches aim to combat noise during quantum computations by operating on the structure of the circuit to be executed.
–
Classical Strategies: these methods target noise before and/or after executing a quantum algorithm through pre- and post-classical processing of quantum algorithms and output distributions.
Going beyond the Intermediate scale: here, we find methodologies seeking to execute quantum circuits larger than those achievable with NISQ devices.
Distributing Quantum Computations: this category encompasses methodologies that address the limitations of quantum computations as a whole while considering the presence of multiple heterogeneous QPUs. The objective is to optimize the execution of quantum circuits across a distributed computing environment.

While our work predominantly aligns with the latter group, it introduces, to the best of our knowledge, a novel idea: distributing even the same quantum circuit among multiple QPUs, exploiting the necessity to run multiple shots. The subsequent sections illustrate each of these categories in depth, presenting key related works associated with each domain.

2.1. Fighting the Noise

The history of error correction techniques and, more broadly, strategies for combating the noise inherent in Quantum Computing is nearly as old as Quantum Computing itself.

In the early days of Quantum Computing, the excitement surrounding the field was tempered by the challenge of qubit noise. Shor, renowned for demonstrating the potential of Quantum Computing [26,27], injected new vitality into the field with his proposal of an initial error correction technique. This technique suggested that if the noise remained below a certain threshold, it would be possible to apply the solution and execute quantum computation as if it were devoid of noise [28].

As Quantum Computing progressed, error correction techniques evolved to become more sophisticated [16,29,30]. However, these techniques typically demand a significantly higher number of qubits than are currently available to be effective. Consequently, while researchers continue to refine error correction techniques to be more resource-efficient, efforts have also emerged to mitigate noise while awaiting full error correction [15,31,32,33,34,35].

Another compelling research direction involves the development of noise-aware compilers. These compilers are designed to compile and optimize circuits for a given QPU while considering its topology, performance, and noise characteristics, among the others. They determine crucial factors such as the initial mapping of virtual qubits onto physical qubits and the optimal set of swap operations, particularly in non-all-to-all topologies. Such classical techniques have the potential to mitigate noise and enhance the performance of quantum circuits [36,37,38,39,40].

Error correction and error mitigation techniques, as well as compilation and optimization processes, are all approaches working before and/or after the actual circuit execution. For that reason, they are completely transparent to our shot-wise methodology. Therefore, they are entirely compatible with our shot-by-shot methodology and can be applied in conjunction. Furthermore, our demonstrations provide initial evidence that distributing circuit shots among multiple heterogeneous QPUs can reduce noise compared to executing all shots on a single QPU. Thus, we intend to explore the possibility of employing shot-wise methodologies as error mitigation techniques and compare them with state-of-the-art approaches, potentially integrating them with already existing ones to further enhance noise reduction strategies.

2.2. Going Beyond the Intermediate Scale

Achieving the capability to execute quantum circuits beyond the current hardware constraints, typically limited to a few hundred qubits at most, is essential for unlocking the full potential of quantum computing. While much attention is directed towards scaling the size of current quantum hardware, an alternative approach to circumventing this limitation involves breaking down larger circuits into smaller pieces. These fragments are then executed independently on NISQ devices, with the resulting computations merged to reconstruct the final output.

These techniques are recognized under various names, including circuit cutting, circuit knitting, and Quantum divide and conquer, or Quantum divide and compute, among others. This strategy offers a promising avenue for harnessing the computational power of existing smaller-scale quantum hardware to tackle larger quantum algorithms.

A seminal contribution is presented in [17], where the authors discuss the theoretical foundations and conduct demonstrations on circuit cutting through tensor-network techniques. This work is further expanded upon by [41,42], where the authors test the approach in the presence of noise and observe that recombining noisy fragments can outperform results without fragmentation. They also investigate the impact of different noise sources on the success of the cutting process.

Another seminal work in this field is illustrated in [18], where the authors introduce CutQC, a scalable circuit cutting approach. They propose a method to execute quantum circuits more than twice the size of available quantum computer backends. Moreover, their approach demonstrates significant improvements in fidelity compared to direct executions on large Quantum Computers, along with elevated speedup over classical simulations. They utilize a mixed-integer programming approach to automate the identification of cuts requiring minimal classical postprocessing. Additionally, they discuss two types of postprocessing: full-definition (FD) query and dynamic-definition (DD) query, differing in whether the entire

2^{n}

full-state probability output of the uncut circuit is reconstructed.

In a different approach, Ref. [43] introduces maximum-likelihood fragment tomography to find the most likely probability distribution for the output of a quantum circuit based on measurement data obtained from circuit fragments. The core of their idea is to perform the circuit fragments by providing a variety of quantum inputs to and measuring its quantum outputs in a variety of bases. Supported by both theoretical and demonstration findings, they advocate for the use of circuit cutting as a standard tool for running clustered circuits on quantum hardware. Indeed, they found that circuit cutting can estimate the output of a clustered circuit with higher fidelity than full circuit execution.

A recent contribution is presented in [44], where the approach is based on randomized measurements, by randomly inserting measure-and-prepare channels, to express the output state of a large circuit as a separable state across distinct devices. With this approach, they apply circuit cutting to large-scale QAOA problems on clustered graphs, i.e., up to 129-qubit problems, demonstrating the potential of circuit cutting procedures in practical applications.

In contrast to the aforementioned methodologies, our shot-wise approach presents a significantly different perspective and can be viewed as orthogonal. While circuit cutting focuses on the circuit dimension, our focus lies in the shot dimension. Our strategy is agnostic to whether the shots to be distributed originate from a whole circuit or fragments. Moreover, considering that executing smaller fragments may yield better results than performing the whole circuit, we believe that combining the power of fragment cutting with the idea of executing shots from each fragment on multiple heterogeneous QPUs can further enhance performance. We intend to explore this combination in future work. An initial step in this direction is presented in [45], where the authors combine circuit cutting and parallel scheduling algorithms for quantum multicomputing. However, such work still treats shots as a single monolithic entity, unlike our approach.

2.3. Distributing Quantum Computations

Numerous research efforts are dedicated to identifying the most suitable Quantum Processing Unit (QPU) for executing a particular quantum circuit. These approaches typically define various metrics encompassing not only the inherent characteristics of a specific Quantum Computer (e.g., qubit count, coupling map, noise model) and the quantum circuit itself (e.g., gate count, width, depth) but also environmental factors such as queue waiting time, pricing plans, and availability periods.

One notable approach following this principle is the Quantum API Gateway [19]. In this work, the authors devised a service that automatically selects the optimal QPU among available options for each submitted quantum circuit. The selection process takes into account factors such as QPU architecture (gate-based or annealing) and circuit width. Users have the flexibility to customize the selection criteria, specifying preferences for speed or cost efficiency.

Similarly, the NISQ Analyzer [20] employs a comparable workflow to determine the best QPU for a given quantum circuit. However, unlike the Quantum API Gateway, this approach acknowledges that multiple circuit implementations may exist for a single quantum program. The NISQ Analyzer automatically selects the best implementation from a repository of quantum programs and associated circuit implementations through a set of selection rules associated with each circuit implementation, depending on the input data. The best QPU-circuit pair is then determined based on criteria such as circuit width, depth, and the choice of Software Development Kit (SDK).

The NISQ Analyzer authors have also developed several extensions, including tools for comparing compiler outputs [46], optimizing compilation processes using Machine Learning (ML) models to discard potential compilers and Quantum Computers before compilation [47], ranking compiled circuits for various QPUs using Multi-Criteria Decision Analysis methods [48], and optimizing these processes through ML techniques [49].

In contrast to these approaches, Ref. [50] introduces a quantum job scheduler that optimizes QPU selection by balancing estimated fidelity and expected waiting time. Conversely, Ref. [51] addresses the integration of Quantum Computing into classical enterprise cloud systems, selecting the most suitable single QPU based on factors like qubit count and queue length. Additionally, Ref. [52] presents a framework for automatically predicting the optimal combination of Quantum Computers, compilers, and compiler options for a given circuit, with a focus on maximizing fidelity in gate and measurement operations.

While existing approaches aim to identify the best QPU for a given quantum circuit, our proposal diverges from this paradigm by leveraging multiple heterogeneous Quantum Computers simultaneously. We evenly distribute the shots of a single quantum circuit among multiple heterogeneous QPUs. To the best of our knowledge, ours is the first proposal employing a shot-wise distribution approach. Building upon the principles of shot-wise distribution, we have developed a prototype Quantum Service called the Quantum Broker [23], which distributes quantum computations shot-by-shot while considering user runtime requirements submitted through a Domain-Specific Language (DSL). This paper advances and formalizes the concepts introduced in the Quantum Broker prototype into a unified conceptual and parametric framework. Furthermore, demonstration evaluation validates the efficacy of the proposed ideas.

3. General Framework

In this section, we introduce some definitions and fix the notation used in the following discussions. An overview of the main concepts is provided in standard book references such as [53].

3.1. Preliminaries

Let us consider a circuit

U \in S U (2^{q})

(for which we extensively use the equivalence between ideal circuits and unitary operators) acting on q qubits initialized as

| 0 〉

. In the ideal noiseless case, the output of a single execution is a pure state

| ψ 〉 = U | 0 〉

. A measurement in the computational basis

{| x 〉}_{x \in Z_{2^{q}}}

, will yield a specific bitstring x with probability

p_{x}^{(ideal)} = {| 〈 x | U | 0 〉 |}^{2}

. Any other initialization and change in basis for the measurement can be incorporated into the circuits without loss of generality. Moreover, in the presence of quantum noise, the final state can be represented as a density matrix

ρ

, obtained from the application of a noisy quantum channel

E

to the initial standard pure state

ρ_{0} = | 0 〉 〈 0 |

as

ρ = E (ρ_{0})

. Therefore, the probability of measuring the bitstring x in the computational basis from a single execution is distributed according to a discrete probability

p_{x} = T r [| x 〉 〈 x | ρ]

, where

ρ = E_{U} (| 0 〉 〈 0 |)

depends on the quantum channel

E_{U}

, which only approximates the effect of the ideally unitary circuit that implements U. In the ideal case of a unitary channel, one would have

E_{U} (ρ_{0}) = U ρ_{0} U^{†}

, while in general, one has

E_{U} (ρ_{0}) = \sum_{α} K_{α} ρ_{0} K_{α}^{†}

in the Kraus representation [53]. Therefore, for any fixed circuit U and QPU m, we expect the biases

p_{x} - p_{x}^{(ideal)}

to be in general non-vanishing, signaling a discrepancy in the results, even in the limit of unlimited resources, unless an error correction or mitigation scheme is applied. Repeating the demonstration a number n of times, commonly known as shots, the number of times a specific state labeled x is measured is called counts and denoted by

{\hat{c}}_{x}

, and it follows a multinomial distribution based on

p_{x}

, i.e.,

\begin{matrix} P ({\hat{c}}_{x} = c_{x} \forall x = 0, \dots, 2^{q} - 1; \sum_{x} c_{x} = n) \equiv n! \prod_{x \in Z_{2^{q}}} \frac{{(p_{x})}^{c_{x}}}{c_{x}!} . \end{matrix}

(1)

From the relative counts, it is possible to estimate the underlying probability distribution as

{\hat{p}}_{x} \equiv \frac{{\hat{c}}_{x}}{n}

, which is unbiased (

E [{\hat{p}}_{x}] = p_{x}

) and fluctuates with a statistical error, estimated as

Δ p_{x} = \sqrt{\frac{1}{(n - 1)} {\hat{p}}_{x} (1 - {\hat{p}}_{x})}

.

We define a q-Quantum Processing Unit (q-QPU) as a hardware capable of running a generic quantum circuit with q qubits. Notice that, with this definition, different connected subtopologies of

q < K

qubits on the same K-QPU are treated as different q-QPUs. This allows to include the possibility of considering disconnected subtopologies of the same QPU (which might make sense only if the crosstalk between the subtopologies involved is negligible).

We define a policy as a set of criteria determining the specific decisions taken at different steps of the heterogeneous quantum computation. This can involve, for example, limitations and prior knowledge about a QPU, different kinds of budget constraints, time constraints, optimization methods, and whatever the arbitrary choices of the user are. In the following discussions, we denote any policy by the symbol

P

. In the next section, we propose a framework of heterogeneous quantum computation where the set of policies acts as a controller. Different possible policy choices are discussed thoroughly in Section 3.3.

3.2. General Strategy for Heterogeneous Quantum Computation

In this work, the general aim of an optimal heterogeneous computation involves allocating computational resources (split), gathering and merging results (merge), and updating the information on the performance with optimality criteria, according to the policies chosen by the user. Note that the choice of the split strategy and the merge policy are completely independent. However, for a starting set of QPUs, prior information on the performance and accuracy of each QPU are not always available or comparable to each other. This lack of information makes the merge inaccurate in some cases, for example, when the results of the majority of QPUS cluster around some point in probability space, while the most accurate QPUs lie far from the majority and would be treated as outliers.

This situation can be overcome by a Calibration and Ranking stage, which would then guide further updates during the Production (i.e., execution) stages. Anyway, whenever available and comparable, one can also rely directly on external calibration data from quantum providers for specific quantum devices. Otherwise, we propose a calibration scheme tailored to the specific device subtopologies and the class of circuits one aims to execute in production, provided it is possible to compute ideal probability distributions to be used as benchmarks against which one can compare noisy results from real QPUs. Of course, this computation is possible on classical computers through emulation only for a limited number of qubits, indeed, although effective for the scope of this study, this procedure is not directly scalable to circuits with larger qubit counts, since classical computation of the ideal distributions quickly becomes intractable. While this ensures accuracy in the NISQ setting, we plan in future work to design strategies to scale beyond this limitation, for instance by using approximate benchmarks, surrogate models, or adaptive calibration techniques. Note that the calibration procedure proposed shares similarities with error mitigation techniques, as it aims to reduce noise-induced biases. However, unlike error mitigation, our method does not attempt to correct noise directly but to inform distribution decisions. The calibration stage is instrumental for a ranking of the QPUs considered since it allows to associate an unreliability index to each QPU. After these stages, which can be performed with a fraction of the full budget expected for the full run, one can proceed with the production stages, which involve an iterative scheduling of split-merge-update steps, where the basic execution task involves measurements on some target circuits but without any ideal benchmark. The split part on the very first iteration is determined by the calibration and ranking stages, if available, while the merge part depends on all the previous stages, as well as on the results of the last execution. Finally, the unreliable information can be updated at the end of each production stage based on the data accumulated at each iteration. One might also consider embedding the production stage into a pipeline where QPU executions happen asynchronously. In this case, one can continuously update the information, merging data according to the available partial information and proceeding with the next steps. Furthermore, a stopping policy can be considered, which allows an earlier termination of the run in case a stopping criterion is reached.

We remark that in our approach, shot-wise distribution does not involve physically splitting or copying quantum states. Instead, the same quantum circuit is instantiated independently on each QPU, all starting from the standard initial state |0⟩ (or parametrized variants). Therefore, the framework operates at the level of execution replication rather than state duplication, fully consistent with the no-cloning theorem. This also clarifies that the method is not designed for scenarios where the input is the live quantum output of another circuit without classical re-description.

Having outlined the general strategy of heterogeneous computation we propose, here it follows a scheme for a given set

M

of

M \equiv | M |

q-QPUs, while in Figure 1 we display a diagrammatic depiction of the processes involved.

Calibration and Ranking stages:

(C.1): $P_{prior - split}^{(calib)}$ —initial split: choose an initial shot allocation ${\vec{n}}^{(0)} \equiv {(n_{m})}_{m \in M}$ (for example, fixing the total number of shots $n_{tot}^{(0)} = \sum_{m} n_{m}^{(0)}$ and using a uniform allocation $n_{m}^{(0)} = n_{tot}^{(0)} / M$ );
(C.2): $P_{bench}^{(calib)}$ —benchmark: choose a “training set” of circuits $C$ , represented ideally by a class of unitary operators ${U_{c}}_{c \in C}$ acting on q qubits, and compute the ideal probability distribution of measurements in the computational basis, denoted as $p_{x}^{(ideal, c)} \equiv {| x U_{c} 0 |}^{2}$ . These will be used as benchmark distributions;
(C.3): Calibration executions: execute each circuit $U_{c}$ on each QPU $m \in M$ (starting conventionally from the pure state $| 0 〉$ ) with the selected number of shots $n_{m}^{(0)}$ , obtaining the counts ${\hat{c}}_{x, c}^{(m)}$ , distributed according to Equation (1) with a generating probability given by the specific noisy (and unknown) realization $p_{x}^{(m; c)}$ ;
(R): $P_{rel}^{(calib)}$ —unreliability: using the results from the previous step and comparing them with the benchmark distributions, assign an unrealiability coefficient $u_{m}$ to each QPU m and compute the optimal split policy for the next stage in the form of a proposal for shot split weight $w_{m}^{(1)}$ ;

Production stage (starts with

i = 1

, target circuit U):

(P.0): $P_{init}^{(prod)}$ —calibrated prior split weights: using the results from the calibration step (C.3), and comparing them with the benchmark distributions, compute the optimal split policy for the next stage in the form of a proposal for shot split weight $w_{m}^{(1)}$ ;
(P.1-i): $P_{split}^{(prod)}$ —production split: implements i-th iteration of the production schedule, computing the number of shots to run in this iteration (for example, simply dividing the total number of shots by the maximum number of iterations) and an optimal split according to prior information;
(P.2-i): Production executions: execute the circuit U on each QPU m, using a given number of shots of the current interaction and the split policy, by allocating shots according to the split weights $w_{m}^{(i)}$ computed in the previous iteration of the production stage (P.4-(i-1)), or in the first step of the production stage (P.0) if this was the first iteration (i.e., $i = 1$ );
(P.3-i): $P_{merge}^{(prod)}$ —merge results: perform the merge policy of the partial distributions estimated by the execution after split. This might also include some error-mitigation strategy per-QPU and per-circuit;
(P.4-i): $P_{update}^{(prod)}$ —update optimal split: improve the prior split weights for the next step by analyzing the relative performance of different QPUs by accumulating statistics from all previous runs of U and proposing a new split weight $w_{m}^{(i + 1)}$ and the number of shots to perform in the next iteration;
(P.5-i): $P_{stop}^{(prod)}$ —check stopping criterion: if the conditions for the chosen stopping policy are not met, proceed with point (P.2-(i+1)), otherwise terminate.

In the next section, we mention some of different possible choices for the policies to be used in the calibration and production stages.

3.3. Policies

3.3.1. Calibration—Initial Split

Deviations from a uniform initial split at calibration can be motivated by various factors: the economic cost per shot for each QPU, different queue and execution times, and some information about the accuracy of each QPU for the task in question. In general, this policy should reflect all the preferences of the user regarding all or some of these aspects (and possibly others), so that the actual split used can be optimized by considering both information about the QPUs and the specific user needs.

3.3.2. Calibration—Benchmarks

Depending on the task considered in production, it might be possible to identify a class of circuits that can be used as a “training” set for the calibration stage, so that one can provide more tailored information for the production stage. Even if the tasks considered are generic, it could be useful, for example, to test the QPUs on a set of random circuits. This gives just a crude estimate of the general performance of the QPU but, as we show in Section 4, the performance at calibration on a fixed set of random circuits does not necessarily reflect the performance observed on specific tasks. Notice also that the results of the executions for each of these benchmarking circuits should be compared with exact results, which means that this step is accessible only for circuits involving a relatively small number of qubits, so that one can perform an exact emulator as a benchmark, or at least with an output distribution that is simple enough to be easily computable.

3.3.3. Calibration—Executions

The benchmark circuits are executed on each QPU with the selected number of shots. Some preprocessing and postprocessing might be involved at this step. For example, the circuit might be decomposed into different primitive gates for each QPU, or one could apply different mitigation strategies (provided this enters the shot budget or other resource criteria). Since the calibration stage is preparatory to the actual production stage, the same techniques are expected to be applied also later during the production stage executions.

3.3.4. Ranking—Unreliability

As a result of the calibration, we want to assign a “goodness” value to each QPU considered, so that the production stage can be guided by it. The specific metric can also depend on different factors, but it should reflect the discrepancy between the results of the QPUs on the set of benchmarking circuits and the exact output distributions. A distance between probability distributions (or density matrices) is, therefore, an essential component of this policy. For example, in Section 4.1, we adopt as the “unreliability” factor the Hellinger distance between the results of QPU executions and the exact one, averaged among all the benchmarking circuits.

3.3.5. Production—Prior Split Weights

For this policy, the same aspects mentioned in Section 3.3.1 can be considered. Furthermore, the results of the calibration stage, if available, can be integrated into the analysis as expected prior accuracy provided by each QPU. We reason here in terms of prior split “weights” because the specific shot allocation might change depending on more information from the production schedule.

3.3.6. Production—Split Strategies

In the single iteration version of the production stage, this step is a trivial application of the prior split weights step mentioned above, applied to the total number of shots. In the case where more iterations of the production stage loop are needed, different shot allocations might be involved based on an update of the prior collective information, depending on former results, and a different number of shots can be considered at each iteration. Apart from schedule-dependent choices, one can test different policies that use the prior information in different ways. In this work, we investigate three specific cases of splitting strategy:

uniform: the total number of shots for that iteration are allocated uniformly among all the QPUs;
Hellinger: the shots are allocated depending on the split weights computed using the Hellinger distance (discussed in Section 3.4) between the count distributions and either the exact one, if provided at calibration, or the best expected one estimated from previous observations during the production stage;
MISE: in this approach, referred to Mean Integrated Square Error (see Section 3.4.2), the shots are allocated in such a way as to minimize not only the relative discrepancy from the exact distribution at calibration (or the best expected one from previous iterations of the production stage), taking into account also the expected statistical error coming from a finite number of shots per QPU.

3.3.7. Production—Executions

The same considerations conducted during calibration executions apply here.

3.3.8. Production—Merge Strategies

In this step, after gathering all counts obtained from the executions on each QPU, one has to merge the results. As for the split strategies discussed above, one can take into consideration different factors involved, but many, such as shot cost or queue time, should not play a role, since the data is already assumed to be fully available after the executions (or, in the case of a pipelined production stage, of the partial information available). The main goal of this step is therefore to maximize the accuracy of the final result with the given information on the executions. In analogy with the split strategies, in this work, we investigate three specific cases:

uniform: the distributions estimated from each QPU are simply merged (summing all counts for each outcome);
Hellinger: the target distribution is chosen as the one that minimizes the total squared Hellinger distance (discussed in Section 3.4.1) weighted according to unreliabilities and other supplementary information;
MISE: in this case, the target distribution is chosen as the optimal convex sum of all the QPU distributions, where optimality is determined as the minimum of the mean integrated square error (MISE), which takes into account both bias and variance of the data, as described in Section 3.4.2.

3.3.9. Production—Update Split

This step is required in the case of a schedule with more than a single iteration since it involves the updating of both the number of shots to split and the prior split weights, which would be used as improved collective information at the beginning of the next iteration of the production loop.

3.3.10. Production—Stopping Criterion

In the cases when one decides to perform more than once the steps in the production stage, different choices of the stopping criterion might be preferred. For example, a straightforward stopping policy might just be the depletion of the total shot, or any other kind of budget cap expected, or the reach of some accuracy and/or precision requirement on the final result (which might result in a lower total budget expense and faster global execution).

3.4. Distance Between Discrete Probability Distributions

For the following discussions, it is useful to introduce a metric of comparison in the form of a distance between probability distributions, which are considered here as the main output of the execution of a quantum circuit, as measurements in the computational basis. In a general quantum setting, one is interested in the distance between density matrices. Nevertheless, for a wide class of quantum algorithms (e.g., Grover searches, some measurements in real-time evolution), the output of a quantum circuit is a collection of measurements on a fixed basis, while a change in basis can usually be incorporated into the circuit, after which measurements in the computational basis follow. Even if multiple changes of basis are needed (for example, for a typical VQE algorithm), one can consider each version as a distinct circuit, to which the whole analysis can be independently applied. Therefore, it is possible to define as a single task a collection of measurements on a fixed circuit in the computational basis, so that a probability distribution of outcomes can be inferred from the relative counts, while more complex algorithms involve more than one of these simple tasks in general.

We consider the Hellinger distance [54] between two discrete distributions

p_{x}

,

q_{x}

, defined as

\begin{matrix} d_{H} (p, q) \equiv \sqrt{\frac{1}{2} \sum_{x} {(\sqrt{p_{x}} - \sqrt{q_{x}})}^{2}} = \sqrt{1 - \sum_{x} \sqrt{p_{x} q_{x}}} = \sqrt{1 - cos Δ (p, q)}, \end{matrix}

(2)

where in the rightmost term, we introduced the so-called Bhattacharyya angle [55], defined as the angle

Δ (p, q) = arccos \sum_{x} \sqrt{p_{x} q_{x}}

between the vectors

(\sqrt{p_{x}})

and

(\sqrt{q_{x}})

, both with Euclidean (

l^{2}

) norm 1 and with positive components. In our case, we do not have direct access to the probability distributions

p_{x}^{(m)}

for each QPU m; only the information about the counts

c_{x}^{(m)}

, obtained using a finite number of shots

n^{(m)}

, is available. Due to the non-linearity of the definition of the Hellinger distance in Equation (2), replacing the best estimate

{\hat{p}}_{x}^{(m)} = \frac{{\hat{c}}_{x}^{(m)}}{n^{(m)}}

typically yields a biased estimate, namely (by Jensen inequality,

E [f (\hat{X})] \geq f (E [\hat{X}])

for a convex function f of a random variable

\hat{X}

; the opposite inequality is true for a concave function such as

x \mapsto \sqrt{x}

. Therefore,

E [{d_{H} (\hat{p}, q)}^{2}] = 1 - \sum_{x} E [\sqrt{{\hat{p}}_{x}}] \sqrt{q_{x}} \geq 1 - \sum_{x} \sqrt{p_{x}} \sqrt{q_{x}} = {d_{H} (p, q)}^{2}

.),

E [d_{H}^{2} (\hat{p}, \hat{q})] \geq E [d_{H}^{2} (\hat{p}, q)] \geq d_{H}^{2} (p, q)

. Due to this bias, to properly estimate these distances, one has to apply a statistical technique of bias removal as outlined in Appendix B.

3.4.1. Weighted Average Square Hellinger Distance

Using the distance metric discussed in the previous section, we can estimate the difference between the relative counts for the dataset

D^{(m)}

and the ideal target distribution

p_{x}^{(ideal)}

known at the calibration stage, where the bias and associated errors are estimated with the techniques discussed in Appendix B. These distances can then be used as the unreliability parameter to be associated with each QPU. A pre-ranking of the QPUs can be determined by reordering them from the lowest unreliability to the highest. We consider an optimal probability distribution

{\bar{p}}_{x}

as the one that minimizes the weighted average square distance with respect to the distributions estimated from the QPU results, namely

\begin{matrix} D^{2} (\bar{p}; p^{(m)}, w^{(m)}) = \sum_{m = 0}^{M - 1} w^{(m)} d^{2} (\bar{p}, p^{(m)}), \end{matrix}

(3)

where

w^{(m)}

are the reliability weights associated with each QPU. In the cases where d corresponds to the Hellinger distance, the probability distribution

{\bar{p}}^{*}

which minimizes

D^{2}

is described in Appendix A.1.

3.4.2. Mean Integrated Square Error

Let us consider a single-circuit benchmark with ideal distribution

p_{x}^{(ideal)}

, and collection of M QPUs, with distribution

p_{x}^{(m)}

, sampled with a certain number of shots

n_{m}

, depending on the split policy and compactly denoted by the “split-shot” vector

\vec{n} = (n_{m})

. Any convex merge policy is defined by a weight vector

\vec{w} = {(w_{m})}_{m = 0}^{M - 1}

and a weighted distribution estimator as follows:

\begin{matrix} {\hat{p}}_{x}^{(\vec{w}; \vec{n})} \equiv \sum_{m = 0}^{M - 1} w_{m} {\hat{p}}_{x}^{(m; n_{m})}, where {\hat{p}}_{x}^{(m; n_{m})} \equiv \frac{1}{n_{m}} \sum_{y \in D^{(m)}} δ_{x, y} . \end{matrix}

(4)

The variables in Equation (4) are unbiased estimators of

p_{x}^{(\vec{w})} \equiv \sum_{m = 0}^{M - 1} w_{m} p_{x}^{(m)}

, which we want to make as close as possible to

p_{x}^{(i d e a l)}

by optimizing the weight parameters

\vec{w}

. Since each QPU contributes in general with a different number of shots, a dataset realization

D

can be formally decomposed into independent sub-datasets

D = \cup_{m = 0}^{M - 1} D^{(m)}

such that

| D^{(m)} | = n_{m}

with

D^{(m)}

sampled according to a multinomial distribution with

p_{x}^{(m)}

as probability for each extraction x. In the following discussion, we consider the Mean Integrated Square Error, defined as

\begin{matrix} MISE (\vec{w}; \vec{n}) \equiv E_{D = \cup_{m} D^{(m)}} [\sum_{x} {({\hat{p}}_{x}^{(\vec{w}; \vec{n})} [D] - p_{x}^{(ideal)})}^{2}], \end{matrix}

(5)

where the expectation value involves all possible realizations of the full dataset

D

with fixed split-shots

\vec{n}

and according to their probabilities. More details on the MISE definition and optimization are reported in Appendix A.2.

4. Demonstration Results

In this section, we present a numerical evaluation of a specific instantiation of the general framework. We begin by examining the calibration stage in Section 4.1, and then analyze the split and merge strategies in Section 4.2.

In both cases, compared to the general protocol shown in Figure 1, this initial numerical study focuses on a specialized protocol where most policies are fixed to simple rules, as illustrated in Figure 2. The codebase used for the demonstration evaluation, along with the corresponding demonstration data, is openly available at https://github.com/GBisi/shot-wise, accessed on 26 October 2025. This codebase can be used directly for shot-wise experiments or as a template for further development.

In particular, whereas the general framework includes a production stage with a scheduling policy allowing multiple incremental executions and optional early stopping, here we restrict the schedule to a simpler policy: the total budget of shots across all considered QPUs is consumed in a single iteration. A comprehensive study of scheduled incremental executions—requiring the exploration of various temporal distribution strategies—is left for future work. In this study, our primary focus is on the optimal distribution of computation across different QPUs.

Moreover, we chose not to employ any error mitigation techniques to maintain the data in its most unaltered and authentic form. This decision aligns with the primary goal of this work, which is not to propose new mitigation methods but rather to provide a clear and unbiased observation. Indeed, while it might happen that error mitigation might be more effective on a specific QPU than another, we assume that this would not change the relative unreliabilities in a relevant way.

4.1. Calibration and Ranking

For the calibration stage, according to the diagram of Figure 2, we first proceed with step (C.1) and select a uniform shot allocation for each QPU m in the set of QPUs considered and reported in Table 1; then, for step (C.2), we select a set of 10 random circuits

{U_{c}}

acting on five qubits which we use as benchmark circuits [56]. The OPENQAMS2 representations of the circuits are available at https://zenodo.org/records/14056270, accessed on 26 October 2025. The random circuits are sampled from the unitary Haar measure. After execution, we choose to assign as an unreliability coefficient the Mean Squared Hellinger distance of the results of each QPU (see Section 3.4.1, where the performance is averaged among the set of circuits considered). Estimates of the unreliability, defined as the Mean Square Hellinger distance for each QPU, are shown in Figure 3 (and summarized in Table 1), where measurements span a time window of about one month and a half.

It is interesting that even if there are sensible fluctuations in time, the best performance between the QPUs considered seems always represented by ‘ibm_sherbrooke’ followed by ‘ibm_kyoto’, while it is not always clear which between ‘ibm_brisbane’ and ‘ibm_osaka’ take the second and third place in the ranking. Furthermore, according to our observation, the performance of IONQ QPUs appears to be one order of magnitude worse. Due to the variability in QPU performances, we stress the importance of a somewhat frequent calibration and assessment of the unreliabilities, at least at the order of the regions of stability (i.e., a few hours).

4.2. Split-Merge Strategies

Here, we follow the right side of Figure 2, where the production stage is simplified as a single iteration and the only variable elements are the split and weight strategies, which we test in three variants for both steps: uniform (all shots are allocated/merged uniformly on the QPUs considered), Hellinger (Section 3.4.1), and MISE (Section 3.4.2). For simplicity, in the split, we are not including other factors that might reweight the resources allocated; for example, different costs per shot in the execution of different QPUs or in the queue and execution times (see Section 3.3 for a more in-depth summary of different situations or [21]).

The main demonstration results of this section are shown in Figure 4. Each panel represents the executions on a different circuit type, among six different cases considered. On the left part of each panel, the ‘baselines’ are shown for each QPU, obtained by running all the shots on single QPUs (the leftmost violins refer to the overall distribution of these results among the QPUs). The specific benchmark circuits have been generated as QASM code for five and eight qubits using the MQT Bench library [56]. On the right part of the panels are shown data with different split and merge strategies for the set of all QPUs (ibm+ionq for the

n_{q} = 5

data and only IBM for the

n_{q} = 8

data). We notice that, while the split-merged results never improve the best baseline for each circuit, nevertheless, the former appears to be quite robust and compatible with the average between the different baselines. Furthermore, the performance of each QPU depends heavily on the circuit considered, to the extent that the trends in the performance for some circuits are not always consistent. For example, considering the GHZ circuit, results involving all the seven QPUs considered show the opposite trend instead of the one expected, which results in an improvement from the uniform strategy to the MISE one, as observed in the other cases. This might be due to a high variance between the baselines in this case, which is not well reflected by the calibrated data, which is instead trained from random circuits. In generality, with some exceptions as the one mentioned before, we observe that either splitting or merging using the Hellinger or MISE strategy improves the results of just uniformly splitting and naively merging according to the uniform strategies alone. A complete account of all different combinations of split and merge policies for each of the circuits considered is available at https://zenodo.org/records/14056270, accessed on 26 October 2025.

Figure 5 illustrates the behavior of different split and merge policies as the number of QPUs increases, showing the performance of all policy combinations under varying QPU counts. A key trend is that the maximum error decreases consistently with more QPUs, while the minimum error increases slightly.

This trend reflects the core principle of the shot-wise methodology: it is generally difficult for a quantum programmer to predict which QPU will perform best for a given circuit at a given time. Distributing shots across multiple QPUs may introduce a small error increase compared to using the single best QPU, but since identifying that best QPU in real time is impractical, the distribution provides a safeguard that improves worst-case performance.

Indeed, shot-wise distribution consistently improves the worst-case outcome, while in the average case, the error either decreases or remains comparable to single-QPU execution. Thus, the approach offers a clear advantage: better robustness in the worst case, potential improvements in the average case, and no meaningful degradation overall.

Moreover, we observe that the standard deviation of the results decreases as the number of QPUs grows. This indicates that shot-wise distribution yields more stable outcomes, with reduced variability and a lower upper error bound, albeit at the slight expense of best-case accuracy.

This behavior is, for instance, clearly illustrated in Figure 6 and Figure 7, in which for each circuit we test two different sizes (five and eight qubits) and all split and merge policy combinations. The results of the shot-wise distribution approach are illustrated by distributing the shots from two to seven QPUs, and their results are compared with single and overall (all baselines) QPU executions.

4.3. Discussion

One key advantage of shot-wise distribution in the field of Quantum Software Engineering (QSE) is its robustness to quantum noise. By distributing shots across QPUs with varying noise profiles, the impact of errors is reduced, resulting in more stable outcomes. This is crucial for QSE, as it ensures that quantum applications perform reliably across different hardware platforms. Additionally, shot-wise distribution enhances worst-case performance by mitigating the risk of relying on a single, underperforming QPU, making it useful when the best-performing QPU is uncertain. The method also improves error resilience, as distributing shots across multiple QPUs helps balance the computational load and increases fault tolerance. If one QPU fails, others can continue the computation, ensuring continuity, which is important for long-running algorithms. Furthermore, shot-wise distribution is hardware-agnostic, allowing flexibility across different quantum architectures and making it easier to scale across a variety of platforms. Together with these advantages, shot-wise distribution introduces some challenges. Managing multiple QPUs adds overhead in terms of coordination, scheduling, and result aggregation. This complexity can slow down execution and requires more sophisticated resource management. Additionally, it may dilute best-case performance, as spreading shots can result in lower accuracy than concentrating them on the highest-performing QPU.

We acknowledge that, at a high level, distributing computations across heterogeneous QPUs can be seen as a statistical averaging mechanism. What our study contributes is a systematic framework that operationalizes this intuition into concrete split and merge policies, quantifies their impact, and demonstrates empirically that shot-wise distribution consistently improves worst-case performance without degradation. This goes beyond the intuitive observation by providing a structured, generalizable methodology. Another limitation is the dependency on reliable QPU calibration. For optimal results, accurate and up-to-date information about each QPU’s performance is essential. Without it, the distribution may be inefficient. In [21,23], we present a general multi-service architecture enabling shot-wise distribution and presenting an approach to retrieve up-to-date and reliable QPU calibration data. Interestingly, we observed that even when calibration data were collected at previous time points, the framework delivered consistent results. This robustness can be attributed to two factors: (i) QPU unreliability typically drifts slowly over periods of hours, producing quasi-stable regions (as shown in Figure 3); and (ii) distributing shots across multiple QPUs reduces sensitivity to transient fluctuations of any single device. Thus, although our calibration data were not real-time, the aggregated outcomes remain representative and effective. Future work will address dynamic, real-time calibration to better track QPU performance drift. In conclusion, the shot-wise methodology consistently performs at least as well as the average outcome of a single Quantum Processing Unit (QPU) while often surpassing many individual QPUs in various scenarios. This approach not only reduces variability, leading to more stable and reliable results, but also enhances overall performance when compared to executing all shots on a single QPU. Although it is not yet a complete solution for noise mitigation—a direction we intend to explore further—on average, the shot-wise method improves results and mitigates output variation. Additionally, it offers significant qualitative advantages, such as increased flexibility and customizability tailored to specific requirements. Overall, the shot-wise approach provides a stable and adaptable technique for executing shots across multiple heterogeneous Quantum Computers, combining both qualitative and quantitative benefits.

Beyond these empirical observations, our results invite a more careful theoretical interpretation. Shot-wise distribution can be understood as a statistical variance-reduction mechanism: each QPU samples from a noisy approximation of the ideal output distribution, and merging across QPUs effectively ensembles these estimators. This ensemble averaging suppresses device-specific fluctuations, explaining why our experiments (Figure 4, Figure 5, Figure 6 and Figure 7) show a consistent reduction in error spread and improved worst-case stability.

Moreover, the split and merge policies reveal how error propagation is shaped by allocation choices. For instance, policies weighted by Hellinger distance privilege devices closer to the ideal distribution, thereby reducing bias, while MISE-based policies explicitly balance bias and variance. These observations highlight that shot-wise distribution is not merely a heuristic for spreading work, but a principled optimization problem over probability distributions, where policy design governs the robustness–accuracy trade-off.

Another insight is that distributing shots across heterogeneous QPUs mitigates correlated errors: while a single device may exhibit transient or systematic drifts, their impact is diluted in the merged outcome. This explains the observed narrowing of performance variability as the number of QPUs increases (Figure 5). In practice, the method transforms hardware heterogeneity from a limitation into a resource, converting device variability into statistical robustness.

Taken together, these interpretations clarify why the shot-wise methodology delivers stable performance even without real-time calibration: slow device drifts are absorbed by averaging, while transient anomalies are suppressed. Future research will extend this theoretical foundation by exploring formal error bounds and connecting shot-wise strategies to ensemble learning theory in classical machine learning, where robustness is likewise achieved by aggregating imperfect predictors.

5. Conclusions and Future Work

In summary, our study introduces a novel framework for quantum computation that addresses the limitations of heterogeneous, noisy Quantum Processing Units (QPUs). Unlike the traditional monolithic execution of circuits, our shot-wise approach leverages the probabilistic nature of quantum mechanics to distribute circuit shots across multiple QPUs. This strategy transforms hardware variability from a challenge into an opportunity, improving robustness while maintaining broad applicability.

With that aim, we propose a methodology that advocates for a departure from the traditional monolithic execution of quantum circuits. By capitalizing on the inherent probabilistic nature of quantum mechanics, our shot-wise approach enables distributed execution of quantum tasks across multiple noisy Quantum Computers. We propose a general methodological framework, parameterized by a set of customizable policies, which allows for fine-grained management and distribution of shots across multiple heterogeneous noisy Quantum Computers, even for a single quantum circuit.

At the methodological level, the framework integrates three core elements: (i) shot distribution through customizable split and merge policies that balance accuracy and stability; (ii) calibration to pre-evaluate QPU reliability and guide allocation; and (iii) incremental execution, where feedback from earlier iterations dynamically refines allocation decisions. Together, these mechanisms enable fine-grained, adaptive management of resources without requiring prior knowledge of circuit structure or noise models, ensuring the framework remains hardware-agnostic and operationally realistic.

The term incremental execution refers to the iterative scheduling of circuit runs in multiple stages, where information from earlier executions is used to update shot allocation and merging strategies. It does not necessarily imply that the quality of results improves incrementally at each step, though this may occur in practice depending on noise and policy choices. Rather, the objective of incremental execution is to progressively refine the decision-making process itself: by feeding back empirical evidence from intermediate runs, the framework can adapt policies to the specific characteristics of the circuit, the QPUs involved, and the observed noise conditions. This adaptive loop allows the execution strategy to become increasingly tailored to the scenario at hand, potentially leading to more robust outcomes than a static, one-shot allocation.

Importantly, the proposed framework is entirely black-box with respect to both the structure of the quantum circuits and the underlying noise models of the QPUs. All decisions and computations are grounded solely in empirically obtained data through calibration (and previous iterations), without relying on prior knowledge of circuit structure or internal hardware models. This ensures wide applicability across diverse quantum hardware platforms and circuit classes, and emphasizes operational realism by strictly adhering to observable behavior rather than assumed theoretical characteristics.

Our demonstration evaluation across two major providers and distinct qubit technologies confirms these advantages. Empirical results show that shot-wise distribution consistently improves worst-case stability, narrows variability in results, and produces outcomes that are competitive with or superior to many single-QPU executions. Importantly, this robustness holds even when calibration data are slightly outdated, thanks to the slow drift of QPU reliability and the averaging effect of multi-device execution. The main trade-off is a slight dilution of best-case performance, as distributing shots can limit the benefits of exceptionally well-calibrated devices.

We further note that the shot-wise approach has already been successfully combined with circuit cutting techniques in a separate study [57]. In that work, shot-wise execution was evaluated in scenarios involving up to eight quantum backends and circuits as large as 14 qubits. The results obtained reinforce the conclusions of the present study, confirming the benefits of distributed execution even under realistic and challenging conditions. Notably, the shot-wise strategy demonstrated minimal overhead—only a few milliseconds—even in configurations with eight QPUs and large circuits, with execution latency growing at most linearly with the number of QPUs. That work also provides a deeper analysis of the runtime performance and behavior of shot-wise execution in an application-driven context. Since the focus of this article is on methodological foundations and general-purpose execution strategies, we refer to [57] for a more comprehensive exploration of such application-specific aspects. Currently, our research team is also working on combining shot-wise distribution with state-of-the-art methodologies in which quantum circuits are combined, through multi-programming [58,59], inside the same QPU. Also in this case, combining the two techniques seems to improve the overall performance in terms of noise resistance. These combinations further highlight the benefits, possibility, and potential of adding shot-wise distribution as a transparent step in the quantum execution pipeline, combining multiple QSE solutions (e.g., circuit cutting, multi-programming, noise mitigation).

In conclusion, the shot-wise methodology emerges as a promising approach, yielding results that improve the worst-case performance and are often superior to individual QPUs in diverse scenarios. This method not only reduces variability, enhancing result stability, but also improves overall performance compared to single QPU execution. While it does not fully address noise mitigation, an area for future research, the shot-wise strategy offers substantial qualitative benefits, including increased flexibility and adaptability to specific needs, blending both qualitative and quantitative benefits.

With this work, we aim to contribute to the development of more efficient and reliable quantum computing systems, overcoming current limitations and inspiring further research and innovation in the field.

Several interesting future research directions may emerge from this study:

Shot-wise Error Mitigation: In this paper, our demonstration assessment provides initial evidence that the shot-wise methodology offers promise as an effective approach for managing quantum computations in the presence of diverse heterogeneous QPUs. Additionally, our findings suggest that this strategy may facilitate error cancellation of different Quantum Computers, resulting in a merged final distribution that is more reliable than the average partial distribution obtained from a QPU. However, further studies and an in-depth comparison with state-of-the-art noise mitigation methodologies are warranted to validate these preliminary findings conclusively. If confirmed, future research endeavours could focus on designing and developing robust shot-wise error mitigation techniques to enhance the effectiveness and reliability of quantum computation methodologies.

Design and Development of Additional Split and Merge Policies: This paper introduced and examined four split and merge policies, embedded with and without calibration data. Future research could expand upon these policies, conducting comparative analyses to discern their efficacy. Moreover, exploring scenarios where specific combinations of policies outperform others could offer valuable insights. Currenlty, our approach assumes that the input circuit can be freshly instantiated on each QPU from a classical description. In scenarios where the circuit output of one QPU directly feeds another (without measurement), our methodology would not apply, since shot replication cannot be achieved by quantum state copying. Future research may investigate hybrid models where distributed circuit fragments (via circuit cutting or teleportation-based schemes) are combined with shot-wise execution.

Tailored Calibrations: In this work, we presented a general, customizable framework to perform shot-wise distribution of quantum computations. We have, then, tested the framework in a general setting in which the calibration phase is executed on random circuits to have a calibration that can be suitable in numerous scenarios. However, the shot-wise methodology could be applied even on a specific scenario (e.g., with Variation Quantum Algorithms (VQAs) [60]) and optimize the calibration phase for that specific setting. For instance, the calibration circuits could all have the same parametric quantum circuit with random parameters when working with VQAs. Finally, a key open challenge is the scalability of our calibration approach, since benchmarking against ideal distributions becomes impractical as system sizes grow. Future research will investigate approximate, task-specific calibration methods, and hybrid error mitigation strategies that can extend shot-wise distribution beyond the small- to medium-scale circuits tested here.

Incremental Execution, Scheduling Policies, and Stop Conditions: An intriguing direction for future research involves a more comprehensive exploration of incremental execution and its impact on quantum computation performance. Accompanying this investigation, the study and development of diverse scheduling policies and stop conditions could further optimize the execution process.

Additional demonstrations on More Quantum Providers and Real Quantum Hardware: To further validate and strengthen the findings of this study, conducting additional demonstrations involving multiple quantum providers and utilizing real quantum hardware is essential. By exploring various environments and defining practical benchmark use cases, researchers may gain deeper insights into the robustness and applicability of proposed strategies.

Author Contributions

Conceptualization, G.B. and G.C.; methodology, G.B. and G.C.; software, G.B. and G.C.; validation, G.B. and G.C.; investigation, G.B. and G.C.; data curation, G.B. and G.C.; writing—original draft preparation, G.B. and G.C.; writing—review and editing, G.B., G.C., J.G.-A., J.M.M., M.D. and A.B.; visualization, G.B. and G.C.; supervision, J.G.-A., J.M.M., M.D. and A.B.; funding acquisition, M.D. and A.B. All authors have read and agreed to the published version of the manuscript.

Funding

GC and MD acknowledge support from Fondazione ICSC-National Centre on HPC, Big Data and Quantum Computing-SPOKE 10 (Quantum Computing) and received funding from the European Union Next-GenerationEU-National Recovery and Resilience Plan (NRRP)—MISSION 4 COMPONENT 2, INVESTMENT N. 1.4–CUP N. I53C22000690001. JGA and JMM were partially funded by the European Union “Next GenerationEU/PRTR”, by the Ministry of Science, Innovation and Universities (TED2021-130913B-I00, RED2022-134148-T, and PDC2022-133465-I00). They were also supported by QSERV project funded by the Spanish Ministry of Science and Innovation and ERDF; by the Regional Ministry of Economy, Science and Digital Agenda of the Regional Government of Extremadura (GR21133); and by European Union under the Agreement-101083667 of the Project “TECH4E -Tech4effiency EDlH” regarding the Call: DIGITAL-2021-EDlH-01 supported by the European Commission through the Digital Europe Program. AB and Gb were partially funded by the University of Pisa under the project hOlistic Sustainable Management of distributed softWARE systems (OSMWARE), under Grant UNIPI PRA_2022_64.

Data Availability Statement

The codebase employed for the demonstration evaluation and demonstration data are freely available at https://github.com/GBisi/shot-wise, accessed on 26 October 2025. The OPENQAMS2 representations of the circuits and a complete account of the results of all different combinations of split and merge policies for each of the circuits considered are available at https://zenodo.org/records/14056270, accessed on 26 October 2025.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Optimal Distributions and Weights

In this section, we show some details about the optimality criteria and the merged probability distributions for the Weighted Square Hellinger Distance (Appendix A.1) and the Mean Integrated Square Error (Appendix A.2), as well as the optimal weights for the split.

Appendix A.1. Optimal Weighted Square Hellinger Distance

In the case of

d_{H}

being the Hellinger distance and denoting by

(m)

the quantity associated to the m-th QPU in the set of QPUs considered, the expression in Equation (3) becomes

\begin{matrix} D_{Hell}^{2} (\bar{p}; p^{(m)}, w^{(m)}) = 1 - \sum_{m = 0}^{M - 1} w^{(m)} \sum_{x} \sqrt{{\bar{p}}_{x} p_{x}^{(m)}} . \end{matrix}

(A1)

For fixed weights

w^{(m)}

, it is straightforward to show that the optimal solution, which minimizes

D_{Hell}^{2}

with constraints

0 \leq {\bar{p}}_{x} \leq 1 \forall x

, is

\begin{matrix} {\bar{p}}_{x}^{* (Hell)} = {[\sum_{m = 0}^{M - 1} w^{(m)} \sqrt{p_{x}^{(m)}}]}^{2} . \end{matrix}

(A2)

As before, since we do not have direct access to the actual QPU distributions

p^{(m)}

, but only to their relative counts

\frac{{\hat{c}}^{(m)}}{n^{(m)}}

, we must take into account the bias due to the non-linearity of the square root, as discussed in Appendix B.

Appendix A.2. Optimal Mean Integrated Square Error

It is useful to formally decompose the MISE in Equation (5) as a sum of two contributions

\begin{matrix} MISE (\vec{w}; \vec{n}) = VAR (\vec{w}; \vec{n}) + {BIAS}^{2} (\vec{w}), \end{matrix}

(A3)

defined as

\begin{matrix} VAR (\vec{w}; \vec{n}) & \equiv \sum_{x} E_{D = \cup_{m} D^{(m)}} [{({\hat{p}}_{x}^{(\vec{w}; \vec{n})} [D] - p_{x}^{(\vec{w})})}^{2}], \end{matrix}

(A4)

\begin{matrix} {BIAS}^{2} (\vec{w}) & \equiv \sum_{x} {(p_{x}^{(\vec{w})} - p_{x}^{(ideal)})}^{2}, \end{matrix}

(A5)

where the first encodes the fluctuations for different realizations of the dataset

D

around

p_{x}^{(m)}

, while the second, independent of the dataset, quantifies between the target distribution

p_{x}^{(i d e a l)}

and the one obtained from a convex weighted average.

While one cannot compute exactly

VAR

and

BIAS

, we can nevertheless estimate them from resamples of a single realization of a dataset. For example, even without knowing the exact weighted distribution

p_{x}^{(\vec{w})}

, we can estimate

VAR

from two datasets

D

and

D^{'}

, sampled independently from as a multinomial with

p_{x}^{(\vec{w})}

, as

\begin{matrix} VAR (\vec{w}; \vec{n}) & ≃ \frac{1}{2} \sum_{x} {({\hat{p}}_{x}^{(\vec{w}; \vec{n})} [D] - {\hat{p}}_{x}^{(\vec{w}; \vec{n})} [D^{'}])}^{2}, \end{matrix}

(A6)

or, more practically, as an average between many bootstrap resamples of a single dataset.

Assuming we can always find a convex solution in the bulk of the simplex made of weight parameters

w_{m} \in [0, 1]

and

\sum_{m = 0}^{M - 1} w_{m} = 1

, we can minimize the MISE, including the normalization constraint on the weights, by adding a Lagrange multiplier

μ

as follows:

\begin{matrix} Λ (\vec{w}, μ; \vec{n}) & \equiv MISE (\vec{w}; \vec{n}) + 2 μ (\sum_{m} w_{m} - 1) . \end{matrix}

(A7)

The function

Λ

can then be minimized as customary by computing the derivatives with respect to

\vec{w}

and

μ

, which results in a linear system

C \vec{w} = \vec{f} - μ \vec{1}

where

\begin{matrix} C_{m, m^{'}} & \equiv \sum_{x} E_{D} [{\hat{p}}_{x}^{(m; n_{m})} [D] {\hat{p}}_{x}^{(m^{'}; n_{m^{'}})} [D]], \end{matrix}

(A8)

\begin{matrix} f_{m} & \equiv \sum_{x} p_{x}^{(m)} p_{x}^{(i d e a l)} . \end{matrix}

(A9)

Therefore, tuning

μ

in such a way to make

w_{k}

properly normalized (enforcing

\partial_{μ} Λ = 0

), we have

\begin{matrix} {\bar{w}}_{m} & = \sum_{m^{'}} {(C^{- 1})}_{m, m^{'}} [f_{m^{'}} - \bar{μ}], \end{matrix}

(A10)

\begin{matrix} \bar{μ} & \equiv \frac{(\sum_{m^{'}, m} {(C^{- 1})}_{m^{'}, m} f_{m} - 1)}{\sum_{{\tilde{m}}^{'}, \tilde{m}} {(C^{- 1})}_{{\tilde{m}}^{'}, \tilde{m}}} . \end{matrix}

(A11)

It might happen that some values of

{\bar{w}}_{m}

are negative and cannot be interpreted as convex weighted average in the bulk of the weight simplex. In that case, one can exclude QPUs m with

{\bar{w}}_{m} < 0

and compute the analysis on the remaining subset of QPUs. In general, the exact QPU distributions

p_{x}^{(m)}

are not available, so the matrix C and vector f have to be estimated from the respective counted distributions, as well as

{\bar{w}}_{m}

and

\bar{μ}

, whose bias should be possibly removed via resampling techniques.

For the calibration stage, we consider a number

N_{c} > 1

of circuits, associated with ideal distributions

p_{x}^{(i d e a l, c)}

. The optimal weights can still be estimated from Equation (A10), but where the matrix C and vector f come from a minimization of the average

M I S E^{(c)}

for each circuit c and weighted according to the relative number of shots per circuit

\frac{n^{(c)}}{n_{tot}}

, which results in

\begin{matrix} C_{m, m^{'}} & \equiv \sum_{c = 0}^{N_{c} - 1} \frac{n^{(c)}}{n_{tot}} E_{D^{(c)}} [\sum_{x} ({\hat{p}}_{x}^{(m, c; n_{m})} [D^{(c)}] {\hat{p}}_{x}^{(m^{'}, c; n_{m^{'}})} [D^{(c)}])], \end{matrix}

(A12)

\begin{matrix} f_{m} & \equiv \sum_{c = 0}^{N_{c} - 1} \frac{n^{(c)}}{n_{tot}} \sum_{x} p_{x}^{(m, c)} p_{x}^{(i d e a l, c)}, \end{matrix}

(A13)

where

D^{(c)} \equiv \cup_{m} D^{(m, c)}

denotes the union of the datasets from each QPU for circuit c, while the number of shots per circuit is

n^{(c)} = \sum_{m} n^{(m, c)}

.

Appendix B. Removing the Bias in Distance Estimates

Let us consider the estimation of the Hellinger distance between a distribution p sampled with n shots, and a known distribution q. The dataset of the sampled distribution can be written as

{y_{i}}_{i = 0}^{n - 1}

, and the counts read

{\hat{c}}_{x} = \sum_{i = 0}^{n - 1} δ_{x, y_{i}}

. A naive estimate of the Hellinger distance between p (unknown) and q (known) reads

\begin{matrix} {\hat{d}}_{H} \equiv d_{H} (\hat{p}, q) = \sqrt{1 - \sum_{x} \sqrt{\frac{{\hat{c}}_{x}}{n} q_{x}}} . \end{matrix}

(A14)

However, as mentioned in Section 3, this typically overestimates the true distance, since

E [d_{H}^{2} (\hat{p}, q)] \geq d_{H} (p, q)

by Jensen inequality, with equality holding only in boundary cases. The jackknife estimate of the Hellinger distance is defined as

\begin{matrix} {\hat{d}}_{H jack} \equiv \frac{1}{n} \sum_{i = 0}^{n - 1} {\hat{d}}_{H (i)}, \end{matrix}

(A15)

where

{\hat{d}}_{H (i)}

is the naive estimate of

d_{H}

computed by removing the i-th count from the dataset. It is straightforward to prove that, in the case of a multinomial distribution, the jackknife estimate of the Hellinger distance reads

\begin{matrix} {\hat{d}}_{H jack} & = \frac{1}{n} \sum_{y} {\hat{c}}_{y} {\hat{d}}_{H (y)}, \end{matrix}

(A16)

\begin{matrix} Δ d_{H jack} & = \sqrt{\frac{n - 1}{n} \sum_{y} {\hat{c}}_{y} {({\hat{d}}_{H (y)} - {\hat{d}}_{H jack})}^{2}} \end{matrix}

(A17)

where

\begin{matrix} {\hat{d}}_{H (y)} \equiv \sqrt{1 - \frac{1}{\sqrt{n - 1}} [\sqrt{n} (1 - {\hat{d}}_{H}^{2}) - (\sqrt{{\hat{c}}_{y}} - \sqrt{{\hat{c}}_{y} - 1}) \sqrt{q_{y}}]} . \end{matrix}

(A18)

The estimate in Equation (A16) is still affected by bias

O (n^{- 1})

, which can nevertheless be easily corrected with the following prescription [61]

\begin{matrix} {\hat{d}}_{H jack}^{*} = {\hat{d}}_{H jack} + n ({\hat{d}}_{H} - {\hat{d}}_{H jack}) . \end{matrix}

(A19)

Finally, in the case of a Hellinger distance where both probability distributions are estimated through two separate datasets, the analysis proceeds as before, by taking into consideration the independence between the extractions for the two datasets involved.

References

Arute, F.; Arya, K.; Babbush, R.; Bacon, D.; Bardin, J.C.; Barends, R.; Biswas, R.; Boixo, S.; Brandao, F.G.; Buell, D.A.; et al. Quantum supremacy using a programmable superconducting processor. Nature 2019, 574, 505–510. [Google Scholar] [CrossRef] [PubMed]
Kim, Y.; Eddins, A.; Anand, S.; Wei, K.X.; van den Berg, E.; Rosenblatt, S.; Nayfeh, H.; Wu, Y.; Zaletel, M.; Temme, K.; et al. Evidence for the utility of quantum computing before fault tolerance. Nature 2023, 618, 500–505. [Google Scholar] [CrossRef]
Zhou, X.; Li, X.; Chen, Q.; Koolstra, G.; Yang, G.; Dizdar, B.; Huang, Y.; Wang, C.S.; Han, X.; Zhang, X.; et al. Electron charge qubit with 0.1 millisecond coherence time. Nat. Phys. 2024, 20, 116–122. [Google Scholar] [CrossRef]
Ladd, T.D.; Jelezko, F.; Laflamme, R.; Nakamura, Y.; Monroe, C.; O’Brien, J.L. Quantum computers. Nature 2010, 464, 45–53. [Google Scholar] [CrossRef] [PubMed]
Berezutskii, A.; Liu, M.; Acharya, A.; Ellerbrock, R.; Gray, J.; Haghshenas, R.; He, Z.; Khan, A.; Kuzmin, V.; Lyakh, D.; et al. Tensor networks for quantum computing. Nat. Rev. Phys. 2025, 7, 581–593. [Google Scholar] [CrossRef]
Bisicchia, G.; García-Alonso, J.; Murillo, J.M.; Brogi, A. From Quantum Software Handcrafting to Quantum Software Engineering. In Proceedings of the 2024 IEEE International Conference on Software Analysis, Evolution and Reengineering—Companion (SANER-C), Rovaniemi, Finland, 12 March 2024; pp. 149–150. [Google Scholar] [CrossRef]
Preskill, J. Quantum computing in the NISQ era and beyond. Quantum 2018, 2, 79. [Google Scholar] [CrossRef]
Proctor, T.; Rudinger, K.; Young, K.; Nielsen, E.; Blume-Kohout, R. Measuring the capabilities of quantum computers. Nat. Phys. 2022, 18, 75–79. [Google Scholar] [CrossRef]
Knill, E. Quantum computing with realistically noisy devices. Nature 2005, 434, 39–44. [Google Scholar] [CrossRef]
Bharti, K.; Cervera-Lierta, A.; Kyaw, T.H.; Haug, T.; Alperin-Lea, S.; Anand, A.; Degroote, M.; Heimonen, H.; Kottmann, J.S.; Menke, T.; et al. Noisy intermediate-scale quantum algorithms. Rev. Mod. Phys. 2022, 94, 015004. [Google Scholar] [CrossRef]
Gong, L.H.; Chen, Y.Q.; Zhou, S.; Zeng, Q.W. Dual Discriminators Quantum Generation Adversarial Network Based on Quantum Convolutional Neural Network. Adv. Quantum Technol. 2025, 8, e2500224. [Google Scholar] [CrossRef]
Zhou, N.R.; Chen, Z.Y.; Liu, Y.Y.; Gong, L.H. Multi-party semi-quantum private comparison protocol of size relation with d-level GHZ states. Adv. Quantum Technol. 2025, 8, 2400530. [Google Scholar] [CrossRef]
Pei, J.J.; Gong, L.H.; Qin, L.G.; Zhou, N.R. One-to-many image generation model based on parameterized quantum circuits. Digit. Signal Process. 2025, 165, 105340. [Google Scholar] [CrossRef]
Andersen, C.K.; Remm, A.; Lazar, S.; Krinner, S.; Lacroix, N.; Norris, G.J.; Gabureac, M.; Eichler, C.; Wallraff, A. Repeated quantum error detection in a surface code. Nat. Phys. 2020, 16, 875–880. [Google Scholar] [CrossRef]
Endo, S.; Benjamin, S.C.; Li, Y. Practical quantum error mitigation for near-future applications. Phys. Rev. X 2018, 8, 031027. [Google Scholar] [CrossRef]
Lidar, D.A.; Brun, T.A. Quantum Error Correction; Cambridge University Press: Cambridge, UK, 2013. [Google Scholar]
Peng, T.; Harrow, A.W.; Ozols, M.; Wu, X. Simulating large quantum circuits on a small quantum computer. Phys. Rev. Lett. 2020, 125, 150504. [Google Scholar] [CrossRef]
Tang, W.; Tomesh, T.; Suchara, M.; Larson, J.; Martonosi, M. Cutqc: Using small quantum computers for large quantum circuit evaluations. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Virtual, 19–23 April 2021; pp. 473–486. [Google Scholar]
Garcia-Alonso, J.; Rojo, J.; Valencia, D.; Moguel, E.; Berrocal, J.; Murillo, J.M. Quantum software as a service through a quantum API gateway. IEEE Internet Comput. 2021, 26, 34–41. [Google Scholar] [CrossRef]
Salm, M.; Barzen, J.; Breitenbücher, U.; Leymann, F.; Weder, B.; Wild, K. The NISQ analyzer: Automating the selection of quantum computers for quantum algorithms. In Proceedings of the Symposium and Summer School on Service-Oriented Computing, Crete, Greece, 13–19 September 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 66–85. [Google Scholar]
Bisicchia, G.; García-Alonso, J.; Murillo, J.M.; Brogi, A. Distributing quantum computations, by shots. In Proceedings of the International Conference on Service-Oriented Computing, Rome, Italy, 28 November–1 December 2023; pp. 363–377. [Google Scholar]
Zhu, L.; Liang, S.; Yang, C.; Li, X. Optimizing shot assignment in variational quantum eigensolver measurement. J. Chem. Theory Comput. 2024, 20, 2390–2403. [Google Scholar] [CrossRef]
Bisicchia, G.; García-Alonso, J.; Murillo, J.M.; Brogi, A. Dispatching shots among multiple quantum computers: An architectural proposal. In Proceedings of the 2023 IEEE International Conference on Quantum Computing and Engineering (QCE), Bellevue, WA, USA, 17–22 September 2023; Volume 2, pp. 195–198. [Google Scholar]
Bisicchia, G.; Clemente, G.; Garcia-Alonso, J.; Rodríguez, J.M.M.; D’Elia, M.; Brogi, A. Distributing Quantum Computations, Shot-wise. arXiv 2024, arXiv:2411.16530. [Google Scholar] [CrossRef]
Lau, J.W.Z.; Lim, K.H.; Shrotriya, H.; Kwek, L.C. NISQ computing: Where are we and where do we go? AAPPS Bull. 2022, 32, 27. [Google Scholar] [CrossRef]
Shor, P.W. Algorithms for quantum computation: Discrete logarithms and factoring. In Proceedings of the 35th Annual Symposium on Foundations of Computer Science, Santa Fe, NM, USA, 20–22 November 1994; pp. 124–134. [Google Scholar]
Shor, P.W. Scheme for reducing decoherence in quantum computer memory. Phys. Rev. A 1995, 52, R2493. [Google Scholar] [CrossRef] [PubMed]
Shor, P.W. Fault-tolerant quantum computation. In Proceedings of the 37th Conference on Foundations of Computer Science, Burlington, VT, USA, 14–16 October 1996; pp. 56–65. [Google Scholar]
Knill, E.; Laflamme, R. Theory of quantum error-correcting codes. Phys. Rev. A 1997, 55, 900. [Google Scholar] [CrossRef]
Terhal, B.M. Quantum error correction for quantum memories. Rev. Mod. Phys. 2015, 87, 307. [Google Scholar] [CrossRef]
Temme, K.; Bravyi, S.; Gambetta, J.M. Error mitigation for short-depth quantum circuits. Phys. Rev. Lett. 2017, 119, 180509. [Google Scholar] [CrossRef]
Kandala, A.; Temme, K.; Córcoles, A.D.; Mezzacapo, A.; Chow, J.M.; Gambetta, J.M. Error mitigation extends the computational reach of a noisy quantum processor. Nature 2019, 567, 491–495. [Google Scholar] [CrossRef]
Giurgica-Tiron, T.; Hindy, Y.; LaRose, R.; Mari, A.; Zeng, W.J. Digital zero noise extrapolation for quantum error mitigation. In Proceedings of the 2020 IEEE International Conference on Quantum Computing and Engineering (QCE), Denver, CO, USA, 12–16 October 2020; pp. 306–316. [Google Scholar]
Endo, S.; Cai, Z.; Benjamin, S.C.; Yuan, X. Hybrid quantum-classical algorithms and quantum error mitigation. J. Phys. Soc. Jpn. 2021, 90, 032001. [Google Scholar] [CrossRef]
Cai, Z.; Babbush, R.; Benjamin, S.C.; Endo, S.; Huggins, W.J.; Li, Y.; McClean, J.R.; O’Brien, T.E. Quantum error mitigation. Rev. Mod. Phys. 2023, 95, 045005. [Google Scholar] [CrossRef]
Heckey, J.; Patil, S.; JavadiAbhari, A.; Holmes, A.; Kudrow, D.; Brown, K.R.; Franklin, D.; Chong, F.T.; Martonosi, M. Compiler management of communication and parallelism for quantum computation. In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, Istanbul, Turkey, 14–18 March 2015; pp. 445–456. [Google Scholar]
Tannu, S.S.; Qureshi, M. Ensemble of diverse mappings: Improving reliability of quantum computers by orchestrating dissimilar mistakes. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, Columbus, OH, USA, 12–16 October 2019; pp. 253–265. [Google Scholar]
Tannu, S.S.; Qureshi, M.K. Not all qubits are created equal: A case for variability-aware policies for NISQ-era quantum computers. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, Providence, RI, USA, 13–17 April 2019; pp. 987–999. [Google Scholar]
Murali, P.; Baker, J.M.; Javadi-Abhari, A.; Chong, F.T.; Martonosi, M. Noise-adaptive compiler mappings for noisy intermediate-scale quantum computers. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, Providence, RI, USA, 13–17 April 2019; pp. 1015–1029. [Google Scholar]
Li, G.; Ding, Y.; Xie, Y. Tackling the qubit mapping problem for NISQ-era quantum devices. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, Providence, RI, USA, 13–17 April 2019; pp. 1001–1014. [Google Scholar]
Ayral, T.; Le Régent, F.M.; Saleem, Z.; Alexeev, Y.; Suchara, M. Quantum divide and compute: Hardware demonstrations and noisy simulations. In Proceedings of the 2020 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), Limassol, Cyprus, 6–8 July 2020; pp. 138–140. [Google Scholar]
Ayral, T.; Régent, F.M.L.; Saleem, Z.; Alexeev, Y.; Suchara, M. Quantum divide and compute: Exploring the effect of different noise sources. SN Comput. Sci. 2021, 2, 132. [Google Scholar] [CrossRef]
Perlin, M.A.; Saleem, Z.H.; Suchara, M.; Osborn, J.C. Quantum circuit cutting with maximum-likelihood tomography. npj Quantum Inf. 2021, 7, 64. [Google Scholar] [CrossRef]
Lowe, A.; Medvidović, M.; Hayes, A.; O’Riordan, L.J.; Bromley, T.R.; Arrazola, J.M.; Killoran, N. Fast quantum circuit cutting with randomized measurements. Quantum 2023, 7, 934. [Google Scholar] [CrossRef]
Chatterjee, T.; Das, A.; Mohtashim, S.I.; Saha, A.; Chakrabarti, A. Qurzon: A prototype for a divide and conquer-based quantum compiler for distributed quantum systems. SN Comput. Sci. 2022, 3, 323. [Google Scholar] [CrossRef]
Salm, M.; Barzen, J.; Leymann, F.; Weder, B.; Wild, K. Automating the comparison of quantum compilers for quantum circuits. In Proceedings of the Symposium and Summer School on Service-Oriented Computing, Online, 13–17 September 2021; pp. 64–80. [Google Scholar]
Salm, M.; Barzen, J.; Leymann, F.; Wundrack, P. How to Select Quantum Compilers and Quantum Computers Before Compilation. In Proceedings of the CLOSER, Prague, Czech Republic, 26–28 April 2023; pp. 172–183. [Google Scholar]
Salm, M.; Barzen, J.; Leymann, F.; Weder, B. Prioritization of Compiled Quantum Circuits for Different Quantum Computers. In Proceedings of the IEEE SANER, Honolulu, HI, USA, 15–18 March 2022; pp. 1258–1265. [Google Scholar]
Salm, M.; Barzen, J.; Leymann, F.; Wundrack, P. Optimizing the Prioritization of Compiled Quantum Circuits by Machine Learning Approaches. In Proceedings of the CCIS, Dalian, China, 17–19 October 2022; Volume 1603, pp. 161–181. [Google Scholar]
Ravi, G.S.; Smith, K.N.; Murali, P.; Chong, F.T. Adaptive job and resource management for the growing quantum cloud. In Proceedings of the IEEE QCE, Broomfield, CO, USA, 17–22 October 2021; pp. 301–312. [Google Scholar]
Grossi, M.; Crippa, L.; Aita, A.; Bartoli, G.; Sammarco, V.; Picca, E.; Said, N.; Tramonto, F.; Mattei, F. A serverless cloud integration for quantum computing. arXiv 2021, arXiv:2107.02007. [Google Scholar] [CrossRef]
Quetschlich, N.; Burgholzer, L.; Wille, R. Predicting Good Quantum Circuit Compilation Options. arXiv 2022, arXiv:2210.08027. [Google Scholar] [CrossRef]
Nielsen, M.A.; Chuang, I.L. Quantum Computation and Quantum Information; Cambridge University Press: Cambridge, UK, 2012. [Google Scholar]
Hellinger, E. Neue Begründung der Theorie quadratischer Formen von unendlichvielen Veränderlichen. J. Eine Angew. Math. 1909, 1909, 210–271. [Google Scholar] [CrossRef]
Bhattacharyya, A. On a measure of divergence between two statistical populations defined by their probability distribution. Bull. Calcutta Math. Soc. 1943, 35, 99–110. [Google Scholar]
Quetschlich, N.; Burgholzer, L.; Wille, R. MQT Bench: Benchmarking Software and Design Automation Tools for Quantum Computing. Quantum 2023, 7, 1062. [Google Scholar] [CrossRef]
Bisicchia, G.; Bocci, A.; García-Alonso, J.; Murillo, J.M.; Brogi, A. Cut&Shoot: Cutting & Distributing Quantum Circuits Across Multiple NISQ Computers. In Proceedings of the 2024 IEEE International Conference on Quantum Computing and Engineering (QCE), Montreal, QC, Canada, 15–20 September 2024; Volume 2, pp. 187–192. [Google Scholar]
Ohkura, Y.; Satoh, T.; Van Meter, R. Simultaneous execution of quantum circuits on current and near-future NISQ systems. IEEE Trans. Quantum Eng. 2022, 3, 2500210. [Google Scholar] [CrossRef]
Das, P.; Tannu, S.S.; Nair, P.J.; Qureshi, M. A case for multi-programming quantum computers. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, Columbus, OH, USA, 12–16 October 2019; pp. 291–303. [Google Scholar]
Cerezo, M.; Arrasmith, A.; Babbush, R.; Benjamin, S.C.; Endo, S.; Fujii, K.; McClean, J.R.; Mitarai, K.; Yuan, X.; Cincio, L.; et al. Variational quantum algorithms. Nat. Rev. Phys. 2021, 3, 625–644. [Google Scholar] [CrossRef]
Cameron, A.; Trivedi, P. Microeconometrics: Methods and Applications; Cambridge University Press: Cambridge, UK, 2005. [Google Scholar]

Figure 1. Diagram of the general strategy of heterogeneous quantum computation discussed in the text. Thin arrows connect steps in sequence, while thick arrows describe dependencies (with dashed line describing optional dependency).

Figure 2. Diagram of the strategies considered in the numerical investigation of Section 4, as a specific instantiation of the general strategy depicted in Figure 1.

Figure 3. Behavior of the Mean Square Hellinger distance on a fixed set of 10 random circuits

{U_{c}}

with

q = 5

qubits as a measure of unreliability for each QPU considered in this work, reported in Table 1.

Figure 3. Behavior of the Mean Square Hellinger distance on a fixed set of 10 random circuits

{U_{c}}

with

q = 5

qubits as a measure of unreliability for each QPU considered in this work, reported in Table 1.

Figure 4. Results of single QPU executions (left part of the panels) and of the split and merged results from the full set of available QPUs (see text for details) in terms of the Hellinger distance (

d_{H}

, see Equation (2)) from the ideal case. Points are slightly shifted on the horizontal axis for better readability. Solid red triangles indicate experiments with 5 qubits, and solid blue inverted triangles indicate experiments with 8 qubits.

Figure 4. Results of single QPU executions (left part of the panels) and of the split and merged results from the full set of available QPUs (see text for details) in terms of the Hellinger distance (

d_{H}

, see Equation (2)) from the ideal case. Points are slightly shifted on the horizontal axis for better readability. Solid red triangles indicate experiments with 5 qubits, and solid blue inverted triangles indicate experiments with 8 qubits.

Figure 5. Measured error for all circuits, divided by split and merge policy combinations while increasing the number of available QPUs. Each bar represents a specific combination of split and merge policies. The red bar corresponds to executing all the shots on a random QPU.

Figure 6. Measured Hellinger distance (

d_{H}

) from ideal, for a Grover algorithm task using different split and merge policies, and for different groups of up to 7 QPUs. The left side of each panel is the same and corresponds to the performance of each considered QPU as taken individually. Solid red triangles indicate experiments with 5 qubits, and solid blue inverted triangles indicate experiments with 8 qubits.

Figure 6. Measured Hellinger distance (

d_{H}

) from ideal, for a Grover algorithm task using different split and merge policies, and for different groups of up to 7 QPUs. The left side of each panel is the same and corresponds to the performance of each considered QPU as taken individually. Solid red triangles indicate experiments with 5 qubits, and solid blue inverted triangles indicate experiments with 8 qubits.

Figure 7. Measured Hellinger distance (

d_{H}

, see Equation (2)) from ideal, for a VQE task using different split and merge policies and for different groups of up to 7 QPUs. The left side of each panel is the same and corresponds to the performance of each considered QPU as taken individually. Solid red triangles indicate experiments with 5 qubits, and solid blue inverted triangles indicate experiments with 8 qubits.

Figure 7. Measured Hellinger distance (

d_{H}

, see Equation (2)) from ideal, for a VQE task using different split and merge policies and for different groups of up to 7 QPUs. The left side of each panel is the same and corresponds to the performance of each considered QPU as taken individually. Solid red triangles indicate experiments with 5 qubits, and solid blue inverted triangles indicate experiments with 8 qubits.

Table 1. Table of QPU emulators considered in this work with some statistical information such as the minimum and the

25 %

,

50 %

(media), and

75 %

percentiles of their unreliability.

Table 1. Table of QPU emulators considered in this work with some statistical information such as the minimum and the

25 %

,

50 %

(media), and

75 %

percentiles of their unreliability.

	Unreliability
QPU Name	min	25% qt	Median (50% qt)	75% qt
ibm_kyoto	0.00071	0.0018	0.0029	0.0035
ibm_brisbane	0.0014	0.0022	0.011	0.016
ibm_osaka	0.0033	0.0062	0.0079	0.071
ibm_sherbrooke	0.00044	0.00084	0.0013	0.0055
simulator_harmony	0.108	0.113	0.114	0.116
simulator_aria-1	0.107	0.112	0.113	0.114
simulator_forte-1	0.093	0.098	0.099	0.100

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Bisicchia, G.; Clemente, G.; Garcia-Alonso, J.; Murillo, J.M.; D’Elia, M.; Brogi, A. Distributing Quantum Computations, Shot-Wise. Future Internet 2025, 17, 507. https://doi.org/10.3390/fi17110507

AMA Style

Bisicchia G, Clemente G, Garcia-Alonso J, Murillo JM, D’Elia M, Brogi A. Distributing Quantum Computations, Shot-Wise. Future Internet. 2025; 17(11):507. https://doi.org/10.3390/fi17110507

Chicago/Turabian Style

Bisicchia, Giuseppe, Giuseppe Clemente, Jose Garcia-Alonso, Juan Manuel Murillo, Massimo D’Elia, and Antonio Brogi. 2025. "Distributing Quantum Computations, Shot-Wise" Future Internet 17, no. 11: 507. https://doi.org/10.3390/fi17110507

APA Style

Bisicchia, G., Clemente, G., Garcia-Alonso, J., Murillo, J. M., D’Elia, M., & Brogi, A. (2025). Distributing Quantum Computations, Shot-Wise. Future Internet, 17(11), 507. https://doi.org/10.3390/fi17110507

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Distributing Quantum Computations, Shot-Wise

Abstract

1. Introduction

2. Related Work

2.1. Fighting the Noise

2.2. Going Beyond the Intermediate Scale

2.3. Distributing Quantum Computations

3. General Framework

3.1. Preliminaries

3.2. General Strategy for Heterogeneous Quantum Computation

3.3. Policies

3.3.1. Calibration—Initial Split

3.3.2. Calibration—Benchmarks

3.3.3. Calibration—Executions

3.3.4. Ranking—Unreliability

3.3.5. Production—Prior Split Weights

3.3.6. Production—Split Strategies

3.3.7. Production—Executions

3.3.8. Production—Merge Strategies

3.3.9. Production—Update Split

3.3.10. Production—Stopping Criterion

3.4. Distance Between Discrete Probability Distributions

3.4.1. Weighted Average Square Hellinger Distance

3.4.2. Mean Integrated Square Error

4. Demonstration Results

4.1. Calibration and Ranking

4.2. Split-Merge Strategies

4.3. Discussion

5. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Optimal Distributions and Weights

Appendix A.1. Optimal Weighted Square Hellinger Distance

Appendix A.2. Optimal Mean Integrated Square Error

Appendix B. Removing the Bias in Distance Estimates

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI