Hybrid Quantum Technologies for Quantum Support Vector Machines

Orazi, Filippo; Gasperini, Simone; Lodi, Stefano; Sartori, Claudio

doi:10.3390/info15020072

Open AccessArticle

Hybrid Quantum Technologies for Quantum Support Vector Machines

Dipartimento di Informatica, Alma Mater Studiorum—University of Bologna, 40126 Bologna, Italy

^*

Author to whom correspondence should be addressed.

Information 2024, 15(2), 72; https://doi.org/10.3390/info15020072

Submission received: 30 November 2023 / Revised: 19 January 2024 / Accepted: 21 January 2024 / Published: 25 January 2024

(This article belongs to the Special Issue Quantum Information Processing and Machine Learning)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Quantum computing has rapidly gained prominence for its unprecedented computational efficiency in solving specific problems when compared to classical computing counterparts. This surge in attention is particularly pronounced in the realm of quantum machine learning (QML) following a classical trend. Here we start with a comprehensive overview of the current state-of-the-art in Quantum Support Vector Machines (QSVMs). Subsequently, we analyze the limitations inherent in both annealing and gate-based techniques. To address these identified weaknesses, we propose a novel hybrid methodology that integrates aspects of both techniques, thereby mitigating several individual drawbacks while keeping the advantages. We provide a detailed presentation of the two components of our hybrid models, accompanied by the presentation of experimental results that corroborate the efficacy of the proposed architecture. These results pave the way for a more integrated paradigm in quantum machine learning and quantum computing at large, transcending traditional compartmentalization.

Keywords:

quantum computing; quantum machine learning; quantum support vector machine; quantum annealling; gate-based quantum computation

1. Introduction

Quantum computing, as an evolving sub-discipline within computer science, amalgamates various research fields, including physics, mathematics, and computer science, among many others. The course of technological advancement has divided the field into two distinct approaches: gate-based quantum computation and adiabatic quantum computation. Although these approaches are theoretically equivalent, their practical utilization diverges significantly. Gate-based quantum computation is studied for a more general-purpose application, whereas adiabatic computation is primarily employed for quantum annealing, an optimization process aimed at determining the minimum of a cost function.

Quantum machine learning (QML) stands as a heavily studied sub-discipline within the framework of quantum computation, it being popular within both the gate-based and adiabatic paradigms. This article focuses on the Support Vector Machine (SVM) model and describes how these two divergent quantum computing approaches seek to implement it. We propose a novel approach to model this problem, combining both technologies.

Our model is based on the observation that the two quantum approaches for the Quantum Support Vector Machine (QSVM) focus on two different parts of the classical approach. Gate-based computation aims to leverage quantum properties to discover a useful kernel for the high-dimensional Hilbert space. On the other hand, the annealing model focuses on optimization that comes after computing the kernel matrix. These are two separate but complementary components of the Support Vector Machine model. The combination of these approaches, if correctly applied, provides advantages over the classical method, as well as each approach on its own.

The following sections perform an in-depth analysis of the current advancements in Quantum Support Vector Machines. It begins with an exposition on classical SVM, then delves into an investigation of the distinct approaches employed in computing QSVM.

1.1. Classical Support Vector Machines

An SVM is a supervised machine learning algorithm designed for both classification and regression tasks. It operates on a dataset

D {(x_{n}, t_{n}) : n = 0, \dots, N - 1}

, where

x_{n} \in R^{d}

represents a point in d-dimensional space, serving as a feature vector, and

t_{n}

denotes the target label assigned to

x_{n}

. We will focus on the classification task and learning a binary classifier that assigns a class label

{\hat{t}}_{n} = \pm 1

to a given data point

x_{n}

. For clarity, we designate the class

t_{n} = 1

as ’positive’ and the class

t_{n} = - 1

as ’negative’.

The training of an SVM entails solving the quadratic programming (QP) problem:

m i n i m i z e E = \frac{1}{2} \sum_{n m} α_{n} α_{m} t_{n} t_{m} k (x_{n}, x_{m}) - \sum_{n} α_{n}

(1)

with

\begin{matrix} 0 \leq α_{n} \leq C a n d \sum_{n} α_{n} t_{n} = 0 \end{matrix}

(2)

For a set of N coefficients

α_{n} \in R

, where C denotes a regularization parameter and

k (\cdot, \cdot)

represents the kernel function of the Support Vector Machine [1,2], the resulting coefficients

α_{n}

establish a

(d - 1)

-dimensional decision boundary that partitions

R^{d}

into two regions corresponding to the predicted class label. The decision boundary is defined by points associated with

α_{n} \neq 0

, commonly referred to as the support vectors of the SVM. Prediction for an arbitrary point

x \in R^{d}

can be accomplished by

f (x) = \sum_{n} α_{n} t_{n} k (x_{n}, x) + b

(3)

where b can be estimated by formula [2]:

b = \frac{\sum_{n} α_{n} (C - α_{n}) [t_{n} - \sum_{m} α_{m} t_{m} k (x_{m}, x_{n})]}{\sum_{n} α_{n} (C - α_{n})}

(4)

Geometrically, the decision function

f (x)

corresponds to the signed distance between the point x and the decision boundary. Consequently, the predicted class label

\hat{t}

for x as determined by the trained Support Vector Machine is given by

\hat{t} = sign (f (x))

.

The problem formulation can be equivalently expressed as a convex quadratic optimization problem [3], indicating its classification among the rare minimization problems in machine learning that possess a globally optimal solution. It is crucial to note that while the optimal solution exists, it is dataset-specific and may not necessarily generalize optimally across the entire data distribution.

Kernel-based SVMs exhibit exceptional versatility as they can obtain nonlinear decision boundaries denoted by

f (x) = 0

. This is achieved through the implicit mapping of feature vectors into higher-dimensional spaces [4]. Importantly, the computational complexity does not escalate with this higher dimensionality, as only the values of the kernel function

k (x_{n}, x_{m})

are involved in the problem specification. This widely recognized technique is commonly referred to as the “kernel trick” and has been extensively explored in the literature [1,2].

The selection of the kernel function significantly influences the outcomes, with radial basis function kernels (RBF) [3] generally serving as the search starting point for the right kernel of a Support Vector Machine problem. An RBF kernel is distinguished by the property that

k (x_{n}, x_{m})

can be expressed as a function of the distance

∥ x_{n} - x_{m} ∥

[1]. The Gaussian kernel, often referred as “The RBF kernel”, is the most prevalent RBF kernel and is represented as

r b f (x_{n}, x_{m}) = e^{- η | | x_{n} - x_{m} {| |}^{2}}

(5)

Here, the value of the hyperparameter

η > 0

is typically determined through a calibration procedure before the training phase [5].

The SVM are very susceptible to the choice of the hyper-parameters, like

η

or C, and different assignments can radically change the result of the optimization.

1.2. Annealer-Based Quantum Support Vector Machines

Quantum computers leverage diverse hardware approaches and technologies, with quantum annealing being one notable paradigm.

Quantum annealing commences by establishing a quantum-mechanical superposition of all possible states, followed by the system’s evolution governed by the time-dependent Schrödinger equation. The amplitudes of all states undergo continuous changes and, if the rate of change is sufficiently slow, the system remains in the ground state of the instantaneous Hamiltonian, characterizing adiabatic quantum computation [6]. In the case of insufficiently slow changes, the system may leave the ground state temporarily, while at the same time producing a higher likelihood of concluding in the ground state of the final problem Hamiltonian in a diabatic quantum computation [7,8]. Quantum annealers efficently solve problems formulated in either Quadratic Unconstrained Binary Optimization (QUBO) or Ising formulations.

Since we know how to formulate Support Vector Machines as convex quadratic optimization problems, a simple transformation is enough to achieve a QUBO formulation suitable for quantum annealing.

In a study by Willsch et al. [9], the authors introduced and explored the application of kernel-based Support Vector Machines on a DW2000Q quantum annealer [10]. From here on out, we refer to this methodology as Quantum Annealer Support Vector Machines (QaSVM). This approach offers distinct mathematical advantages, generating a spectrum of different classifiers with each iteration of the training process.

The study’s findings demonstrate that the ensemble of classifiers produced by the quantum annealer can surpass the single classifier derived from classical SVMs when addressing the same computational problem. Performance is assessed through metrics such as Area Under the Receiver Operating Characteristic curve (AUROC), Area Under the Precision–Recall curve (AUPRC) [11,12], and accuracy. This advantage is attributed to the quantum annealer’s ability to yield not only the global optimum for the training dataset but also a distribution of solutions close to optimality for the given optimization problem. The potential to combine these solutions enhances generalization to the test dataset.

Additional studies [13,14] corroborate the efficacy of QaSVMs and extend their promise to diverse problems, including multiclass classification.

1.3. Gate-Based Quantum Support Vector Machines

Gate-based quantum computers operate within the quantum circuit model, where quantum computation involves initialization of qubits to known values, the application of a sequence of quantum gates, and measurements. While this approach is broader in scope than the one outlined in Section 1.2, it presents formidable challenges in both hardware and algorithm development.

Quantum machine learning is predominantly studied on gate-based quantum computers, meaning there is more research on the QSVM model. THe quantum machine learning (QML) models exhibit diverse architectures and ansatz configurations for processing data. Even so, most models share a common initial step wherein classical data undergoes encoding in the Hilbert space, known as feature mapping. Schuld and Killoran [15] showed the equivalence of this phase to a quantum kernel and proposed one of the initial gate-based Support Vector Machine models (QgSVM). Their work outlines two approaches: firstly, they introduce the quantum kernel as a means to process classical data, transforming it into new data with different dimensionality analogous to a classical kernel, which is then utilized in classical algorithms such as SVM. Secondly, they propose a parametric circuit for classifying input in a quantum environment. While this feature map–ansatz approach has become standard for Quantum Neural Networks (QNNs), a detailed discussion exceeds the scope of this document.

Research by Maria Schuld [16] aggregates results from various sources and concludes that all supervised quantum machine learning models (excluding generative ones) operate as kernel methods.

Another noteworthy contribution to the state of the art in gate-based QSVMs is the work by Havlicek et al. [17]. They introduce two SVM classifiers optimizing classically over data obtained using a quantum kernel trick to achieve quantum advantage. Similar to previous approaches, this involves non-linearly mapping data to a quantum state:

ϕ : \tilde{x} \in Ω \overset{}{\to} | ϕ (\tilde{x}) 〉 〈 ϕ (\tilde{x}) |

(6)

The authors also highlight a crucial requirement for applying QgSVM: if the feature vector kernel

K (\tilde{x}, \tilde{z}) = | 〈 Φ (\vec{x}) | Φ (\vec{z}) 〉 |^{2}

is overly simplistic, such as generating only product states, it can be easily computed classically and loses its benefit. The advantage lies in leveraging the high dimensionality of the Hilbert space, necessitating a kernel that is impossible to simulate to surpass classical approaches.

2. Materials and Methods

In the outlined quantum computing landscape, we introduced two distinct architectures for employing Support Vector Machines in classification tasks. Both models, still in their early stages, warrant further thorough investigation and testing.

Quantum Annealer Support Vector Machines (QaSVM) demonstrate partial efficacy in addressing the problem, yet encounter challenges in precisely determining the optimal hyperplane. Additionally, QaSVM relies on a classical kernel trick to compute the

k (x_{n}, x_{m})

component of the formulation.

Conversely, Quantum gate-based Support Vector Machines (QgSVM) exhibit impressive data manipulation capabilities. However, they currently lack the readiness to handle the optimization aspect of the SVM algorithm. Present-day quantum technology does not possess the computational power necessary for intricate optimization problems. Consequently, the optimization step necessitates an approximate approach, achieved either through an ansatz (transitioning from QgSVM to a Quantum Neural Network, QNN model) or by resorting to classical optimization methods.

Our proposed approach involves a fusion of the quantum annealing and gate-based models, establishing a connection through a classical channel. The core idea is to capitalize on the strengths of each approach: annealing excels in optimization but lacks a dedicated kernel method, while the gate-based model performs well with the kernel trick but faces challenges in optimization.

Building upon our previous discussions, we recognize the ability to derive the Kernel Matrix from a quantum feature map within the gate-based architecture. Once the Kernel matrix is defined, we employ a standard procedure to reformulate the problem into a Quadratic Unconstrained Binary Optimization (QUBO) format, achieved by discretizing the continuous variables of a quantum computing problem. Subsequently, the QUBO formulation is encoded into the annealer, enabling the extraction of minima, as previously described. The illustrated process, as depicted in Figure 1, allows the attainment of results for even challenging problems, leveraging the combined advantages of both quantum technologies.

2.1. From Classical SVM to QaSVM

We proceed to illustrate the translation of the Support Vector Machine into a form suitable for solving with Quantum Annealers. The complete formulation is detailed in the work by Willsch et al. [9].

The initial challenge in this transformation arises from the fact that, by definition,

α_{n} \in R

while quantum annealers are only capable of producing discrete binary values. A straightforward resolution to this challenge involves binarizing the values of

α

as follows:

α = \sum_{k}^{K - 1} B^{k} a_{K n + k}

(7)

where

a_{K n + k} \in {0, 1}

, B represents the chosen basis and K denotes the number of variables

a_{k}

used for representing

α

. We proceed by substituting

α

with a and utilizing Equation (7) as a constraint, multiplied by a factor

ξ

:

\begin{matrix} E = & \frac{1}{2} \sum_{n m k j} a_{K n + k} a_{K m + j} B^{k + j} t_{n} t_{m} k (x_{n}, x_{m}) \end{matrix}

\begin{matrix} - \sum_{n k} B^{k} a_{K n + k} + ξ {(\sum_{n k} B^{k} a_{K n + k} t_{n})}^{2} \end{matrix}

(8)

\begin{matrix} = & \sum_{n, m = 0}^{N - 1} \sum_{k, j = 0}^{K - 1} a_{K n + k} {\tilde{Q}}_{K n + k, K m + j} a_{K m + j} \end{matrix}

(9)

where

\tilde{Q}

is a matrix of

K N \times K N

given by

{\tilde{Q}}_{K n + k, K m + j} = \frac{1}{2} B^{k + j} t_{n} t_{m} (k (x_{n}, x_{m}) + ξ) - δ_{n m} δ_{k j} B^{k}

(10)

Naturally, given that

\tilde{Q}

is symmetric, the upper triangular Quadratic Unconstrained Binary Optimization (QUBO) matrix Q we seek is defined as

Q_{i j} = {\tilde{Q}}_{i j} + {\tilde{Q}}_{j i}

for

i < j

, and

Q_{i i} = {\tilde{Q}}_{i i}

.

The last operation concludes the problem formulation so that it becomes suitable for a quantum annealer.

We do not delve into the embedding of the problem into the current available quantum annealing system for two primary reasons: Firstly, embedding is inherently tied to the architecture and could evolve with technological advancements, while the QUBO formulation remains constant. Secondly, in our experiments, we entrusted the embedding to the library’s methods since they are optimized for this operation [18].

A pivotal aspect in formulating our new model involves recognizing that in Equation (10), all components are either hyperparameters or labels, with the exception of the function

k (\cdot, \cdot)

, traditionally a classical step. In the literature, various classical functions have been chosen for k, but our proposal is to use a quantum kernel from gate-based quantum computation.

Throughout our experiments, we utilized the Advance 4.1 machine from D-Wave [19] for all annealing experiments (more detail on the architecture can be found in Appendix B).

2.2. From Classical to Quantum Kernel

As previously discussed, the Gaussian kernel is the most used radial basis function (RBF) kernel. It facilitates the computation of the similarity between each pair of points in a higher-dimensional space while circumventing explicit calculations. Once an RBF kernel is selected, the similarities between each pair of points are computed and stored in a kernel matrix. Due to the symmetry of the distance function, it naturally follows that both the kernel matrix and, subsequently,

\tilde{Q}

are symmetric as well.

In our proposal, we assume that the kernel is a Quantum Kernel. This approach allows us to leverage the exponentially higher dimension of the Hilbert space to separate the data effectively.

For the sake of simplicity, in the experiment we propose as a proof of concept, we employ the simple algorithm of the quantum inner product to compute the similarity between two vectors in the Hilbert space (Figure 2). Consider two points

x_{i}

and

x_{j}

along with their respective feature maps A and B, such that

| x_{i} 〉 = A | 0 〉

and

| x_{j} 〉 = B | 0 〉

on a M qubit register. Our objective is to derive the similarity s between the two through the quantum inner product:

\begin{matrix} B^{†} A | 0 〉 & = a_{0} | 0 〉 + \sum_{k = 1}^{2^{M} - 1} a_{k} | k 〉 \end{matrix}

(11)

\begin{matrix} a_{0} = 〈 0 | & B^{†} A | 0 〉 = 〈 x_{j} | x_{i} 〉 = s \end{matrix}

(12)

where |k〉 and

a_{k}

are, respectively, the kth standard basis vector and its coefficient. It is evident that if

x_{i} = x_{j}

, then

a_{0}

is equal to 1, while if they are orthogonal to each other, the result will be 0. This behavior describes a similarity metric in a high-dimensional space, precisely what we sought to integrate into our Quantum Support Vector Machine.

The final step to complete the QgSVM involves determining two feature maps, A and B, capable of encoding the data in the Hilbert space. To obtain the quantum inner product between vectors, we employ the same feature map technique for both A and B. Additionally, given that each vector

x_{i}

for all

i \in {0, \dots, N - 1}

possesses the same number of features

γ

, it follows that the number of qubits in every circuit used for gate-based computation depends solely on

γ

and the chosen feature map technique.

In our study, we opt for one of the widely employed feature map techniques known as the Pauli expansion circuit [17]. The Pauli expansion circuit, denoted as U, serves as a data encoding circuit that transforms input data

\vec{x} \in R^{N}

, where N represents the feature dimension, as

U_{Φ (\vec{x})} = exp (i \sum_{S \in I} ϕ_{S} (\vec{x}) \prod_{i \in S} P_{i}) .

(13)

Here, S represents a set of qubit indices describing the connections in the feature map,

I

is a set containing all these index sets, and

P_{i} \in {I, X, Y, Z}

. The default data mapping is

ϕ_{S} (\vec{x}) = {\begin{matrix} x_{i} if S = {i} \\ \prod_{j \in S} (π - x_{j}) if | S | > 1 \end{matrix} .

(14)

This technique offers various degrees of freedom, including the choice of the Pauli gate, the number of circuit repetitions, and the type of entanglement. In our implementation, we select a second-order Pauli-Z evolution circuit, a well-known instance of the Pauli expansion circuit readily available in libraries such as Qiskit [20]. A more detailed formulation will be provided in the next subsection.

Our decision to use this specific feature map aligns with prior research by Havlicek et al. [17], where they leverage a Pauli Z expansion circuit to train a Quantum gate-based Support Vector Machine (QgSVM). It is essential to note that this choice is arbitrary and the feature map can be substituted with a variety of circuits. One important constraint applied to the feature map circuits is that, in a real-world implementation, they must encode data in a manner not easily simulated by classical computers. This requirement is necessary, though not sufficient, for achieving a quantum advantage.

Given that the focus of this paper is on proposing a new methodology rather than experimentally proving quantum advantage, we opted to simulate the quantum gate-based environment on classical hardware during the experimental phase.

2.3. Feature Map

As previously discussed, the feature map we chose for our experiments is a variation of the Pauli expansion gate known as the ZZ feature map. We employ a linearly entangled, single repetition, N-qubit ZZ feature map, illustrated in Figure 3. The rotation gates in this context involve rotations around the Z-axis:

R_{Z} (θ) = (\begin{matrix} e^{- i θ} & 0 \\ 0 & e^{i θ} \end{matrix})

(15)

The function

ϕ (\cdot)

is the one described in Equation (14) that assumes the default forms depending on how many parameters are passed:

\begin{matrix} ϕ (x) = x \\ ϕ (x_{0}, x_{1}) = (π - x_{0}) (π - x_{1}) \end{matrix}

(16)

Once the kernel is constructed, it can generates the similarity score between any pair of data points. We utilize this kernel to compile a kernel matrix K, where

K_{n, m} = k (x_{n}, x_{m})

, similar to the classical SVM approach. As mentioned earlier, the matrix is symmetric over the main diagonal since

k (x_{n}, x_{m}) = k (x_{m}, x_{n})

.

It is crucial to highlight the computational bottleneck in this methodology: obtaining the similarity score involves repeated executions of the circuit to derive the probability. Once we have the probability

p_{0}

of the state

{| 0 〉}^{\otimes n}

, we can approximate s to a value

\hat{s}

, which, for our purposes, serves as a valid proxy for s. Our proposal addresses this problem by generating an ensemble of models that compute smaller K. This approach is a standard practice in classical SVM, as documented in [21]. It has been demonstrated to enhance generalization and decrease the computational cost in the kernel part of the algorithm.

2.4. Experiments

Our experiments serve as a demonstration of the viability of the proposed methodology. All experiments involve binary classification problems conducted on pairs of classes from the MNIST dataset [22], following preprocessing. Due to the limited capabilities of current quantum computers and simulations, and to maintain simplicity for demonstration purposes, we applied Principal Component Analysis (PCA) [23] to each image, considering only the first two principal components as input to the model (Table 1 reports the explained variance). The resulting values are then mapped into the range

x_{i} \in [0, π / 2]

to fully exploit the properties of the Quantum gate-based Support Vector Machine (QgSVM). Simultaneously, the labels are adapted to the Quantum Annealer-based Support Vector Machine (QaSVM) standard, where

y_{i} \in {- 1, 1}

.

We present four different experiments utilizing class pairs 0–9, 3–8, 4–7 and 5–6; referred to as MN09, MN38, MN47, and MN56, respectively. In each experiment, we selected 500 data points with balanced classes and divided them into a training set (300 data points) and a test set (200 data points), each with two features (referred to as

γ

).

The experiments are divided into three phases: hyperparameter tuning, training, and testing.

Hyperparameter Tuning: Initially, a 4-fold Monte Carlo cross-validation is performed on the training set for hyperparameter tuning. The optimized hyperparameters include K, B, and

ξ

from Equation (8). While an exhaustive search on the hyperparameter space is beyond the scope of this work, additional information and results on the effect of hyperparameter tuning can be found in [9]. To assess the performance of each model, we compute the Area Under the Receiver Operating Characteristic curve (AUROC) and the Area Under the Precision–Recall curve (AUPCR), as shown in Figure 4.

Training: Once the optimal hyperparameters are determined, we instantiate the best model for each dataset and train it on the entire training set. Following the original proposal by Willsch et al. [9], and to address the challenge of limited connectivity in the annealing hardware, the entire training set is divided into six small, disjoint subsets (referred to as folds), each containing approximately 50 samples. The approach involves constructing an ensemble of quantum weak Support Vector Machines, with each classifier trained on one of the subsets.

This process unfolds in two steps. First, for each fold, the top twenty solutions (qSVM

(B, K, ξ) # i

where

i \in {1, \dots, 20}

is the index of the best solutions) obtained from the annealer are combined by averaging their respective decision functions. Since the decision function is linear in the coefficients, and the bias for each fold l, denoted as

b^{(l, i)}

, is computed from

α_{n}^{(l, i)}

using Equation (4), this step effectively produces a single classifier with an effective set of coefficients given by

α_{n}^{(l)} = \sum_{i} α_{n}^{(l, i)} / 20

and bias given by

b^{l} = \sum_{i} b^{(l, i)} / 20

Second, an average is taken over the six subsets. It is important to note that the data points

{(x_{n}, y_{n})}^{l}

are unique for each l. The complete decision function is expressed as

F (x) = \frac{1}{L} \sum_{n l} α_{n}^{(l)} y_{n}^{(l)} k (x_{n}^{(l)}, x) + b

(17)

where L is the number of folds and

b = \sum_{l} b^{(l)} / L

. Similar to the formulation showed before, the decision for the class label of a point

x

is determined by

\tilde{t} = sign (F (x))

.

Testing: After the training concludes, we test the models on the 200 data points of the test set. Similar to the training phase, a total of 6 weak SVMs, each resulting from the combination of the 20 annealer’s solutions, are utilized to determine the class of the data point.

A visual representation of the process can be found in Figure 5, while in Appendix C a toy example with the visualization of the kernel and QUBO matrix can be found.

3. Results

In this section, we present the results obtained from the new methodology. As discussed earlier, these experiments should be viewed as a proof of concept and as evidence of the correctness of the methodology, rather than an exhaustive search for the best possible model. We searched over a narrow hyperparameter space to obtain a model slightly superior to the most naive approach. For each dataset, we performed hyperparameters tuning in the space

B = {2, 10}

,

K = {2, 3}

, and

ξ = {1, 2}

, where B is the basis used to represent

α

(Equation (7)), K is the number of

a_{k}

used to represent

α

(Equation (7)), and

ξ

is the coefficient used when adding the constraint in Equation (8). The results of the tuning phase are reported in Appendix A, while the best hyperparameter for each model is shown in Table 2.

It is noteworthy to emphasize that the choice of hyperparameters can significantly impact the results, even with a relatively shallow search as the one we performed. The effect is particularly evident for the dataset MN38 in Table A2, where the standard deviation between the data of every validation column is close to 0.1, while the difference between the maximum and minimum scores is more than 0.22 points in each validation column.

Once we have selected the best hyperparameters for each model, we proceed training six weak classifiers, each composed of 20 solutions returned by the quantum annealer. The combination of these six can be regarded as one model trained on the entire dataset. Finally, the model is tested on the test set. The results of each model are reported in Table 3, while Figure 6 illustrates the predictions on the test set for each dataset, highlighting the errors.

The results are consistently positive, showcasing the models’ high accuracy in predicting the classes of the datasets. Some errors can be attributed to the limited information contained in only two features, even though these features are the most expressive in Principal Component Analysis.

As hypothesized during formulation, the experiments provide evidence of the effectiveness of this approach and highlight the synergistic power of the two quantum technologies when employed together.

4. Discussion

The results obtained from our Hybrid Quantum Support Vector Machine (Hybrid QSVM) showcase great promise across various aspects. Remarkably, we achieved favorable outcomes using only two features derived through Principal Component Analysis (PCA) while harnessing both quantum gate-based technology (simulated) and quantum annealing (real hardware).

As previously mentioned, our experimental focus was not centered on seeking the optimal model or comparing its performance against classical approaches. Instead, our primary goal was to experimentally validate the viability of our proposed approach. We deliberately invested minimal effort in performance optimization, aside from a modest hyperparameter tuning step that surpassed the naive approach. Consequently, these results serve as a baseline for potential future enhancements to the model.

During the tuning phase, we concentrated solely on hyperparameters related to the annealing component of the algorithm (specifically, B, K, and

ξ

), without delving into the gate-based part. This decision is motivated by three main factors.

Firstly, the existing literature on Quantum gate-based Support Vector Machines (QgSVM) is more developed compared to Quantum annealing Support Vector Machines (QaSVM), making exploration of the latter more pertinent from a research standpoint.

Secondly, we had access to real annealing hardware through the D-Wave Leap program [24]. Although access to real gate-based hardware is possible through the IBM Quantum Initiative, leveraging both technologies concurrently was impractical due to extended wait times in the hardware queues. Moreover, given our constraint of limiting each data point to two features, a two-qubit gate-based quantum hardware was sufficient for simulating the quantum kernel, further simplifying the process.

Third, and directly linked to the point above, we considered that an easy-to-simulate quantum kernel cannot surpass classical performance. This means that one of the reasons why the QgSVM in our experiment performed sub-optimally is because we could not access hard-to-simulate quantum hardware. For this reason, optimizing its hyperparameters was deemed futile. The same applies to the circuit design, we decided to keep it fixed for the whole experimental phase, since any 2-qubit circuit (with depth within reason) can be simulated.

It is essential to clarify that the three reasons outlined above pertain to the experimental setup and not the specific model’s performance on the datasets. While tuning different hyperparameters or exploring more values of the already optimized ones might enhance performance on specific datasets, it deviates from the primary objective of this paper.

When talking about the goal of this proposal and the advantages it brings, we need to highlight that the quantum annealer is on the verge of addressing industrial optimization challenges [24]. Simultaneously, the inherent variational nature of the QgSVM positions it favorably for the Noisy Intermediate-scale Quantum (NISQ) era. The convergence of these two approaches is nearing completion, offering several potential benefits. QgSVM can leverage the exponentially large Hilbert space

H

to more effectively capture similarities and distances between points than classical kernel tricks. This allows it to distinguish between points that are traditionally considered challenging. Moreover, the quantum annealer’s speed surpasses classical optimization methods, providing a solution quicker.

Recent studies on Quantum gate-based Support Vector Machines (QgSVM) have highlighted certain theoretical limitations under specific assumptions. In [25], the authors connect the exponential dimension of the feature space to a limitation in the model’s ability to generalize. One proposed solution, discussed both theoretically [26] and empirically [27], involves introducing a new hyperparameter to regulate the bandwidth.

We argue that our proposal can further enhance the generalization ability of QgSVM. Building on the findings from the original QaSVM study [9], we know that Quantum annealing Support Vector Machines can outperform the single classifier obtained by classical SVM optimization in the same computational problem, and this can be directly translated to an improvement on the QgSVM since its optimization is done classically. This improvement stems from the ensembling technique used and the intrinsic ability of the annealer to generate a distribution of solutions close to optimal, thereby enhancing generalization (similar results can be found in [28,29] for different annealing machine learning models). Moreover, since our proposal is not tied to a specific implementation, it can employ various strategies to mitigate the limitations of QgSVM like the one discussed above.

Future work in this direction could involve a theoretical analysis of these features.

5. Conclusions

In the realm of quantum computing, the conventional approach involves selecting one technology and working within its framework, often neglecting the potential synergies that could arise from mixing different quantum technologies to harness their respective strengths. In this paper, we propose a novel version of Quantum Support Vector Machines (QSVM) that capitalizes on the advantages of both gate-based quantum computation and quantum annealing.

Our proposal positions itself within the context of Quantum Support Vector Machines (QSVM) from both technological points of view improving on both techniques. We enhance the generalization ability of standard QgSVM through the ensemble technique and leverage the intrinsic capability of Quantum annealing Support Vector Machines (QaSVM) to generate a distribution of suboptimal solutions during optimization within constant annealing time.

The ensemble method, similarly to its classical counterpart, not only enhances generalization but also reduces the computation time of the kernel matrix and the accesses to the quantum computer. For a dataset X with n samples and

γ

features, the full kernel matrix is composed of

n^{2} / 2

elements, implying

O (s n^{2})

calls to the (gate-based) quantum computer where s is the number of shots needed to accuratly compute the similarity score. Among other important factors [30], s is strongly linked to the dimensionality of the Hilbert space [26] that in turn depends on the kernel implementation and

γ

. Our ensemble of m Kernels quadratically reduces the needed number of accesses to

O (s {(n / m)}^{2})

while only needing

O (m)

accesses to the quantum annealer. In this context, m becomes a new hyperparameter and its value is highly dependant on the specific problem and the implementation in question.

To conclude, we can summarize our contribution in three points.

First, we introduce a method that combines quantum technologies, leveraging the complementarity between gate-based quantum computation and quantum annealing in the context of Support Vector Machines. While previous research has touched on the integration of these approaches for solving large-size Ising problems [31], to the best of our knowledge, our work is the first to explore this fusion in the context of classification problems.

Second, we provide experimental validation of our approach. Notably, the annealing component of our experiments is conducted on real quantum hardware, demonstrating the feasibility of our methodology in the early era of Noisy Intermediate-Scale Quantum (NISQ) computation.

Third, we establish a baseline result for future hybrid technology approaches, particularly on one of the most widely used datasets in classical machine learning.

Future work in this area is straightforward. As quantum technologies advance, our approach should undergo testing on quantum hardware to validate its capabilities. Additionally, there is a need for more extensive hyperparameter tuning to achieve optimal results. We recommend exploring and proposing different quantum kernel methods based on the characteristics of the data to further enhance performance. Simultaneously, a rigorous proof of the generalization power of our model is essential to prove the positive interaction of the technologies in a more rigorous manner.

It is crucial to note that the implementation of our methodology is not rigidly tied to the specific formulations described here. Rather, it adapts to the state of the art for each technology. The core of our proposal lies in the fusion of approaches, allowing for flexibility as advancements occur in individual technologies.

Quantum machine learning (QML) currently stands as a focal point of research, offering the potential to revolutionize various applications through the utilization of quantum computational power and innovative algorithmic models like Variational Algorithms. Similar to any scientific field, as it expands, new limitations come to light. However, simultaneously, researchers develop techniques to overcome and mitigate these restrictions. Despite the increasing attention, the field is still in its early stages, necessitating further exploration to unveil its practical benefits.

Author Contributions

Conceptualization, F.O.; Formal analysis, F.O. and S.G.; Investigation, F.O. and S.G.; Methodology, F.O.; Software, F.O. and S.G.; Supervision, S.L. and C.S.; Validation, F.O.; Writing—original draft, F.O.; Writing—review and editing, F.O. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All the data are publicly available. The dataset used for this study is the well-known MNIST. The code needed to reproduce the experiment is contained in the pubblic repository at https://github.com/filorazi/Hybrid_Quantum_Technologies_For_Quantum_Support_Vector_Machines/, (accessed on 30 November 2023).

Acknowledgments

We want to acknowledge the great help received by the people at the Julich research center. In particular thanks to Dennis Willsch and Gabriele Cavallaro that allowed the use of their code for our experiments.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

QUBO	Quadratic Uncontraint Binary Optimization
SVM	Support Vector Machine
NISQ	Noisy Intermediate-Scale Quantum
QaSVM	Quantum annealing Support Vector Machine
QgSVM	Quatum gate-based Support Vector Machine

Appendix A. Hyperparameter Tuning

This appendix contains the hyperparameter tuning tables for all datasets Table A1, Table A2, Table A3 and Table A4. Each table contains all combination of the three hyperparameters and for each combination presents the results on six different metrics: Accuracy on training set, Area under ROC on training set, Area under PCR on training set, Accuracy on validation set, Area under ROC on validation set, and Area under PCR on validation set. When choosing the best hyperparameters, we consider the AUPCR metric on the validation set.

Table A1. Results of hyperparameter tuning for the MN09 dataset. The columns B, K, and

ξ