Support Vector Machines with Quantum State Discrimination

: We analyze possible connections between quantum-inspired classiﬁcations and support vector machines. Quantum state discrimination and optimal quantum measurement are useful tools for classiﬁcation problems. In order to use these tools, feature vectors have to be encoded in quantum states represented by density operators. Classiﬁcation algorithms inspired by quantum state discrimination and implemented on classic computers have been recently proposed. We focus on the implementation of a known quantum-inspired classiﬁer based on Helstrom state discrimination showing its connection with support vector machines and how to make the classiﬁcation more efﬁcient in terms of space and time acting on quantum encoding. In some cases, traditional methods provide better results. Moreover, we discuss the quantum-inspired nearest mean classiﬁcation.


Introduction
Support vector machines are becoming popular in a wide variety of applications [1]. They are supervised learning models with associated algorithms (such as sub-gradient descent and coordinate descent) that analyze data for classification [2]. A support vector machine (SVM, for short) learns by examples to assign labels to feature vectors. An object with a feature vector is treated through a kernel function as a point in a larger space and the goal is to find the maximum-margin separating hyperplanes that allow one to partition the space and divide the points into classes to which labels are assigned. The logic behind the kernel function of an SVM and the kernel methods in general turns out to be rather similar to what is seen in quantum computing when one performs an encoding of classical data into quantum states. In fact, quantum computing provides implicit computations in high-dimensional Hilbert spaces by means of the physical manipulation of quantum systems, as well as kernel methods provide implicit computation in a higher dimensional feature space by means of the efficient representation of inputs. The interpretation of quantum encodings as feature maps with relevance in quantum machine learning is wellestablished [3,4], and it is one of the crucial points of this work. In addition to the general connection between kernel methods and quantum computing, there are the expressive quantum approaches to SVM in terms of implementations of this model on quantum computers. For instance, in [5], the authors propose a discretized SVM whose training is performed applying the Grover algorithm. The celebrated proposal of quantum SVM by Rebentrost,Mohseni and Lloyd [6] is based on data retrieval from a quantum random access memory, the quantum phase estimation algorithm and the SWAP test. The resulting quantum algorithm allows a direct implementation of polynomial kernels in terms of tensor products of the quantum states encoding the training vectors. Furthermore, a quantum implementation of SVM on a quantum annealer has been recently proposed [7].
Quantum structures can be used to devise novel machine learning algorithms that do not require quantum hardware in the sense that the mathematical formalism of quantum mechanics is applied to deal with data that are managed by classical computers. The so-called quantum-inspired machine learning is based on particular kinds of information storing and processing defined by means of objects from the quantum formalism that do not necessarily represent physical quantum systems. In the context of quantum-inspired machine learning, SVM has been studied [8], and the present work focuses on a quantuminspired classification algorithm that turns out to be similar to an SVM.
The general idea of classification algorithms based on discrimination of quantum states is also supported by recent experimental works on quantum state classification based on classical machine learning methods, such as the proposals in [9][10][11][12][13], for instance. In [14], the authors demonstrate a machine learning approach to construct a classifier of quantum states training a neural network. In [15], convolutional neural networks and principal component analysis are applied to classify polarization patterns in quantum optics.
An interesting quantum-inspired binary classification algorithm has been introduced in [16] in terms of a nearest mean classifier based on trace distance between density operators encoding feature vectors. To handle multi-class with this binary classifier, there are different techniques: one against one, which constructs a classifier for each pair of classes, one against all, which builds one per class, hierarchical classification, which creates a tree, where the leaves correspond with the classes. Another quantum-inspired supervised machine learning algorithm for multi-class classification based on so-called pretty good measurement has been proposed in [17], generalizing the Helstrom quantum state discrimination [18] that can be used for binary classification. Classification accuracy of this quantum-inspired multi-class classifier can be improved by increasing the number of copies of the quantum state that encodes the feature vector, at the cost of increasing the computational space and time.
In this paper, we analyze possible connections between support vector machines and quantum-inspired classifications using a geometric approach. In particular, considering a quantum encoding of classical data in terms of Bloch vectors of density operators, we observe that the execution of the Helstrom classifier is analogous to an SVM with a linear kernel. In Section 2, we give a short introduction to some quantum fundamentals that are relevant in the present work, such as the Bloch representation for quantum states. Moreover, we review the application of Helstrom state discrimination for binary classification. In Section 3, we analyze quantum state discrimination for binary classification encoding data into Bloch vectors, some empirical results in this regard and the code in Mathematica are presented in Appendix A. In particular, we highlight the SVM-like behavior of the Helstrom classification for a two-feature dataset considering the encoding in a bi-dimensional Hilbert space and an encoding into a space of enlarged dimension. In Section 4, we discuss the general strategy of implementing a nearest mean classifier based on an operator distance between quantum states such as trace distance and Bures distance. In Section 5, we present some numerical results obtained by the implementation of the Helstrom classifier. In Section 6, we draw some final comments.

Basics
The set of density matrices on the (finite-dimensional) Hilbert space H is given by S(H) = {ρ ∈ B + (H) : trρ = 1}, where B + (H) is the set of positive semidefinite operators on H. The set S(H) is convex and its extreme elements, the pure states, are rank-1 orthogonal projectors. A pure state has general form ρ = |ψ ψ|, and it can then be directly identified with the unit vector |ψ ∈ H up to a phase factor.
The bases of the real space of Hermitian matrices on C d can be used to decompose density matrices associated with states of a quantum system described in a d-dimensional Hilbert space. A fundamental basis for qubits (dim H = 2) is formed by the three Pauli matrices and the 2 × 2 identity matrix. In this case, any density matrix can be represented by a three-dimensional vector, the Bloch vector, that lies within the unit ball in R 3 whose boundary is the Bloch sphere. The points on the spherical surface are in bijective correspondence with the pure states. In higher dimensions, the set of quantum states is a convex body with a much more complicated geometry and it is no longer simply represented as a unit ball. In general, for any j, k, l such that 1 ≤ j ≤ d 2 − 1, 0 ≤ k < l ≤ d − 1, the generalized Pauli matrices σ j on C d can be defined as follows: represent the Bloch vector associated to ρ with respect to the basis {I d , σ j : 1 ≤ j ≤ d 2 − 1}, which lies within the hypersphere of radius 1. For d > 2, the points contained in the unit hypersphere in R d 2 −1 are not in bijective correspondence with quantum states on C d such as in the case of a single qubit. However, any vector within the closed ball of radius 2 d gives rise to a density operator. From the physical viewpoint, the Bloch vector has real components that can be expressed as expectation values of measurable quantities. For d = 3, the generalized Pauli matrices are the Gell-Mann matrices and the Bloch vector can be expressed as expectation values of spin 1 operators.
A complex vector can be encoded into a density matrix. For instance, a quantum encoding (the amplitude encoding) is given by: where {|i } i=0,...,n is the computational basis of the (n + 1)-dimensional Hilbert space H, identified as the standard basis of C n+1 . The map defined in (2) encodes x into the pure state ρ x = |x x|, the additional component of |x stores the norm of x. Generally speaking, a quantum encoding is any procedure to encode classical information (e.g., a list of symbols) into quantum states. In this paper, we consider encodings of vectors in C n and R n into density matrices on a Hilbert space H whose dimension depends on n.
In [17], there is the proposal of a quantum-inspired classification algorithm based on a generalization of the Helstrom measurement, the so-called Pretty Good measurement, for quantum state discrimination. Let us focus on the case of binary classification of n-dimensional complex feature vectors; the algorithm is based on the following three ingredients: (1) a quantum encoding of the feature vectors C n x → ρ x ∈ S(H); (2) the construction of the quantum centroids of the two classes C 1 and C 2 of training points: (3) application of the Helstrom discrimination on the two quantum centroids in order to assign a label to a new data instance. Let us briefly introduce the notion of quantum state discrimination that is central in the present work. Given a set of arbitrary quantum states with respective a priori probabilities R = {(ρ 1 , p 1 ), ..., (ρ N , p N )}, in general there is no measurement process that discriminates the states without errors. More formally, there does not exist a collection of effects E = {E i } i=1,...,N ⊂ B + (H) such that ∑ N i=1 E i = I satisfying the following property: tr(E i ρ j ) = 0 when i = j for all i, j = 1, ..., N. In some particular cases, the states can be exactly discriminated, for example if we have a set of orthogonal pure states {|ψ 1 , ..., |ψ N }, we can discriminate them without errors by means of the corresponding von Neumann measurement {|ψ i ψ i |} i=1,...,N . Returning to the general set R, the probability of a successful state discrimination performing the measurement E is: An interesting and useful task is finding the optimal measurement that maximizes the probability (4). In [18], the author presents a complete characterization of the optimal measurement E opt for R = {(ρ 1 , p 1 ), (ρ 2 , p 2 )}. E opt can be constructed as follows: Let Λ : = p 1 ρ 1 − p 2 ρ 2 be the Helstrom observable whose positive and negative eigenvalues are, respectively, collected in the sets D + and D − . Consider the two orthogonal projectors: where P λ projects onto the eigenspace of λ. The measurement E opt : = {P + , P − } maximizes the probability (4) that attains the Helstrom bound h b (ρ 1 , ρ 2 ) = p 1 tr(P + ρ 1 ) + p 2 tr(P − ρ 2 ). Helstrom quantum state discrimination can be used to implement a binary classifier [17]. Let {(x 1 , y 1 ), ..., (x M , y M )} be a training set with y i ∈ {1, 2} ∀i = 1, ..., M. Once a quantum encoding C n x → ρ x ∈ S(H) has been selected, one can construct the quantum centroids ρ 1 and ρ 2 as in (3) of the two classes C 1,2 = {x i : y i = 1, 2}. Let {P + , P − } be the Helstrom measurement defined by the set R = {(ρ 1 , p 1 ), (ρ 2 , p 2 )}, where the probabilities attached to the centroids are p 1,2 = |C 1,2 | |C 1 |+|C 2 | . The Helstrom classifier applies the optimal measurement for the discrimination of the two quantum centroids to assign the label y to a new data instance x, encoded into the state ρ x , as follows: A strategy to increase the accuracy in classification is given by the construction of the tensor product of k copies of the quantum centroids ρ ⊗k 1,2 enlarging the Hilbert space where data are encoded. The corresponding Helstrom measurement is {P ⊗k + , P ⊗k − }, and the Helstrom bound satisfies [17]: Enlarging the Hilbert space of the quantum encoding, one increases the Helstrom bound obtaining a more accurate classifier. The computational cost is evident; however, in the next section, we observe that in the case of real input vectors, the space can be enlarged, saving time and space by means of the encoding into Bloch vectors.

Geometric Approach to Quantum-Inspired Classifications
In [17], a real vector x ∈ R d−1 is encoded, as shown above, in a projector operator ρ x of a Hilbert space C d represented by an d × d real symmetric matrix, where d ≥ 2. For simplicity, we consider an input vector [x 1 , x 2 ] ∈ R 2 and the corresponding projector operator ρ [x 1 ,x 2 ] on C 3 . By easy computations, one can see that the Bloch vector of ρ [x 1 ,x 2 ] has null components: Instead of using a matrix with nine real elements, memory occupation can be improved by considering only the non-zero components of the Bloch vector. In general, the technique of removing the components that are zero or repeated several times allows reducing the space and the calculation time considering only the significant values that allow to carry out the classification. Quantum-inspired classifications are similar to support vector machines that implicitly map the input space into high-dimensional feature space using kernel functions, where the maximal separating margins are constructed. In this case, the nonlinear explicit injective function ϕ : R 2 → R 5 can be defined as follows: From a geometric point of view, feature vectors are indeed points on the surface of a hyper-hemisphere. The corresponding elements to the quantum centroids ρ 1 , ρ 2 are the centroids of the feature vectors: In general, such centroids are points inside the hypersphere and therefore they do not have an inverse image. The Helstrom classifier can also be applied in a smaller space using the following encoder from R 2 to density operators of C 2 : where the Bloch vector [x 1 , x 2 , 1]. As discussed in Section 5, the Helstrom classifier gives even less accurate results on the training set as expected because the feature space is smaller than the previous one. In this particular case, quantum centroids are points inside the Bloch sphere of a qubit that correspond to density operators. An interesting question suggested in [17] is whether classification accuracy can be improved by increasing the dimension of the state space of density matrices that represent input vectors. Improving accuracy providing n copies of centroids in quantum-inspired classifications has a strong impact in terms of computational space (from dimension d − 1 to d 2n ) and time. Following the geometric approach, considering the significant values that allow to carry out the classification, the explicit function ϕ : R 2 → R 20 for two copies can be defined as follows: In particular, removing null and multiple entries, we consider only 20 values instead of 81 for two copies, 51 values instead of 729 for three copies and so on. However, one must also take into account high-precision numbers and track the propagation of the numerical error. The gain in accuracy seems marginal already from three copies. In Section 5, we will show some numerical results obtained by the implementation of the Helstrom classifier and, in particular, the close analogy between this quantum-inspired method and the SVM with linear kernel. A natural limitation of Helstrom classification arises in the case of a training set where the centroids of the two classes coincide, the Helstrom classifier is clearly useless because it is not able to perform a corresponding state discrimination. In the same situation, there are effective classical classification methods such as Random Forest, Naive Bayes classifier and Nearest Neighbor.

Quantum-Inspired Nearest Mean Classifications
In [16], a quantum version of the nearest mean classifier was shown making use of the inverse of the stereographic projection as encoder: In the case x ∈ R 2 , the lowest dimensional quantum encoding is obviously in C 2 ; in particular, ρ x is given by the density matrix identified by the Bloch vector π −1 (x), that is: The state ρ x is pure, i.e., a projector, as π −1 (x) lies on the surface of the Bloch sphere.
In Appendix A, the encoding x → ρ x defined by (14), in an arbitrary dimension, is realized by the function SV MEncoder[x, type] with type = 2. For binary classification, in [16], the centroids of the two classes are calculated in the feature space and then encoded into density matrices according to (14). Given a test point encoded into a density matrix, the classifier appends it to the nearest centroid with respect to the normalized trace distance: where b x 3 and b y 3 are the Bloch coefficients with respect to the Pauli matrix σ 3 of ρ x and ρ y and d tr (ρ x , ρ y ) = 1 2 tr(|ρ x − ρ y |). One can easily verify that: where d E is the standard Euclidean distance. Therefore a quantum-inspired nearest mean classifier can be defined by the encoding given in (14) and by the evaluation of the trace distance between density operators. In [16], experimental results on the performances of such a quantum-inspired classifier are presented and compared to the classical nearest mean classifier with impressive results in terms of accuracy. As illustrated in the previous section, in order to improve data separation, the input space can be mapped into a higher dimensional feature space by means of a kernel trick. It is also possible to reduce the computational space from R 8 to R 5 in this case with the following explicit function ϕ : R 2 → R 5 : In Appendix A, we implement the higher dimensional encoding induced by ϕ calling the function BlochVector [SV MEncoder[x,2]] and removing the null components.
The following distances, respectively, the Hilbert-Schimidt distance, trace distance, Bures distance and Hellinger distance, are often considered and can be used for nearest mean classification: The measures induce different geometries. The set of states of a qubit is equivalent to the Bloch sphere for the Hilbert-Schmidt distance d HS and for the trace distance d tr , and to the Uhlmann hemisphere for the Bures distance d B . For higher dimensions, the geometries induced by the Hilbert-Schmidt distance and the trace distance also differ. The Bures distance and trace distance are useful measures for quantifying the states distinguishability. The Bures distance is an optimized Kullback-Leibler distance between output statistics over all quantum measurements. The trace distance is a function of the probability to successfully discriminate two states in a single measurement optimized over all quantum measurements. As mentioned above, in the bi-dimensional case there is the equivalence of the normalized trace distance and the Euclidean distance (16). Moreover, one can see that the trace distance between pure states is equal to half of the Euclidean distance between the respective Bloch vectors. Therefore, in the Mathematica code of Appendix B, the function CentroidClassi f y for nearest mean classification based on trace distance is equivalently defined by means of the standard norm.
Within the paradigm of quantum-inspired classification, the Bures distance and the Hellinger distance can be used to define other classifiers that evaluate these distances for the nearest mean classification. Let us consider a binary classification problem and the quantum centroids (3) of the two classes. We can define a classification algorithm that evaluates the Bures distance between the pure quantum state encoding a test point and the quantum centroids that are not pure in general. The fidelity between density operators, defined as F (ρ 1 , when ρ 1 = |ψ 1 ψ 1 |. Therefore, the Bures distance between the pure state ρx encoding the test pointx and the quantum centroid ρ i is: where b (x) and b (i) are the Bloch vectors of ρ x and ρ i , respectively, and d is the dimension of the Hilbert space of the quantum encoding. The formula (18) can be directly derived from tr(ρ 1 (2) ), which is an immediate consequence of the fact that the generalized Pauli matrices are traceless and satisfy tr(σ i σ j ) = 2δ ij . An example of the nearest mean classifier based on the Bures distance can be defined by Algorithm 1.
The quantum encodings at line 1 and line 3 of Algorithm 1 can be realized by the function SV MEncoder[x, 1], defined in Appendix A, for instance. At line 5, the quantum centroids are constructed according to (3); alternatively, the classifier can calculate the centroids in the feature space from the Bloch vectors of the quantum states encoding the training points like in (10), where in general, the resulting centroid vectors x i are not the Bloch vectors of density operators.

Algorithm 1: Quantum-inspired nearest mean classifier based on Bures distance.
Input : Two classes C 1 and C 2 of training points, unlabelled pointx Result : Label y ofx 1 encodex into a pure state ρx; (2)

Numerical Results and Discussion
We focus on some numerical results obtained running the considered quantuminspired classifiers on some datasets. In Appendix A, there is the Mathematica code of the tests on the Helstrom classifier, and in Appendix B, there is the Mathematica code of the implementation of the quantum-inspired NMC (the code is also available at the following repository: github.com/leporini/classification). As a benchmark for testing the Helstrom classifier, we applied the DB-SCAN clustering algorithm [19] to a moons dataset, obtaining the classification of Figure 1.
The first test provides the quantum encoding of the bi-dimensional input vectors into a five-dimensional space according to (9) and (10)     The considered test on the performance of the Helstrom classifier over the moons dataset can be repeated considering a smaller space encoding the data points into density operators of a qubit according to (11). Since we are considering a feature map ϕ : R 2 → R 3 , which represents data in a lower dimensional space, the classifier is less accurate as expected. The accuracy of the Helstrom classifier, over the training set, in the two cases is: Accuracy is sensitive to the dimension, as shown in Figure 4. In Figure 5, there is the decision boundary found by the Helstrom classifier, with misclassified points, in the lowest dimensional case. Figure 6 shows the output of an SVM with a linear kernel, and its behavior is confirmed as similar to the Helstrom classifier.  In the case of a training set where the centroids of the two classes coincide, the Helstrom classifier does not work because it is not able to perform a state discrimination. Let us consider the dataset represented in Figure 7 with coinciding centroids; on the one hand, the Hesltrom classifier is useless, and on the other hand, RandomForest, Naive Bayes classifier, and Nearest Neighbors present the following high accuracies on the training set: Considering a dataset with distinguishable but close centroids, the performance of the Helstrom classifier is poor with respect to existing classical methods. For example, let us consider the training set represented in Figure 8. The accuracies of the Helstrom classifier, the Random Forest and the Nearest Neighbors can be compared, observing that the performances of the classical algorithms are definitely better in terms of the correct classification: Acc Hel = 0.73 , Acc RF = 0.98 , Acc NN = 0.85.
In Figure 9, there are the misclassified points by the Helstrom classifier.   The numerical results presented in this section reveal the connection between the Helstrom classifier and SVM with linear kernel; in particular, one can observe that different strategies of quantum encoding by means of Bloch vectors of density operators correspond to different kernel tricks.
In Section 4, we showed that from a geometric viewpoint, a quantum-inspired nearest mean classifier based on the trace distance between density operators can be equivalently implemented considering the Euclidean distance between Bloch vectors. We considered both the implementations of this quantum-inspired NMC, finding the exact equivalence in terms of accuracy on the moons dataset: Acc Density is the accuracy of the NMC based on the quantum encoding of data points into density operators according to (14) and the calculation of trace distances within this representation. Acc Bloch is the accuracy of the NMC, which encodes the data points into Bloch vectors by means of the inverse of the stereographic projection and evaluates the Euclidean distance in the Bloch feature space.

Conclusions
In this paper, we analyzed some methods of quantum-inspired classification, highlighting a connection with support vector machines. After an introduction on the Bloch representation of quantum states in an arbitrary dimension, we considered the Helstrom quantum state discrimination applied to binary classification (the Helstrom classifier), observing that its execution is similar to an SVM with linear kernel. In particular, adopting a geometric viewpoint, we described how quantum encodings of feature vectors can be used to implement a kernel trick, improving the quality of classification. Moreover, if one considers multiple copies of the encoding quantum states to map real feature vectors into a space with higher dimensions (as performed in [6] to obtain a polynomial kernel for a quantum SVM with an exponential cost of space resources), we showed that the computational cost can be calmed down, deleting the redundancies in the resulting Bloch vectors. In this way, in quantum-inspired classification, one can define a nonlinear injective function to perform a kernel trick saving space and time.
We presented some experimental results on the moons dataset to exhibit the behavior of the Helstrom classifier as an SVM with a linear kernel. With this dataset, we have enlarged the dimension of the Hilbert space, changing the quantum encoding within the Bloch representation of the density operators. On the other hand, we gave a couple of examples where the Helstrom classifier does not work due to the difficult discrimination of the quantum states representing the centroids of the classes.
We also focused on quantum-inspired nearest mean classification that is based on the computation of operator distances between density matrices encodings feature vectors. For instance, the classifier can evaluate the trace distance or the Bures distance among encoding quantum states. We considered the classifier with trace distance showing that it is equivalent (in terms of classification accuracy) to the "geometric" classifier, which evaluates the Euclidean distance between Bloch vectors. Then, we proposed an algorithm based on Bures distance, which can be evaluated directly in terms of Bloch vectors. An empirical study on this quantum-inspired classifier is a matter for future works.
As a general consideration emerging from the present work, we point out that the geometric approach considering the Bloch representation of density matrices is suitable to describe quantum-inspired classification. This approach reveals a connection between the Helstrom classifier based on quantum state discrimination and SVM. The geometric viewpoint seems to be fruitful also to define quantum-inspired nearest mean classifiers.
The present work opens possible directions of investigation such as the full characterization of the kernel of the Helstrom classifier in order to complete the description of quantum state discrimination as the execution of a support vector machine. More in general, an interesting topic could be a satisfying geometric analysis of quantum-inspired machine learning algorithms beyond classifiers since the present paper suggests that the geometry of quantum states offers a novel machinery to deal with data.

Data Availability Statement:
The code is also available at the following repository: github.com/ leporini/classification (accessed on 28 August 2021).

Conflicts of Interest:
The authors declare no conflict of interest.
Needs["QDENSITY`Qdensity`", "C:\\Qdensity.m"]; Needs["QDENSITY`Qdensity`", "C:\\Qdensity.m"]; Needs["QDENSITY`Qdensity`", "C:\\Qdensity.m"];  \n\t Not any vector b of the unit hypershpere gives rise a density operator, \n\t Not any vector b of the unit hypershpere gives rise a density operator, \n\t Not any vector b of the unit hypershpere gives rise a density operator, since the output is not a semi-definite positive operator (i.e. there exists a negative eigenvalue), since the output is not a semi-definite positive operator (i.e. there exists a negative eigenvalue), since the output is not a semi-definite positive operator (i.e. there exists a negative eigenvalue), \n\t but all vectors of length 2 d give rise a density operator."; \n\t but all vectors of length 2 d give rise a density operator.  \n\t When type=2, it uses the inverse of the stereographic projection. \n\t When type=2, it uses the inverse of the stereographic projection. \n\t When type=2, it uses the inverse of the stereographic projection.
\n\t Returns the density operator of C d , \n\t Returns the density operator of C d , \n\t Returns the density operator of C d , which is projection-operator (that projects over the closed subspace determined by the normalized vector) if type is 1 or 2. which is projection-operator (that projects over the closed subspace determined by the normalized vector) if type is 1 or 2. which is projection-operator (that projects over the closed subspace determined by the normalized vector) if type is 1 or 2. where DClass1 and DClass2 are density operators of the class 1 and 2, respectively. where DClass1 and DClass2 are density operators of the class 1 and 2, respectively. where DClass1 and DClass2 are density operators of the class 1 and 2, respectively.