Quantum-Inspired Applications for Classification Problems

In the context of quantum-inspired machine learning, quantum state discrimination is a useful tool for classification problems. We implement a local approach combining the k-nearest neighbors algorithm with some quantum-inspired classifiers. We compare the performance with respect to well-known classifiers applied to benchmark datasets.


Introduction
Quantum-inspired machine learning is a new branch of machine learning based on the application of the mathematical formalism of quantum mechanics to devise novel algorithms. It has revealed how such algorithms have the potential to provide benefits in spite of lacking the computational power of quantum computers with several qubits. Some of these binary classifiers have been analyzed from a geometric perspective [1]. In this work, we implement some algorithms, based on quantum state discrimination, within a local approach in the feature space by taking into account elements close to the element to be classified. In particular, we perform multi-class classification directly (without using binary classifiers) based on Helstrom discrimination following an approach suggested by Blanzieri and Melgani [2], where an unlabeled data instance is classified by finding its k nearest training elements before running a support vector machine (SVM) over the k training elements. This local approach improves the accuracy in classification and motivates the integration with the quantum-inspired Helstrom classifier since the latter can be interpreted as a SVM with linear kernel [3]. It has the potential to offer comparable performance using less complexity because it uses few training points per test point.
The quantum-inspired classifiers require the encoding of the feature vectors into density operators and methods for estimating the distinguishability of quantum states like the Helstrom state discrimination and the pretty-good measurement (PGM). Quantum-inspired machine learning has revealed how relevant benefits for machine learning problems can be obtained using the quantum information theory even without employing quantum computers [4]. Moreover, as we will show below, our PGM within our algorithms is more efficient than the one proposed by these authors in the case of multiple preparations in the same state because it removes duplicates and null values in encoding. Quantum-inspired methods are used in applications that solve industry-relevant problems related to finance, optimization and chemistry [5][6][7][8][9].
In the experimental part, we present a comparison of the performances of the local quantum-inspired classifiers against well-known classical algorithms in order to show that the local approach can be a valuable tool for increasing the performances of this kind of classifier.
In Section 2, we review the notion of quantum encoding of data vectors into density operators and quantum-inspired classification based on quantum state discrimination [10][11][12][13].
In Section 3, we use the k-nearest neighbors algorithm (kNN) as a procedure to restrict the training set to the nearest elements around the test elements enabling the local execution of the quantum-inspired classifiers. In Section 4, we present and discuss some empirical results for evaluating the impact of locality in quantum-inspired classification comparing the performances of the proposed algorithms to classical methods over benchmark datasets. Furthermore, we compare quantum-inspired classifiers with SVMs within the local approach. In Section 5, there are the concluding remarks about the efficiency of local quantum-inspired classifiers.

Quantum-Inspired Classification
The first step of quantum-inspired classification is the quantum encoding that is any procedure to encode classical information into quantum states. In particular, we consider encoding of data vectors into density matrices on a Hilbert space H whose dimension depends on the dimension of the input space. Density matrices are positive semidefinite operators ρ such that trρ = 1 and are the mathematical objects used to describe the physical states of quantum systems. Pure states are all the density matrices of the form ρ = |ψ ψ|, with ψ = 1, which are the rank-1 projectors that can be directly identified with unit vectors up to a phase factor. Let ρ be a density operator on a d-dimensional Hilbert space C d ; it can be written in the following form: where {σ j } j=1,. . . ,d 2 −1 are the standard generators of the special unitary group SU(d), also called generalized Pauli matrices, and is the Bloch vector associated with ρ which lies within the hypersphere of radius 1 in R d 2 −1 . For d = 2, the qubit case, the density matrices are in bijective correspondence to the points of the Bloch sphere in R 3 , where the pure states are in one-to-one correspondence with the points of the spherical surface. For d > 2, the points contained in the unit hypersphere of R d 2 −1 are not in bijective correspondence with density matrices on C d , so the Bloch vectors do not form a ball but a complicated convex body. However, any vector within the sphere of radius 2 d gives rise to a density operator [14].
Complex vectors of dimension n can be encoded into density matrices of an (n + 1)dimensional Hilbert space H in the following way: where {|α } α=0,. . . ,n is the computational basis of H, identified as the standard basis of C n+1 . The map defined in (2), called amplitude encoding, encodes x into the pure state ρ x = |x x| where the additional component of |x stores the norm of x. Nevertheless the quantum encoding x → ρ x can be realized in terms of the Bloch vectors x → b (ρ x ) saving space resources. The improvement of memory occupation within the Bloch representation is evident when we take multiple tensor products ρ ⊗ · · · ⊗ ρ of a density matrix ρ constructing a feature map to enlarge the dimension of the representation space [1]. Quantum-inspired classifiers are based on quantum encoding of data vectors into density matrices, calculations of centroids and various criteria of quantum state distinguishability such as: the Helstrom state discrimination, the pretty-good measurement [4,11] and the geometric construction of a minimum-error measurement [12]. Let us briefly recall the notion of quantum state discrimination. Given a set of arbitrary quantum states with respective a priori probabilities R = {(ρ 1 , p 1 ), . . . , (ρ N , p N )}, in general there is no measurement process that discriminates the states without errors, i.e., a collection E = {E i } i=1,...,N of positive semidefinite operators such that ∑ N i=1 E i = I, satisfying the following property: tr(E i ρ j ) = 0 when i = j for all i, j = 1, . . . , N. The probability of a successful state discrimination of the states in R performing the measurement E is: A complete characterization of the optimal measurement E opt that maximizes the probability (3) for R = {(ρ 1 , p 1 ), (ρ 2 , p 2 )} is due to Helstrom [10]. Let Λ := p 1 ρ 1 − p 2 ρ 2 be the Helstrom observable whose positive and negative eigenvalues are, respectively, collected in the sets D + and D − . Consider the two orthogonal projectors: where P λ projects onto the eigenspace of λ. The measurement E opt : = {P + , P − } maximizes the probability (3) that attains the Helstrom bound: Helstrom quantum state discrimination can be used to implement a quantum-inspired binary classifier with promising performances. Let {(x 1 , y 1 ), . . . , (x M , y M )} be a training set with x i ∈ C n , y i ∈ {1, 2} ∀i = 1, . . . , M. Assume that, to encode the data points into quantum states by means of C n x → ρ x ∈ S(H), one can construct the quantum centroids ρ 1 and ρ 2 of the two classes C 1,2 = {x i : y i = 1, 2}: Let {P + , P − } be the Helstrom measurement defined by the set R = {(ρ 1 , p 1 ), (ρ 2 , p 2 )}, where the probabilities attached to the centroids are p 1,2 = |C 1,2 | |C 1 |+|C 2 | . The Helstrom classifier applies the optimal measurement for the discrimination of the two quantum centroids to assign the label y to a new data instance x, encoded into the state ρ x , as follows: A strategy to increase the accuracy in classification is given by the construction of the tensor product of q copies of the quantum centroids ρ ⊗q 1,2 enlarging the Hilbert space where data are encoded. The corresponding Helstrom measurement is {P ⊗q + , P ⊗q − } and the Helstrom bound satisfies: Increasing the dimension of the Hilbert space of the quantum encoding, one increases the Helstrom bound obtaining a more accurate classifier. The corresponding computational cost is evident; however, in the case of real input vectors, the space can be enlarged saving time and space by means of encoding into Bloch vectors. Clearly, defining a quantum encoding is equivalent to selecting a feature map to represent feature vectors into a space of higher dimension. In the case of the considered quantum amplitude encoding R 2 (x 1 , , the nonlinear explicit injective function ϕ : R 2 → R 5 to encode data into Bloch vectors can be defined as follows: The mapped feature vectors are points on the surface of a hyper-hemisphere, with centroids of the classes, calculated as the means of these feature vectors, inside the hypersphere and can be rescaled to a Bloch vector as shown below. In order to make the classification more accurate, one can increase the dimension of the representation space providing k copies of the quantum states, in terms of a tensor product, encoding data instances and centroids into density matrices ρ ⊗q . Bloch encoding allows an efficient implementation of feature maps; by removing null and repeated entries from the Bloch vector we obtain the following injective function for data encoding. Therefore, the Bloch representation allows an efficient storing of redundant elements of density matrices ρ ⊗q . Let us consider a training set divided into the classes C 1 , . . . , C M ; assume we have any training point x encoded into the Bloch vector b (x) of a pure state on C d . The calculation of the centroid of the class C i , within this quantum encoding, must take into account that the mean of the Bloch vectors b (i) : does not represent a density operator in general. In fact, for d > 2 the points contained in the unit hypersphere of R d 2 −1 are not in bijective correspondence with density matrices on C d . However, since any vector within the closed ball of radius 2 d gives rise to a density operator, a centroid can be defined in terms of a meaningful Bloch vector by a rescaling: A method of quantum state discrimination for distinguishing more than two states {(ρ 1 , p 1 ), . . . , (ρ N , p N )} is the square-root measurement, also known as the pretty-good measurement, defined by: where ρ = ∑ i p i ρ i ; PGM is the optimal minimum error when states satisfy certain symmetry properties [11]. Clearly, to distinguish between n centroids we need a measurement with at most n outcomes. It is sometimes optimal to avoid measurement and simply guess that the state is the a priori most likely state. The optimal POVM {E i } i for minimum-error state discrimination over satisfies the following necessary and sufficient Helstrom conditions [12]: where the Hermitian operator, also known as the Lagrange operator, is defined by Γ := ∑ i p i ρ i E i . It is also useful to consider the following properties which can be obtained from the above conditions: For each i the operator Γ − p i ρ i can have two, one or no zero eigenvalues, corresponding to the zero operator, a rank-one operator, and a positive-definite operator, respectively. In the first case, we use the measurement {E i = I, E i =j = 0} for some i where p i ≥ p j ∀j, i.e., the state belongs to the a priori most likely class. In the second case, if E i = 0, it is a weighted projector onto the corresponding eigenstate. In the latter case, it follows that E i = 0 for every optimal measurement. Given the following Bloch representations: in order to determine the Lagrange operator in C d we need d 2 independent linear constraints: A measurement with more than d 2 outcomes can always be decomposed as a probabilistic mixture of measurements with at most d 2 outcomes. Therefore, if the number of classes is greater than or equal to d 2 and we get d 2 linearly independent equations, we construct the Lagrange operator and derive the optimal measurements. From the geometric point of view, we obtain the unit vectors corresponding to the rank-1 projectors giving the POVM of the measurement. It is also possible to further partition the classes in order to increase the number of centroids and of the corresponding equations. The classification is carried out in this way: an unlabeled point x is associated with the first label y such that

Local Quantum-Inspired Classifiers
In the implementation, we consider the execution of the classifiers described above after a selection of the k training elements that are closest to a considered unclassified instance.
The k-nearest neighbors algorithm (kNN) is a simple classification algorithm which consists of the following steps: 1.
The computation of the chosen distance metric between the test element and the training elements; 2.
The extraction of the k elements closest to the test instance; 3.
The assignment of the class label through a majority voting based on the labels of the k nearest neighbors.
In the following, we apply the kNN for the extraction of the closest elements to the test element then the classification is performed by a quantum-inspired algorithm instead of majority voting. On the one hand, given a test element, the kNN can be executed over the data vectors in the input space, e.g., considering the Euclidean distance, then the k neighbors can be encoded into density matrices and used for a quantum-inspired classification. On the other hand, the entire dataset can be encoded into density matrices and the kNN selects the k neighbors evaluating an operator distance among quantum states. In the latter case, we consider the Bures distance that is a quantum generalization of the Fisher information and a distance derived by the super-fidelity. The Bures distance is defined by: where the fidelity between density operators is given by F (ρ 1 , ρ 2 ) = tr √ ρ 1 ρ 2 √ ρ 1 2 . Let us note that the fidelity reduces to F (ρ 1 , ρ 2 ) = ψ 1 |ρ 2 |ψ 1 when ρ 1 = |ψ 1 ψ 1 |. Therefore the Bures distance between the pure state ρ 1 and the arbitrary state ρ 2 can be expressed in term of the Bloch representation as follows: where b (1) and b (2) are the Bloch vectors of ρ 1 and ρ 2 , respectively, and d is the dimension of the Hilbert space of the quantum encoding. The special form (17) of the Bures distance, expressed in terms of Bloch vectors, is relevant for our purpose because data vectors can be encoded into pure states and in general quantum centroids are mixed states. An alternative distance can be defined via super-fidelity [15] d G (ρ 1 , where the super-fidelity between density operators is given by Notice that the super-fidelity reduces to G(ρ 1 , ρ 2 ) = ψ 1 |ρ 2 |ψ 1 when ρ 1 = |ψ 1 ψ 1 |. This distance can be expressed in term of the Bloch representation as follows: where b (1) and b (2) are the Bloch vectors of ρ 1 and ρ 2 , respectively, and d is the dimension of the Hilbert space of the quantum encoding. The inner distance between the corresponding Bloch vectors represents the angle θ between the unit vectors (b (1) , 1 − |b (1) | 2 ) and (b (2) , 1 − |b (2) | 2 ), which is normalized to be 1: For pure states the inner distance corresponds to the Fubini-Study distance.
In Algorithm 1, the locality is imposed by running the kNN on the input space finding the training vectors that are closest to the test element; then there is the quantum encoding into pure states and a quantum-inspired classifier (Helstrom, PGM or geometric Helstrom) is locally executed over the restricted training set. In Algorithm 2, the test element and all the training elements are encoded into Bloch vectors of pure states then a kNN is run w.r.t. the Bures distance to find the nearest neighbors in the space of the quantum representation; then a quantum-inspired classifier is executed with the training instances corresponding to the closest quantum states.

Algorithm 1
Local quantum-inspired classification based on kNN in the input space before the quantum encoding. The distance can be: Euclidean, Manhattan, Chessboard, Canberra or Bray-Curtis.
Require: Dataset X of labeled instances, unlabeled pointx Ensure: Label ofx find the k nearest neighbors x 1 , . . . , x k tox in X w.r.t. the Euclidean distance encodex into a pure state ρx for j = 1, . . . , k do encode x j into a pure state ρ x j end for run the quantum-inspired classifier with training points encoded into {ρ x j } j=1,. . . ,k . Algorithm 2 Local quantum-inspired classification based on kNN in the Bloch representation after quantum encoding. The distance can be: Bures, Super-Fidelity or Inner.
Require: Dataset X of labeled instances, unlabeled pointx Ensure: Label ofx encodex into a Bloch vector b (x) of a pure state for x ∈ X do encode x into a Bloch vector b (x) of a pure state end for find the k nearest neighbors to b (x) in {b (x) } x∈X w.r.t. the distance D B run the quantum-inspired classifier over the k nearest neighbors.
A local quantum-inspired classifier can be defined without quantum state discrimination but considering a nearest mean classification such as the following: after the quantum encoding we perform a kNN selection and calculate the centroid of each class considering only the nearest neighbors to the test element, finally we assign the label according to the nearest centroid as schematized in Algorithm 3.

Algorithm 3 Local quantum-inspired nearest mean classifier.
Require: Training set X divided into n classes C i , unlabeled pointx Ensure: Label ofx encodex into a Bloch vector b (x) of a pure state for x ∈ X do encode x into a Bloch vector b (x) of a pure state end for find the neighborhood

Results and Discussion
In this section, we present some numerical results obtained by the implementation of the local quantum-inspired classifiers with several distances compared to well-known classical algorithms. In particular, we consider the SVM with different kernels: linear, radial basis function and sigmoid. Then, we run a random forest, a naive Bayes classifier and the logistic regression. In order to compare the results with previous papers, we take into account the following benchmark datasets from the PMLB public repository [16]: analcatdata_aids, analcatdata_asbestos, analcatdata_bankruptcy, analcat-data_boxing1, analcatdata_cyyoung9302, analcatdata_dmft, analcatdata_happiness, anal-catdata_japansolvent, analcatdata_lawsuit, appendicitis, biomed, breast_cancer, iris, labor, new_thyroid, phoneme, prnn_fglass, prnn_synth, tae and wine_recognition. For each dataset we randomly select 80% of the data to create a training set and use the residual 20% for the evaluation. We repeated the same procedure 10 times and calculated the average accuracy in Table 1. Certainly, it is possible to compare the performances based on different statistic indices including Matthews correlation coefficient, F-measure and Cohen's parameter.
We observe that the performances of the local quantum-inspired classifiers turn out to be definitely more accurate, where the hyperparameter k is set equal to the number of classes in the dataset. This value is reasonable to construct the centroids of the classes. In particular, Algorithm 1 with the Euclidean distance is the most accurate classifier for the datasets analcatdata_boxing1, analcatdata_happiness, biomed, prnn_fglass and wine_recognition, while the Manhattan distance is best for analcatdata_aids, anal-catdata_japansolvent, breast_cancer, iris and tae, the Chessboard distance is best for anal-catdata_cyyoung9302 and analcatdata_lawsuit, and the Bray-Curtis distance is best for analcatdata_bankruptcy and appendicitis. Algorithm 2 with the Bures distance outperforms Algorithm 1 and 3 for analcatdata_dmft and produces the same accuracy for labor. Algorithm 3 with the Bures distance is the most accurate classifier for analcatdata_asbestos, new_thyroid, phoneme and prnn_synth. Algorithm 1 uses a k-d tree in the training set, while the other two use a k-d tree in the corresponding Bloch vector space. The time complexity to construct the k-d tree is usually O(dn log n), where n is the cardinality of the training set and d the length of each vector, while the space complexity is O(dn). The query to find the k nearest neighbors takes O(k log n). The time complexity of PGM is O(cd 3 ) and is O(dm) for the classification of the m elements of the test set in c classes. Our algorithm is more efficient than the one presented in [4] in the presence of multiple copies because it remove nulls and duplicates. In particular, we consider only 20 values instead of 81 matrix elements of ρ (x 1 ,x 2 ) ⊗ ρ (x 1 ,x 2 ) , 51 values instead of 729 for ρ (x 1 ,x 2 ) ⊗ ρ (x 1 ,x 2 ) ⊗ ρ (x 1 ,x 2 ) and so on. In a future paper, we will analyze in detail the complexity of such algorithms in the average case and in the worst case. For instance, one can construct the ball tree for clustered data instead of the k-d tree and consider different search techniques.
In Table 2, we show the methods that provided the best accuracy, with the respective execution times, compared with the classical method. These experimental results are promising and show that the methods are efficient when run on classical computers. Algorithm 3 with the Bures distance is not efficient for phoneme, but Algorithm 1 with the Euclidean distance is: 1.951 s with an average accuracy of 0.897. We will study in a future work how to also apply the local methods in implementations on quantum computers.
Let us focus on multi-class datasets for the comparison with the kNNSVM method proposed by Blanzieri and Melgani [2]. This method requires the choice of the hyperparameter k, and as is well known from the standard kNN algorithm, there is no general strategy to choose k a priori. In Table 3, the results obtained for some k values of the kNNSVM are shown. For analcatdata_dmft, kNNSVM presents an average accuracy that is only 2% lower than Algorithm 2 but requires 17 elements per test element instead of 6. For analcat-data_happiness, kNNSVM yields an average accuracy that is 10% lower than Algorithm 1 and requires 14 elements per test element instead of 3. However, kNNSVM outperforms local quantum-inspired classifiers for iris and tae, but only for the latter requires fewer elements, while for wine_recognition they are comparable. For new_thyroid and prnn_fglass, the best results are obtained with the nearest neighbor method, but with lower accuracy than Algorithms 1 and 3, respectively.

Conclusions
The present paper focuses on the implementation of classification algorithms based on quantum state discrimination. A novel contribution is the local approach adopted to execute the classifier, not over the entire training set, but in a neighborhood of the test element. Once partitioned, for the training set the k nearest data elements are encoded into Bloch vectors and used to define the quantum centroid of each class.
The local quantum-inspired classifier considered, for reasonable values of the hyperparameters, was found to be a method with performance comparable to classical algorithms for multi-class classification. We performed some experiments using benchmark datasets and found that local quantum-inspired classifiers were even more accurate than SVM with different kernels, a random forest, a naive Bayes classifier and the logistic regression classification algorithm.
The present proposal offers a family of classifiers. In fact, several strategies to impose a notion of locality over a training set, and several procedures of quantum state discrimination, can be applied. Both the local approach to classification and the quantum-inspired data encoding/processing deserve further investigation to clarify the impact of these ideas on machine learning, but the results achieved clearly indicate that both approaches to machine learning are promising.