Next Article in Journal
Entropy, Ecology and Evolution: Toward a Unified Philosophy of Biology
Next Article in Special Issue
A Method Based on Timing Weight Priority and Distance Optimization for Quantum Circuit Transformation
Previous Article in Journal
Evidence of an Absence of Inbreeding Depression in a Wild Population of Weddell Seals (Leptonychotes weddellii)
Previous Article in Special Issue
An Inexact Feasible Quantum Interior Point Method for Linearly Constrained Quadratic Optimization
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Quantum-Inspired Applications for Classification Problems

by
Cesarino Bertini
1,† and
Roberto Leporini
2,*,†
1
Department of Management, University of Bergamo, via dei Caniana 2, I-24127 Bergamo, Italy
2
Department of Economics, University of Bergamo, via dei Caniana 2, I-24127 Bergamo, Italy
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Entropy 2023, 25(3), 404; https://doi.org/10.3390/e25030404
Submission received: 16 January 2023 / Revised: 19 February 2023 / Accepted: 22 February 2023 / Published: 23 February 2023
(This article belongs to the Special Issue Quantum Machine Learning 2022)

Abstract

:
In the context of quantum-inspired machine learning, quantum state discrimination is a useful tool for classification problems. We implement a local approach combining the k-nearest neighbors algorithm with some quantum-inspired classifiers. We compare the performance with respect to well-known classifiers applied to benchmark datasets.

1. Introduction

Quantum-inspired machine learning is a new branch of machine learning based on the application of the mathematical formalism of quantum mechanics to devise novel algorithms. It has revealed how such algorithms have the potential to provide benefits in spite of lacking the computational power of quantum computers with several qubits. Some of these binary classifiers have been analyzed from a geometric perspective [1]. In this work, we implement some algorithms, based on quantum state discrimination, within a local approach in the feature space by taking into account elements close to the element to be classified. In particular, we perform multi-class classification directly (without using binary classifiers) based on Helstrom discrimination following an approach suggested by Blanzieri and Melgani [2], where an unlabeled data instance is classified by finding its k nearest training elements before running a support vector machine (SVM) over the k training elements. This local approach improves the accuracy in classification and motivates the integration with the quantum-inspired Helstrom classifier since the latter can be interpreted as a SVM with linear kernel [3]. It has the potential to offer comparable performance using less complexity because it uses few training points per test point.
The quantum-inspired classifiers require the encoding of the feature vectors into density operators and methods for estimating the distinguishability of quantum states like the Helstrom state discrimination and the pretty-good measurement (PGM). Quantum-inspired machine learning has revealed how relevant benefits for machine learning problems can be obtained using the quantum information theory even without employing quantum computers [4]. Moreover, as we will show below, our PGM within our algorithms is more efficient than the one proposed by these authors in the case of multiple preparations in the same state because it removes duplicates and null values in encoding. Quantum-inspired methods are used in applications that solve industry-relevant problems related to finance, optimization and chemistry [5,6,7,8,9].
In the experimental part, we present a comparison of the performances of the local quantum-inspired classifiers against well-known classical algorithms in order to show that the local approach can be a valuable tool for increasing the performances of this kind of classifier.
In Section 2, we review the notion of quantum encoding of data vectors into density operators and quantum-inspired classification based on quantum state discrimination [10,11,12,13]. In Section 3, we use the k-nearest neighbors algorithm (kNN) as a procedure to restrict the training set to the nearest elements around the test elements enabling the local execution of the quantum-inspired classifiers. In Section 4, we present and discuss some empirical results for evaluating the impact of locality in quantum-inspired classification comparing the performances of the proposed algorithms to classical methods over benchmark datasets. Furthermore, we compare quantum-inspired classifiers with SVMs within the local approach. In Section 5, there are the concluding remarks about the efficiency of local quantum-inspired classifiers.

2. Quantum-Inspired Classification

The first step of quantum-inspired classification is the quantum encoding that is any procedure to encode classical information into quantum states. In particular, we consider encoding of data vectors into density matrices on a Hilbert space H whose dimension depends on the dimension of the input space. Density matrices are positive semidefinite operators ρ such that tr ρ = 1 and are the mathematical objects used to describe the physical states of quantum systems. Pure states are all the density matrices of the form ρ = ψ ψ , with  ψ = 1 , which are the rank-1 projectors that can be directly identified with unit vectors up to a phase factor. Let ρ be a density operator on a d-dimensional Hilbert space C d ; it can be written in the following form:
ρ = 1 d I d + d ( d 1 ) 2 j = 1 d 2 1 b j ( ρ ) σ j ,
where { σ j } j = 1 , , d 2 1 are the standard generators of the special unitary group S U ( d ) , also called generalized Pauli matrices, and  I d is the d × d identity matrix. The vector b ( ρ ) = ( b 1 ( ρ ) , , b d 2 1 ( ρ ) ) , with  b j ( ρ ) = d 2 ( d 1 ) tr ( ρ σ j ) R , is the Bloch vector associated with ρ which lies within the hypersphere of radius 1 in R d 2 1 . For  d = 2 , the qubit case, the density matrices are in bijective correspondence to the points of the Bloch sphere in R 3 , where the pure states are in one-to-one correspondence with the points of the spherical surface. For d > 2 , the points contained in the unit hypersphere of R d 2 1 are not in bijective correspondence with density matrices on C d , so the Bloch vectors do not form a ball but a complicated convex body. However, any vector within the sphere of radius 2 d gives rise to a density operator [14].
Complex vectors of dimension n can be encoded into density matrices of an ( n + 1 ) -dimensional Hilbert space H in the following way:
C n x x = 1 x 2 + 1 α = 0 n 1 x α α + n H ,
where { α } α = 0 , , n is the computational basis of H , identified as the standard basis of C n + 1 . The map defined in (2), called amplitude encoding, encodes x into the pure state ρ x = x x where the additional component of x stores the norm of x . Nevertheless the quantum encoding x ρ x can be realized in terms of the Bloch vectors x b ( ρ x ) saving space resources. The improvement of memory occupation within the Bloch representation is evident when we take multiple tensor products ρ ρ of a density matrix ρ constructing a feature map to enlarge the dimension of the representation space [1].
Quantum-inspired classifiers are based on quantum encoding of data vectors into density matrices, calculations of centroids and various criteria of quantum state distinguishability such as: the Helstrom state discrimination, the pretty-good measurement [4,11] and the geometric construction of a minimum-error measurement [12]. Let us briefly recall the notion of quantum state discrimination. Given a set of arbitrary quantum states with respective a priori probabilities R = { ( ρ 1 , p 1 ) , , ( ρ N , p N ) } , in general there is no measurement process that discriminates the states without errors, i.e., a collection E = { E i } i = 1 , , N of positive semidefinite operators such that i = 1 N E i = I , satisfying the following property: tr ( E i ρ j ) = 0 when i j for all i , j = 1 , , N . The probability of a successful state discrimination of the states in R performing the measurement E is:
P E ( R ) = i = 1 N p i tr ( E i ρ i ) .
A complete characterization of the optimal measurement E o p t that maximizes the probability (3) for R = { ( ρ 1 , p 1 ) , ( ρ 2 , p 2 ) } is due to Helstrom [10]. Let Λ : = p 1 ρ 1 p 2 ρ 2 be the Helstrom observable whose positive and negative eigenvalues are, respectively, collected in the sets D + and D . Consider the two orthogonal projectors:
P ± : = λ D ± P λ ,
where P λ projects onto the eigenspace of λ . The measurement E o p t : = { P + , P } maximizes the probability (3) that attains the Helstrom bound:
h b ( ρ 1 , ρ 2 ) = p 1 tr ( P + ρ 1 ) + p 2 tr ( P ρ 2 ) .
Helstrom quantum state discrimination can be used to implement a quantum-inspired binary classifier with promising performances. Let { ( x 1 , y 1 ) , , ( x M , y M ) } be a training set with x i C n , y i { 1 , 2 }   i = 1 , , M . Assume that, to encode the data points into quantum states by means of C n x ρ x S ( H ) , one can construct the quantum centroids ρ 1 and ρ 2 of the two classes C 1 , 2 = { x i : y i = 1 , 2 } :
ρ 1 , 2 = 1 | C 1 , 2 | x C 1 , 2 ρ x
Let { P + , P } be the Helstrom measurement defined by the set R = { ( ρ 1 , p 1 ) , ( ρ 2 , p 2 ) } , where the probabilities attached to the centroids are p 1 , 2 = | C 1 , 2 | | C 1 | + | C 2 | . The Helstrom classifier applies the optimal measurement for the discrimination of the two quantum centroids to assign the label y to a new data instance x , encoded into the state ρ x , as follows:
y ( x ) = 1 if tr ( P + ρ x ) tr ( P ρ x ) 2 otherwise
A strategy to increase the accuracy in classification is given by the construction of the tensor product of q copies of the quantum centroids ρ 1 , 2 q enlarging the Hilbert space where data are encoded. The corresponding Helstrom measurement is { P + q , P q } and the Helstrom bound satisfies:
h b ( ρ 1 q , ρ 2 q ) h b ρ 1 ( q + 1 ) , ρ 2 ( q + 1 ) q N .
Increasing the dimension of the Hilbert space of the quantum encoding, one increases the Helstrom bound obtaining a more accurate classifier. The corresponding computational cost is evident; however, in the case of real input vectors, the space can be enlarged saving time and space by means of encoding into Bloch vectors.
Clearly, defining a quantum encoding is equivalent to selecting a feature map to represent feature vectors into a space of higher dimension. In the case of the considered quantum amplitude encoding R 2 ( x 1 , x 2 ) ρ ( x 1 , x 2 ) S ( C 3 ) , the nonlinear explicit injective function φ : R 2 R 5 to encode data into Bloch vectors can be defined as follows:
φ ( x 1 , x 2 ) : = 1 x 1 2 + x 2 2 + 1 2 x 1 x 2 , 2 x 1 , 2 x 2 , x 1 2 x 2 2 , x 1 2 + x 2 2 2 3 .
The mapped feature vectors are points on the surface of a hyper-hemisphere, with centroids of the classes, calculated as the means of these feature vectors, inside the hypersphere and can be rescaled to a Bloch vector as shown below.
In order to make the classification more accurate, one can increase the dimension of the representation space providing k copies of the quantum states, in terms of a tensor product, encoding data instances and centroids into density matrices ρ q . Bloch encoding allows an efficient implementation of feature maps; by removing null and repeated entries from the Bloch vector we obtain the following injective function for data encoding. Therefore, the Bloch representation allows an efficient storing of redundant elements of density matrices ρ q .
Let us consider a training set divided into the classes C 1 , , C M ; assume we have any training point x encoded into the Bloch vector b ( x ) of a pure state on C d . The calculation of the centroid of the class C i , within this quantum encoding, must take into account that the mean of the Bloch vectors b ( i ) : = 1 | C i | x C i b ( x ) does not represent a density operator in general. In fact, for  d > 2 the points contained in the unit hypersphere of R d 2 1 are not in bijective correspondence with density matrices on C d . However, since any vector within the closed ball of radius 2 d gives rise to a density operator, a centroid can be defined in terms of a meaningful Bloch vector by a rescaling:
b ^ ( i ) : = 2 d | C i | x C i b ( x ) .
A method of quantum state discrimination for distinguishing more than two states { ( ρ 1 , p 1 ) , , ( ρ N , p N ) } is the square-root measurement, also known as the pretty-good measurement, defined by:
E i = p i ρ 1 2 ρ i ρ 1 2 ,
where ρ = i p i ρ i ; PGM is the optimal minimum error when states satisfy certain symmetry properties [11]. Clearly, to distinguish between n centroids we need a measurement with at most n outcomes. It is sometimes optimal to avoid measurement and simply guess that the state is the a priori most likely state.
The optimal POVM { E i } i for minimum-error state discrimination over
R = { ( ρ 1 , p 1 ) , , ( ρ N , p N ) }
satisfies the following necessary and sufficient Helstrom conditions [12]:
Γ p i ρ i 0 i = 1 , , N ,
where the Hermitian operator, also known as the Lagrange operator, is defined by Γ : = i p i ρ i E i . It is also useful to consider the following properties which can be obtained from the above conditions:
E j ( p j ρ j p i ρ i ) E i = 0 i , j .
For each i the operator Γ p i ρ i can have two, one or no zero eigenvalues, corresponding to the zero operator, a rank-one operator, and a positive-definite operator, respectively. In the first case, we use the measurement { E i = I , E i j = 0 } for some i where p i p j   j , i.e., the state belongs to the a priori most likely class. In the second case, if  E i 0 , it is a weighted projector onto the corresponding eigenstate. In the latter case, it follows that E i = 0 for every optimal measurement.
Given the following Bloch representations:
Γ = 1 d a I d + d ( d 1 ) 2 j = 1 d 2 1 b j σ j , ρ i = 1 d I d + d ( d 1 ) 2 j = 1 d 2 1 b j ( i ) σ j ,
in order to determine the Lagrange operator in C d we need d 2 independent linear constraints:
2 p i a b ^ ( i ) · b p i 2 ( 1 | b ^ ( i ) | 2 ) = a 2 | b | 2 .
A measurement with more than d 2 outcomes can always be decomposed as a probabilistic mixture of measurements with at most d 2 outcomes. Therefore, if the number of classes is greater than or equal to d 2 and we get d 2 linearly independent equations, we construct the Lagrange operator and derive the optimal measurements. From the geometric point of view, we obtain the unit vectors corresponding to the rank-1 projectors E i = 1 d I d + d ( d 1 ) 2 j = 1 d 2 1 n j ( i ) σ j where n ( i ) = b ^ ( i ) a b | b ^ ( i ) a b | R d 2 1 giving the POVM of the measurement. It is also possible to further partition the classes in order to increase the number of centroids and of the corresponding equations. The classification is carried out in this way: an unlabeled point x ^ is associated with the first label y such that b ( x ^ ) · n ( y ) = max i b ( x ^ ) · n ( i ) , where d = l e n g t h ( x ) + 2 .

3. Local Quantum-Inspired Classifiers

In the implementation, we consider the execution of the classifiers described above after a selection of the k training elements that are closest to a considered unclassified instance.
The k-nearest neighbors algorithm (kNN) is a simple classification algorithm which consists of the following steps:
  • The computation of the chosen distance metric between the test element and the training elements;
  • The extraction of the k elements closest to the test instance;
  • The assignment of the class label through a majority voting based on the labels of the k nearest neighbors.
In the following, we apply the kNN for the extraction of the closest elements to the test element then the classification is performed by a quantum-inspired algorithm instead of majority voting. On the one hand, given a test element, the kNN can be executed over the data vectors in the input space, e.g., considering the Euclidean distance, then the k neighbors can be encoded into density matrices and used for a quantum-inspired classification. On the other hand, the entire dataset can be encoded into density matrices and the kNN selects the k neighbors evaluating an operator distance among quantum states. In the latter case, we consider the Bures distance that is a quantum generalization of the Fisher information and a distance derived by the super-fidelity. The Bures distance is defined by:
d B ( ρ 1 , ρ 2 ) = 2 1 F ( ρ 1 , ρ 2 ) ,
where the fidelity between density operators is given by F ( ρ 1 , ρ 2 ) = tr ρ 1 ρ 2 ρ 1 2 . Let us note that the fidelity reduces to F ( ρ 1 , ρ 2 ) = ψ 1 | ρ 2 | ψ 1 when ρ 1 = ψ 1 ψ 1 . Therefore the Bures distance between the pure state ρ 1 and the arbitrary state ρ 2 can be expressed in term of the Bloch representation as follows:
d B ( ρ 1 , ρ 2 ) = 2 1 1 d 1 + ( d 1 ) b ( 1 ) · b ( 2 ) D B b ( 1 ) , b ( 2 ) ,
where b ( 1 ) and b ( 2 ) are the Bloch vectors of ρ 1 and ρ 2 , respectively, and d is the dimension of the Hilbert space of the quantum encoding. The special form (17) of the Bures distance, expressed in terms of Bloch vectors, is relevant for our purpose because data vectors can be encoded into pure states and in general quantum centroids are mixed states.
An alternative distance can be defined via super-fidelity [15]
d G ( ρ 1 , ρ 2 ) = 1 G ( ρ 1 , ρ 2 ) ,
where the super-fidelity between density operators is given by
G ( ρ 1 , ρ 2 ) = tr ρ 1 ρ 2 + ( 1 tr ρ 1 2 ) ( 1 tr ρ 2 2 ) .
Notice that the super-fidelity reduces to G ( ρ 1 , ρ 2 ) = ψ 1 | ρ 2 | ψ 1 when ρ 1 = ψ 1 ψ 1 . This distance can be expressed in term of the Bloch representation as follows:
D G b ( 1 ) , b ( 2 ) = 1 1 d 1 + ( d 1 ) ( b ( 1 ) · b ( 2 ) + ( 1 | b ( 1 ) | 2 ) ( 1 | b ( 2 ) | 2 ) ) ,
where b ( 1 ) and b ( 2 ) are the Bloch vectors of ρ 1 and ρ 2 , respectively, and d is the dimension of the Hilbert space of the quantum encoding. The inner distance between the corresponding Bloch vectors represents the angle θ between the unit vectors ( b ( 1 ) , 1 | b ( 1 ) | 2 ) and ( b ( 2 ) , 1 | b ( 2 ) | 2 ) , which is normalized to be 1:
D ^ G b ( 1 ) , b ( 2 ) = arccos b ( 1 ) · b ( 2 ) + ( 1 | b ( 1 ) | 2 ) ( 1 | b ( 2 ) | 2 ) π .
For pure states the inner distance corresponds to the Fubini-Study distance.
In Algorithm 1, the locality is imposed by running the kNN on the input space finding the training vectors that are closest to the test element; then there is the quantum encoding into pure states and a quantum-inspired classifier (Helstrom, PGM or geometric Helstrom) is locally executed over the restricted training set. In Algorithm 2, the test element and all the training elements are encoded into Bloch vectors of pure states then a kNN is run w.r.t. the Bures distance to find the nearest neighbors in the space of the quantum representation; then a quantum-inspired classifier is executed with the training instances corresponding to the closest quantum states.
Algorithm 1 Local quantum-inspired classification based on kNN in the input space before the quantum encoding. The distance can be: Euclidean, Manhattan, Chessboard, Canberra or Bray–Curtis.
Require: Dataset X of labeled instances, unlabeled point x ^
Ensure: Label of x ^
     find the k nearest neighbors x 1 , , x k to x ^ in X w.r.t. the Euclidean distance
     encode x ^ into a pure state ρ x ^
     for  j = 1 , , k   do
         encode x j into a pure state ρ x j
     end for
     run the quantum-inspired classifier with training points encoded into { ρ x j } j = 1 , , k .
Algorithm 2 Local quantum-inspired classification based on kNN in the Bloch representation after quantum encoding. The distance can be: Bures, Super-Fidelity or Inner.
Require: Dataset X of labeled instances, unlabeled point x ^
Ensure: Label of x ^
     encode x ^ into a Bloch vector b ( x ^ ) of a pure state
     for  x X   do
           encode x into a Bloch vector b ( x ) of a pure state
     end for
     find the k nearest neighbors to b ( x ^ ) in { b ( x ) } x X w.r.t. the distance D B
     run the quantum-inspired classifier over the k nearest neighbors.
A local quantum-inspired classifier can be defined without quantum state discrimination but considering a nearest mean classification such as the following: after the quantum encoding we perform a kNN selection and calculate the centroid of each class considering only the nearest neighbors to the test element, finally we assign the label according to the nearest centroid as schematized in Algorithm 3.
Algorithm 3 Local quantum-inspired nearest mean classifier.
Require: Training set X divided into n classes C i , unlabeled point x ^
Ensure: Label of x ^
    encode x ^ into a Bloch vector b ( x ^ ) of a pure state
    for  x X   do
            encode x into a Bloch vector b ( x ) of a pure state
    end for
    find the neighborhood K = { b ( x 1 ) , , b ( x k ) } of b ( x ^ ) w.r.t. the distance D B
    for  i = 1 , , n do
            construct the centroid b ^ ( i ) = 2 d | C i k | x C i k b ( x ) where C i k : = { x C i : b ( x ) K }
    end for
    find the closest centroid b ^ ( l ) to 2 d b ( x ^ ) w.r.t. the distance D B
    return label of the class C l

4. Results and Discussion

In this section, we present some numerical results obtained by the implementation of the local quantum-inspired classifiers with several distances compared to well-known classical algorithms. In particular, we consider the SVM with different kernels: linear, radial basis function and sigmoid. Then, we run a random forest, a naive Bayes classifier and the logistic regression. In order to compare the results with previous papers, we take into account the following benchmark datasets from the PMLB public repository [16]: analcatdata_aids, analcatdata_asbestos, analcatdata_bankruptcy, analcatdata_boxing1, analcatdata_cyyoung9302, analcatdata_dmft, analcatdata_happiness, analcatdata_japansolvent, analcatdata_lawsuit, appendicitis, biomed, breast_cancer, iris, labor, new_thyroid, phoneme, prnn_fglass, prnn_synth, tae and wine_recognition. For each dataset we randomly select 80 % of the data to create a training set and use the residual 20 % for the evaluation. We repeated the same procedure 10 times and calculated the average accuracy in Table 1. Certainly, it is possible to compare the performances based on different statistic indices including Matthews correlation coefficient, F-measure and Cohen’s parameter.
We observe that the performances of the local quantum-inspired classifiers turn out to be definitely more accurate, where the hyperparameter k is set equal to the number of classes in the dataset. This value is reasonable to construct the centroids of the classes. In particular, Algorithm 1 with the Euclidean distance is the most accurate classifier for the datasets analcatdata_boxing1, analcatdata_happiness, biomed, prnn_fglass and wine_recognition, while the Manhattan distance is best for analcatdata_aids, analcatdata_japansolvent, breast_cancer, iris and tae, the Chessboard distance is best for analcatdata_cyyoung9302 and analcatdata_lawsuit, and the Bray–Curtis distance is best for analcatdata_bankruptcy and appendicitis. Algorithm 2 with the Bures distance outperforms Algorithm 1 and 3 for analcatdata_dmft and produces the same accuracy for labor. Algorithm 3 with the Bures distance is the most accurate classifier for analcatdata_asbestos, new_thyroid, phoneme and prnn_synth. Algorithm 1 uses a k-d tree in the training set, while the other two use a k-d tree in the corresponding Bloch vector space. The time complexity to construct the k-d tree is usually O ( d n log n ) , where n is the cardinality of the training set and d the length of each vector, while the space complexity is O ( d n ) . The query to find the k nearest neighbors takes O ( k log n ) . The time complexity of PGM is O ( c d 3 ) and is O ( d m ) for the classification of the m elements of the test set in c classes. Our algorithm is more efficient than the one presented in [4] in the presence of multiple copies because it remove nulls and duplicates. In particular, we consider only 20 values instead of 81 matrix elements of ρ ( x 1 , x 2 ) ρ ( x 1 , x 2 ) , 51 values instead of 729 for ρ ( x 1 , x 2 ) ρ ( x 1 , x 2 ) ρ ( x 1 , x 2 ) and so on. In a future paper, we will analyze in detail the complexity of such algorithms in the average case and in the worst case. For instance, one can construct the ball tree for clustered data instead of the k-d tree and consider different search techniques.
In Table 2, we show the methods that provided the best accuracy, with the respective execution times, compared with the classical method. These experimental results are promising and show that the methods are efficient when run on classical computers. Algorithm 3 with the Bures distance is not efficient for phoneme, but Algorithm 1 with the Euclidean distance is: 1.951 s with an average accuracy of 0.897 . We will study in a future work how to also apply the local methods in implementations on quantum computers.
Let us focus on multi-class datasets for the comparison with the kNNSVM method proposed by Blanzieri and Melgani [2]. This method requires the choice of the hyperparameter k, and as is well known from the standard kNN algorithm, there is no general strategy to choose k a priori. In Table 3, the results obtained for some k values of the kNNSVM are shown. For analcatdata_dmft, kNNSVM presents an average accuracy that is only 2% lower than Algorithm 2 but requires 17 elements per test element instead of 6. For analcatdata_happiness, kNNSVM yields an average accuracy that is 10% lower than Algorithm 1 and requires 14 elements per test element instead of 3. However, kNNSVM outperforms local quantum-inspired classifiers for iris and tae, but only for the latter requires fewer elements, while for wine_recognition they are comparable. For new_thyroid and prnn_fglass, the best results are obtained with the nearest neighbor method, but with lower accuracy than Algorithms 1 and 3, respectively.

5. Conclusions

The present paper focuses on the implementation of classification algorithms based on quantum state discrimination. A novel contribution is the local approach adopted to execute the classifier, not over the entire training set, but in a neighborhood of the test element. Once partitioned, for the training set the k nearest data elements are encoded into Bloch vectors and used to define the quantum centroid of each class.
The local quantum-inspired classifier considered, for reasonable values of the hyperparameters, was found to be a method with performance comparable to classical algorithms for multi-class classification. We performed some experiments using benchmark datasets and found that local quantum-inspired classifiers were even more accurate than SVM with different kernels, a random forest, a naive Bayes classifier and the logistic regression classification algorithm.
The present proposal offers a family of classifiers. In fact, several strategies to impose a notion of locality over a training set, and several procedures of quantum state discrimination, can be applied. Both the local approach to classification and the quantum-inspired data encoding/processing deserve further investigation to clarify the impact of these ideas on machine learning, but the results achieved clearly indicate that both approaches to machine learning are promising.

Author Contributions

Conceptualization, R.L.; methodology, C.B.; software, R.L.; validation, C.B.; formal analysis, R.L.; writing—original draft preparation, R.L.; writing—review and editing, R.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The code is also available at the following repository: https://github.com/leporini/classification (accessed on 21 February 2023). Penn Machine Learning Benchmarks https://github.com/EpistasisLab/pmlb (accessed on 21 February 2023).

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
SVMSupport vector machine
PGMPretty good measurement
NNNearest neighbors

References

  1. Leporini, R.; Pastorello, D. An efficient geometric approach to quantum-inspired classifications. Sci. Rep. 2022, 12, 8781. [Google Scholar] [CrossRef] [PubMed]
  2. Blanzieri, E.; Melgani, F. An Adaptive SVM Nearest Neighbor Classifier for Remotely Sensed Imagery. In Proceedings of the IEEE International Conference on Geoscience and Remote Sensing Symposium, Denver, CO, USA, 31 July–4 August 2006; pp. 3931–3934. [Google Scholar]
  3. Leporini, R.; Pastorello, D. Support Vector Machines with Quantum State Discrimination. Quantum Rep. 2021, 3, 482–499. [Google Scholar] [CrossRef]
  4. Giuntini, R.; Holik, F.; Park, D.K.; Freytes, H.; Blank, C.; Sergioli, G. Quantum-inspired algorithm for direct multi-class classification. Appl. Soft Comput. 2023, 134, 109956. [Google Scholar] [CrossRef]
  5. Montiel, O.; Rubio, Y.; Olvera, C.; Rivera, A. Quantum inspired acromyrmex evolutionary algorithm. Sci. Rep. 2019, 9, 12181. [Google Scholar] [CrossRef] [Green Version]
  6. Mugel, S.; Kuchkovsky, C.; Sánchez, E.; Fernández-Lorenzo, S.; Luis-Hita, J.; Lizaso, E.; Orús, R. Dynamic portfolio optimization with real datasets using quantum processors and quantum-inspired tensor networks. Phys. Rev. Res. 2022, 4, 013006. [Google Scholar] [CrossRef]
  7. Mugel, S.; Lizaso, E.; Orús, R. Use cases of quantum optimization for finance. In Credible Asset Allocation, Optimal Transport Methods, and Related Topics; Sriboonchitta, S., Kreinovich, V., Yamaka, W., Eds.; Springer International Publishing: Cham, Switzerland, 2022; pp. 211–220. [Google Scholar]
  8. Chen, J.; Cheng, H.; Freericks, J.K. Quantum-inspired algorithm for the factorized form of unitary coupled cluster theory. J. Chem. Theory Comput. 2021, 17, 841–847. [Google Scholar] [CrossRef] [PubMed]
  9. Alvarez-Alvarado, M.S.; Alban-Chacón, F.E.; Lamilla-Rubio, E.A.; Rodríguez-Gallegos, C.D.; Velásquez, W. Three novel quantum-inspired swarm optimization algorithms using different bounded potential fields. Sci. Rep. 2021, 11, 11655. [Google Scholar] [CrossRef] [PubMed]
  10. Helstrom, C.W. Quantum detection and estimation theory. J. Stat. Phys. 1969, 1, 231–252. [Google Scholar] [CrossRef] [Green Version]
  11. Mochon, C. Family of generalized pretty good measurements and the minimal-error pure-state discrimination problems for which they are optimal. Phys. Rev. A 2006, 73, 032328. [Google Scholar] [CrossRef] [Green Version]
  12. Bae, J. Structure of minimum-error quantum state discrimination. New J. Phys. 2013, 15, 073037. [Google Scholar] [CrossRef] [Green Version]
  13. Bae, J.; Kwek, L.C. Quantum state discrimination and its applications. J. Phys. A 2015, 48, 083001. [Google Scholar] [CrossRef] [Green Version]
  14. Kimura, G.; Kossakowski, A. The Bloch-Vector Space for N-Level Systems: The Spherical-Coordinate Point of View. Open Syst. Inf. Dyn. 2005, 12, 207–229. [Google Scholar] [CrossRef] [Green Version]
  15. Miszczak, J.A.; Puchała, Z.; Horodecki, P.; Uhlmann, A.; Życzkowski, K. Sub- and super-fidelity as bounds for quantum fidelity. Quantum Inf. Comput. 2009, 9, 103–130. [Google Scholar] [CrossRef]
  16. Romano, J.D.; Le, T.T.; La Cava, W.; Gregg, J.T.; Goldberg, D.J.; Chakraborty, P.; Ray, N.L.; Himmelstein, D.; Fu, W.; Moore, J.H. PMLB v1.0: An open source dataset collection for benchmarking machine learning methods. arXiv 2020, arXiv:2012.00058. [Google Scholar] [CrossRef] [PubMed]
Table 1. Classification comparison, in terms of test average accuracy for 10 runs, for benchmark datasets.
Table 1. Classification comparison, in terms of test average accuracy for 10 runs, for benchmark datasets.
Dataset/MethodEuclideanManhattanChessboardCanberraBray–CurtisBuresNearest Mean
analcatdata_aids 0.6 0.62 0.53 0.55 0.51 0.46 0.46
analcatdata_asbestos 0.794 0.794 0.794 0.782 0.794 0.794 0.8
analcatdata_bankruptcy 0.87 0.87 0.83 0.85 0.88 0.87 0.87
analcatdata_boxing1 0.725 0.721 0.696 0.688 0.721 0.688 0.646
analcatdata_cyyoung9302 0.811 0.794 0.844 0.789 0.811 0.817 0.833
analcatdata_dmft 0.201 0.199 0.198 0.191 0.202 0.209 0.18
analcatdata_happiness 0.475 0.458 0.383 0.433 0.475 0.45 0.325
analcatdata_japansolvent 0.79 0.82 0.8 0.74 0.81 0.77 0.74
analcatdata_lawsuit 0.979 0.974 0.981 0.968 0.974 0.972 0.976
appendicitis 0.866 0.811 0.867 0.876 0.886 0.838 0.857
biomed 0.917 0.91 0.9 0.898 0.91 0.91 0.91
breast_cancer 0.698 0.707 0.696 0.674 0.695 0.691 0.704
iris 0.943 0.957 0.95 0.947 0.947 0.943 0.937
labor 0.927 0.927 0.836 0.909 0.936 0.936 0.936
new_thyroid 0.974 0.981 0.96 0.956 0.977 0.97 0.984
phoneme 0.897 0.897 0.895 0.893 0.897 0.898 0.903
prnn_fglass 0.746 0.693 0.7 0.685 0.671 0.72 0.724
prnn_synth 0.874 0.864 0.896 0.856 0.878 0.894 0.9
tae 0.58 0.583 0.573 0.543 0.577 0.58 0.56
wine_recognition 0.989 0.983 0.975 0.967 0.972 0.981 0.981
Dataset/MethodLinearRadial Basis FunctionPolynomialSigmoidRandom ForestNaive BayesLogistic Regression
analcatdata_aids 0.45 0.45 0.49 0.45 0.42 0.49 0.56
analcatdata_asbestos 0.7 0.724 0.694 0.735 0.694 0.735 0.724
analcatdata_bankruptcy 0.79 0.79 0.81 0.7 0.78 0.77 0.81
analcatdata_boxing1 0.621 0.654 0.65 0.629 0.713 0.658 0.654
analcatdata_cyyoung9302 0.778 0.756 0.767 0.7 0.778 0.75 0.817
analcatdata_dmft 0.175 0.175 0.17 0.168 0.18 0.182 0.191
analcatdata_happiness 0.35 0.3 0.325 0.283 0.358 0.35 0.442
analcatdata_japansolvent 0.77 0.77 0.76 0.72 0.75 0.8 0.75
analcatdata_lawsuit 0.955 0.953 0.955 0.904 0.955 0.945 0.964
appendicitis 0.852 0.857 0.842 0.743 0.867 0.848 0.867
biomed 0.857 0.857 0.864 0.838 0.857 0.881 0.857
breast_cancer 0.656 0.656 0.668 0.651 0.675 0.672 0.66
iris 0.94 0.927 0.93 0.87 0.933 0.923 0.937
labor 0.782 0.854 0.827 0.636 0.827 0.855 0.845
new_thyroid 0.963 0.958 0.947 0.958 0.949 0.947 0.96
phoneme 0.75 0.821 0.829 0.746 0.883 0.766 0.745
prnn_fglass 0.541 0.51 0.559 0.483 0.666 0.629 0.568
prnn_synth 0.85 0.85 0.844 0.848 0.846 0.844 0.86
tae 0.446 0.403 0.43 0.273 0.45 0.477 0.457
wine_recognition 0.978 0.978 0.978 0.947 0.975 0.975 0.983
Table 2. Classification comparison, in terms of test average accuracy for 10 runs, for benchmark datasets. For quantum-inspired methods we show the average execution time in seconds on an Intel i7-9750H CPU @ 2.60GHz 6-core with 32 GB RAM, and for classical methods the training time and evaluation time per test set element.
Table 2. Classification comparison, in terms of test average accuracy for 10 runs, for benchmark datasets. For quantum-inspired methods we show the average execution time in seconds on an Intel i7-9750H CPU @ 2.60GHz 6-core with 32 GB RAM, and for classical methods the training time and evaluation time per test set element.
DatasetMethodAccuracyExecution timeMethodAccuracyTraining timeEvaluation Time
analcatdata_aidsManhattan 0.62 0.013 LogisticRegression 0.56 0.942 0.0018
analcatdata_asbestosNearestMean 0.8 0.018 Sigmoid 0.735 1.244 0.0015
analcatdata_bankruptcyBrayCurtis 0.88 0.019 Polynomial 0.81 0.893 0.0015
analcatdata_boxing1Euclidean 0.725 0.021 RandomForest 0.713 0.351 0.0054
analcatdata_cyyoung9302Chessboard 0.844 0.074 LogisticRegression 0.817 1.201 0.0014
analcatdata_dmftBures 0.209 0.013 LogisticRegression 0.191 3.103 0.0018
analcatdata_happinessEuclidean 0.475 0.021 LogisticRegression 0.442 1.416 0.0024
analcatdata_japansolventManhattan 0.82 0.06 NaiveBayes 0.8 0.636 0.0045
analcatdata_lawsuitChessboard 0.981 0.074 LogisticRegression 0.964 2.243 0.0022
appendicitisBrayCurtis 0.886 0.094 RandomForest 0.867 0.657 0.0087
biomedEuclidean 0.917 0.194 NaiveBayes 0.881 0.682 0.0045
breast_cancerManhattan 0.707 0.339 RandomForest 0.675 0.832 0.009
irisManhattan 0.957 0.029 Linear 0.94 0.904 0.0015
laborBrayCurtis 0.936 0.013 NaiveBayes 0.855 0.318 0.0038
new_thyroidNearestMean 0.984 0.099 Linear 0.963 1.452 0.0015
phonemeNearestMean 0.903 39.204 RandomForest 0.883 2.157 0.0093
prnn_fglassEuclidean 0.746 0.378 RandomForest 0.666 1.646 0.01
prnn_synthNearestMean 0.9 0.12 LogisticRegression 0.86 2.323 0.0024
taeManhattan 0.583 0.148 NaiveBayes 0.477 3.801 0.0065
wine_recognitionEuclidean 0.989 0.394 LogisticRegression 0.983 2.426 0.003
Table 3. kNNSVM comparison, in terms of test average accuracy for 10 runs, for benchmark multi-class datasets.
Table 3. kNNSVM comparison, in terms of test average accuracy for 10 runs, for benchmark multi-class datasets.
Dataset/Method1NNSVM2NNSVM3NNSVM4NNSVM5NNSVM6NNSVM7NNSVM8NNSVM9NNSVM
analcatdata_dmft 0.186 0.198 0.198 0.15 0.177 0.187 0.187 0.187 0.196
analcatdata_happiness 0.242 0.317 0.325 0.375 0.383 0.383 0.375 0.375 0.35
iris 0.937 0.943 0.943 0.95 0.943 0.957 0.937 0.95 0.96
new_thyroid 0.974 0.965 0.958 0.96 0.958 0.965 0.949 0.951 0.96
prnn_fglass 0.72 0.71 0.658 0.678 0.683 0.678 0.666 0.673 0.676
tae 0.57 0.593 0.557 0.55 0.553 0.543 0.53 0.53 0.493
wine_recognition 0.986 0.981 0.986 0.989 0.989 0.989 0.983 0.983 0.986
Dataset/Method10NNSVM11NNSVM12NNSVM13NNSVM14NNSVM15NNSVM16NNSVM17NNSVM18NNSVM
analcatdata_dmft 0.197 0.196 0.2 0.202 0.193 0.185 0.197 0.204 0.204
analcatdata_happiness 0.358 0.4 0.317 0.408 0.433 0.425 0.35 0.375 0.4
iris 0.95 0.95 0.95 0.957 0.95 0.953 0.943 0.937 0.943
new_thyroid 0.951 0.937 0.963 0.944 0.944 0.949 0.96 0.953 0.955
prnn_fglass 0.676 0.641 0.685 0.659 0.663 0.654 0.693 0.678 0.685
tae 0.49 0.433 0.52 0.48 0.43 0.427 0.48 0.467 0.487
wine_recognition 0.983 0.972 0.983 0.972 0.964 0.967 0.989 0.969 0.975
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Bertini, C.; Leporini, R. Quantum-Inspired Applications for Classification Problems. Entropy 2023, 25, 404. https://doi.org/10.3390/e25030404

AMA Style

Bertini C, Leporini R. Quantum-Inspired Applications for Classification Problems. Entropy. 2023; 25(3):404. https://doi.org/10.3390/e25030404

Chicago/Turabian Style

Bertini, Cesarino, and Roberto Leporini. 2023. "Quantum-Inspired Applications for Classification Problems" Entropy 25, no. 3: 404. https://doi.org/10.3390/e25030404

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop