1. Introduction
Quantum-inspired algorithms exploit some of the advantages of quantum computing on classical hardware, providing new models of information storing and processing. In this respect, quantum formalism, based on linear operators on Hilbert spaces, is used as a rich mathematical machinery without the need for an underlying quantum system which physically realizes the computations. In particular, quantum-inspired machine learning is a field where this kind of algorithm is developed to accomplish machine learning tasks.
In the context of supervised learning, a general classification problem is defined as the assignment of labels to new data instances given a training set of already labeled data. For example, a label can be assigned to a new data instance on the basis of its distance to the training data within a metric space where data are represented. In many relevant cases, data instances are represented in a real vector space, called feature space, equipped with the Euclidean distance. Classification with quantum computers is a widely investigated topic (e.g., [
1,
2,
3]), but the quantum-inspired paradigm can also be applied. Some quantum-inspired classification algorithms based on a geometric approach have recently been presented in [
4] and compared with well-known classical methods.
This paper is devoted to the investigation of some quantum-inspired classification algorithms, based on the notion of pretty-good measurement, within a local approach. More precisely, we consider a Vonoroi-type tessellation of the dataset proposed in [
5] to classify an unlabeled instance without considering the entire dataset but, instead, only a neighborhood of the test point. Therefore, the present proposal involves investigation that goes beyond the quantum-inspired classifiers studied in [
4]. Here, we integrate the geometric approach to classification based on quantum discrimination with a
local strategy that has been successfully applied in [
5], but which has also been suggested as a promising path to improve classification in less recent proposals, such as [
6]. However, the notion of locality that we address is uniquely related to the selection of a neighborhood of the test point into the feature space and has nothing to do with the notion of
quantum non-locality, violation of Bell inequalities, or local realism via hidden variable formulations of quantum mechanics.
An ensemble of classifiers is a set that improves the performance of an individual classifier. These classifiers are trained with subsets of the original training set and integrated to achieve more accurate classification. Methods for constructing ensemble learning algorithms include boosting and bagging. The former learns sequentially: in each iteration it assigns higher weight to the observations misclassified by its predecessor. In the latter, different sample subsets are randomly drawn from the training dataset and each subset is used to train a basic learning model in parallel. The global decision is obtained by voting. Ensemble learning has been successfully used in diverse applications, such as text classification, speech recognition, sentiment analysis, protein-folding recognition, and streamflow forecasting [
7].
In our approach, the ensemble is built using a clustering method that generates a Voronoi diagram, which splits the space into regions defined by its representative. The relative quantum-inspired classifiers are structured on the encoding of the feature vectors into density operators and on techniques for estimating the distinguishability of quantum states, such as pretty-good measurement. The classification accuracy of these quantum-inspired classifiers can be improved by increasing, in terms of tensor products, the number of preparations of the quantum states that encode the feature vectors, at the cost of increasing the computational space and time.
In
Section 2, we recall the quantum encoding of data vectors into density operators and the quantum-inspired classification based on the well-known quantum state discrimination developed by Helstrom [
8] and others [
9].
Section 3 is focused on an encoding of feature vectors into Bloch vectors that scale efficiently, increasing the dimension of the feature space. In
Section 4, we introduce the local algorithm as a procedure to restrict the training set to the nearest points around each representative point of the tessellation, enabling the local and parallel execution of the quantum-inspired classifiers. We present an application of the Iris dataset for evaluating the impact of locality in quantum-inspired classification, comparing the performances of the proposed algorithm to classical methods. In
Section 5, concluding remarks are presented and future developments towards innovative techniques in machine learning are discussed.
2. Quantum-Inspired Classification
The first step of quantum-inspired classification is quantum encoding, that is, any procedure to encode classical information into quantum states. In this paper, we consider encodings of data vectors into density matrices on a Hilbert space
whose dimension depends on the dimension of the input space. Density matrices are the mathematical objects used to describe the physical states of quantum systems. A density matrix on
is a positive semidefinite operator
such that
.
Pure states are all the density matrices of the form
, with
, which are the rank-1 projectors that can be directly identified with unit vectors up to a phase factor. Let
be a density operator on a
d-dimensional Hilbert space
.
can be written in the following form:
where
are the standard generators of the special unitary group
, also called
generalized Pauli matrices, and
is the identity matrix of size
d. The vector
, with
, is the
Bloch vector associated to
which lies within the hypersphere of radius 1 in
. For
, the qubit case, the density matrices are in bijective correspondence to the points of the unit ball in
, where the pure states are in one-to-one correspondence with the points of the surface of the Bloch sphere. For
, the points contained in the unit hypersphere of
are not in bijective correspondence with density matrices on
, as in the case of a single qubit, so the Bloch vectors do not form a ball but a complicated convex body. However, any vector within the closed ball of radius
gives rise to a density operator [
10]. One can apply the Bloch representation of density matrices as an efficient quantum encoding as discussed in
Section 3.
Complex vectors of dimension
n can be encoded into density matrices of a
-dimensional Hilbert space
in the following way:
where
is the computational basis of
, identified as the standard basis of
. The map defined in (
2), called
amplitude encoding, encodes
into the pure state
, where the additional component of
stores the norm of
. Nevertheless, the quantum encoding
can be realized in terms of the Bloch vectors
, saving space resources. The improvement in memory occupation within the Bloch representation is evident when we take multiple tensor products
of a density matrix
, constructing a feature map to enlarge the dimension of the representation space [
4].
Quantum-inspired classifiers are based on quantum encoding of data vectors into density matrices, calculations of centroids, and various criteria of quantum state distinguishability, such as the Helstrom state discrimination [
11], the pretty-good measurement [
12], and the geometric construction of a minimum-error measurement [
13].
Let us briefly recall the notion of quantum state discrimination. Given a set of arbitrary quantum states with respective a priori probabilities
, in general, there is no measurement process that discriminates the states without errors. More formally, there does not exist a POVM, i.e., a collection
of positive semidefinite operators such that
, satisfying the following property:
when
for all
. The probability of a successful state discrimination of the states in
R performing the measurement
E is:
An interesting and useful task is finding the optimal measurement that maximizes the probability (
3). Helstrom provided a complete characterization of the optimal measurement
for
[
8].
can be constructed as follows: Let
=
be the
Helstrom observable, whose positive and negative eigenvalues are, respectively, collected in the sets
and
. Consider the two orthogonal projectors:
where
projects onto the eigenspace of
. The measurement
maximizes the probability (
3) that attains the
Helstrom bound .
Helstrom quantum state discrimination can be used to implement a quantum-inspired binary classifier with promising performances [
11]. Let
be a training set with
. Once a quantum encoding
has been selected, one can construct the quantum centroids
and
of the two classes
. Let
be the Helstrom measurement defined by the set
, where the probabilities attached to the centroids are
. The
Helstrom classifier applies the optimal measurement for the discrimination of the two quantum centroids to assign the label
y to a new data instance
, encoded into the state
, as follows:
A strategy to increase the accuracy in classification is given by the construction of the tensor product of
k copies of the quantum centroids
, enlarging the Hilbert space where data are encoded. The corresponding Helstrom measurement is
, and the Helstrom bound satisfies [
11]:
By enlarging the Hilbert space of the quantum encoding, one increases the Helstrom bound, obtaining a more accurate classifier. Since the Helstrom classifier is similar to a support vector machine (SVM) with linear kernel [
14], considering many copies of the encoding quantum states gives rise to a kernel trick. The corresponding computational cost is evident, but the space can be enlarged, saving time and space by means of the encoding into Bloch vectors because fewer significant features are used.
A popular method for quantum state discrimination for distinguishing more than two states is the square-root measurement, also known as pretty-good measurement, defined by:
where
. This method gives the optimal minimum-error when states satisfy certain symmetry properties [
12]. Clearly, to distinguish between
n centroids, we need a measurement with at most
n outcomes. It is sometimes optimal to avoid measurement and simply guess that the state is the a priori most likely state.
3. Bloch Representation
In quantum-inspired machine learning, the encoding of data instances into Bloch vectors of density operators turns out to be a useful geometric tool to reduce memory consumption in defining feature maps into higher dimensional spaces. Within the quantum encoding (
2), a real vector
is encoded in a projector operator
, on a
d-dimensional Hilbert space where
. For simplicity, we consider an input vector
and the corresponding projector operator
on
. By easy computations, one can see that the Bloch vector of
has null components:
Instead of using a matrix with nine real elements to represent
, memory occupation can be improved by considering only the non-zero components of the Bloch vector. In general, the technique of removing the components that are zero, or are repeated several times, allows reducing of the space and the calculation time, considering only the significant values that enable carrying out the classification. More precisely, the encoding of a real data vector
into the amplitudes of a pure state
enables an encoding of data into the corresponding Bloch vector
. As observed in the example given by (
8), the Bloch vector can be reduced to a vector of lesser dimension reducing the zero components. In higher dimension, a reduced Bloch representation can be obtained by also removing the redundant components, as discussed below. In this sense, representing real vectors into Bloch vectors, by means of the amplitude encoding, represents efficient quantum encoding.
In general, the definition of a quantum encoding is equivalent to selection of a feature map for representing data vectors into a space of higher dimension. In this sense, data representation into quantum states can be considered a way to perform kernel tricks. In the case of the considered quantum encoding
, in view of (
8) the non-linear explicit injective function
, to encode data into Bloch vectors can be defined as follows:
From a geometric point of view, the mapped feature vectors are points on the surface of a hyper-hemisphere. Within this representation, the centroids for the classification can be calculated as:
In general, such centroids are points inside the hypersphere that do not have an inverse image in terms of a density operator; however, they can be rescaled to Bloch vectors within the closed ball of radius .
To improve the accuracy of the classification, one can increase the dimension of the representation space providing k copies of the quantum states, in terms of a tensor product, encoding data instances and centroids into “redundant” density matrices . According to the quantum formalism, multiple copies of the states are described in a tensor product Hilbert space, whose dimension scales exponentially, with a strong impact in terms of computational space (from dimension to ) and time. However, following the geometric approach, we can take advantage of the Bloch representation of real data to deal with tensor products in an efficient way.
Let us consider the quantum encoding
introduced above. Assuming that we need to represent the data vectors into a higher dimensional space, then we define the new encoding
, taking two copies of the density matrix
. Following the same argument applied to reduce the Bloch vector (
8), we can remove the zero entries and the multiple entries from the Bloch vector of
to store and process a density matrix as a real vector of dimension 20 instead of a matrix of 81 elements (because there are 36 zero components and 24 duplicate values). Thus, the explicit function
for two copies of the density operators on
can be defined as follows:
Similarly, we can store only 51 values instead of 729 for three copies (because there are 351 zero components and 326 duplicate values), and so on. To clarify the application of the efficient Bloch representation in the general case, let us stress that the positions of the entries that can be removed are known a priori. However, one must also take into account high-precision numbers and track the propagation of the numerical error.
The calculation of the centroids for the classification can be related to the quantum encoding in different ways. Consider the amplitude encoding (
2) of a
d-dimensional real feature vector
into the pure state
, the centroids of the classes
of training points can be defined in the following alternative terms:
Quantum centroid ;
Quantum encoding of the classical centroid ;
Mean of the Bloch vectors that is not a Bloch vector in general;
Contracted centroid that is a Bloch vector itself.
In general, we have that and is not the Bloch vector of or . In the following, we choose as the definition of centroid in order to select the encoding that is less memory consuming and to also represent the centroids as density matrices to perform a meaningful quantum state discrimination.
4. Local Pretty-Good Classifiers
In this section, we introduce the local approach to quantum-inspired classification. More precisely, we consider the execution of the classifiers based on quantum state discrimination described in
Section 2 after selection of the training points. In particular, the local process in the feature space is based on the notion of Voronoi tessellation that we sketch in the following way: Let
be a metric space and
be a collection of non-empty subsets of
X called
generators. The
Voronoi cell is the set of points in
X defined by:
where
and
is called
Voronoi tessellation. The
Vonoroi vertices are the points that are equidistant to three or more generators. In our case,
X is a real vector space,
d is the Euclidean distance and the generators are single points.
Following the approach proposed in [
5], the training data is partitioned into voxels for every class. For any non-empty voxel of a class, the mean of the contained data points is calculated. The voxel means are the initial tessellation representatives of the classes and are passed through an expectation-maximization process to evenly distribute the generator points over the class instances. Thus, possible degenerate vertices are removed, and the generator points move closer to the centroids of the Voronoi cells. The obtained tessellations are merged into a single tessellation combining the generator points of every class. In Algorithm 1, line 1 refers to this construction of the Voronoi tessellation over the training set. For any generator point, there is the selection of the
k nearest neighbors (kNNs) w.r.t. the Euclidean distance (line 3) that are encoded into Bloch vectors using the representation described in
Section 3 (line 4). Then the contracted centroid of any class is constructed out to form the considered
k Bloch vectors (line 5). The corresponding
n density operators can be used to define the pretty-good measurement, according to (
7), obtaining a quantum state discrimination procedure for any cluster of the tessellation (line 6). Given an unlabeled point
, there is calculation of the
h nearest generator points in the input space (line 8) which correspond to as many pretty-good measurements. The unlabeled point is encoded into a pure state
, according to the amplitude encoding (
2). Then, each of the
h pretty-good measurements is run to attach
to the most likely class (line 11). Finally, the algorithm returns the label given by the majority voting.
Algorithm 1: Local pretty-good classifier hNNPGM. |
|
The locality is imposed in a two-fold way by running the k-nearest neighbors on the input space, finding the training vectors that are closest to any generator point of the tessellation and by the h generator points nearest to the unlabeled point . Then, there is the quantum encoding into pure states and h quantum-inspired classifiers are parallel executed over the restricted training sets. The considered method is a combination of a kNN (and a hNN) algorithm to the quantum-inspired classifier based on the pretty-good measurement. Since the parallel execution of the h pretty-good measurements selected on the basis of the closest generator points to the test instance, let us denote this local quantum-inspired classifier as hNNPGM.
The Iris dataset is a well-known multi-class classification dataset that we considered for the experimental implementation of the hNNPGM classifier. The considered dataset is characterized by five features and three classes where two classes are linearly separable (for this reason the SVMs with non-linear kernel do not perform better than the SVM with linear kernel, as reported in
Table 1). The dataset was randomly divided into a training set and a test set. The average accuracy is a useful index for comparison of our proposal to some popular classical methods. We considered the SVM with different kernels: linear, radial basis function, polynomial, and sigmoid. Then, we ran a random forest with 10 trees and maximum depth 5, a naive Bayes classifier, and the nearest neighbor algorithm. hNNPGM requires the choice of the hyperparameters
k and
h, as it is known from the standard kNN algorithm that there is no general strategy to choose
k a priori and hyperparameter tuning must be performed. In
Table 1, the results obtained for
and
for the average accuracy of the tested classifiers are shown. The value of
h is low to reduce the parallel execution of the pretty-good classifiers, and the value of
k is reasonable to construct the centroids of the three classes of the Iris dataset for any generator point of the tessellation. From the experiments, it was observed that the proposed local quantum-inspired classifier performed well against the classical competitor, having the best average accuracy.
5. Conclusions
The present paper focuses on some methods of quantum-inspired machine learning, in particular, classification algorithms based on quantum state discrimination. We adopted a geometric formulation in defining quantum encodings of classical data in terms of Bloch vectors of density operators, as in previous work [
4], which is a crucial procedure to save computational resources when defining feature maps into high dimensional spaces, considering multiple copies of the encoding quantum states. A novel contribution of the present paper is the local approach adopted to execute the classifier, not over the entire training set, but in a neighborhood of the test point after separate voxelization of the data classes, and the subsequent construction of a Vonoroi tessellation. Once partitioned, for the training set, for any generator point of the tessellation, the
k nearest data points are encoded into Bloch vectors and used to define the quantum centroid of each class as a contracted Bloch vector. Then a pretty-good measurement is constructed from the obtained centroids for any generator point. Finally, a selection is made of the
h nearest generator points to the test data instance to classify them by
h pretty-good classifiers in parallel. The final label is chosen by majority voting. The local quantum-inspired classifier considered, hNNPGM, for reasonable values of the hyperparameters, was implemented and was found to be a method with performance comparable to well-known classical algorithms for multiclass classification. We performed some experiments using the Iris dataset and found that hNNPGM was even more accurate than an SVM with different kernels, a random forest, a naive Bayes classifier and the NN classification algorithm. We do not consider the obtained results to be disruptive but, more cautiously, as evidence that local quantum-inspired classifiers can be considered to be an interesting kind of classification algorithm that can be investigated from the point of view of foundations and applications.
The present proposal, based on a local approach to quantum-inspired classification, offers a family of classifiers, rather than a single classification algorithm. In fact, several strategies to impose a notion of locality over a training set, and several procedures of quantum state discrimination, can be applied. Both the local approach to classification and the quantum-inspired data encoding/processing deserve careful scientific investigation to clarify the impact of these ideas on machine learning. In a future paper, we will present some numerical results obtained from the implementation of these local quantum-inspired classifiers.