Next Article in Journal
Entropy-Based Investigation on the Precipitation Variability over the Hexi Corridor in China
Next Article in Special Issue
The Poincaré Half-Plane for Informationally-Complete POVMs
Previous Article in Journal
Entropic Constitutive Relation and Modeling for Fourier and Hyperbolic Heat Conductions
Previous Article in Special Issue
Coherent Processing of a Qubit Using One Squeezed State
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:

Quantum Minimum Distance Classifier

Department of Electrical and Electronic Engineering, University of Cagliari, Cagliari 09123, Italy
Entropy 2017, 19(12), 659;
Submission received: 15 October 2017 / Revised: 21 November 2017 / Accepted: 29 November 2017 / Published: 1 December 2017
(This article belongs to the Special Issue Quantum Mechanics: From Foundations to Information Technologies)


We propose a quantum version of the well known minimum distance classification model called Nearest Mean Classifier (NMC). In this regard, we presented our first results in two previous works. First, a quantum counterpart of the NMC for two-dimensional problems was introduced, named Quantum Nearest Mean Classifier (QNMC), together with a possible generalization to any number of dimensions. Secondly, we studied the n-dimensional problem into detail and we showed a new encoding for arbitrary n-feature vectors into density operators. In the present paper, another promising encoding is considered, suggested by recent debates on quantum machine learning. Further, we observe a significant property concerning the non-invariance by feature rescaling of our quantum classifier. This fact, which represents a meaningful difference between the NMC and the respective quantum version, allows us to introduce a free parameter whose variation provides, in some cases, better classification results for the QNMC. The experimental section is devoted: (i) to compare the NMC and QNMC performance on different datasets; and (ii) to study the effects of the non-invariance under uniform rescaling for the QNMC.

1. Introduction

In recent years, we observed an increasing interest toward the use of quantum formalism in non-microscopic domains [1,2,3,4]. The idea is that the powerful predictive properties of quantum mechanics, used for describing the behavior of microscopic phenomena, turn out to be particularly beneficial also in non-microscopic domains. Indeed, the real power of quantum computing consists in exploiting the strength of particular quantum properties in order to implement algorithms which are much more efficient and faster than the respective classical counterpart. For this purpose, several non standard applications involving the quantum mechanical formalism have been proposed, in research fields such as game theory [5], economics [6], cognitive sciences [7], signal processing [8], and so on. Further, particular applications, interesting for the specific topics of the present paper, concern the areas of machine learning and pattern recognition.
Quantum machine learning aims at using quantum computation advantages in order to find new solutions to pattern recognition and image understanding problems. Regarding this, we can find several efforts exploiting quantum information properties for the resolution of pattern recognition problems in [9], while a detailed overview concerning the application of quantum computing techniques to machine learning is presented in [10].
In this context, there exist different approaches involving the use of quantum formalism in pattern recognition and machine learning. We can find, for instance, procedures that exploit quantum properties in order to reach advantages on a classical computer [11,12,13] or techniques supposing the existence of a quantum computer in order to perform in an inherently parallel way all the required operations, taking advantage of quantum mechanical effects and providing high performance in terms of computational efficiency [14,15,16].
One of the main aspects of pattern recognition is focused on the application of quantum information processing methods [17] to solve classification and clustering problems [18,19].
The use of quantum states for representing patterns has a twofold motivation: as already discussed, first of all it permits the exploitation of quantum algorithms for enhancing the computational efficiency of the classification procedure. Secondly, it is possible to use quantum-inspired models in order to reach some benefits with respect to classical problems. With regards to the first motivation, in [15,16], it was proved that the computation of distances between d-dimensional real vectors takes time O ( log d ) on a quantum computer, while the same operation on a classical computer is computationally much harder. Therefore, the introduction of a quantum algorithm for the purpose of classifying patterns based on our encoding gives potential advantages to rush the whole procedure.
Even if in literature we can find techniques proposing some kind of computational benefits [20], the main problem to find a more convenient encoding from classical to quantum objects is currently an open and interesting matter of debate [9,10]. Here, our contribution consists of constructing a quantum version of a minimum distance classifier in order to reach some convenience, in terms of the error in pattern classification, with respect to the corresponding classical model. We have already proposed this kind of approach in two previous works [21,22], where a “quantum counterpart” of the well known Nearest Mean Classifier (NMC) has been presented.
In both cases, the model is based on the introduction of two main ingredients: first, an appropriate encoding of arbitrary patterns into density operators; second, a distance between density matrices, representing the quantum counterpart of the Euclidean metric in the “classical” NMC. The main difference between the two previous works is the following one: (i) firstly [21], we tested our quantum classifier on two-dimensional datasets and we proposed a purely theoretical generalization to an arbitrary dimension; (ii) secondly [22], a new encoding for arbitrary n-dimensional patterns into quantum states has been proposed, and it was tested on different real-world and artificial two-class datasets. Anyway, in both cases we observed a significant improvement of the accuracy in the classification process. In addition, we found that, by using the encoding proposed in [22] and for two-dimensional problems only, the classification accuracy of our quantum classifier can be further improved by performing a uniform rescaling of the original dataset.
In this work we propose a new encoding of arbitrary n-dimensional patterns into quantum objects, extending both the theoretical model and the experimental results to multi-class problems, which preserves information about the norm of the original pattern. This idea has been inspired by recent debates on quantum machine learning [9], according to which it is crucial to avoid loss of information when a particular encoding of real vectors into quantum states is considered. Such an approach turns out to be very promising in terms of classification performance compared to the NMC. Further, differing from the NMC, our quantum classifier is not invariant under uniform rescaling. In particular, the classification error provided by the QNMC changes by feature rescaling. As a consequence, we observe that, for several datasets, the new encoding exhibits a further advantage that can be gained by exploiting the non-invariance under rescaling, and also for n-dimensional problems (conversely to the previous works). To this end, some experimental results have been presented.
The organization of this paper is as follows. In Section 2, the classification process and the formal structure of the NMC for multi-class problems are described. Section 3 is devoted to the definition of a new encoding of real patterns into quantum states. In Section 4, we introduce the quantum version of the NMC, called Quantum Nearest Mean Classifier (QNMC), based on the new encoding previously described. In Section 5, we show experimental results related to the NMC and QNMC comparison which generally exhibit better performance of our quantum classifier (in terms of error and other meaningful classification parameters) with respect to the NMC. Further, starting from the fact that the QNMC is not invariant under uniform coordinate rescaling (contrary to the corresponding classical version), we also show that for some datasets it is possible to provide a benefit from this non-invariance property. Finally, the last section includes conclusions and probable future developments.
The present work is an extended version of the paper presented at the conference Quantum and Beyond 2016, Vaxjo, 13–16 June 2016 [23], significantly enlarged in theoretical discussion, experimental section and bibliography.

2. Minimum Distance Classification

Pattern recognition [24,25] is the machine learning branch whose purpose is to design algorithms able to automatically recognize “objects”.
Here, we deal with supervised learning, whose goal is to infer a map from labeled training objects. The purpose of pattern classification, which represents one of the main tasks in this context, consists in assigning input data to different classes.
Each object is univocally identified by a set of features; in other words, we represent a d-feature object as a d-dimensional vector x = [ x ( 1 ) , , x ( d ) ] X , where X R d is generally a subset of the d-dimensional real space representing the feature space. Consequently, any arbitrary object is represented by a vector x associated with a given class of objects (but, in principle, we do not know which one). Let Y = { 1 , , L } be the class label set. A pattern is represented by a pair ( x , y ) , where x is the feature vector representing an object and y Y is the label of the class which x is associated with. A classification procedure aims at attributing (with high accuracy) to any unlabeled object the corresponding label (where the label attached to an object represents the class which the object belongs to), by learning about the set of objects whose class is known. The training set is given by S tr = { ( x n , y n ) } n = 1 N , where x n X , y n Y (for n = 1 , , N ) and N is the number of patterns belonging to S tr . Finally, let N l be the cardinality of the training set associated to the l-th class (for l = 1 , 2 , , L ) such that l = 1 L N l = N .
We now introduce the well known Nearest Mean Classifier (NMC) [24], which is a particular kind of minimum distance classifier widely used in pattern recognition. The strategy consists in computing the distances between an object x (to classify) and other objects chosen as prototypes of each class (called centroids). Finally, the classifier associates to x the label of the closest centroid. So, we can resume the NMC algorithm as follows:
  • Computation of the centroid (i.e., the sample mean [26]) associated to each class, whose corresponding feature vector is given by:
    μ l = 1 N l n = 1 N l x n , l = 1 , 2 , , L ,
    where l is the label of the class;
  • Classification of the object x , provided by:
    a r g m i n l = 1 , L d E ( x , μ l ) , with d E ( x , μ l ) = x μ l 2 ,
    where d E is the standard Euclidean distance. In this framework, argmin plays the role of classifier, i.e., a function that associates to any unlabeled object the correspondent label.
Generally, it could be that a pattern of a given class is closer to the centroid of another class. This fact can depend on the specific data distribution for instance. Consequently, if the algorithm would be applied to this pattern, it would fail. Hence, for an arbitrary object x which belongs to an a priori not known class, the classification method output has the following four possibilities [27]: (i) True Positive (TP): pattern belonging to the l-th class and correctly classified as l; (ii) True Negative (TN): pattern belonging to a class different than l, and correctly classified as not l; (iii) False Positive (FP): pattern belonging to a class different than l, and incorrectly classified as l; (iv) False Negative (FN): pattern belonging to the l-th class, and incorrectly classified as not l.
Generally, a given classification method is evaluated via a standard procedure which consists of dividing the original labeled dataset S of size N , into a set S tr of N training patterns and a set S ts of ( N N ) test patterns, i.e., S = S tr S ts where S ts is the test set [24], defined as S ts = { ( x n , y n ) } n = N + 1 N .
As a consequence, we can examine the classification algorithm performance by considering the following statistical measures associated to each class l depending on the quantities listed above:
  • True Positive Rate (TPR): TPR = TP TP + FN ;
  • True Negative Rate (TNR): TNR = TN TN + FP ;
  • False Positive Rate (FPR): FPR = FP FP + TN = 1 TPN ;
  • False Negative Rate (FNR): FNR = FN FN + TP = 1 TPR .
Further, other standard statistical coefficients [27] used to establish the reliability of a classification algorithm are:
  • Classification error (E): E = 1 TP N N ;
  • Precision (P): P = TP TP + FP ;
  • Cohen’s Kappa (K): K = Pr ( a ) Pr ( e ) 1 Pr ( e ) , where
    Pr ( a ) = TP + TN N N , Pr ( e ) = ( TP + FP ) ( TP + FN ) + ( FP + TN ) ( TN + FN ) ( N N ) 2 .
The classification error represents the percentage of misclassified patterns, the precision is a measure of the statistical variability of the considered model and the Cohen’s Kappa represents the degree of reliability and accuracy of a statistical classification and it can assume values ranging from 1 to + 1 . In particular, if K = + 1 (K = 1 ), we correctly (incorrectly) classify all the test set patterns. Let us note that these statistical coefficients have to be computed for each class. Then, the final value of each statistical coefficient related to the classification algorithm is the weighted sum of the statistical coefficients of each class.

3. Mapping Real Patterns into Quantum States

As already discussed, quantum mechanical formalism seems to be promising in non-standard scenarios, in our case to solve for instance pattern classification tasks. To this end, in order to provide our quantum classification model, the first ingredient we have to introduce is an appropriate encoding of real patterns into quantum states. Quoting Schuld et al. [9], “in order to use the strengths of quantum mechanics without being confined by classical ideas of data encoding, finding ‘genuinely quantum’ ways of representing and extracting information could become vital for the future of quantum machine learning.”
Generally, given a d-dimensional feature vector, there exist different ways to encode it into a density operator [9]. As already mentioned, finding the “best” encoding of real vectors into quantum states (i.e., outperforming all the possible encodings for any dataset) is still an open and intricate problem. This fact is not so surprising because, on the other hand, in pattern recognition is not possible to establish an absolute superiority of a given classification method with respect to the other ones, and the reason is that each dataset has unique and specific characteristics (this point will be deepened in the numerical section).
In [21], the proposed encoding was based on the use of the stereographic projection [28]. In particular, it uniquely maps a point r = ( r 1 , r 2 , r 3 ) on the surface of a radius-one sphere S 2 (except for the north pole) into a point x = [ x ( 1 ) , x ( 2 ) ] in R 2 , i.e.,
S P : ( r 1 , r 2 , r 3 ) r 1 1 r 3 , r 2 1 r 3 = [ x ( 1 ) , x ( 2 ) ] ,
whose image plane passes through the center of the sphere. The inverse of the stereographic projection is:
S P 1 : [ x ( 1 ) , x ( 2 ) ] 2 x ( 1 ) x 2 + 1 , 2 x ( 2 ) x 2 + 1 , x 2 1 x 2 + 1 = ( r 1 , r 2 , r 3 ) ,
where x 2 = [ x ( 1 ) ] 2 + [ x ( 2 ) ] 2 . By imposing that r 1 = 2 x ( 1 ) x 2 + 1 , r 2 = 2 x ( 2 ) x 2 + 1 , r 3 = x 2 1 x 2 + 1 , we consider r 1 , r 2 , r 3 as Pauli components of the density operator ρ x Ω 2 (where the space Ω d of density operators for d-dimensional systems consists of positive semidefinite matrices with unitary trace) associated to the pattern x = [ x ( 1 ) , x ( 2 ) ] , defined as:
ρ x = 1 2 1 + r 3 r 1 i r 2 r 1 + i r 2 1 r 3 = 1 x 2 + 1 x | | 2 x ( 1 ) i x ( 2 ) x ( 1 ) + i x ( 2 ) 1 .
The proposed encoding offers the advantage of visualizing a bi-dimensional vector on the Bloch sphere [21]. In the same work, we also introduced a generalization of our encoding to the d-dimensional case, which allows to represent d-dimensional vectors as points on the hypersphere S d by writing a density operator ρ as a linear combination of the d-dimensional identity and d 2 1 ( d × d ) -matrices { σ i } (i.e., generalized Pauli matrices [29,30]).
To this end, we introduced the generalized stereographic projection [31], which maps any point r = ( r 1 , , r d + 1 ) S d into an arbitrary point x = [ x ( 1 ) , , x ( d ) ] R d , i.e.,
S P : ( r 1 , , r d + 1 ) r 1 1 r d + 1 , r 2 1 r d + 1 , , r d 1 r d + 1 = [ x ( 1 ) , , x ( d ) ] .
However, even if it is possible to map points on the d-hypersphere into d-feature patterns, they are not density operators as a rule and the one-to-one correspondence between them and density matrices is guaranteed only on particular regions [29,32,33].
An alternative encoding of a d-feature vector x into a density operator was proposed in [22]. It is obtained by: (i) by mapping x R d into a ( d + 1 )-dimensional vector x R d + 1 according to the generalized version of Equation (4), i.e.,
S P 1 : [ x ( 1 ) , , x ( d ) ] 1 x 2 + 1 2 x ( 1 ) , , 2 x ( d ) , x 2 1 = ( r 1 , , r d + 1 ) ,
where x 2 = i = 1 d [ x ( i ) ] 2 ; (ii) by considering the projector ρ x = x · ( x ) T .
Here, a different kind of quantum minimum distance classifier is considered, based on a new encoding again and we show that it exhibits interesting improvements by also exploiting the non-invariance under feature rescaling. Accordingly with [9,15], when a real vector is encoded into a quantum state, in order to avoid a loss of information it is important that the quantum state keeps information on the original real vector norm. In light of this fact, we introduce the following alternative encoding.
Let x = [ x ( 1 ) , , x ( d ) ] R d be a d-dimensional vector.
  • We map the vector x R d into a vector x R d + 1 , whose first d features are the components of the vector x and the ( d + 1 ) -th feature is the norm of x . Formally:
    x = [ x ( 1 ) , , x ( d ) ] x = [ x ( 1 ) , , x ( d ) , x ] .
  • We obtain the vector x by dividing the first d components of the vector x for x :
    x x = x ( 1 ) x , , x ( d ) x , x .
  • We compute the norm of the vector x , i.e., x = x 2 + 1 and we map the vector x into the normalized vector x as follows:
    x x = x x = x ( 1 ) x | | x | | 2 + 1 , , x ( d ) x | | x | | 2 + 1 , x | | x | | 2 + 1 .
Now, we provide the following definition.
Definition 1 (Density Pattern).
Let x = [ x ( 1 ) , , x ( d ) ] be a d-dimensional vector and ( x , y ) the corresponding pattern. Then, the density pattern associated with ( x , y ) is represented by the pair ( ρ x , y ) , where the matrix ρ x , corresponding to the feature vector x , has the following form:
ρ x x · ( x ) ,
where the vector x is given by Equation (10) and y is the label of the original pattern.
Hence, this encoding maps real d-dimensional vectors x into ( d + 1 ) -dimensional pure states ρ x . In this way, we obtain an encoding that takes into account the information about the initial real vector norm and, at the same time, allows to easily encode arbitrary real d-dimensional vectors.
Clearly, there exist different ways to encode patterns into quantum states by maintaining some information about the vector norm. However, the one we show has been inspired by simple considerations concerning the two-dimensional encoding on the Bloch sphere, naturally extended to the d-dimensional case. To this end, in [21] it was analytically proved that the encoding of x = [ x ( 1 ) , x ( 2 ) ] into the density operator ρ x given by Equation (5) can be exactly recovered if we consider as starting point the vector [ x ( 1 ) + i x ( 2 ) , x ] and by applying the set of transformations given by Equations (9)–(11).

4. Density Pattern Classification

In this section, a quantum counterpart of the NMC is provided, named Quantum Nearest Mean Classifier (QNMC). It can be seen as a particular kind of minimum distance classifier between quantum objects (i.e., density patterns). First of all, the use of this new quantum formalism could provide potential advantages in reducing the computational complexity of the problem if we consider a possible implementation of our framework on a quantum computer (as already explained in the Introduction). Secondly, it permits to fully compare the NMC and the QNMC performance by using a classical computer only. About the second point, we reiterate that our aim is not to assert that the QNMC outperforms all the other supervised classical procedures, but to prove (as we will show by numerical simulations) that it performs better than its “natural” classical counterpart (i.e., the NMC).
In order to provide a quantum counterpart of the NMC, we need: (i) an encoding from real patterns to quantum objects (defined above); (ii) a quantum version of the classical centroid (i.e., a sort of quantum class prototype), that will be named quantum centroid; and (iii) an appropriate quantum distance between density patterns, corresponding to the Euclidean metric for the NMC. In such a quantum framework, the quantum version S q of the dataset S is given by:
S q = S tr q S ts q , S tr q = { ( ρ x n , y n ) } n = 1 N , S ts q = { ( ρ x n , y n ) } n = N + 1 N ,
where ( ρ x n , y n ) is the density pattern associated to the pattern ( x n , y n ) . Consequently, S tr q and S ts q represent the quantum versions of the training and test set respectively, i.e., the sets of all the density patterns corresponding to the patterns in S tr and S ts . Now, we can naturally define the quantum version of the classical centroid μ l , given in Equation (1).
Definition 2 (Quantum Centroid).
Let S q be a labeled dataset of N density patterns such that S tr q S q is a training set composed of N density patterns. Further, let Y = { 1 , 2 , , L } be the class label set. The quantum centroid of the l-th class is given by:
ρ l = 1 N l n = 1 N l ρ x n , l = 1 , , L ,
where N l is the number of density patterns of the l-th class in S tr q , such that l = 1 L N l = N .
Let us stress that the quantum centroids are generally mixed states and we cannot get them by mapping the classical centroids μ l , i.e.,
ρ l ρ μ l , l { 1 , , L } .
Therefore, the quantum centroid has a completely new meaning because it is no longer a pure state and does not have any classical counterpart. This is the main reason that establishes the deep difference between both classifiers. At this purpose, it is easy to verify [21] that, unlike the classical case, the expression of the quantum centroid is sensitive to the dataset dispersion.
Now, we recall the definition of trace distance between quantum states (see, e.g., [34]), which can be considered as a suitable metric between density patterns.
Definition 3 (Trace Distance).
Let ρ 1 and ρ 2 be two arbitrary density operators belonging to the same dimensional Hilbert space. The trace distance between ρ 1 and ρ 2 is:
d T ( ρ 1 , ρ 2 ) = 1 2 T r | ρ 1 ρ 2 | ,
where | A | = A A .
Clearly d T , as the true metric for density operators, satisfies the standard properties of positivity, symmetry and triangle inequality. The use of the trace distance in our quantum framework is naturally motivated by the fact that it is the simplest possible choice among other possible metrics in the density matrix space [35]. Consequently, it can be seen as the “authentic” quantum counterpart of the Euclidean distance, which represents the simplest choice in the starting space. However, the trace distance exhibits some limitations and downsides (in particular, it is monotone but not Riemannian [36]). On the other hand, the Euclidean distance in some pattern classification problems is not enough to fully capture for instance the dataset distribution. For this reason, other kinds of metrics in the classical space are adopted to avoid this limitation [24]. To this end, as a future development of the present work, it could be interesting to compare different distances in both quantum and classical framework, able to treat more complex situations (we will deepen this point in the conclusions).
We are ready to introduce the QNMC procedure consisting, as the classical one, of the following steps:
  • Constructing the sets S tr q , S ts q by mapping each pattern of the sets S tr , S ts via the encoding introduced in Definition 1;
  • Calculating the quantum centroids ρ l ( l { 1 , L } ), by using the quantum training set S tr q , in accordance with Definition 2;
  • Classifying a density pattern ρ x S ts q by means of the optimization problem:
    a r g m i n l = 1 , , L d T ( ρ x , ρ l ) ,
    where d T is the trace distance introduced in Definition 3.

5. Experimental Results

This section is devoted to showing a comparison between the NMC and the QNMC performances in terms of the statistical coefficients introduced in Section 2. We use both classifiers to analyze twenty-seven datasets, divided into two categories: artificial datasets (Gaussian (I), Gaussian (II), Gaussian (III), Moon, Banana) and the remaining ones which are real-world datasets, extracted both from the UCI (UC Irvine Machine Learning Repository) [37] and KEEL (Knowledge Extraction based on Evolutionary Learning) [38] repositories. Further, among them we can find also imbalanced datasets, whose main characteristic is that the number of patterns in a given class is significantly lower than those belonging to the other classes. Let us note that, in real situations, we usually deal with data whose distribution is unknown, then the most interesting case is the one in which we use real-world datasets. However, the use of artificial datasets following known distribution, and in particular Gaussian distributions with specific parameters, can help to catch precious information.

5.1. Comparison between QNMC and NMC

In Table 1 we summarize the characteristics of the datasets involved in our experiments. In particular, for each dataset we list the total number of patterns, the number of each class and the number of features. Let us note that, although we mostly confine our investigation to two-class datasets, our model can be easily extended to multi-class problems (as we show for the three-class datasets Balance, Gaussian (III), Hayes-Roth, Iris).
In order to make our results statistically significant, we apply the standard procedure which consists in randomly splitting each dataset into two parts, the training set (representing the 80 % of the original dataset) and the test set (representing the 20 % of the original dataset). Finally, we perform 10 runs for each dataset, with a random partition at each experiment. Let us stress that the results appear robust with respect to different partitions of the original dataset. Further, we consider only 10 runs because, for a greater number, the standard deviation of the classification error mean value is substantially the same.
In Table 2, we report the QNMC and NMC performance for each dataset, evaluated in terms of mean value and standard deviation (computed on ten runs) of the statistical coefficients, discussed in the previous section. For the sake of simplicity, we omit the values of FPR and FNR because they can be easily obtained by TPR and TNR values (i.e., FPR = 1 − TNR, FNR = 1 − TPR).
We observe, by comparing QNMC and NMC performances (see Table 2), that the first provides a significant improvement with respect to the standard NMC in terms of all the statistical parameters we have considered. In several cases, the difference between the classification error for both classifiers is very high, up to 22 % (see Mutagenesis-Bond). Further, the new encoding, for two-feature datasets, provides better performance than the one considered in [21] (where the QNMC error with related standard deviation was 0.174 ± 0.047 for Moon and 0.419 ± 0.015 for Banana) and it generally exhibits a quite similar performance with respect to the one in [22] for multi-dimension datasets or a classification improvement of about 5 % , generally.
The artificial Gaussian datasets may deserve a brief comment. Let us discuss the way in which the three Gaussian datasets have been created. Gaussian (I) [39] is a perfectly balanced dataset (i.e., both classes have the same number of patterns), patterns have the same dispersion in both classes, and only some features are correlated [40]. Gaussian (II) is an unbalanced dataset (i.e., classes have a very different number of patterns), patterns do not exhibit the same dispersion in both classes and features are not correlated. Gaussian (III) is composed of three classes and it is an unbalanced dataset with different pattern dispersion in all the classes, where all the features are correlated.
For this kind of Gaussian data, we remark that the NMC does not offer the best performance in terms of pattern classification [24] because of the particular characteristics of the class distribution. Indeed, the NMC does not keep into consideration the pattern dispersion. Conversely, by looking at Table 2, the improvements of the QNMC seem to exhibit some kind of sensitivity of the classifier with respect to the data dispersion. A detailed description of this problem will be addressed in a future work.
Further, we can note that the QNMC performance is better also for imbalanced datasets (the most significant cases are Balance, Ilpd, Segment, Page, Gaussian (III)), which are usually difficult to deal with standard classification models. At this purpose, we can note that the QNMC exhibits a classification error much lower than the NMC, up to a difference of about 12 % . Another interesting and surprising result concerns the Iris0 dataset, which represents the imbalanced version of the Iris dataset: as we can observe looking at Table 2, our quantum classifier is able to perfectly classify all the test set patterns, conversely to the NMC.
We remark that, even if it is possible to establish whether a classifier is “good” or “bad” for a given dataset by the evaluation of some a priori data characteristics, generally it is no possible to establish an absolute superiority of a given classifier for any dataset, thanks to the No Free Lunch Theorem [24]. In any case, the QNMC seems to be particularly convenient when the data distribution is difficult to treat with the standard NMC.

5.2. Non-Invariance Under Rescaling

The final experimental results that we present in this paper regard a significant difference between NMC and QNMC. Let us suppose that all the components of the feature vectors x n ( n = 1 , , N ) belonging to the original dataset S are multiplied by the same parameter γ R , i.e., x n γ x n . Then, the whole dataset is subjected to an increasing dispersion (for | γ | > 1 ) or a decreasing dispersion (for | γ | < 1 ) and the classical centroids change according to μ l γ μ l ( l = 1 , , L ). Therefore, pattern classification for the rescaled problem consists of solving:
a r g m i n l = 1 , , L d E ( γ x n , γ μ l ) = γ a r g m i n l = 1 , , L d E ( x n , μ l ) , n = N + 1 , , N .
For any value of the parameter γ it can be proved [22] that, while the NMC is invariant under rescaling, for the QNMC this invariance fails. Interestingly enough, it is possible to consider the failure of the invariance under rescaling as a resource for the classification problem. In other words, through a suitable choice of the rescaling factor is possible, in principle, to get a decreasing of the classification error. To this end, we have studied the variation of the QNMC performance (in particular of the classification error) in terms of the free parameter γ and in Figure 1 the results for the datasets Appendicitis, Monk and Moon are shown. In the figure, each point represents the mean value (with corresponding standard deviation represented by the vertical bar) over ten runs of the experiments. Finally, we have considered, as an example, three different ranges of the rescaling parameter γ for each dataset. We can observe that the resulting classification performance strongly depends on the γ range. Indeed, in all the three cases we consider, we obtain completely different classification results based on different choices of the γ values. As we can see, in some situations we observe an improvement of the QNMC performance with respect to the unrescaled problem (subfigures (b), (c), (e), (h)), in other cases we get worse classification results (subfigures (a), (d), (g), (i)) and sometimes the rescaling parameter does not offer any variation of the classification error (subfigure (f)).
In conclusion, the range of the parameter γ for which the QNMC performance improves, is generally not unique and strongly depends on the considered dataset. As a consequence, we do not generally get an improvement in the classification process for any γ ranges. On the contrary, there exist some intervals for the parameter γ where the QNMC classification performance is worse than the case without rescaling. Then, each dataset has specific and unique characteristics (in complete accord to the No Free Lunch Theorem) and the incidence of the non-invariance under rescaling in the decreasing of the error, in general, should be determined by empirical evidences.

6. Conclusions and Future Developments

In this work we have introduced a quantum minimum distance classifier, named Quantum Nearest Mean Classifier, which can be seen as a quantum version of the well known Nearest Mean Classifier. In particular, it is obtained by defining a suitable encoding of real patterns, i.e., density patterns, and by recovering the trace distance between density operators.
A new encoding of real patterns into a quantum objects have been proposed, suggested by recent debates on quantum machine learning according to which, in order to avoid a loss of information caused by encoding a real vector into a quantum state, we need to consider the normalized vector keeping some information about its norm simultaneously. Secondly, we have defined the quantum centroid, i.e., the pattern chosen as the prototype of each class, which is not invariant under uniform rescaling of the original dataset (unlike the NMC) and seems to exhibit a kind of sensitivity to the data dispersion.
In the experiments, both classifiers have been compared in terms of significant statistical coefficients. In particular, we have considered 27 different datasets having different nature (real-world and artificial). Further, the non-invariance under rescaling of the QNMC has suggested to study the variation of the classification error in terms of a free parameter γ , whose variation produces a modification of the data dispersion and, consequently, of the classifier performance. In particular we have showed as, in the most of cases, the QNMC exhibits a significant decreasing of the classification error (and of the other statistical coefficients) with respect to the NMC and, for some cases, the non-invariance under rescaling can provide a positive incidence in the classification process.
Let us remark that, even if there is not an absolute superiority of QNMC with respect to the NMC, the proposed technique leads to relevant improvements in terms of pattern classification when we deal with an a priori knowledge of the data distribution.
In light of such considerations, further developments of the present work will involve the study of: (i) the optimal encoding (mapping patterns to quantum states) which ensures a better classification accuracy (at least for a finite set of data); (ii) a general method to find the suitable rescaling parameter range we can apply to a given dataset for further optimizing the classification process; and (iii) the data distribution for which our quantum classifier outperforms the NMC. Further, as discussed in Section 4, in some situations the standard NMC is not very useful as a classification model, especially when the dataset distribution is quite complex to deal with. In pattern recognition, in order to address such problems, other kinds of classification techniques are used instead of the NMC, for instance the well known Linear Discriminant Analysis (LDA) or Quadratic Discriminant Analysis (QDA) classifiers, where different distances between patterns are considered, taking the data distribution into account more precisely [24]. To this end, an interesting development of the present work could regard the comparison between the LDA or QDA models and the QNMC based on the computation of more suitable and convenient distances between density patterns [35].

Supplementary Materials

The following are available online at

Conflicts of Interest

The author declares no conflict of interest.


  1. Aerts, D.; Sozzo, S.; Veloz, T. Quantum structure of negation and conjunction in human thought. Front. Psychol. 2015, 6, 1447. [Google Scholar] [CrossRef] [PubMed]
  2. Ohya, M.; Volovich, I. Mathematical Foundations of Quantum Information and Computation and Its Applications to Nano- and Bio-Systems; Springer: Dordrecht, The Netherlands, 2011; ISBN 978-94-007-0170-0. [Google Scholar]
  3. Stapp, H.P. Mind, Matter, and Quantum Mechanics, 3rd ed.; Springer-Verlag: Berlin, Germany, 1993. [Google Scholar]
  4. Wang, B.; Zhang, P.; Li, J.; Song, D.; Hou, Y.; Shang, Z. Exploration of quantum interference in document relevance judgement discrepancy. Entropy 2016, 18, 144. [Google Scholar] [CrossRef]
  5. Eisert, J.; Wilkens, M.; Lewenstein, M. Quantum games and quantum strategies. Phys. Rev. Lett. 1999, 83, 3077. [Google Scholar] [CrossRef]
  6. Haven, E.; Khrennikov, A. Quantum Social Science; Cambridge University Press: Cambridge, UK, 2013; ISBN 978-1-107-01282-0. [Google Scholar]
  7. Veloz, T.; Desjardins, S. Unitary Transformations in the Quantum Model for Conceptual Conjunctions and Its Application to Data Representation. Front. Psychol. 2015, 6, 1734. [Google Scholar] [CrossRef] [PubMed]
  8. Eldar, Y.C.; Oppenheim, A.V. Quantum signal processing. IEEE Signal Process. Mag. 2002, 19, 12–32. [Google Scholar] [CrossRef]
  9. Schuld, M.; Sinayskiy, I.; Petruccione, F. An introduction to quantum machine learning. Contemp. Phys. 2015, 56, 172–185. [Google Scholar] [CrossRef]
  10. Manju, A.; Nigam, M.J. Applications of quantum inspired computational intelligence: A survey. Artif. Intell. Rev. 2014, 42, 79–156. [Google Scholar] [CrossRef]
  11. Horn, D.; Gottlieb, A. Algorithm for data clustering in pattern recognition problems based on quantum mechanics. Phys. Rev. Lett. 2001, 88, 018702. [Google Scholar] [CrossRef] [PubMed]
  12. Liu, D.; Yang, X.; Jiang, M. A Novel Text Classifier Based on Quantum Computation. In Proceedings of the 51th Annual Meeting of the Association for Computational Linguistics, Sofia, Bulgaria, 4–9 August 2013; pp. 484–488. [Google Scholar]
  13. Tanaka, K.; Tsuda, K. A quantum-statistical-mechanical extension of gaussian mixture model. J. Phys. Conf. Ser. 2008, 95, 012023. [Google Scholar] [CrossRef]
  14. Caraiman, S.; Manta, V. Image processing using quantum computing. In Proceedings of the 16th International Conference on System Theory, Control and Computing (ICSTCC), Sinaia, Romania, 12–14 October 2012. [Google Scholar]
  15. Rebentrost, P.; Mohseni, M.; Lloyd, S. Quantum support vector machine for big feature and big data classification. Phys. Rev. Lett. 2014, 113. [Google Scholar] [CrossRef] [PubMed]
  16. Wiebe, N.; Kapoor, A.; Svore, K.M. Quantum nearest-neighbor algorithms for machine learning. Quantum Inf. Comput. 2015, 15, 0318–0358. [Google Scholar]
  17. Miszczak, J.A. High-level Structures for Quantum Computing. In Synthesis Lectures on Quantum Computing; Morgan & Claypool Publishers: Williston, FL, USA, 2012. [Google Scholar]
  18. Holik, F.; Sergioli, G.; Freytes, H.; Plastino, A. Pattern Recognition in Non-Kolmogorovian Structures. Found. Sci. 2017, 1–14. [Google Scholar] [CrossRef]
  19. Trugenberger, C.A. Quantum pattern recognition. Quantum Inf. Process. 2002, 1, 471–493. [Google Scholar] [CrossRef]
  20. Lloyd, S.; Mohseni, M.; Rebentrost, P. Quantum principal component analysis. Nat. Phys. 2014, 10, 631–633. [Google Scholar] [CrossRef]
  21. Sergioli, G.; Santucci, E.; Didaci, L.; Miszczak, J.A.; Giuntini, R. A quantum-inspired version of the Nearest Mean Classifier. Soft Comput. 2017, 1–15. [Google Scholar] [CrossRef]
  22. Sergioli, G.; Bosyk, G.M.; Santucci, E.; Giuntini, R. A quantum-inspired version of the classification problem. Int. J. Theor. Phys. 2017, 56, 3880–3888. [Google Scholar] [CrossRef]
  23. Santucci, E.; Sergioli, G. Classification problem in a quantum framework. In Quantum Foundations, Probability and Information, Proceedings of the Quantum and Beyond Conference, Vaxjo, Sweden, 13–16 June 2016; Khrennikov, A., Bourama, T., Eds.; Springer: Berlin, Germany, 2017; in press. [Google Scholar]
  24. Duda, R.O.; Hart, P.E.; Stork, D.G. Pattern Classification, 2nd ed.; Wiley: Hoboken, NJ, USA, 2000; ISBN 978-0-471-05669-0. [Google Scholar]
  25. Webb, A.R.; Copsey, K.D. Statistical Pattern Recognition, 3rd ed.; Wiley: Hoboken, NJ, USA, 2011; ISBN 978-0-470-68227-2. [Google Scholar]
  26. Johnson, R.A.; Wichern, D.W. Applied Multivariate Statistical Analysis, 6th ed.; Pearson: London, UK, 2007. [Google Scholar]
  27. Fawcett, T. An introduction of the ROC analysis. Pattern Recognit. Lett. 2006, 27, 861–874. [Google Scholar] [CrossRef]
  28. Coxeter, H.S.M. Introduction to Geometry, 2nd ed.; Wiley: Hoboken, NJ, USA, 1989. [Google Scholar]
  29. Kimura, G. The Bloch vector for N-level systems. Phys. Lett. A 2003, 314, 339–349. [Google Scholar] [CrossRef]
  30. Bertlmann, R.A.; Krammer, P. Bloch vectors for qudits. J. Phys. A Math. Theor. 2008, 41, 235303. [Google Scholar] [CrossRef]
  31. Karlıǧa, B. On the generalized stereographic projection. Beitr. Algebra Geom. 1996, 37, 329–336. [Google Scholar]
  32. Kimura, G.; Kossakowski, A. The Bloch-vector space for N-level systems: the spherical-coordinate point of view. Open Syst. Inf. Dyn. 2005, 12, 207–229. [Google Scholar] [CrossRef]
  33. Jakóbczyk, L.; Siennicki, M. Geometry of bloch vectors in two-qubit system. Phys. Lett. A 2001, 286, 383–390. [Google Scholar] [CrossRef]
  34. Nielsen, M.A.; Chuang, I.L. Quantum Computation and Quantum Information, 10th Anniversary ed.; Cambridge University Press: Cambridge, UK, 2010. [Google Scholar]
  35. Sommers, H.J.; Zyczkowski, K. Bures volume of the set of mixed quantum states. J. Phys. A Math. Gen. 2003, 36, 10083. [Google Scholar] [CrossRef]
  36. Ruskai, M.B. Beyond strong subadditivity? Improved bounds on the contraction of generalized relative entropy. Rev. Math. Phys. 1994, 6, 1147–1161. [Google Scholar] [CrossRef]
  37. UCL Machine Learning Repository (Center for Machine Learning and Intelligent Systems). Available online: (accessed on 30 November 2017).
  38. Knowledge Extraction based on Evolutionary Learning. Available online: (accessed on 30 November 2017).
  39. Skurichina, M.; Duin, R.P.W. Bagging, Boosting and the Random Subspace Method for Linear Classifiers. Pattern Anal. Appl. 2002, 5, 121–135. [Google Scholar] [CrossRef]
  40. Wassermann, L. All of Statistic: A Concise Course in Statistical Inference; Springer: Berlin, Germany, 2004. [Google Scholar]
Figure 1. Comparison between NMC (Nearest Mean Classifier) and QNMC (Quantum Nearest Mean Classifier) performance in terms of the classification error for the datasets (ac) Appendicitis, (df) Monk, (gi) Moon. In all the subfigures, the simple dashed line represents the QNMC classification error without rescaling, the dashed line with points represents the NMC classification error (which does not depend on the rescaling parameter), points with related error bars (red for Appendicitis, blue for Monk and green for Moon) represent the QNMC classification error for increasing values of the parameter γ .
Figure 1. Comparison between NMC (Nearest Mean Classifier) and QNMC (Quantum Nearest Mean Classifier) performance in terms of the classification error for the datasets (ac) Appendicitis, (df) Monk, (gi) Moon. In all the subfigures, the simple dashed line represents the QNMC classification error without rescaling, the dashed line with points represents the NMC classification error (which does not depend on the rescaling parameter), points with related error bars (red for Appendicitis, blue for Monk and green for Moon) represent the QNMC classification error for increasing values of the parameter γ .
Entropy 19 00659 g001
Table 1. Characteristics of the datasets used in our experiments. The number of each class is shown between brackets.
Table 1. Characteristics of the datasets used in our experiments. The number of each class is shown between brackets.
Data SetClass SizeFeatures (d)
Appendicitis106 (85 + 21)7
Balance625 (49 + 288 + 288)4
Banana5300 (2376 + 2924)2
Bands365 (135 + 230)19
Breast Cancer (I)683 (444 + 239)10
Breast Cancer (II)699 (458 + 241)9
Bupa345 (145 + 200)6
Chess3196 (1669 + 1527)36
Gaussian (I)400 (200 + 200)30
Gaussian (II)1000 (100 + 900)8
Gaussian (III)2050 (50 + 500 + 1500)8
Hayes-Roth132 (51 + 51 + 30)5
Ilpd583 (416 + 167)9
Ionosphere351 (225 + 126)34
Iris150 (50 + 50 + 50)4
Iris0150 (100 + 50)4
Liver578 (413 + 165)10
Monk432 (204 + 228)6
Moon200 (100 + 100)2
Mutagenesis-Bond3995 (1040 + 2955)17
Page5472 (4913 + 559)10
Pima768 (500 + 268)8
Ring7400 (3664 + 3736)20
Segment2308 (1979 + 329)19
Thyroid (I)215 (180 + 35)5
Thyroid (II)215 (35 + 180)5
TicTac958 (626 + 332)9
Table 2. Comparison between QNMC and NMC performances.
Table 2. Comparison between QNMC and NMC performances.
Appendicitis0.124 ± 0.0580.876 ± 0.0580.708 ± 0.2190.886 ± 0.0680.553 ± 0.223
Balance0.148 ± 0.0180.852 ± 0.0180.915 ± 0.0140.862 ± 0.0220.767 ± 0.029
Banana0.316 ± 0.0170.684 ± 0.0170.660 ± 0.0170.684 ± 0.0180.350 ± 0.034
Bands0.394 ± 0.0530.606 ± 0.0530.528 ± 0.0710.606 ± 0.0580.133 ± 0.112
Breast Cancer (I)0.386 ± 0.0380.614 ± 0.0380.444 ± 0.0450.583 ± 0.0440.062 ± 0.069
Breast Cancer (II)0.040 ± 0.0150.946 ± 0.0230.986 ± 0.0160.993 ± 0.0090.912 ± 0.033
Bupa0.389 ± 0.0440.610 ± 0.0440.641 ± 0.0520.359 ± 0.0520.066 ± 0.044
Chess0.256 ± 0.0170.744 ± 0.0170.747 ± 0.0160.748 ± 0.0160.488 ± 0.033
Gaussian (I)0.274 ± 0.0510.726 ± 0.0510.728 ± 0.0490.745 ± 0.0480.452 ± 0.099
Gaussian (II)0.210 ± 0.0250.790 ± 0.0250.744 ± 0.0610.900 ± 0.0190.308 ± 0.058
Gaussian (III)0.401 ± 0.0360.599 ± 0.0360.558 ± 0.0260.654 ± 0.0410152 ± 0.043
Hayes-Roth0.413 ± 0.0390.588 ± 0.0390.780 ± 0.0250.602 ± 0.0630.339 ± 0.060
Ilpd0.351 ± 0.0370.649 ± 0.0370.705 ± 0.0560.734 ± 0.0410.292 ± 0.073
Ionosphere0.165 ± 0.0490.835 ± 0.0490.764 ± 0.0590.842 ± 0.0510.624 ± 0.105
Iris0.047 ± 0.0310.953 ± 0.0310.977 ± 0.0140.957 ± 0.0280.929 ± 0.045
Iris00 ± 01 ± 01 ± 01 ± 01 ± 0
Liver0.342 ± 0.0370.607 ± 0.0570.783 ± 0.0590.870 ± 0.0390.318 ± 0.061
Monk0.132 ± 0.0340.869 ± 0.0340.885 ± 0.0300.891 ± 0.0250.738 ± 0.065
Moon0.156 ± 0.0420.857 ± 0.0630.831 ± 0.0660.841 ± 0.0660.683 ± 0.085
Mutagenesis-Bond0.266 ± 0.0210.734 ± 0.0210.281 ± 0.0170.662 ± 0.0400.023 ± 0.021
Page0.154 ± 0.0090.846 ± 0.0090.471 ± 0.0390.869 ± 0.0100.274 ± 0.035
Pima0.304 ± 0.0300.696 ± 0.0300.690 ± 0.0440.720 ± 0.0300.365 ± 0.066
Ring0.098 ± 0.0060.902 ± 0.0060.903 ± 0.0060.905 ± 0.0060.805 ± 0.012
Segment0.194 ± 0.0170.807 ± 0.0170.718 ± 0.0450.864 ± 0.0150.401 ± 0.041
Thyroid (I)0.078 ± 0.0400.922 ± 0.0400.747 ± 0.1480.923 ± 0.0430.695 ± 0.153
Thyroid (II)0.081 ± 0.0340.919 ± 0.0340.754 ± 0.1220.923 ± 0.0350.684 ± 0.121
Tic Tac0.410 ± 0.0320.590 ± 0.0320.597 ± 0.0390.629 ± 0.0360.172 ± 0.061
Appendicitis0.218 ± 0.0860.782 ± 0.0860.724 ± 0.1670.835 ± 0.0700.423 ± 0.201
Balance0.267 ± 0.0380.733 ± 0.0380.969 ± 0.0140.925 ± 0.0250.686 ± 0.034
Banana0.453 ± 0.0190.548 ± 0.0190.552 ± 0.0200.556 ± 0.0200.098 ± 0.038
Bands0.435 ± 0.0480.565 ± 0.0480.582 ± 0.0550.605 ± 0.0540.135 ± 0.092
Breast Cancer (I)0.442 ± 0.0370.558 ± 0.0370.464 ± 0.0460.551 ± 0.0390.022 ± 0.076
Breast Cancer (II)0.042 ± 0.0150.973 ± 0.0150.931 ± 0.0320.963 ± 0.0170.908 ± 0.033
Bupa0.530 ± 0.0290.470 ± 0.0290.625 ± 0.0300.620 ± 0.0360.066 ± 0.044
Chess0.307 ± 0.0180.693 ± 0.0180.707 ± 0.0160.714 ± 0.0160.393 ± 0.033
Gaussian (I)0.322 ± 0.0420.679 ± 0.0420.680 ± 0.0430.685 ± 0.0420.355 ± 0.085
Gaussian (II)0.320 ± 0.0320.680 ± 0.0320.588 ± 0.1020.860 ± 0.0320.129 ± 0.055
Gaussian (III)0.530 ± 0.0290.470 ± 0.0290.625 ± 0.0300.620 ± 0.0360.066 ± 0.044
Hayes-Roth0.503 ± 0.0660.497 ± 0.0660.689 ± 0.0630.514 ± 0.0750.180 ± 0.121
Ilpd0.470 ± 0.0370.530 ± 0.0370.757 ± 0.0410.761 ± 0.0370.193 ± 0.051
Ionosphere0.323 ± 0.0510.677 ± 0.0510.676 ± 0.0510.680 ± 0.0510.351 ± 0.102
Iris0.110 ± 0.0520.890 ± 0.0520.946 ± 0.0330.904 ± 0.0410.831 ± 0.087
Iris00.023 ± 0.0210.977 ± 0.0210.990 ± 0.0090.980 ± 0.0180.946 ± 0.050
Liver0.472 ± 0.0480.388 ± 0.0570.891 ± 0.0550.905 ± 0.0450.193 ± 0.060
Monk0.224 ± 0.0220.776 ± 0.0220.775 ± 0.0220.779 ± 0.0220.550 ± 0.043
Moon0.234 ± 0.0650.772 ± 0.0890.762 ± 0.0850.771 ± 0.0910.528 ± 0.130
Mutagenesis-Bond0.481 ± 0.0130.519 ± 0.0130.525 ± 0.0290.630 ± 0.0200.034 ± 0.029
Page0.215 ± 0.0130.785 ± 0.0130.205 ± 0.0280.809 ± 0.014-0.010 ± 0.024
Pima0.375 ± 0.0330.625 ± 0.0330.546 ± 0.0450.622 ± 0.0370.173 ± 0.075
Ring0.238 ± 0.0110.763 ± 0.0110.761 ± 0.0110.768 ± 0.0110.524 ± 0.022
Segment0.311 ± 0.0220.689 ± 0.0220.824 ± 0.0410.870 ± 0.0140.286 ± 0.038
Thyroid (I)0.134 ± 0.0420.867 ± 0.0420.739 ± 0.1500.887 ± 0.0400.545 ± 0.139
Thyroid (II)0.134 ± 0.0480.866 ± 0.0480.777 ± 0.1590.897 ± 0.0460.542 ± 0.157
Tic Tac0.439 ± 0.0310.561 ± 0.0310.571 ± 0.0420.606 ± 0.0360.119 ± 0.063

Share and Cite

MDPI and ACS Style

Santucci, E. Quantum Minimum Distance Classifier. Entropy 2017, 19, 659.

AMA Style

Santucci E. Quantum Minimum Distance Classifier. Entropy. 2017; 19(12):659.

Chicago/Turabian Style

Santucci, Enrica. 2017. "Quantum Minimum Distance Classifier" Entropy 19, no. 12: 659.

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop