Support Vector Machines with Quantum State Discrimination

Leporini, Roberto; Pastorello, Davide

doi:10.3390/quantum3030032

Open AccessArticle

Support Vector Machines with Quantum State Discrimination

by

Roberto Leporini

^1,*

and

Davide Pastorello

²

¹

Department of Economics, University of Bergamo, Via dei Caniana 2, I-24127 Bergamo, Italy

²

Department of Information Engineering and Computer Science, University of Trento, Via Sommarive 9, I-38123 Povo, Italy

^*

Author to whom correspondence should be addressed.

Quantum Rep. 2021, 3(3), 482-499; https://doi.org/10.3390/quantum3030032

Submission received: 24 July 2021 / Revised: 22 August 2021 / Accepted: 26 August 2021 / Published: 28 August 2021

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

We analyze possible connections between quantum-inspired classifications and support vector machines. Quantum state discrimination and optimal quantum measurement are useful tools for classification problems. In order to use these tools, feature vectors have to be encoded in quantum states represented by density operators. Classification algorithms inspired by quantum state discrimination and implemented on classic computers have been recently proposed. We focus on the implementation of a known quantum-inspired classifier based on Helstrom state discrimination showing its connection with support vector machines and how to make the classification more efficient in terms of space and time acting on quantum encoding. In some cases, traditional methods provide better results. Moreover, we discuss the quantum-inspired nearest mean classification.

Keywords:

quantum state discrimination; optimal quantum measurement; support vector machine; machine learning

1. Introduction

Support vector machines are becoming popular in a wide variety of applications [1]. They are supervised learning models with associated algorithms (such as sub-gradient descent and coordinate descent) that analyze data for classification [2]. A support vector machine (SVM, for short) learns by examples to assign labels to feature vectors. An object with a feature vector is treated through a kernel function as a point in a larger space and the goal is to find the maximum-margin separating hyperplanes that allow one to partition the space and divide the points into classes to which labels are assigned. The logic behind the kernel function of an SVM and the kernel methods in general turns out to be rather similar to what is seen in quantum computing when one performs an encoding of classical data into quantum states. In fact, quantum computing provides implicit computations in high-dimensional Hilbert spaces by means of the physical manipulation of quantum systems, as well as kernel methods provide implicit computation in a higher dimensional feature space by means of the efficient representation of inputs. The interpretation of quantum encodings as feature maps with relevance in quantum machine learning is well-established [3,4], and it is one of the crucial points of this work. In addition to the general connection between kernel methods and quantum computing, there are the expressive quantum approaches to SVM in terms of implementations of this model on quantum computers. For instance, in [5], the authors propose a discretized SVM whose training is performed applying the Grover algorithm. The celebrated proposal of quantum SVM by Rebentrost, Mohseni and Lloyd [6] is based on data retrieval from a quantum random access memory, the quantum phase estimation algorithm and the SWAP test. The resulting quantum algorithm allows a direct implementation of polynomial kernels in terms of tensor products of the quantum states encoding the training vectors. Furthermore, a quantum implementation of SVM on a quantum annealer has been recently proposed [7].

Quantum structures can be used to devise novel machine learning algorithms that do not require quantum hardware in the sense that the mathematical formalism of quantum mechanics is applied to deal with data that are managed by classical computers. The so-called quantum-inspired machine learning is based on particular kinds of information storing and processing defined by means of objects from the quantum formalism that do not necessarily represent physical quantum systems. In the context of quantum-inspired machine learning, SVM has been studied [8], and the present work focuses on a quantum-inspired classification algorithm that turns out to be similar to an SVM.

The general idea of classification algorithms based on discrimination of quantum states is also supported by recent experimental works on quantum state classification based on classical machine learning methods, such as the proposals in [9,10,11,12,13], for instance. In [14], the authors demonstrate a machine learning approach to construct a classifier of quantum states training a neural network. In [15], convolutional neural networks and principal component analysis are applied to classify polarization patterns in quantum optics.

An interesting quantum-inspired binary classification algorithm has been introduced in [16] in terms of a nearest mean classifier based on trace distance between density operators encoding feature vectors. To handle multi-class with this binary classifier, there are different techniques: one against one, which constructs a classifier for each pair of classes, one against all, which builds one per class, hierarchical classification, which creates a tree, where the leaves correspond with the classes. Another quantum-inspired supervised machine learning algorithm for multi-class classification based on so-called pretty good measurement has been proposed in [17], generalizing the Helstrom quantum state discrimination [18] that can be used for binary classification. Classification accuracy of this quantum-inspired multi-class classifier can be improved by increasing the number of copies of the quantum state that encodes the feature vector, at the cost of increasing the computational space and time.

In this paper, we analyze possible connections between support vector machines and quantum-inspired classifications using a geometric approach. In particular, considering a quantum encoding of classical data in terms of Bloch vectors of density operators, we observe that the execution of the Helstrom classifier is analogous to an SVM with a linear kernel. In Section 2, we give a short introduction to some quantum fundamentals that are relevant in the present work, such as the Bloch representation for quantum states. Moreover, we review the application of Helstrom state discrimination for binary classification. In Section 3, we analyze quantum state discrimination for binary classification encoding data into Bloch vectors, some empirical results in this regard and the code in Mathematica are presented in Appendix A. In particular, we highlight the SVM-like behavior of the Helstrom classification for a two-feature dataset considering the encoding in a bi-dimensional Hilbert space and an encoding into a space of enlarged dimension. In Section 4, we discuss the general strategy of implementing a nearest mean classifier based on an operator distance between quantum states such as trace distance and Bures distance. In Section 5, we present some numerical results obtained by the implementation of the Helstrom classifier. In Section 6, we draw some final comments.

2. Basics

The set of density matrices on the (finite-dimensional) Hilbert space

H

is given by

S (H) = {ρ \in B^{+} (H) : tr ρ = 1}

, where

B^{+} (H)

is the set of positive semidefinite operators on

H

. The set

S (H)

is convex and its extreme elements, the pure states, are rank-1 orthogonal projectors. A pure state has general form

ρ = |ψ〉 〈ψ|

, and it can then be directly identified with the unit vector

|ψ〉 \in H

up to a phase factor.

The bases of the real space of Hermitian matrices on

C^{d}

can be used to decompose density matrices associated with states of a quantum system described in a d-dimensional Hilbert space. A fundamental basis for qubits (

dim H = 2

) is formed by the three Pauli matrices and the

2 \times 2

identity matrix. In this case, any density matrix can be represented by a three-dimensional vector, the Bloch vector, that lies within the unit ball in

R^{3}

whose boundary is the Bloch sphere. The points on the spherical surface are in bijective correspondence with the pure states. In higher dimensions, the set of quantum states is a convex body with a much more complicated geometry and it is no longer simply represented as a unit ball. In general, for any

j, k, l

such that

1 \leq j \leq d^{2} - 1

,

0 \leq k < l \leq d - 1

, the generalized Pauli matrices

σ_{j}

on

C^{d}

can be defined as follows:

σ_{j} = : \{\begin{matrix} |\frac{k}{d - 1}〉 〈\frac{l}{d - 1}| + |\frac{l}{d - 1}〉 〈\frac{k}{d - 1}| & if j \leq \frac{d (d - 1)}{2} and j = \frac{k (1 - k)}{2} + (d - 2) k + l; \\ - i |\frac{k}{d - 1}〉 〈\frac{l}{d - 1}| + i |\frac{l}{d - 1}〉 〈\frac{k}{d - 1}| & if \frac{d (d - 1)}{2} < j \leq d (d - 1) and j = \frac{d (d - 1) + k (1 - k)}{2} + (d - 2) k + l; \\ \sqrt{\frac{2}{l (l + 1)}} (\sum_{k = 0}^{l - 1} |\frac{k}{d - 1}〉 〈\frac{k}{d - 1}| - l |\frac{l}{d - 1}〉 〈\frac{l}{d - 1}|) & if j > d (d - 1) and j = d (d - 1) + l; \end{matrix}

(1)

where

{\{|\frac{k}{d \dots 1}〉\}}_{k = 0, \dots, d - 1}

denotes the canonical basis of

C^{d}

. The generalized Pauli matrices

{σ_{j}}_{j = 1, \dots, d^{2} - 1}

are the standard generators of the special unitary group

S U (d)

. In particular,

\frac{d (d - 1)}{2}

matrices are symmetric,

\frac{d (d - 1)}{2}

matrices are antisymmetric,

d - 1

matrices are diagonal. Together with the

d \times d

identity matrix

I_{d}

, the generalized Pauli matrices form an orthogonal (the orthogonality is with respect to the Hilbert–Schmidt product

{(A, B)}_{H S} = tr (A^{†} B)

) basis of the real space of

d \times d

Hermitian matrices.

Let

ρ

be a density operator on

C^{d}

. The expansion of

ρ

with respect to the orthogonal basis

{I_{d}, σ_{j} : 1 \leq j \leq d^{2} - 1}

is:

ρ = \frac{1}{d} (I_{d} + \sqrt{\frac{d (d - 1)}{2}} \sum_{j = 1}^{d^{2} - 1} b_{j}^{(ρ)} σ_{j}),

where

b_{j}^{(ρ)} = \sqrt{\frac{d}{2 (d - 1)}} tr (ρ σ_{j}) \in R

. The coordinates

b^{(ρ)} = (b_{1}^{(ρ)}, \dots, b_{d^{2} - 1}^{(ρ)})

represent the Bloch vector associated to

ρ

with respect to the basis

{I_{d}, σ_{j} : 1 \leq j \leq d^{2} - 1}

, which lies within the hypersphere of radius 1. For

d > 2

, the points contained in the unit hypersphere in

R^{d^{2} - 1}

are not in bijective correspondence with quantum states on

C^{d}

such as in the case of a single qubit. However, any vector within the closed ball of radius

\frac{2}{d}

gives rise to a density operator. From the physical viewpoint, the Bloch vector has real components that can be expressed as expectation values of measurable quantities. For

d = 3

, the generalized Pauli matrices are the Gell–Mann matrices and the Bloch vector can be expressed as expectation values of spin 1 operators.

A complex vector can be encoded into a density matrix. For instance, a quantum encoding (the amplitude encoding) is given by:

C^{n} ∋ x \mapsto |x〉 = \frac{1}{\sqrt{{‖ x ‖}^{2} + 1}} (\sum_{i = 0}^{n - 1} x_{i} |i〉 + |n〉) \in H,

(2)

where

{|i〉}_{i = 0, \dots, n}

is the computational basis of the

(n + 1)

-dimensional Hilbert space

H

, identified as the standard basis of

C^{n + 1}

. The map defined in (2) encodes

x

into the pure state

ρ_{x} = |x〉 〈x|

, the additional component of

|x〉

stores the norm of

x

. Generally speaking, a quantum encoding is any procedure to encode classical information (e.g., a list of symbols) into quantum states. In this paper, we consider encodings of vectors in

C^{n}

and

R^{n}

into density matrices on a Hilbert space

H

whose dimension depends on n.

In [17], there is the proposal of a quantum-inspired classification algorithm based on a generalization of the Helstrom measurement, the so-called Pretty Good measurement, for quantum state discrimination. Let us focus on the case of binary classification of n-dimensional complex feature vectors; the algorithm is based on the following three ingredients: (1) a quantum encoding of the feature vectors

C^{n} ∋ x \mapsto ρ_{x} \in S (H)

; (2) the construction of the quantum centroids of the two classes

C_{1}

and

C_{2}

of training points:

ρ_{i} : = \frac{1}{| C_{i} |} \sum_{x \in C_{i}} ρ_{x} i = 1, 2;

(3)

(3) application of the Helstrom discrimination on the two quantum centroids in order to assign a label to a new data instance.

Let us briefly introduce the notion of quantum state discrimination that is central in the present work. Given a set of arbitrary quantum states with respective a priori probabilities

R = {(ρ_{1}, p_{1}), \dots, (ρ_{N}, p_{N})}

, in general there is no measurement process that discriminates the states without errors. More formally, there does not exist a collection of effects

E = {E_{i}}_{i = 1, \dots, N} \subset B^{+} (H)

such that

\sum_{i = 1}^{N} E_{i} = I

satisfying the following property:

tr (E_{i} ρ_{j}) = 0

when

i \neq j

for all

i, j = 1, \dots, N

. In some particular cases, the states can be exactly discriminated, for example if we have a set of orthogonal pure states

{|ψ_{1}〉, \dots, |ψ_{N}〉}

, we can discriminate them without errors by means of the corresponding von Neumann measurement

{|ψ_{i}〉 〈ψ_{i}|}_{i = 1, \dots, N}

. Returning to the general set R, the probability of a successful state discrimination performing the measurement E is:

P_{E} (R) = \sum_{i = 1}^{N} p_{i} tr (E_{i} ρ_{i}) .

(4)

An interesting and useful task is finding the optimal measurement that maximizes the probability (4). In [18], the author presents a complete characterization of the optimal measurement

E_{o p t}

for

R = {(ρ_{1}, p_{1}), (ρ_{2}, p_{2})}

.

E_{o p t}

can be constructed as follows: Let

Λ :

=

p_{1} ρ_{1} - p_{2} ρ_{2}

be the Helstrom observable whose positive and negative eigenvalues are, respectively, collected in the sets

D_{+}

and

D_{-}

. Consider the two orthogonal projectors:

P_{\pm} : = \sum_{λ \in D_{\pm}} P_{λ},

(5)

where

P_{λ}

projects onto the eigenspace of

λ

. The measurement

E_{o p t} :

= {P_{+}, P_{-}}

maximizes the probability (4) that attains the Helstrom bound

h_{b} (ρ_{1}, ρ_{2}) = p_{1} tr (P_{+} ρ_{1}) + p_{2} tr (P_{-} ρ_{2})

.

Helstrom quantum state discrimination can be used to implement a binary classifier [17]. Let

{(x_{1}, y_{1}), \dots, (x_{M}, y_{M})}

be a training set with

y_{i} \in {1, 2}

\forall i = 1, \dots, M

. Once a quantum encoding

C^{n} ∋ x \mapsto ρ_{x} \in S (H)

has been selected, one can construct the quantum centroids

ρ_{1}

and

ρ_{2}

as in (3) of the two classes

C_{1, 2} = {x_{i} : y_{i} = 1, 2}

. Let

{P_{+}, P_{-}}

be the Helstrom measurement defined by the set

R = {(ρ_{1}, p_{1}), (ρ_{2}, p_{2})}

, where the probabilities attached to the centroids are

p_{1, 2} = \frac{| C_{1, 2} |}{| C_{1} | + | C_{2} |}

. The Helstrom classifier applies the optimal measurement for the discrimination of the two quantum centroids to assign the label y to a new data instance

x

, encoded into the state

ρ_{x}

, as follows:

y (x) = \{\begin{matrix} 1 & if tr (P_{+} ρ_{x}) \geq tr (P_{-} ρ_{x}) \\ 2 & otherwise \end{matrix}

(6)

A strategy to increase the accuracy in classification is given by the construction of the tensor product of k copies of the quantum centroids

ρ_{1, 2}^{\otimes k}

enlarging the Hilbert space where data are encoded. The corresponding Helstrom measurement is

{P_{+}^{\otimes k}, P_{-}^{\otimes k}}

, and the Helstrom bound satisfies [17]:

h_{b} (ρ_{1}^{\otimes k}, ρ_{2}^{\otimes k}) \leq h_{b} (ρ_{1}^{\otimes (k + 1)}, ρ_{2}^{\otimes (k + 1)}) \forall k \in N .

(7)

Enlarging the Hilbert space of the quantum encoding, one increases the Helstrom bound obtaining a more accurate classifier. The computational cost is evident; however, in the next section, we observe that in the case of real input vectors, the space can be enlarged, saving time and space by means of the encoding into Bloch vectors.

3. Geometric Approach to Quantum-Inspired Classifications

In [17], a real vector

x \in R^{d - 1}

is encoded, as shown above, in a projector operator

ρ_{x}

of a Hilbert space

C^{d}

represented by an

d \times d

real symmetric matrix, where

d \geq 2

. For simplicity, we consider an input vector

[x_{1}, x_{2}] \in R^{2}

and the corresponding projector operator

ρ_{[x_{1}, x_{2}]}

on

C^{3}

. By easy computations, one can see that the Bloch vector of

ρ_{[x_{1}, x_{2}]}

has null components:

b^{(x_{1}, x_{2})} = \frac{1}{1 + x_{1}^{2} + x_{2}^{2}} [2 x_{1} x_{2}, 2 x_{1}, 2 x_{2}, 0, 0, 0, x_{1}^{2} - x_{2}^{2}, \frac{x_{1}^{2} + x_{2}^{2} - 2}{\sqrt{3}}] .

(8)

Instead of using a matrix with nine real elements, memory occupation can be improved by considering only the non-zero components of the Bloch vector. In general, the technique of removing the components that are zero or repeated several times allows reducing the space and the calculation time considering only the significant values that allow to carry out the classification.

Quantum-inspired classifications are similar to support vector machines that implicitly map the input space into high-dimensional feature space using kernel functions, where the maximal separating margins are constructed. In this case, the nonlinear explicit injective function

φ : R^{2} \to R^{5}

can be defined as follows:

φ ([x_{1}, x_{2}]) : = \frac{1}{x_{1}^{2} + x_{2}^{2} + 1} [2 x_{1} x_{2}, 2 x_{1}, 2 x_{2}, x_{1}^{2} - x_{2}^{2}, \frac{x_{1}^{2} + x_{2}^{2} - 2}{\sqrt{3}}] .

(9)

From a geometric point of view, feature vectors are indeed points on the surface of a hyper-hemisphere. The corresponding elements to the quantum centroids

ρ_{1}, ρ_{2}

are the centroids of the feature vectors:

{\bar{x}}_{i} : = \frac{1}{| C_{i} |} \sum_{x \in C_{i}} φ (x) i = 1, 2

(10)

In general, such centroids are points inside the hypersphere and therefore they do not have an inverse image.

The Helstrom classifier can also be applied in a smaller space using the following encoder from

R^{2}

to density operators of

C^{2}

:

ρ_{[x_{1}, x_{2}]} = \frac{1}{2} (I_{2} + \sum_{j = 1}^{3} b_{j} σ_{j}),

(11)

where the Bloch vector

b = φ ([x_{1}, x_{2}]) \in R^{3}

and

φ ([x_{1}, x_{2}]) : = \frac{1}{\sqrt{x_{1}^{2} + x_{2}^{2} + 1}} [x_{1}, x_{2}, 1]

. As discussed in Section 5, the Helstrom classifier gives even less accurate results on the training set as expected because the feature space is smaller than the previous one. In this particular case, quantum centroids are points inside the Bloch sphere of a qubit that correspond to density operators.

An interesting question suggested in [17] is whether classification accuracy can be improved by increasing the dimension of the state space of density matrices that represent input vectors. Improving accuracy providing n copies of centroids in quantum-inspired classifications has a strong impact in terms of computational space (from dimension

d - 1

to

d^{2 n}

) and time. Following the geometric approach, considering the significant values that allow to carry out the classification, the explicit function

φ : R^{2} \to R^{20}

for two copies can be defined as follows:

φ ([x_{1}, x_{2}]) : = \frac{1}{{(x_{1}^{2} + x_{2}^{2} + 1)}^{2}} [2 x_{1}^{3} x_{2}, 2 x_{1}^{3}, 2 x_{1}^{2} x_{2}^{2}, 2 x_{1}^{2} x_{2}, 2 x_{1}^{2}, 2 x_{1} x_{2}^{3}, 2 x_{1} x_{2}^{2}, 2 x_{1} x_{2}, 2 x_{1}, 2 x_{2}^{3},

2 x_{2}^{2}, 2 x_{2}, x_{1}^{2} (x_{1} - x_{2}) (x_{1} + x_{2}), \frac{x_{1}^{2} (x_{1}^{2} + x_{2}^{2} - 2)}{\sqrt{3}}, \frac{x_{1}^{2} (x_{1}^{2} - 2 x_{2}^{2} + 1)}{\sqrt{6}},

\frac{x_{1}^{4} - 4 x_{2}^{4} + x_{1}^{2} (2 x_{2}^{2} + 1)}{\sqrt{10}}, \frac{x_{1}^{2} + x_{1}^{4} - 5 x_{2}^{2} + 2 x_{1}^{2} x_{2}^{2} + x_{2}^{4}}{\sqrt{15}}, \frac{x_{1}^{4} + x_{2}^{2} + x_{2}^{4} + x_{1}^{2} (2 x_{2}^{2} - 5)}{\sqrt{21}},

\frac{x_{1}^{4} - 6 x_{2}^{2} + x_{2}^{4} + 2 x_{1}^{2} (x_{2}^{2} + 1)}{2 \sqrt{7}}, \frac{1}{6} (x_{1}^{2} + x_{2}^{2} - 2) (x_{1}^{2} + x_{2}^{2} + 4)] .

(12)

In particular, removing null and multiple entries, we consider only 20 values instead of 81 for two copies, 51 values instead of 729 for three copies and so on. However, one must also take into account high-precision numbers and track the propagation of the numerical error. The gain in accuracy seems marginal already from three copies.

In Section 5, we will show some numerical results obtained by the implementation of the Helstrom classifier and, in particular, the close analogy between this quantum-inspired method and the SVM with linear kernel. A natural limitation of Helstrom classification arises in the case of a training set where the centroids of the two classes coincide, the Helstrom classifier is clearly useless because it is not able to perform a corresponding state discrimination. In the same situation, there are effective classical classification methods such as Random Forest, Naive Bayes classifier and Nearest Neighbor.

4. Quantum-Inspired Nearest Mean Classifications

In [16], a quantum version of the nearest mean classifier was shown making use of the inverse of the stereographic projection as encoder:

π^{- 1} : R^{d - 1} ∋ x \mapsto \frac{2}{\sum_{i = 1}^{d - 1} x_{i}^{2} + 1} [x_{1}, \dots, x_{d - 1}, \frac{\sum_{i = 1}^{d - 1} x_{i}^{2} - 1}{2}] \in S^{d - 1} .

(13)

In the case

x \in R^{2}

, the lowest dimensional quantum encoding is obviously in

C^{2}

; in particular,

ρ_{x}

is given by the density matrix identified by the Bloch vector

π^{- 1} (x)

, that is:

ρ_{x} = \frac{1}{x_{1}^{2} + x_{2}^{2} + 1} (\begin{matrix} x_{1}^{2} + x_{2}^{2} & x_{1} - i x_{2} \\ x_{1} + i x_{2} & 1 \end{matrix}) .

(14)

The state

ρ_{x}

is pure, i.e., a projector, as

π^{- 1} (x)

lies on the surface of the Bloch sphere. In Appendix A, the encoding

x \mapsto ρ_{x}

defined by (14), in an arbitrary dimension, is realized by the function

SVMEncoder [x, type]

with

type = 2

. For binary classification, in [16], the centroids of the two classes are calculated in the feature space and then encoded into density matrices according to (14). Given a test point encoded into a density matrix, the classifier appends it to the nearest centroid with respect to the normalized trace distance:

{\bar{d}}_{tr} (ρ_{x}, ρ_{y}) = \frac{2}{\sqrt{(1 - b_{x_{3}}) (1 - b_{y_{3}})}} d_{tr} (ρ_{x}, ρ_{y}),

(15)

where

b_{x_{3}}

and

b_{y_{3}}

are the Bloch coefficients with respect to the Pauli matrix

σ_{3}

of

ρ_{x}

and

ρ_{y}

and

d_{tr} (ρ_{x}, ρ_{y}) = \frac{1}{2} tr (| ρ_{x} - ρ_{y} |)

. One can easily verify that:

{\bar{d}}_{tr} (ρ_{x}, ρ_{y}) = d_{E} (x, y),

(16)

where

d_{E}

is the standard Euclidean distance. Therefore a quantum-inspired nearest mean classifier can be defined by the encoding given in (14) and by the evaluation of the trace distance between density operators. In [16], experimental results on the performances of such a quantum-inspired classifier are presented and compared to the classical nearest mean classifier with impressive results in terms of accuracy.

As illustrated in the previous section, in order to improve data separation, the input space can be mapped into a higher dimensional feature space by means of a kernel trick. It is also possible to reduce the computational space from

R^{8}

to

R^{5}

in this case with the following explicit function

φ : R^{2} \to R^{5}

:

φ ([x_{1}, x_{2}]) : = \frac{2}{{(x_{1}^{2} + x_{2}^{2} + 1)}^{2}} [4 x_{1} x_{2}, 2 x_{1} (x_{1}^{2} + x_{2}^{2} - 1), 2 x_{2} (x_{1}^{2} + x_{2}^{2} - 1),

2 x_{1}^{2} - 2 x_{2}^{2}, \frac{4 x_{1}^{2} + 4 x_{2}^{2} - x_{1}^{4} - x_{2}^{4} - 2 x_{1}^{2} x_{2}^{2} - 1}{\sqrt{3}}] .

(17)

In Appendix A, we implement the higher dimensional encoding induced by

φ

calling the function

BlochVector [SVMEncoder [x, 2]]

and removing the null components.

The following distances, respectively, the Hilbert–Schimidt distance, trace distance, Bures distance and Hellinger distance, are often considered and can be used for nearest mean classification:

d_{HS} (ρ_{1}, ρ_{2}) = \sqrt{tr | ρ_{1} - ρ_{2} |^{2}},

d_{tr} (ρ_{1}, ρ_{2}) = \frac{1}{2} tr | ρ_{1} - ρ_{2} |,

d_{B} (ρ_{1}, ρ_{2}) = \sqrt{2 - 2 tr \sqrt{\sqrt{ρ_{1}} ρ_{2} \sqrt{ρ_{1}}}},

d_{He} (ρ_{1}, ρ_{2}) = \sqrt{2 - 2 tr \sqrt{ρ_{1}} \sqrt{ρ_{2}}},

where

| A | = \sqrt{A^{†} A}

is the modulus of the operator A. The measures induce different geometries. The set of states of a qubit is equivalent to the Bloch sphere for the Hilbert–Schmidt distance

d_{HS}

and for the trace distance

d_{tr}

, and to the Uhlmann hemisphere for the Bures distance

d_{B}

. For higher dimensions, the geometries induced by the Hilbert–Schmidt distance and the trace distance also differ. The Bures distance and trace distance are useful measures for quantifying the states distinguishability. The Bures distance is an optimized Kullback–Leibler distance between output statistics over all quantum measurements. The trace distance is a function of the probability to successfully discriminate two states in a single measurement optimized over all quantum measurements. As mentioned above, in the bi-dimensional case there is the equivalence of the normalized trace distance and the Euclidean distance (16). Moreover, one can see that the trace distance between pure states is equal to half of the Euclidean distance between the respective Bloch vectors. Therefore, in the Mathematica code of Appendix B, the function

CentroidClassify

for nearest mean classification based on trace distance is equivalently defined by means of the standard norm.

Within the paradigm of quantum-inspired classification, the Bures distance and the Hellinger distance can be used to define other classifiers that evaluate these distances for the nearest mean classification. Let us consider a binary classification problem and the quantum centroids (3) of the two classes. We can define a classification algorithm that evaluates the Bures distance between the pure quantum state encoding a test point and the quantum centroids that are not pure in general. The fidelity between density operators, defined as

F (ρ_{1}, ρ_{2}) = {(tr \sqrt{\sqrt{ρ_{1}} ρ_{2} \sqrt{ρ_{1}}})}^{2}

, reduces to

F (ρ_{1}, ρ_{2}) = 〈 ψ_{1} | ρ_{2} | ψ_{1} 〉

when

ρ_{1} = |ψ_{1}〉 〈ψ_{1}|

. Therefore, the Bures distance between the pure state

ρ_{\hat{x}}

encoding the test point

\hat{x}

and the quantum centroid

ρ_{i}

is:

d_{B} (ρ_{x}, ρ_{i}) = \sqrt{2 - 2 \sqrt{\frac{1}{d} (1 + (d - 1) b^{(x)} \cdot b^{(i)})}} \equiv D_{B} (b^{(x)}, b^{(i)}),

(18)

where

b^{(x)}

and

b^{(i)}

are the Bloch vectors of

ρ_{x}

and

ρ_{i}

, respectively, and d is the dimension of the Hilbert space of the quantum encoding. The formula (18) can be directly derived from

tr (ρ_{1} ρ_{2}) = \frac{1}{d} (1 + (d - 1) b^{(1)} \cdot b^{(2)})

, which is an immediate consequence of the fact that the generalized Pauli matrices are traceless and satisfy

tr (σ_{i} σ_{j}) = 2 δ_{i j}

. An example of the nearest mean classifier based on the Bures distance can be defined by Algorithm 1.

The quantum encodings at line 1 and line 3 of Algorithm 1 can be realized by the function

SVMEncoder [x, 1]

, defined in Appendix A, for instance. At line 5, the quantum centroids are constructed according to (3); alternatively, the classifier can calculate the centroids in the feature space from the Bloch vectors of the quantum states encoding the training points like in (10), where in general, the resulting centroid vectors

{\bar{x}}_{i}

are not the Bloch vectors of density operators.

Algorithm 1:Quantum-inspired nearest mean classifier based on Bures distance.

5. Numerical Results and Discussion

We focus on some numerical results obtained running the considered quantum-inspired classifiers on some datasets. In Appendix A, there is the Mathematica code of the tests on the Helstrom classifier, and in Appendix B, there is the Mathematica code of the implementation of the quantum-inspired NMC (the code is also available at the following repository: github.com/leporini/classification). As a benchmark for testing the Helstrom classifier, we applied the DB-SCAN clustering algorithm [19] to a moons dataset, obtaining the classification of Figure 1.

The first test provides the quantum encoding of the bi-dimensional input vectors into a five-dimensional space according to (9) and (10) and the execution of the Helstrom classifier on the moons dataset. The obtained decision boundary and the misclassified points are shown in Figure 2. The same classification task has been tackled by a classical SVM with a linear kernel, which returns the decision line and the support vectors depicted in Figure 3. The comparison of the outputs reveals that the execution of the Helstrom classifier returns the decision boundary of an SVM with the linear kernel.

The considered test on the performance of the Helstrom classifier over the moons dataset can be repeated considering a smaller space encoding the data points into density operators of a qubit according to (11). Since we are considering a feature map

φ : R^{2} \to R^{3}

, which represents data in a lower dimensional space, the classifier is less accurate as expected. The accuracy of the Helstrom classifier, over the training set, in the two cases is:

A c c_{R^{5}}^{Hel} = 0.8455, A c c_{R^{3}}^{Hel} = 0.503 .

(19)

Accuracy is sensitive to the dimension, as shown in Figure 4.

In Figure 5, there is the decision boundary found by the Helstrom classifier, with misclassified points, in the lowest dimensional case. Figure 6 shows the output of an SVM with a linear kernel, and its behavior is confirmed as similar to the Helstrom classifier.

In the case of a training set where the centroids of the two classes coincide, the Helstrom classifier does not work because it is not able to perform a state discrimination. Let us consider the dataset represented in Figure 7 with coinciding centroids; on the one hand, the Hesltrom classifier is useless, and on the other hand, RandomForest, Naive Bayes classifier, and Nearest Neighbors present the following high accuracies on the training set:

A c c^{RF} = 0.975, A c c^{NB} = 0.955, A c c^{NN} = 0.985 .

(20)

Considering a dataset with distinguishable but close centroids, the performance of the Helstrom classifier is poor with respect to existing classical methods. For example, let us consider the training set represented in Figure 8. The accuracies of the Helstrom classifier, the Random Forest and the Nearest Neighbors can be compared, observing that the performances of the classical algorithms are definitely better in terms of the correct classification:

A c c^{Hel} = 0.73, A c c^{RF} = 0.98, A c c^{NN} = 0.85 .

(21)

In Figure 9, there are the misclassified points by the Helstrom classifier.

The numerical results presented in this section reveal the connection between the Helstrom classifier and SVM with linear kernel; in particular, one can observe that different strategies of quantum encoding by means of Bloch vectors of density operators correspond to different kernel tricks.

In Section 4, we showed that from a geometric viewpoint, a quantum-inspired nearest mean classifier based on the trace distance between density operators can be equivalently implemented considering the Euclidean distance between Bloch vectors. We considered both the implementations of this quantum-inspired NMC, finding the exact equivalence in terms of accuracy on the moons dataset:

A c c^{Density} = A c c^{Bloch} = 0.865 .

(22)

A c c^{Density}

is the accuracy of the NMC based on the quantum encoding of data points into density operators according to (14) and the calculation of trace distances within this representation.

A c c^{Bloch}

is the accuracy of the NMC, which encodes the data points into Bloch vectors by means of the inverse of the stereographic projection and evaluates the Euclidean distance in the Bloch feature space.

6. Conclusions

In this paper, we analyzed some methods of quantum-inspired classification, highlighting a connection with support vector machines. After an introduction on the Bloch representation of quantum states in an arbitrary dimension, we considered the Helstrom quantum state discrimination applied to binary classification (the Helstrom classifier), observing that its execution is similar to an SVM with linear kernel. In particular, adopting a geometric viewpoint, we described how quantum encodings of feature vectors can be used to implement a kernel trick, improving the quality of classification. Moreover, if one considers multiple copies of the encoding quantum states to map real feature vectors into a space with higher dimensions (as performed in [6] to obtain a polynomial kernel for a quantum SVM with an exponential cost of space resources), we showed that the computational cost can be calmed down, deleting the redundancies in the resulting Bloch vectors. In this way, in quantum-inspired classification, one can define a nonlinear injective function to perform a kernel trick saving space and time.

We presented some experimental results on the moons dataset to exhibit the behavior of the Helstrom classifier as an SVM with a linear kernel. With this dataset, we have enlarged the dimension of the Hilbert space, changing the quantum encoding within the Bloch representation of the density operators. On the other hand, we gave a couple of examples where the Helstrom classifier does not work due to the difficult discrimination of the quantum states representing the centroids of the classes.

We also focused on quantum-inspired nearest mean classification that is based on the computation of operator distances between density matrices encodings feature vectors. For instance, the classifier can evaluate the trace distance or the Bures distance among encoding quantum states. We considered the classifier with trace distance showing that it is equivalent (in terms of classification accuracy) to the “geometric” classifier, which evaluates the Euclidean distance between Bloch vectors. Then, we proposed an algorithm based on Bures distance, which can be evaluated directly in terms of Bloch vectors. An empirical study on this quantum-inspired classifier is a matter for future works.

As a general consideration emerging from the present work, we point out that the geometric approach considering the Bloch representation of density matrices is suitable to describe quantum-inspired classification. This approach reveals a connection between the Helstrom classifier based on quantum state discrimination and SVM. The geometric viewpoint seems to be fruitful also to define quantum-inspired nearest mean classifiers.

The present work opens possible directions of investigation such as the full characterization of the kernel of the Helstrom classifier in order to complete the description of quantum state discrimination as the execution of a support vector machine. More in general, an interesting topic could be a satisfying geometric analysis of quantum-inspired machine learning algorithms beyond classifiers since the present paper suggests that the geometry of quantum states offers a novel machinery to deal with data.

Author Contributions

Conceptualization, R.L. and D.P.; software, R.L.; validation, D.P.; formal analysis, D.P.; writing—original draft preparation, R.L. and D.P.; writing—review and editing, D.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Q@TN consortium.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The code is also available at the following repository: github.com/leporini/classification (accessed on 28 August 2021).

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Mathematica Notebook for Quantum-Inspired Classifications

https://sites.pitt.edu/~tabakin/QDENSITY (accessed on 28 August 2021).

Needs [“ QDENSITYQdensity ” “ C : \ \ Qdensity . m ”]; KetBra : : usage = ″ | \frac{x}{d - 1} 〉 〈 \frac{y}{d - 1} | of C^{d}, where x, y \in {0, \dots, d - 1} and | 0 〉, | \frac{1}{d - 1} 〉, \dots, | 1 〉 are the elements of the canonical basis of C^{d} ″; KetBra [d_, x_, y_] : = SparseArray [{{x + 1, y + 1} \to 1}, {d, d}]; BlochVector : : usage = ″ BlochVector [ρ], where ρ is a density operator of C^{d} and d \geq 2 . \ n \ t Returns the corresponding Bloch vector b \in R^{d^{2} - 1} . ″; BlochVector [ρ_] : = Block [{d, sigma}, d = Dimensions [ρ] [[2]]; sigma = {}; Do [AppendTo [sigma, KetBra [d, l, k] + KetBra [d, k, l]], {l, 0, d - 2}, {k, l + 1, d - 1}]; Do [AppendTo [sigma, - i KetBra [d, k, l] + i KetBra [d, l, k]], {l, 0, d - 2}, {k, l + 1, d - 1}]; Do [AppendTo [sigma, \sqrt{\frac{2}{l (l + 1)}} (Sum [KetBra [d, k, k], {k, 0, l - 1}] - l KetBra [d, l, l]), {l, 1, d - 1}]; Tr [ρ . #] & / @ sigma]; BlochVectorInverse : : usage = ″ BlochVectorInverse [b] \ n \ t Returns the density operator of C^{d} of the Bloch vector b \in R^{d^{2} - 1} . \ n \ t Not any vector b of the unit hypershpere gives rise a density operator, since the output is not a semi - definite positive operator (i . e . there exists a negative eigenvalue), \ n \ t but all vectors of length ⩽ \frac{2}{d} give rise a density operator . "; BlochVectorInverse [b_] : = Block [{d, sigma, ρ}, d = Ceiling [\sqrt{Length [b] + 1}]; sigma = {}; Do [AppendTo [sigma, KetBra [d, l, k] + KetBra [d, k, l]], {l, 0, d - 2}, {k, l + 1, d - 1}]; Do [AppendTo [sigma, - i KetBra [d, k, l] + i KetBra [d, l, k]], {l, 0, d - 2}, {k, l + 1, d - 1}]; Do [AppendTo [sigma, \sqrt{\frac{2}{l (l + 1)}} (Sum [KetBra [d, k, k], {k, 0, l - 1}] - l KetBra [d, l, l])], {l, 1, d - 1}]; b = \frac{2}{d} PadRight [b, d]; ρ = \frac{1}{d} IdentityMatrix [d] + \sqrt{\frac{d - 1}{2 d}} Sum [b [[j]] sigma [[j]], {j, 1, d^{\land} 2 - 1}]; Return [ρ]]; SVMEncoder : : usage = ″ SVMEncoder [x, type], where x \in R^{d - 1} (keeps the value | x | and normalizes the new vector if type = 1) . \ n \ t When type = 2, it uses the inverse of the stereographic projection . \ n \ t Returns the density operator of C^{d}, which is projection - operator (that projects over the closed subspace determined by the normalized vector) if type is 1 or 2 . \ n \ t It is a mixed state with the corresponding Bloch vector of length \frac{2}{d} if type = 3 . "; SVMEncoder [x_, type_] : = Block [{u, ρ}, u = Switch [type, 1, Normalize [Append [x, 1]], 2, \frac{2}{Total [x^{\land} 2] + 1} Append [x, \frac{1}{2} (Total [x^{\land} 2] - 1)], 3, Normalize [Append [x, 1]]]; If [type = = 3, ρ = BlochVectorInverse [u], ρ = Outer [Times, u, u]]; Return [ρ]]; BinaryClassifier : : usage = " BinaryClassifier [DClass 1, DClass 2], where DClass 1 and DClass 2 are density operators of the class 1 and 2, respectively . \ n \ t Returns the projection - operators P + and P - . "; BinaryClassifier [DClass 1_, DClass 2_] : = Block [{p, c, vals, vecs, prjs, prj 1, prj 2}, p = \frac{Length [DClass 1]}{Length [DClass 1] + Length [DClass 2]}; c = p Total [DClass 1] - (1 - p) Total [DClass 2]; {vals, vecs} = Eigensystem [c]; vecs = Normalize / @ vecs; prjs = Outer [Times, Conjugate [#], #] & / @ vecs; prj 1 = Total [Pick [prjs, NonNegative [vals]]]; prj 2 = Total [Pick [prjs, Negative [vals]]]; Return [{prj 1, prj 2}]]; HelstromClassify : : usage = " HelstromClassify [{P +, P -}, ρ] . \ n \ t Returns the class 1 or 2 . "; HelstromClassify [{prj 1_, prj 2_}, ρ_] : = If [Tr [prj 1 . #] \geq Tr [prj 2 . #], 1, 2] & / @ ρ; ToDensity [B_] : = Block [{sigma, ρ}, sigma = {KetBra [2, 0, 1] + KetBra [2, 1, 0], - i KetBra [2, 1, 0] + i KetBra [2, 0, 1], KetBra [2, 0, 0] - KetBra [2, 1, 1]}; ρ = {}; Do [AppendTo [ρ, \frac{1}{2} (IdentityMatrix [2] + Sum [B [[i, j]] sigma [[j]], {j, 3}])], {i, Length [B]}]; Return [ρ]]; unstandardize [point_, sd_, mu_] : = point * sd + mu; getvectors [cf_ClassifierFunction, X_] : = With [{p = X}, With [{sd = StandardDeviation [p], mu = Mean [p]}, unstandardize [#, sd, mu] & / @ cf [[1]] [“ Model ”] [“ TrainedModel ”] [[1]] [“ supportVectors ”]]] getplane [svmcf_ClassifierFunction, X_] : = With [{tm = svmcf [[1]] [“ Model ”] [“ TrainedModel ”] [[1]], points = X}, Module [{sv = tm [“ supportVectors ”], svc = tm [“ supportVectorCoeffifificients ”], sd = StandardDeviation [points], mu = Mean [points], dim = Length [points [[1]]], vecs, offset}, vecs = Rest [RotationMatrix [{svc . sv, PadRight [{1}, dim]}] . IdentityMatrix [dim]]; offset = With [{vars = Array [x, dim]}, Values @ First @ FindInstance [vars . (svc . sv) = = tm [“ rho ”], vars]]; vecs = sd * # & / @ vecs; offset = unstandardize [offset, sd, mu]; If [dim > 2, InfinitePlane [offset, vecs], InfiniteLine [offset, First [vecs]]]]]

DBSCAN with a toy dataset

circle [r_, theta_] : = {r Sin [theta], r Cos [theta]}; {train, test} = With [{rot = RotationTransform [π, {0, 0}], tra = TranslationTransform [{1, 2.5}], pts = circle @ @ @ RandomVariate [UniformDistribution [{{2, 3}, {0, Pi}}], 2000]}, TakeDrop [RandomSample @ Join [tra [rot [pts]], pts], 2000]]; cl = ClusterClassify [train, Method \to “ DBSCAN ”] ListPlot [Pick [test, cl [test], #] & / @ Range [2], PlotStyle \to Directive [PointSize [0.013], Opacity [0.7]], AspectRatio \to 1, Frame \to True, Axes \to False] {X 1, X 2} = Pick [train, cl [train], #] & / @ Range [2]; {X 1 Test, X 2 Test} = Pick [test, cl [test], #] & / @ Range [2];

D 1 = SVMEncoder [#, 1] & / @ X 1; D 2 = SVMEncoder [#, 1] & / @ X 2; {p 1, p 2} = BinaryClassifier [D 1, D 2]; y 1 = HelstromClassify [{p 1, p 2}, D 1]; y 2 = HelstromClassify [{p 1, p 2}, D 2]; accuracy [y 1_, y 2_] : = N [\frac{Count [y 1, 1] + Count [y 2, 2]}{Length [y 1] + Length [y 2]}]; accuracy [y 1, y 2] Show [Plot 3 D [{Tr [p 1 . SVMEncoder [{x 1, x 2}, 1]], Tr [p 2 . SVMEncoder [{x 1, x 2}, 1]]}, {x 1, - 10, 10}, {x 2, - 10, 10}, PlotStyle \to {Green, Red}, ViewPoint \to {0, 0, \infty}, Lighting \to {{“ Ambient ”, White}}, Mesh \to False], Graphics 3 D [{RGBColor [0, 1, 0, 0.5], PointSize [0.013], Point [Append [#, 1] & / @ X 1]}], Graphics 3 D [{RGBColor [1, 0, 0, 0.5], PointSize [0.013], Point [Append [#, 1] & / @ X 2]}]] y = Join [y 1, y 2]; X = Join [X 1, X 2]; HelstromX 1 = Pick [X, y, 1]; HelstromX 2 = Pick [X, y, 2]; svm = Classify [X \to y, Method \to {“ SupportVectorMachine ”, “ KernelType ” \to “ Linear ”}]; Graphics [{Point [X], Blue, PointSize [Large], Point [getvectors [svm, X]], Opacity [0.5], Gray, getplane [svm, X]}] AssociationMap [ClassifierMeasurements [Classify [X \to y, Method \to {“ SupportVectorMachine ”, “ KernelType ” \to #}], 1 \to HelstromX 1, 2 \to HelstromX 2, “ Accuracy ”] &, {“ Linear ”, “ RadialBasisFunction ”, “ Polynomial ”, “ Sigmoid ”}] D 1 Test = SVMEncoder [#, 1] & / @ X 1 Test; D 2 Test = SVMEncoder [#, 1] & / @ X 2 Test; y 1 = HelstromClassify [{p 1, p 2}, D 1 Test]; y 2 = HelstromClassify [{p 1, p 2}, D 2 Test]; accuracy [y 1, y 2]

0.8455 is the Helstrom accuracy on the training set

Linear \to 0.9475, RadialBasisFunction \to 0.9915, Polynomial \to 0.996, Sigmoid \to 0.889

0.8365 is the Helstrom accuracy on the test set

D 1 C = ToDensity [Normalize [Append [#, 1]] & / @ X 1]; D 2 C = ToDensity [Normalize [Append [#, 1]] & / @ X 2]; {p 1 C, p 2 C} = BinaryClassifier [D 1 C, D 2 C]; y 1 = HelstromClassify [{p 1 C, p 2 C}, D 1 C]; y 2 = HelstromClassify [{p 1 C, p 2 C}, D 2 C]; accuracy [y 1, y 2] Show [Plot 3 D [{Tr [p 1 C . ToDensity [{Normalize [{x 1, x 2, 1}]}] [[1]]], Tr [p 2 C . ToDensity [{Normalize [{x 1, x 2, 1}]}] [[1]]]}, {x 1, - 10, 10}, {x 2, - 10, 10}, PlotStyle \to {Green, Red}, ViewPoint \to {0, 0, \infty}, Lighting \to {{“ Ambient ”, White}}, Mesh \to False], Graphics 3 D [{RGBColor [0, 1, 0, 0.5], PointSize [0.013], Point [Append [#, 1] & / @ X 1]}], Graphics 3 D [{RGBColor [1, 0, 0, 0.5], PointSize [0.013], Point [Append [#, 1] & / @ X 2]}]] y = Join [y 1, y 2]; HelstromX 1 = Pick [X, y, 1]; HelstromX 2 = Pick [X, y, 2]; svm = Classify [X \to y, Method \to {“ SupportVectorMachine ”, “ KernelType ” \to “ Linear ”}]; Graphics [{Point [X], Blue, PointSize [Large], Point [getvectors [svm, X]], Opacity [0.5], Gray, getplane [svm, X]}] AssociationMap [ClassifierMeasurements [Classify [X \to y, Method \to {“ SupportVectorMachine ”, “ KernelType ” \to #}], 1 \to HelstromX 1, 2 \to HelstromX 2, “ Accuracy ”] &, {“ Linear ”, “ RadialBasisFunction ”, “ Polynomial ”, “ Sigmoid ”}]

0.503 is the Helstrom accuracy on the training set

Linear \to 0.9925, RadialBasisFunction \to 0.9905, Polynomial \to 0.9955, Sigmoid \to 0.8655

Heltrom classifier gives the best average success probability

(1 / 2 + 1 / 2 (1 / 2 | ρ_{1} - ρ_{2} |_{1}),

but it does not work with the same centroid (such as 1/2 I).

X 1 = Table [{Tan [θ], 0}, \{θ, 0, 2 π, \frac{2 π}{99}\}]; X 2 = - RotateLeft [#, 1] & / @ X 1; ListPlot [{X 1, X 2}, PlotStyle \to Directive [PointSize [0.013], Opacity [0.7]], AspectRatio \to 1, Frame \to True, Axes \to False] D 1 C = ToDensity [Normalize [Append [#, 1]] & / @ X 1]; D 2 C = ToDensity [Normalize [Append [#, 1]] & / @ X 2]; MatrixForm [#] & / @ N [\{\frac{Length [D 1 C]}{Length [D 1 C] + Length [D 2 C]} Total [D 1 C] - \frac{Length [D 2 C]}{Length [D 1 C] + Length [D 2 C]} Total [D 2 C], \frac{Total [D 1 C]}{Length [D 1 C]}, \frac{Total [D 2 C]}{Length [D 2 C]}\}] AssociationMap [ClassifierMeasurements [Classify [1 \to X 1, 2 \to X 2, Method \to #], 1 \to X 1, 2 \to X 2, “ Accuracy ”] &, {“ RandomForest ”, “ NaiveBayes ”, “ SupportVectorMachine ”, “ NearestNeighbors ”}]

(\begin{matrix} (\begin{matrix} 0 & 0 \\ 0 & 0 \end{matrix}) \\ (\begin{matrix} 0.82014 & 0 \\ 0 & 0.17986 \end{matrix}) \\ (\begin{matrix} 0.82014 & 0 \\ 0 & 0.17986 \end{matrix}) \end{matrix})

RandomForest \to 0.975, NaiveBayes \to 0.955, SupportVectorMachine \to 0.83, NearestNeighbors \to 0.985

X 1 = N [Flatten [Table [\{\{\frac{Cot [θ] + Csc [θ]}{\sqrt{2}}, \frac{Tan [\frac{θ}{2}]}{\sqrt{2}}\}, \{- \frac{Cot [\frac{θ}{2}]}{\sqrt{2}}, \frac{(- 1 + Cos [θ]) Csc [θ]}{\sqrt{2}}\}\}, \{θ, \frac{π}{100}, π - \frac{π}{100}, \frac{2 π}{100}\}], 1]]; X 2 = N [Flatten [Table [\{\{\frac{Tan [\frac{θ}{2}]}{\sqrt{2}}, - \frac{Cot [\frac{θ}{2}]}{\sqrt{2}}\}, \{\frac{(- 1 + Cos [θ]) Csc [θ]}{\sqrt{2}}, \frac{Cot [θ] + Csc [θ]}{\sqrt{2}}\}\}, \{θ, \frac{π}{200}, π - \frac{π}{200}, \frac{2 π}{100}\}], 1]]; ListPlot [{X 1, X 2}, PlotStyle \to Directive [PointSize [0.013], Opacity [0.7]], AspectRatio \to 1, Frame \to True, Axes \to False] D 1 = SVMEncoder [#, 1] & / @ X 1; D 2 = SVMEncoder [#, 1] & / @ X 2; {p 1, p 2} = BinaryClassifier [D 1, D 2]; y 1 = HelstromClassify [{p 1, p 2}, D 1]; y 2 = HelstromClassify [{p 1, p 2}, D 2]; accuracy [y 1, y 2] Show [Plot 3 D [{Tr [p 1 . SVMEncoder [{x 1, x 2}, 1]], Tr [p 2 . SVMEncoder [{x 1, x 2}, 1]]}, {x 1, - 10, 10}, {x 2, - 10, 10}, PlotStyle \to {Green, Red}, ViewPoint \to {0, 0, \infty}, Lighting \to {{“ Ambient ”, White}}, Mesh \to False], Graphics 3 D [{RGBColor [0, 1, 0, 0.5], PointSize [0.013], Point [Append [#, 1] & / @ X 1]}], Graphics 3 D [{RGBColor [1, 0, 0, 0.5], PointSize [0.013], Point [Append [#, 1] & / @ X 2]}]] MatrixForm [#] & / @ Chop [N [\{\frac{Length [D 1]}{Length [D 1] + Length [D 2]} Total [D 1] - \frac{Length [D 2]}{Length [D 1] + Length [D 2]} Total [D 2], \frac{Total [D 1]}{Length [D 1]}, \frac{Total [D 2]}{Length [D 2]}\}]] AssociationMap [ClassifierMeasurements [Classify [1 \to X 1, 2 \to X 2, Method \to #], 1 \to X 1, 2 \to X 2, “ Accuracy ”] &, {“ RandomForest ”, “ NaiveBayes ”, “ SupportVectorMachine ”, “ NearestNeighbors ”}]

0.73 is the Helstrom accuracy on the training set

(\begin{matrix} (\begin{matrix} 0.250031 & 12.5 & 0 \\ 12.5 & - 0.250031 & 0 \\ 0 & 0 & 0 \end{matrix}) \\ (\begin{matrix} 0.375 & 0.125 & 0 \\ 0.125 & 0.375 & 0 \\ 0 & 0 & 0.25 \end{matrix}) \\ (\begin{matrix} 0.369999 & - 0.125 & 0 \\ - 0.125 & 0.380001 & 0 \\ 0 & 0 & 0.25 \end{matrix}) \end{matrix})

RandomForest \to 0.98, NaiveBayes \to 0.545, SupportVectorMachine \to 0.45, NearestNeighbors \to 0.85

FeatureVector : : usage = " FeatureVector [m, n, type], where m is the dimension of the input vector, n is the number of copies of the density operator, type of the SVMEncoder \ n \ t Returns the feature vector . "; FeatureVector [m_, n_, type_] : = Block [{ρ, state, x}, ρ = SVMEncoder [Array [x, m], type]; state = Nest [TensorProductQD [ρ, #] &, ρ, n - 1]; DeleteDuplicates [DeleteCases [BlochVector [state], 0]]]; FeatureVector [2, 1, 2]

(\begin{matrix} \frac{8 x [1] x [2]}{{(1 + x {[1]}^{2} + x {[2]}^{2})}^{2}} \\ \frac{4 x [1] (- 1 + x {[1]}^{2} + x {[2]}^{2})}{{(1 + x {[1]}^{2} + x {[2]}^{2})}^{2}} \\ \frac{4 x [2] (- 1 + x {[1]}^{2} + x {[2]}^{2})}{{(1 + x {[1]}^{2} + x {[2]}^{2})}^{2}} \\ \frac{4 x {[1]}^{2}}{{(1 + x {[1]}^{2} + x {[2]}^{2})}^{2}} - \frac{4 x {[2]}^{2}}{{(1 + x {[1]}^{2} + x {[2]}^{2})}^{2}} \\ \frac{4 x {[1]}^{2}}{\sqrt{3} {(1 + x {[1]}^{2} + x {[2]}^{2})}^{2}} + \frac{4 x {[2]}^{2}}{\sqrt{3} {(1 + x {[1]}^{2} + x {[2]}^{2})}^{2}} - \frac{2 {(- 1 + x {[1]}^{2} + x {[2]}^{2})}^{2}}{\sqrt{3} {(1 + x {[1]}^{2} + x {[2]}^{2})}^{2}} \end{matrix})

State [n_, x_] : = Block [{ρ, state}, ρ = SVMEncoder [x, 1]; state = Nest [TensorProductQD [ρ, #] &, ρ, n - 1]]; HelstromAccuracy [n_, X 1_, X 2_] : = Block [{D 1, D 2, p 1, p 2, y 1, y 2, acc}, acc = {}; Do [D 1 = State [i, #] & / @ X 1; D 2 = State [i, #] & / @ X 2; {p 1, p 2} = BinaryClassifier [D 1, D 2]; y 1 = HelstromClassify [{p 1, p 2}, D 1]; y 2 = HelstromClassify [{p 1, p 2}, D 2]; AppendTo [acc, \{Length [FeatureVector [2, i, 1]], N [\frac{Count [y 1, 1] + Count [y 2, 2]}{Length [y 1] + Length [y 2]}]\}], {i, n}]; Return [acc]]; acc = HelstromAccuracy [6, X 1, X 2]; ListLinePlot [acc, AxesLabel \to {“ dimension ”, “ accuracy ”}]

Appendix B. Mathematica Notebook for the Quantum-Inspired NMC

Quantum-inspired NMC with the moons dataset defined in Appendix A

CentroidClassify [{mean 1_, mean 2_}, D_] : = If [Tr [ConjugateTranspose [mean 1 - #] . (mean 1 - #)] \leq Tr [ConjugateTranspose [mean 2 - #] . (mean 2 - #)], 1, 2] & / @ D; mean 1 = Mean [D 1]; mean 2 = Mean [D 2]; y 1 = CentroidClassify [{mean 1, mean 2}, D 1]; y 2 = CentroidClassify [{mean 1, mean 2}, D 2]; accuracy [y 1, y 2] B 1 = BlochVector [#] & / @ D 1; B 2 = BlochVector [#] & / @ D 2; mean 1 = Mean [B 1]; mean 2 = Mean [B 2]; CentroidClassify [{mean 1_, mean 2_}, B_, p_] : = If [Norm [mean 1 - #, p] \leq Norm [mean 2 - #, p], 1, 2] & / @ B; y 1 = CentroidClassify [{mean 1, mean 2}, B 1, 2]; y 2 = CentroidClassify [{mean 1, mean 2}, B 2, 2]; accuracy [y 1, y 2]

0.865 is the accuracy of the NMC with trace distance between density operators

0.865 is the accuracy of the NMC with Euclidean distance between Bloch vectors

References

Cervantes, J.; Garcia-Lamont, F.; Rodríguez-Mazahua, L.; Lopez, A. A comprehensive survey on support vector machine classification: Applications, challenges and trends. Neurocomputing 2020, 408, 189–215. [Google Scholar] [CrossRef]
Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
Schuld, M.; Killoran, N. Quantum machine learning in feature Hilbert spaces. Phys. Rev. Lett. 2019, 122, 040504. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Schuld, M. Supervised quantum machine learning models are kernel methods. arXiv 2021, arXiv:2101.11020. [Google Scholar]
Anguita, D.; Ridella, S.; Rivieccio, F.; Zunino, R. Quantum optimization for training support vector machines. Neural Netw. 2003, 16, 763–770. [Google Scholar] [CrossRef]
Rebentrost, P.; Mohseni, M.; Lloyd, S. Quantum support vector machine for big data classification. Phys. Rev. Lett. 2014, 113, 130503. [Google Scholar] [CrossRef] [PubMed]
Willisch, D. Support vector machine on the D-Wave quantum annealer. Comput. Phys. Commun. 2020, 248, 107006. [Google Scholar] [CrossRef]
Ding, C.; Bao, T.-Y.; Huang, H.-L. Quantum-inspired support vector machine. arXiv 2021, arXiv:1906.08902. [Google Scholar]
Harney, C.; Pirandola, S.; Ferraro, A.; Paternostro, M. Entanglement classification via neural network quantum states. New J. Phys. 2020, 22, 045001. [Google Scholar] [CrossRef]
You, C.; Quiroz-Juárez, M.A.; Lambert, A.; Bhusal, N.; Dong, C.; Perez-Leija, A.; Javaid, A.; León-Montiel, R.D.J.; Magaña-Loaiza, O.S. Identification of light sources using machine learning. Appl. Phys. Rev. 2020, 7, 021404. [Google Scholar] [CrossRef]
Kudyshev, Z.A.; Bogdanov, S.I.; Isacsson, T.; Kildishev, A.V.; Boltasseva, A.; Shalaev, V.M. Rapid classification of quantum sources enabled by machine learning. Adv. Quantum Technol. 2020, 3, 2000067. [Google Scholar] [CrossRef]
Bae, J.; Kwek, L.C. Quantum state discrimination and its applications. J. Phys. A 2015, 48, 083001. [Google Scholar] [CrossRef] [Green Version]
Park, D.; Blank, C.; Petruccione, F. The theory of the quantum kernel-based binary classifier. Phys. Lett. A 2020, 384, 126422. [Google Scholar] [CrossRef] [Green Version]
Gao, J.; Qiao, L.F.; Jiao, Z.Q.; Ma, Y.C.; Hu, C.Q.; Ren, R.J.; Yang, A.L.; Tang, H.; Yung, M.H.; Jin, X.M. Experimental Machine Learning of Quantum States. Phys. Rev. Lett. 2018, 120, 240501. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Giordani, T.; Suprano, A.; Polino, E.; Acanfora, F.; Innocenti, L.; Ferraro, A.; Paternostro, M.; Spagnolo, N.; Sciarrino, F. Machine Learning-Based Classification of Vector Vortex Beams. Phys. Rev. Lett. 2020, 124, 160401. [Google Scholar] [CrossRef] [PubMed]
Sergioli, G.; Bosyk, G.M.; Santucci, E.; Giuntini, R. A quantum-inspired version of the classification problem. Int. J. Theor. Phys. 2017, 56, 3880–3888. [Google Scholar] [CrossRef]
Giuntini, R.; Freytes, H.; Park, D.K.; Blank, C.; Holik, F.; Chow, K.L.; Sergioli, G. Quantum state discrimination for supervised classification. arXiv 2021, arXiv:2104.00971v1. [Google Scholar]
Helstrom, C.W. Quantum detection and estimation theory. J. Stat. Phys. 1969, 1, 231–252. [Google Scholar] [CrossRef] [Green Version]
Ester, M.; Kriegel, H.P.; Sander, J.; Xu, X. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96), Portland, OR, USA, 2–4 August 1996; Simoudis, E., Han, J., Fayyad, U.M., Eds.; AAAI Press: Palo Alto, CA, USA, 1996; pp. 226–231, ISBN 1-57735-004-9. [Google Scholar]

Figure 1. DBSCAN classification.

Figure 2. Helstrom classification with highlighted points classified incorrectly.

Figure 3. SVM with linear kernel behaves similarly to the Helstrom method.

Figure 4. Accuracy of the Helstrom classifier as a function of the dimension of the space.

Figure 5. Helstrom method applied to the smallest space with highlighted points classified incorrectly.

Figure 6. SVM with linear kernel behaves similarly to the Helstrom method in the smallest space.

Figure 7. Helstrom classification is useless with the same centroids, while some classical methods work.

Figure 8. Helstrom classification is useless with centroids close to each other, while some classical methods work.

Figure 9. Helstrom method with highlighted points classified incorrectly.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Leporini, R.; Pastorello, D. Support Vector Machines with Quantum State Discrimination. Quantum Rep. 2021, 3, 482-499. https://doi.org/10.3390/quantum3030032

AMA Style

Leporini R, Pastorello D. Support Vector Machines with Quantum State Discrimination. Quantum Reports. 2021; 3(3):482-499. https://doi.org/10.3390/quantum3030032

Chicago/Turabian Style

Leporini, Roberto, and Davide Pastorello. 2021. "Support Vector Machines with Quantum State Discrimination" Quantum Reports 3, no. 3: 482-499. https://doi.org/10.3390/quantum3030032

APA Style

Leporini, R., & Pastorello, D. (2021). Support Vector Machines with Quantum State Discrimination. Quantum Reports, 3(3), 482-499. https://doi.org/10.3390/quantum3030032

Article Menu

Support Vector Machines with Quantum State Discrimination

Abstract

1. Introduction

2. Basics

3. Geometric Approach to Quantum-Inspired Classifications

4. Quantum-Inspired Nearest Mean Classifications

5. Numerical Results and Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A. Mathematica Notebook for Quantum-Inspired Classifications

Appendix B. Mathematica Notebook for the Quantum-Inspired NMC

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI