Supervised Quantum State Discrimination

Combining machine learning and quantum information. [...]


Introduction
Combining machine learning and quantum information ideas has been a fruitful line of research in the last few years [1][2][3]. These new proposals can be divided in three main categories: using machine learning for enhancing numerical methods and experimental techniques, devising quantum algorithms to attack classical machine learning problems, and studying quantum generalisations of learning tasks. In this contribution we address the latter category with an example of a supervised learning with quantum data. For the sake of our discussion we use a very broad characterisation of quantum learning tasks. First of all we consider a genuine quantum information theory setting, where an agent receives a quantum source (of quantum or classical information), and can operate on it with any processing device allowed by the rules of quantum mechanics. Secondly, we refer to tasks in which a machine should be trained to perform a certain quantum operation and this training can be done through quantum processing, that means with quantum training data and quantum operations.
In a classical supervised learning classification problem a machine is given a set of labelled data, and uses this information to produce a classifier which can be used to assign a label to new unlabelled data. In a probabilistic setting the data x ∈ X and the labels y ∈ Y are distributed according to the probability distribution P : (X, Y) → [0, 1] and a classifier is a conditional probability distribution C(Y|X). For each P there exist an optimal classifier which minimises the probability of misclassification, and a good learning algorithm is expected to give a good approximation of the optimal classifier, at least when the training dataset becomes large. Even better, if the agent has some prior information on the possible distributions one can define what is the optimal training and test algorithm, which is the one with the lowest probability on average, averaging over all the possible distributions assuming the prior.
Since classical learning tasks can be studied in a probabilistic formulation, a straightforward generalisation of a learning task can be obtained by reinterpreting the task on quantum states rather then probability distributions.
The analogy is clear and fundamental in quantum information theory, as most of the information theoretic tasks that can be defined using probability distribution, like compression and communication, can be generalised to quantum states.

The Model
On the basis of these observation, we consider the following problem: an agent is asked to correctly guess the state of a qubit system X initialised with equal probability in the state ρ 1 or ρ 2 , bur ρ 1 and ρ 2 are unknown to the agent. Instead, he receives as a training set for the task a system A made of n qubits known to be in the state ρ 1 and a system B of n other qubits known to be in the state ρ 2 . The agent may have some kind of prior information on ρ 1 and ρ 2 , like their purity or their overlap. The question we ask is to find the two-outcome measurementM ≡ {Π 1 , Π 2 } on the joint input made of the training and test data which minimises the probability of misclassification error, calculated by averaging over all the possible couples ρ 1 and ρ 2 according to the prior information. Since the input state of the machine can be one of two alternatives τ 1 = ρ ⊗n 1 ⊗ ρ 1 ⊗ ρ ⊗n 2 (if X in ρ 1 ) and , the average probability reads where dµ(ρ 1 , ρ 2 ) is a classical probability distribution on the states ρ 1 , ρ 2 , and encodes the prior information of the agent. This problem can be translated in a binary state discrimination task for two known effective states Therefore, defined Θ ≡ α (n) − β (n) , our figure of merit can be written as with the symbol · · · 1 indicating the trace norm.
In the limit of n → ∞ one expects that this quantity converges to the averages of the probability of error for the Helstrom measurement for known ρ 1 and ρ 2 , since the classical description of the template states can be recovered exactly, for example using tomography. In fact this limit is always a lower bound to the probability of error at finite n. We are interested in the finite size correction to this value.

Results
Our contribution consists in calculating P (n) err,min for a number of priors, generalising previuos results [4,5]. As detailed in the preprint [6] we extensively use the symmetric and covariant properties of Θ in order to reduce the problem to a simple analytic diagonalisation of 2 × 2 matrices. Moreover, we perform asymptotic expansions of the sum of eigenvalues arising from Equation (3) in order to obtain finite size corrections to the asymptotic limit. In particular we consider these cases: (i) ρ 1 , ρ 2 have assigned purities-the moduli of their Bloch vectors being respectively r 1 and r 2 ; uniform prior on the Bloch vector's directions, where the notation o 1 n indicate terms that goes to zero faster than 1 n . (ii) ρ 1 and ρ 2 are generically mixed qubit states, with a constant density Bloch sphere prior, P (n 1) err,min = 17 70 (iii) ρ 1 and ρ 2 are pure and they have a fixed overlap Tr[ρ 1 ρ 2 ] = sin 2 θ 2 ; uniform prior on the global orientation, P (n 1) err,min =