Minimax Estimation of Quantum States Based on the Latent Information Priors

We develop priors for Bayes estimation of quantum states that provide minimax state estimation. The relative entropy from the true density operator to a predictive density operator is adopted as a loss function. The proposed prior maximizes the conditional Holevo mutual information, and it is a quantum version of the latent information prior in classical statistics. For one qubit system, we provide a class of measurements that is optimal from the viewpoint of minimax state estimation.


Introduction
In quantum mechanics, the outcome of a measurement is subject to a probability distribution determined from the quantum state of the measured system and the measurement performed.The task of estimating the quantum state from the outcome of measurement is called the quantum estimation and it is a fundamental problem in quantum statistics [1][2][3].Tanaka and Komaki [4] and Tanaka [5] discussed quantum estimation using the framework of statistical decision theory and showed that Bayesian methods provide better estimation than the maximum likelihood method.In Bayesian methods, we need to specify a prior distribution on the unknown parameters of the quantum states.However, the problem of prior selection has not been fully discussed for quantum estimation [6].
The quantum state estimation problem is related to the predictive density estimation problem in classical statistics [7].This is a problem of predicting the distribution of an unobserved variable y based on an observed variable x.Suppose (x, y) ∼ p(x, y | θ), where θ denotes an unknown parameter.Based on the observed x, we predict the distribution p(y | x, θ) of y using a predictive density p(y | x).The plug-in predictive density is defined as pplug-in (y | x) = p(y | x, θ(x)), where θ(x) is some estimate of θ from x.The Bayesian predictive density with respect to a prior distribution dπ(θ) is defined as where dπ(θ | x) is the posterior distribution.We compare predictive densities using the framework of statistical decision theory.Specifically, a loss function L(q, p) is introduced that evaluates the difference between the true density q and the predictive density p.Then, the risk function R(θ, p) is defined as the average loss when the true value of the parameter is θ: R(θ, p) = L(p(y | x, θ), p(y | x))p(x | θ)dx.
A predictive density p * is called minimax if it minimizes the maximum risk among all predictive densities: max θ R(θ, p * ) = min p max θ R(θ, p). ( We adopt the Kullback-Leibler divergence L(q, p) = q(x) log q(x) p(x) dx as a loss function, since it satisfies many desirable properties compared to other loss functions such as the Hellinger distance and the total variation distance [8].Under this setting, Aitchison [9] proved where R(π, p) = R(θ, p)π(θ)dθ is called the Bayes risk.Namely, the Bayesian predictive density pπ (y | x) minimizes the Bayes risk.
We provide the proof of Equation ( 4) in the Appendix A. Therefore, it is sufficient to consider only Bayesian predictive densities from the viewpoint of Kullback-Leibler risk, and the selection of the prior π becomes important.
For the predictive density estimation problem above, Komaki [10] developed a class of priors called the latent information priors.The latent information prior π LIP is defined as a prior that maximizes the conditional mutual information I θ,y|x (π) between the parameter θ and the unobserved variable y given the observed variable x.Namely, is the conditional mutual information between y and θ given x.Here, are marginal densities.The Bayesian predictive densities based on the latent information priors are minimax under the Kullback-Leibler risk: The latent information prior is a generalization of the reference prior [11] that is a prior maximizing the unconditional mutual information I θ,y (π) between θ and y.Now, we consider the problem of estimating the quantum state of a system Y based on the outcome of a measurement on a system X.Suppose the quantum state of the composed system (X, Y) be σ XY θ where θ denotes an unknown parameter.We perform a measurement on the system X and obtain the outcome x.Based on the measurement outcome x, we estimate the state of the system Y by a predictive density operator ρ(x).Similarly to the Bayesian predictive density (1), the Bayesian predictive density operator with respect to the prior dπ(θ) is defined as where dπ(θ | x) is the posterior distribution.Like the predictive density estimation problem discussed above, we compare predictive density operators using the framework of statistical decision theory.There are several possibilities for the loss function L(σ, ρ) in quantum estimation such as the fidelity and the trace norm [12].In this paper, we adopt the quantum relative entropy as a loss function, since it is a quantum analogue of the Kullback-Leibler divergence (3).Note that the fidelity and the trace norm correspond to the Hellinger distance and the total variation distance in the classical statistics, respectively.Under this setting, Tanaka and Komaki [4] proved that the Bayesian predictive density operators minimize the Bayes risk: This is a quantum version of Equation ( 4).From Tanaka and Komaki [4], the selection of the prior becomes important also in quantum estimation.However, this problem has not been fully discussed [6].In this paper, we provide a quantum version of the latent information priors and prove that they provide minimax predictive density operators.Whereas the latent information prior in the classical case maximizes the conditional Shannon mutual information, the proposed prior maximizes the conditional Holevo mutual information.The Holevo mutual information, which is a quantum version of the Shannon mutual information, is a fundamental quantity in the classical-quantum communication [13].Our result shows that the conditional Holevo mutual information also has a natural meaning in terms of quantum estimation.
Unlike the classical statistics, the measurement is not unique in quantum statistics.Therefore, selection of the measurement also becomes important.From the viewpoint of minimax state estimation, measurements that minimize the minimax risk are considered to be optimal.We provide a class of optimal measurements for one qubit system.This class includes the symmetric informationally complete measurement [14,15].These measurements and latent information priors provide robust quantum estimation.

Quantum States and Measurements
We briefly summarize several notations of quantum states and measurements.Let H be a separable Hilbert space of a quantum system.A Hermitian operator ρ on H is called a density operator if it satisfies The state of a quantum system is described by a density operator.We denote the set of all density operators on H as S(H).
Denote the set of all linear operators on Hilbert space H by L(H) and the set of all positive linear operators by L + (H) ⊂ L(H).Let Ω be a measurable space of all possible outcomes of a measurement and B(Ω) be a σ-algebra of Ω.A map Any quantum measurement is represented by a POVM on Ω.In this paper, we mainly assume Ω is finite.In such case, we denote Ω = X = {1, . . ., N} and any POVM is represented by a set of positive Hermitian operators The outcome of a measurement E on a quantum system with the state ρ ∈ S(H) is distributed with a probability measure Let X, Y be quantum systems with Hilbert spaces H X and H Y .The Hilbert space of the composed system (X, Y) is given by the tensor product H X ⊗ H Y .Suppose the state of this composed system is σ XY .Then, the states of two subsystems can be yielded by the partial trace: If a measurement E = {E x | x ∈ X } is performed on the system X and the measurement outcome is x, then the state of the system Y becomes where the normalization constant is the probability of the outcome x.Here, I Y is the identity operator on the space H Y .We call the operator σ Y x the conditional density operator.

Quantum State Estimation
We formulate the quantum state estimation problem using the framework of statistical decision theory.Let X and Y be quantum systems with finite-dimensional Hilbert spaces H X and H Y , where dimH X = d X and dimH Y = d Y .
Suppose the state of the composed system (X, Y) be σ XY θ , where θ ∈ Θ denotes unknown parameters.We perform a measurement E = {E x | x ∈ X } on X, observe the outcome x ∈ X , and estimate the conditional density operator σ Y θ,x of Y by a predictive density operator ρ(x).As discussed in the introduction (1) and (7), the Bayesian predictive density operator based on a prior π(θ) is defined by where dπ(θ | x) is the posterior distribution.
To evaluate predictive density operators, we introduce a loss function L(σ, ρ) that evaluates the difference between the true conditional density operator σ and the predictive density operator ρ.In this paper, we adopt the quantum relative entropy (8) since it is a quantum analogue of the Kullback-Leibler divergence (3).Then, the risk function R(θ, ρ) of a predictive density operator ρ is defined by where is the probability of the outcome x.Similarly to the classical case (2), a predictive density operator ρ * is called minimax if it minimizes the maximum risk among all predictive density operators [16,17]: Tanaka and Komaki [4] showed where is called the Bayes risk.Namely, the Bayesian predictive density operator minimizes the Bayes risk.This result is a quantum version of Equation ( 4).Although Tanaka and Komaki [4] considered separable models , the relation ( 9) holds also for non-separable models as shown in the Appendix A. Therefore, it is sufficient to consider only Bayesian predictive density operators and the problem of prior selection becomes crucial.

Notations
For a quantum state family {σ XY θ | θ ∈ Θ}, we define another quantum state family where By identifying Θ with M, the parameter space Θ is endowed with the induced topology as a subset of R Nd 2 Y −1 .Any measurement on the system X is represented by a projective measurement which is the unnormalized state of Y conditional on the measurement outcome x.We also define

Minimax Estimation of Quantum States
In this section, we develop the latent information prior for quantum state estimation and show that this prior provides a minimax predictive density operator.
In the following, we assume the following conditions: The third assumption is achieved by adopting sufficiently small Hilbert space.Namely, if there exists x ∈ X such that p(x | θ) = Tr E x σ X θ = 0 for every θ ∈ Θ, then we redefine the state space H as the orthogonal complement of Ker E x .
Let P be the set of all probability measures on Θ endowed with the weak convergence topology and the corresponding Borel algebra.By the Prohorov theorem [18] and the first assumption, P is compact.
When x is fixed, the function θ ∈ Θ → S θ (x) is bounded and continuous.Thus, for every fixed x ∈ X , the function is continuous because P is endowed with the weak convergence topology and dim H Y < ∞.Let {λ x,i } i , {|φ x,i } i be the eigenvalues and the normalized eigenvectors of the predictive density operator ρ(x).For every predictive density operator ρ, consider the function from P to [0, ∞] defined by The last term in (??) is lower semicontinuous under the definition 0 log 0 = 0 [10], since each summand takes either zero or infinity and so the set of π ∈ P such that this term takes zero is closed.In addition, the other terms in (??) are continuous since the von Neumann entropy is continuous [12].Therefore, the function D ρ (π) in (??) is lower-semicontinuous. Now, we prove that the class of predictive density operators that are limits of Bayesian predictive density operators is an essentially complete class.We prepare three lemmas.Lemma 1 is useful for differentiation of quantum relative entropy (see Hiai and Petz [19]).Lemmas 2 and 3 are from Komaki [10].Lemma 1.Let A, B be n-dimensional self-adjoint matrices and t be a real number.Assume that f : (α, β) → R is a continuously differentiable function defined on an interval and assume that the eigenvalues of A + tB are in (α, β) if t is sufficiently close to t 0 .Then,

Lemma 3 ([10]
).Let f : P → [0, ∞] be continuous, and let µ be a probability measure on Θ such that p µ (x) := p(x | θ)dµ(θ) > 0 for every x ∈ X .Then, there is a probability measure π n in By using these results, we obtain the following theorem, which is a quantum version of Theorem 1 of Komaki [10].
(2) For every predictive density operator ρ, there exists a convergent prior sequence {π Next, we develop priors that provide minimax predictive density operators.Let x be a random variable, which represents the outcome of the measurement, i.e., x ∼ p(• | θ).Then, as a quantum analogue of the conditional mutual information (5), we define the conditional Holevo mutual information [13] between the quantum state σ Y x of Y and the parameter θ given the measurement outcome x as which is a function of π ∈ P. Here, we used The conditional Holevo mutual information provides an upper bound on the conditional mutual information as follows.
Proposition 1.Let σ XY θ be the state of the composed system (X, Y).Suppose that a measurement is performed on X with the measurement outcome x and then another measurement is performed on Y with the measurement outcome y.Then, Proof.Since any measurement is a trace-preserving completely positive map, inequality (12) follows from the monotonicity of the quantum relative entropy [13].
Analogous with the latent information priors [10] in classical statistics, we define latent information priors as priors that maximize the conditional Holevo mutual information.It is expected that the Bayesian predictive density operator σ π,x based on a latent information prior is a minimax predictive density operator.This is true from the following theorem, which is a quantum version of Theorem 2 of Komaki [10].
The proof of Theorems 1 and 2 are deferred to the Appendix A. We note that the minimax risk inf ρ sup θ R E (θ, ρ) depends on the measurement E on X.Therefore, the measurement E with minimum minimax risk is desirable from the viewpoint of minimaxity.We define a POVM E * to be a minimax POVM if it satisfies In the next section, we provide a class of minimax POVMs for one qubit system.

One Qubit System
In this section, we consider one qubit system and derive a class of minimax POVMs satisfying (13).
Qubit is a quantum system with a two-dimensional Hilbert space.It is the fundamental system in the quantum information theory.A general state of one qubit system is described by a density matrix 1} for pure states is called the Bloch sphere.Let σ XY θ = σ θ ⊗ σ θ be a separable state.We consider the estimation of σ Y θ = σ θ from the outcome of a measurement on σ X θ = σ θ .Here, we assume that the state σ XY θ is separable, since the state of Y changes according to the outcome of the measurement on X and so the estimation problem is not well-defined if the state σ XY θ is not separable.
Let Ω := {(x, y, z) ∈ R 3 | x 2 + y 2 + z 2 = 1} and B = B(Ω) be Borel sets.From Haapasalo et al. [20], it is sufficient to consider POVMs on Ω.For every probability measure µ on (Ω, B) that satisfies In the following, we identify E with µ.Let E * 1-qubit be a class of POVMs on Ω represented by measures that satisfy the conditions where E µ is the expectation with respect to a measure µ.We provide two examples of POVMs in E * 1-qubit .
Proposition 2. The POVM corresponding to where dω is surface element, is in E * 1-qubit .
Proof.From the symmetry of µ, Let µ be a four point discrete measure on Ω defined by Then, the POVM corresponding to µ belongs to E * 1-qubit .
Let P * 1-qubit be a class of priors on Θ that satisfies the conditions where E π is the expectation with respect to a prior π.
Proposition 4. The uniform prior where dθ is the surface element on the Bloch sphere, belongs to P * 1-qubit .
Proposition 5. Suppose that Then, the four point discrete prior belongs to P * 1-qubit .
We obtain the following result.
Lemma 4. Suppose π * ∈ P * 1-qubit .Then, for general measurement E, the risk function of the Bayesian predictive density operator Proof.The distribution of the measurement outcome ω = (x, y, z) is Then, since π * ∈ P * 1-qubit , the marginal distribution of the measurement outcome is Therefore, the posterior distribution of θ is The posterior mean of θ x , θ y and θ z are x/3, y/3 and z/3, respectively.Thus, the Bayesian predictive density operator based on prior π * is and we have Therefore, the quantum relative entropy loss is Hence, the risk function is Theorem 3.For a measurement E ∈ E * 1-qubit , every π * ∈ P * 1-qubit is a latent information prior: In addition, the risk of the Bayesian predictive density operator based on where h is the binary entropy function h Proof.From Lemma 4 and Therefore, the risk depends only on r = θ and we have Since the function g(r) is convex.In addition, we have g(1) = log 3 − 2 3 log 2 > g(0) = log 3 − 3 2 log 2. Therefore, g(r) takes the maximum at r = 1.
In other words, R E * (θ, σ π * ) takes maximum on the Bloch sphere.In addition, since (θ We note that the Bayesian predictive density operator is identical for every π * ∈ P * 1-qubit .In fact, every π * ∈ P * 1-qubit also provides the minimax estimation of density operator σ Y θ when there is no observation system X. Figure 1 shows the risk function g(r) in (17) and also the minimax risk function g 0 (r) when there is no observation: Whereas g(r) < g 0 (r) around r = 1, we can see that g(r) > g 0 (r) around r = 0.Both risk functions take the maximum at r = 1 and The decrease g 0 (1) − g(1) > 0 in the maximum risk corresponds to the gain from the observation X.Now, we consider the selection of the measurement E. As we discussed in the previous section, we define a POVM E * to be a minimax POVM if it satisfies (13).We provide a sufficient condition on a POVM to be minimax.Let ρ E be a minimax predictive density operator for the measurement E. Lemma 5. Suppose π * is a latent information prior for the measurement E * .If Proof.For every (E, ρ), we have The last equality is from the minimaxity of σ π * .Therefore, E * is a minimax POVM.
For general measurement E, from Lemma 4, the risk function of the Bayesian predictive density operator Hence, the Bayes risk of σ π * with respect to π * is Now, since the Bayesian predictive density operator σ π * minimizes the Bayes risk with respect to π * among all predictive density operators [4], for every E. Therefore, On the other hand, From Lemma 5, E * is minimax.
Whereas Theorems 1 and 2 are valid even when σ XY θ is not separable, Theorems 3 and 4 assume the separability σ From Theorem 4, the POVM ( 15) is a minimax POVM.Since this POVM is identical to the SIC-POVM [14,15], it is an interesting problem whether the SIC-POVM is a minimax POVM also in higher dimensions.This is a future work. where Therefore, for arbitrary p, which is nonnegative since the Kullback-Leibler divergence L(q, p) in ( 3) is always nonnegative.
Proof of (9).From the definition of σ Y π (x) in (7), where Therefore, for arbitrary p, which is nonnegative since the quantum relative entropy L(σ, ρ) in ( 8) is always nonnegative.
Because D ρ (π) is continuous as a function of π ∈ P ρ , there exists π n ∈ P , where lim π m ⇒ π ∞ .Let n m be the integer satisfying π m = π n m .We can make the subsequence {π m } ∞ m=1 satisfy 0 < n m /(n m+1 − n m ) < c for some positive constant c. Since for every θ ∈ Θ ρ and 0 ≤ u ≤ 1.Thus, Hence, where is the orthogonal projection matrix onto the eigenspace of ∑ θ π ∞ (θ)p(x | θ)σ θ,x corresponding to the eigenvalue 0. Here, we have By taking an appropriate subsequence {π k } of {π m }, we can make the subsequence of density operators {σ π Hence, the risk of the predictive density operator defined by where τ x is an arbitrary predictive density, is not greater than that of ρ(x) for every θ ∈ Θ.Therefore, by taking a sequence {ε n ∈ (0, 1)} ∞ n=1 that converges rapidly enough to 0, we can construct a predictive density operator x ∈ X ρ , (A5) as a limit of Bayesian predictive density operators based on priors {ε k μ + (1 − ε k )π k }, where μ is a measure on Θ such that p μ(x) > 0 for every x ∈ X .Hence, the risk of the predictive density operator (A5) is not greater than that of ρ(x) for every θ ∈ Θ.Therefore, the predictive density operator σ π (x) is minimax.