An Information Quantity in Pure State Models

When we consider an error model in a quantum computing system, we assume a parametric model where a prepared qubit belongs. Keeping this in mind, we focus on the evaluation of the amount of information we obtain when we know the system belongs to the model within the parameter range. Excluding classical fluctuations, uncertainty still remains in the system. We propose an information quantity called purely quantum information to evaluate this and give it an operational meaning. For the qubit case, it is relevant to the facility location problem on the unit sphere, which is well known in operations research. For general cases, we extend this to the facility location problem in complex projective spaces. Purely quantum information reflects the uncertainty of a quantum system and is related to the minimum entropy rather than the von Neumann entropy.


Introduction
Building a large-scale quantum computer remains challenging, and there are many problems to be solved. For example, the performance of error correction depends very strongly on how coherent the noise process is [1], and experimenters need to improve the quantum computing system through analysis of the physical noise [2]. When we prepare an imperfect quantum computing system, it is important to specify the noise based on a suitable error model. The process of understanding the physical error model for a prepared system corresponds to that of obtaining a certain amount of information on the system. In the present paper, keeping this in mind, we consider how to evaluate such information without any entropic concept.
An overly simple example is given by a purely rotation error model [2] with a parameter range. Let the ideal qubit state be |0 and the error model be e −iθY |0 , where Y = −i(|0 1| − |1 0|) and θ denotes the parameter to be specified. Then the parameter range reflects the information we have on the system. We have more information when we know − ≤ θ ≤ than when we know −2 ≤ θ ≤ 2 .
Our problem is closely related to so-called quantum estimation, but the above type of problem has still not been investigated. In quantum state estimation [4][5][6][7][8][9] or quantum state discrimination [6,[10][11][12][13], for a given model, we find an optimal quantum measurement to extract information on the quantum state and choose the true state in the model from observation. This has been a typical problem and has been investigated by many authors. In quantum information experiments, quantum tomography has been also discussed [14][15][16][17]. As far as the author knows, these studies do not refer to the comparison of several models in terms of information quantity.
In our setting, we focus on the information that we obtain before preparation for a measurement. As we see later, we clearly obtain a certain amount of information other than the dimension of the Hilbert space.
We consider only pure state models so that we can neglect classical fluctuation. As we shall see later, there is no classical counterpart for such information. In other words, we calibrate so that such classical information becomes zero. If a positive amount of information remains under the calibration, then we expect that it reflects the truly quantum information. We do not have a proper name for this information, and we call it model information or information of the model tentatively. It would become an alternative to the usual entropy.
In the next section, we provide a rough idea of how to define model information and present pure state models as examples. Then, we will formulate a pure state model and define the representative quantum state for it in a rigorous manner. In Section 4, we describe the equivalence between the problem of finding the representative quantum state and finding the minimax facility location on the sphere in operations research. In Section 5, we introduce the purely quantum information of the model and calculate it in several examples. We also describe the relationship of entropic concepts to our result and extension to infinite-dimensional Hilbert space in Section 7. Finally, concluding remarks are given in Section 8.

Preliminary Considerations
In this section, we describe a rough idea of how we evaluate model information. First, we recall classical information theory. Suppose that Alice picks a three-letter word w 1 w 2 w 3 , where w 1 , w 2 , w 3 ∈ {a, b, c, . . . , z} and we set M = {a, b, c, . . . , y}. If Bob knows w 1 ∈ M, Bob does not feel that he obtains much information. However, if w 1 ∈ M = {a, e, i, o, u}, then Bob feels that he obtains more information on the word Alice picks.
The above situation corresponds to a commutative case in quantum theory. Keeping this in mind, let us consider the model information in the quantum system. We assume that Bob already knows that the quantum system is described in a d-dimensional Hilbert space. Since information quantity is a relative concept, let us compare two models. Let the first model consist of a d-dimensional orthonormal basis, i.e., M = {e 1 , . . . , e d } and the second model consist of {e 1 , . . . , e k } (k << d). At least we can say that the second model gives more significant information to Bob than the first model. This is because the quantum state is in a proper subspace. Now we tackle the case where some quantum states are nonorthogonal. For simplicity, we set d = 2 and consider the following models: In explicit calculations, we set |0 = (1, 0) , |1 = (0, 1) . Suppose that we know that the quantum state ϕ is one of the candidate states in M 2 (hereinafter, we write ϕ ∈ M 2 for simplicity). Perhaps we agree that the information is more than ϕ ∈ M 3 and ϕ ∈ M 4 . Then, which is more informative, ϕ ∈ M 3 or ϕ ∈ M 4 ? Both models consist of three nonorthogonal state vectors. Likewise, which is more informative, ϕ ∈ M 1 or ϕ ∈ M 2 ? In the present article, we consider how to quantitatively evaluate the information obtained when Bob knows that the quantum state ϕ belongs to a model M.

Full Rank Condition
In order to avoid technical difficulties, we give one important assumption here. Let us define the rank of a model as We assume that the rank of a model is equal to the dimension of the Hilbert space, i.e., d = rankM. We call it the full rank condition. The full rank condition implies that there exists no complementary subspace that is orthogonal to every state vector in the model M.

Rough Idea on Defining Model Information
Under the full rank condition, we consider the case where a model has considerable information on the quantum system. Suppose that we are given the following model: . While this satisfies the full rank condition, clearly all candidate quantum states are approximately in the same direction as |e 1 . Then, the quantum system ϕ is approximately described by a representative state vector |e 1 . When → 0 but = 0, the model information is expected to increase.
From the above discussion, we find that the information quantity associated with ϕ ∈ M is completely different from the number of elements, |M|. Rather, a certain scale or a size of the model M should be included in the definition of the model information.
Along the lines of the above rough idea, we discuss in the next section: (a) How to determine a representative state vector for a given model M; (b) One definition of the model information; (c) The relationship with the concept of entropy.
We emphasize that all of these have no classical counterpart and thus it might be difficult to understand them. Before going into detail, we shall give an overview of each item here.
For (a), we consider maximin overlap between quantum states and define the representative quantum state of a model. Mathematically speaking, it is regarded as a variant of the facility location problem on the unit sphere [18,19], which appears in operations research. In operations research, many authors have developed algorithms on the facility location problem. In particular, finding the minimax solution is our concern. For a finite model (|M| < ∞), we present a naive algorithm to find the representative quantum state for a model using this consideration.
In order to consider item (b), we introduce an imaginary two-person game called the quantum detection game. Bob benefits from the information of a given model to obtain a higher score than Alice. The value of the game, which is determined by a least favorable prior [20] in this game, defines one information quantity related to the model M.
In (c), we compare our method with the formal analogue based on the von Neumann entropy. Later we will see that the newly proposed information quantity is related to the minimum entropy [21,22] rather than the von Neumann entropy.

Definition of Pure State Models and Assumptions
In the present paper, let H be a d-dimensional Hilbert space. (d could be ∞). We call a finite-dimensional parametric family of quantum pure states a quantum statistical model of pure states or briefly a (pure state) model. Note that f 2 = f | f . Basically, the parameter set Θ is a compact subset of finite-dimensional Euclidean space.

Preliminary Results
In the present paper, we introduce the information of a model M. Although the formal definition is given in Section 5, we need several concepts to understand them analytically and geometrically.
In this section, we introduce the most fundamental concept, the representative quantum state of a model. We shall give a rough idea for when dim H = 2 and |M| = 2. Specifically, we set M = {ψ 1 , ψ 2 } with ψ 1 = ψ 2 = 1. When two quantum states are close to each other, ψ 1 ψ 2 , it is natural to consider that a representative quantum state of model M should be a "midpoint between two quantum states". We often identify the state vector with the point on the whole pure states specified by the vector.
Mathematically, we may try to define the point as the point ϕ equidistant between ψ 1 and ψ 2 such that holds. However, the above equidistance condition does not determine the point ϕ generally. Thus, we maximize the above "overlap" under the condition (1). Then we obtain an explicit formula for the representative point of a model, where δ satisfies γ = |γ|e iδ and then the maximum overlap is given by max ϕ | ϕ|ψ 1 | = 1+|γ| 2 . Next, we consider the case where |M| = 3. Let us take M 3 and M 4 introduced in the previous section: In M 4 , the above idea applies, i.e., we find the quantum state to maximize the overlap Up to the global phase, we set ϕ as Then, we obtain an explicit solution satisfying (3), cos α = 3+ √ 3 6 , δ = π 4 after lengthy but straightforward algebra. (See also Section 4.1.3).
However, we find no solution satisfying Equation (3) in M 3 . We need a more careful treatment. First, we fix an arbitrary quantum state ϕ and consider the set of numbers r satisfying | ϕ|ψ j | ≥ r, j = 1, 2, 3. The condition assures that the overlap between ϕ and an arbitrary quantum state in M is not less than r. For each ϕ, the maximum of r is equal to min{| ϕ|ψ j | : j = 1, 2, 3}.
We consider that the larger the overlap gets, the more suitable ϕ becomes as a representative quantum state of the model M. Thus, we maximize r as a function of ϕ.
It is convenient for explicit calculation to use the squared overlap (i.e., Fidelity), max ϕ min ψ∈M 3 | ϕ|ψ | 2 , and we regard ϕ rep = arg max r(ϕ; M 3 ) as a representative quantum state of the model M 3 . Based on the above idea, we will give a more formal definition of the representative quantum state in the next subsection.

Representative Quantum State
Now we are ready to define the representative quantum state of a given model M formally. We adopt the distance d F (ϕ, ϕ ) = 1 − F(ϕ, ϕ ) = 1 − | ϕ|ϕ | 2 rather than the overlap. Definition 1. Let a model M be given. When a quantum state ϕ rep satisfies ϕ rep is called a representative quantum state of the model M with respect to the distance d F .
When we emphasize the model M, we write ϕ rep (M). While the terms max and min are enough for discrete models, using the terms sup and inf is generally inevitable. (see Section 7). We also use a condition equivalent to (5), ∀ϕ.
In the above definition, ϕ rep is also interpreted as the minimax estimate in quantum estimation with no observation. Suppose that a parametric family of pure states or countable set of pure states is given. Then we give an estimate, say ϕ, as the true quantum state without any observation. The error is evaluated by the Fidelity-based quantity, d F (ϕ, ψ true ) = 1 − | ϕ|ψ true | 2 . The above representative quantum state is a minimax estimate in this setting.
In the context of quantum estimation, this may seem quite strange because we do not perform any measurement. However, it is not unnatural to consider estimation with no observation. For example, in classical information theory, we infer the outcome of an information source with no observation. For a given parametric model of source code distribution {p θ (x) : θ ∈ Θ}, this kind of estimation corresponds to constructing a minimax code [23].
Apart from actual application, quantum estimation with no observation also makes sense theoretically. In a quantum computer, a quantum bit will be processed under a certain quantum gate with an unknown parameter, say, θ, during the computing process. When θ is uncontrollable with a range [−2 , ], it might be necessary to estimate the quantum bit. Since there is no reason to estimate θ 0.5 , we need a certain formulation to estimate the quantum bit.
We should also mention why we adopt d F as the distance among several candidates as the closeness measure in our definition. There are two reasons. One is the operational meaning of the quantum detection game, which is explained in Section 5. The other is due to the following property: Proof. It is enough to show that for every ϕ, holds. If Equation (6) holds true, then we show the statement in the following way. For every ϕ, from the definition of ϕ rep , sup θ d F (ϕ, ψ θ ) ≥ sup θ d F (ϕ rep , ψ θ ) holds. Since f is an increasing function, applying f to both sides and using Equation (6) yields , which implies that ϕ rep is a representative quantum state with respect to the distance f • d F . Now let us show Equation (6). Let ϕ be fixed and set α = sup θ d F (ϕ, ψ θ ). For every > 0, due to the continuity of f , there exists δ > 0 such that |α − x| ≤ δ ⇒ f (α) ≤ f (x) + . We take θ * such that α ≤ d F (ϕ, ψ θ * ) + δ. Then Taking the supremum of RHS with respect to θ, we obtain the converse inequality. Thus, Equation (6) is shown, and the proof is complete.
In Section 5, we shall define the information quantity obtained when we know ϕ ∈ M, which is denoted by J(M). When we find ϕ rep , it is shown to be easy to calculate J(M). Now let us consider the representative quantum state of a two-state model geometrically. Recall that each pure state in a two-dimensional Hilbert space is written in the form (4). If we switch to the Bloch representation, we obtain one-to-one correspondence between (α, δ) ↔ (x, y, z) = (sin 2α cos δ, sin 2α sin δ, cos 2α) on the unit sphere (Bloch sphere). When one pure state is set to |0 (P), the distance between the pure state and another pure state specified with (α, δ) (Q) is 2α along the shortest path on the Bloch sphere. The shortest path connecting two points P and Q on the Bloch sphere is the arc along the large circle on the Bloch sphere. The arc is called a geodesic connecting P and Q and the equidistant point M on the geodesic from both points is called the geodesic midpoint between P and Q. The representative quantum state corresponds to the geodesic midpoint. The concept of geodesics on the Bloch sphere is often useful and has been investigated in several works [24][25][26][27].
For every pair of independent quantum states ψ 1 and ψ 2 , let us consider a twodimensional subspace span C {ψ 1 , ψ 2 }. Then each state in the subspace is regarded as a point on the Bloch sphere. By using the Formula (2), we summarize the above statements.
Understanding the geometry of the unit sphere is very helpful to find the representative quantum state, which is discussed in Section 4.

Example of a Representative
. Using the Formula (7), is obtained. Next, we use the following lemma.
then ϕ rep is also the representative quantum state of M.
Proof. Let ϕ be an arbitrary quantum state. Since N ⊂ M, which implies that ϕ rep is also the representative quantum state of M.
It is easily seen that is also the representative quantum state of M 3 .

Facility Location Problem
Mathematically, finding the representative quantum state of a given model is equivalent to finding the minimax facility location for a given demand point in operations research [18].

Facility Location Problem on the Sphere
Decades ago, Drezner and Wesolowsky [18] considered the facility location problem on the sphere. We briefly summarize their formulation. Suppose that there are m demand points with (positive) weights on the unit sphere and our objective is to locate a single facility on the same sphere so as to minimize the weighted sum of distances from the facility to the demand points. Let (ξ i , φ i ) and (ξ, φ) denote the locations of the i-th demand point and the facility, respectively, in spherical coordinate (0 ≤ ξ ≤ π, 0 ≤ φ < 2π). Weights for demand points are denoted as π i . Without loss of generality, we may take ∑ i π i = 1. We obtain the following minimization problem.
Drezner and Wesolowsky measured distances through the sphere for squared Euclidean distances and they also used the shortest length of arc. Let us denote the shortest length of arc between two points on a sphere with a unit radius by α. Then the squared Euclidean distance is given by 4 sin 2 (α/2). Both distances are computed by the equation: The interpretation of the problem is as follows. The distance d i (ξ, φ) is the transportation cost from a facility (ξ, φ) to the i-th demand point (ξ i , φ i ). We regard the relative frequency of each transport as the weight. When we already know the relative frequency, we minimize the objective function ∑ m i=1 π i d i (ξ, φ) with respect to (ξ, φ). Focusing on the correspondence between a point (ξ, φ) on the unit sphere and a complex unit vector the problem is completely solved when we adopt the squared Euclidean distance. Let us denote |ϕ = |ψ(ξ, φ) as the location of the facility instead of (ξ, φ). Straightforward calculation yields . Thus, the objective function to be minimized is written as Note that ρ π is positive semidefinite and of trace one, and it is regarded as the Bayes mixture (For the definition, see Section 5.1). Then, the minimization problem min ϕ ∑ m i=1 π i d i (ξ, φ) reduces to finding the maximum of ϕ|ρ π |ϕ . This is given by the first eigenvector ϕ of ρ π . This result [28] agrees with that derived by Drezner and Wesolowsky, who obtained the same result by differentiation with respect to variables (ξ, φ) in the context of operations research.
However, what if we have no information on the relative frequency {π i } for each demand point? One idea is to take the minimax point. Through the minimax theorem [20], we obtain the associated weight π * = {π * i } such that holds, where d i (ξ, φ) is the squared Euclidean distance. (Strictly speaking, the above holds under the condition that the convex hull of all demand points does not include the origin). This π * is called the least favorable weight. Now we go back to our problem. Let the Hilbert space be two-dimensional, where each quantum state is specified by (ξ, φ) in (9). Let a discrete model M = {|ψ(ξ 1 , φ 1 ) , . . . , |ψ(ξ m , φ m ) } be given. Each state corresponds to a demand point on the Bloch sphere. The distance d F corresponds to the transportation cost measured by a constant multiplied by the squared Euclidean distance. Then, finding the representative quantum state ϕ rep of the model M is equivalent to finding the minimax facility location specified by ϕ on the Bloch sphere with the distance d F .
According to this correspondence, we also know the following fact. When we know the least favorable weight π * in the problem, we obtain the representative quantum state as the first eigenvector of the Bayes mixture ρ π with respect to π * [20].
Following the interpretation of the facility location problem on the sphere, we find the representative quantum state of each model in Section 2.1. Since sin 2 (α/2) is a strictly increasing function of α, both minimax points for the two distance measures agree due to Lemma 1.
It may be thought that using the squared Euclidean distance rather than the arc length is unnatural. However, as shown in Lemma 1, The minimax location obtained under the squared Euclidean distance agrees with that obtained under the arc length due to the strict monotonicity of α → 4 sin 2 (α/2) (See the beginning of Section 4.1 for the squared Euclidean distance). In this sense, the representative quantum state of a model is invariant. On the contrary, the least favorable weight for the facility location problem depends on the measure of distance on the sphere; thus, it is not invariant.

Algorithm to Find Nonrandomized Minimax Location
In operations research, there are several studies on the facility location problem on the unit sphere, where some algorithms to find the minimax facility location are also proposed. Inspired by these studies, we propose a naïve algorithm to find the nonrandomized minimax facility location. Specifically, we consider the facility location problem on a three-dimensional hypersphere in a four-dimensional real Euclidean space. This is easily generalized to a general dimension. Basically, a pure state model in a d-dimensional Hilbert space is regarded as a subset of a complex projective space CP d−1 . A complex projective space CP d−1 is a typical example of a complex manifold but is actually a 2d − 2-dimensional real manifold. This fact is sufficient to understand the following argument. (For complex projective space, e.g., see Section 4 in Bengtsson andŻyczkowki [29]).
We exclude the possibility of a randomized strategy, although it is sometimes better than any nonrandomized strategy, at least theoretically. (For example, see Section 1.5 in Ferguson [30]). For example, let us consider six demand points, (ξ, φ) = (π/2 ± , 0), (π/2 ± , 2π/3), (π/2 ± , 4π/3) on the unit sphere, where is a small positive constant. Then, a randomized facility location strategy, north pole with probability 1/2 and south pole with probability 1/2, yields the average transportation cost measured by the arc length, 1/2(π/2 + ) + 1/2(π/2 − ) = π/2 for each demand point. Thus, it achieves the minimax location. On the other hand, any nonrandomized strategy yields a higher transportation cost (>π/2) in the worst case. The algorithm presented below fails in this example. When dim H = 2, all demand points are not covered in a hemisphere if and only if there exists a randomized strategy that is better than any nonrandomized strategy. No simple mathematical condition can assure that nonrandomized minimax is not worse than any randomized strategy when dim H > 2. Thus, we implicitly assume this fact and the existence of a nonrandomized minimax strategy in the Algorithm 1.

Algorithm 1: Find Minimax Facility Location
(1) Find the most distant pair: Let two distinct demand points A and B be arbitrary.
Then, from Formula (7), find the geodesic midpoint P. Then calculate the arc length AP (= BP). Find the maximum arc length R 2 and its center P 2, * . If every arc length between P 2, * and a demand point is not more than R 2 , then P 2, * is the minimax location and STOP. If not, then go to Step (2). (2) Find the most distant triplet: Let A, B, and C be three arbitrary demand points. Find the center P of the circumscribed circle of the triangle ABC on the hypersphere. Then calculate the arc length AP (= BP = CP). Find the maximum arc length R 3 (>R 2 ) and its center P 3, * . If every arc length between P 3, * and a demand point is not more than R 3 , then P 3, * is the minimax location and STOP. If not, then go to Step (3). (3) Find the most distant quadruplet: Let A, B, C, and D be four arbitrary demand points.
Find the center P of the circumscribed sphere of the tetrahedron ABCD on the hypersphere. Then, calculate the arc length AP (= BP = CP = DP). Find the maximum arc length R 4 (>R 3 ) and its center P 4, * . P 4, * is the minimax location and STOP.
Due to monotonicity, we may evaluate the squared Euclidean distance or inner product instead of the arc length between two demand points.
When we generalize the algorithm suitably to CP d−1 (as a 2d − 2-dimensional real manifold), we obtain the algorithm to find the representative quantum state of a model in a d-dimensional Hilbert space. In the last subsection, we used the proposed algorithm to find the representative quantum state in some examples. In Section 5, we also demonstrate how to find the representative quantum state following the above algorithm in a specific case.
In the qubit system, the above argument is applied to the mixed states because the Bloch ball is regarded as the hypersphere S 3 + , a hemisphere of a 3-sphere, in a real fourdimensional Euclidean space (e.g., see Section 9.5 in Bengtsson andŻyczkowki [29]).
Though the algorithm itself is not our main concern, we briefly mention the efficiency of the algorithm. The computational complexity of each Step (1), (2), and (3) is, respectively, O(m 2 ), O(m 3 ), and O(m 4 ), and clearly it is not efficient. The above problem is reduced to finding the covering sphere for all demand points with the minimum radius (cf. Shiode [19]). Based on this idea, it could be possible to obtain more efficient algorithms even for a continuous model.
The facility location problem on the sphere and finding the representative quantum state of a model (in a two-dimensional Hilbert space) are completely different. It is a bit surprising that the former problem, which comes from operations research, is helpful for understanding the result in the latter, which comes from quantum physics. What a top manager in a global business really cares about might be essentially the same as a fundamental problem in quantum physics. How does this wonderful connection arise? A mathematician might point out the underlying isomorphism between SU(2) and SO(3) [29]. However, this connection arises mainly from a game-theoretic approach. In other words, the unexpected tie implies the universality and effectiveness of game-theoretic concepts, which are different from information-theoretic concepts such as entropy. This is again emphasized when we introduce the definition of model information in the next section.

Quantum Detection Game and Model Information
We have explained how to determine a representative quantum state of a given model. Based on the state, in the present section, we define a new information quantity, model information. Geometrically, this is the maximum radius from the representative quantum state as the center.
The basic strategy to define an information quantity is to introduce a certain imaginary two-person game where one player obtains points according to the information available.
For example, in classical information theory, we consider assigning the ideal code length − log p(x) to each alphabet x when we know the code distribution p(x). Bob' s best score is given by his guessed distribution q(x) and obtains the score {− log q(x)} − {− log p(x)} for each alphabet x. Taking the average with respect to p(x), we obtain the Kullback-Leibler information [23], which is a very fundamental quantity in information theory.
According to Tanaka [20], we consider a quantum detection game as an imaginary two-person game.

Quantum Detection Game and Definition of Purely Quantum Information
As an example, we introduce a four-dimensional pure state model, M FP and set between 1/4 < < 1. This consists of the following four-dimensional vectors: While dim H = 4, it is enough to consider each vector in a real four-dimensional vector space. Let us explain the quantum detection game under the model M FP . First, Alice picks one pure state from the model (i.e., ϕ 1 , . . . , ϕ 4 ) and then sends it to Bob. Bob knows only the candidate pure state sets and the model, and prepares a two-outcome measurement in the form {|ϕ ϕ|, I − |ϕ ϕ|}, where I is the identity operator and ϕ is the unit vector. We call ϕ a detector or a detector state. Bob's purpose is not to guess the number that Alice has chosen but to obtain "detection" with a higher probability.
The detection rate for the chosen state |ϕ j ϕ j | is given by Tr{|ϕ j ϕ j ||ϕ ϕ|} = | ϕ|ϕ j | 2 when Alice sends ϕ j to Bob and Bob prepares ϕ as a detector. (Tr denotes the matrix trace and |a b| is regarded as a matrix). As a game, Alice aims at making Bob's detection rate smaller by choosing ϕ 1 , . . . , ϕ 4 with a certain probability. In contrast, Bob aims at making the detection rate larger by preparing his detector ϕ based on the knowledge of the model. Later, we will evaluate the information of the model M FP . Now we go back to the general situation and explain the details. First, we seek the minimum detection rate for Bob among all possible models. Suppose that Alice picks among the whole pure states in a completely random way (i.e., with respect to the Haar measure). This is the worst case for Bob. When Bob is allowed to adopt a randomized strategy, the detection rate is 1/d (d is the dimension of the Hilbert space). If the model consists of the orthonormal basis, then again the detection rate is 1/d. It is the minimum detection rate.
Next, suppose that Alice has a certain tendency for choosing the pure state, which is also described by the model M and Bob knows this for some reason. Although we do not care about the origin of such models, there are various situations where they apply in quantum science and technology. For example, in the bipartite system C 2 ⊗ C 2 without interaction, a pure state arises as a product state like |ϕ |ϕ . Then an entangled state such as α|00 + β|11 is not expected. In quantum computation, the output qubit state under the unitary gate, which has some rotation error, would be e i( +π/2)Y |ϕ . Then, Bob could obtain a detection rate larger than 1/d based on the information of the model. Following this idea, we propose one information quantity for a model below.
A detailed explanation of the quantum detection game and useful results are described in the author's previous work [20]. Below, we only present some of the results in a formal way, which is necessary to define the information quantity. Those definitions hold in an infinite-dimensional Hilbert space.
First, we define the Bayes mixture in a slightly formal way. Let M = {ψ θ : θ ∈ Θ} be a model, (see Section 3) and π be a probability distribution on the parameter space Θ. Then, the Bayes mixture ρ π is defined as In the context of Bayesian statistics [31][32][33], we call π a prior distribution or briefly a prior. For a discrete model, the above integral is replaced with a finite sum ∑ j π j |ψ j ψ j |. Then, when Alice sends |ψ j to Bob with probability π j , it is equivalent to sending ρ π to Bob in the quantum detection game.
Finally, we have come to our main theme: to define the information quantity of a model M. For calibration, we subtract the lower bound 1/d, and thus J(M) ≥ 0. When Bob knows that the quantum state Alice prepares is among M, we interpret this as Bob obtaining J(M). As shown in Section 5.3, the above infimum is related to the value of the quantum detection game (possible maximum score) through the minimax theorem [20].
Let us rewrite J(M) in a slightly simpler form. For a discrete model, there exists a prior distribution that achieves the infimum of ρ π ∞ . Then, we call the prior a least favorable prior (LFP). LFP is one of the technical terms in statistical decision theory or game theory (see, e.g., Section 1.7, p. 39 in Ferguson [30]). Using the least favorable prior π LF , PQI is defined by Even if LFP is not uniquely determined, ρ π LF ∞ remains the same [20]. As some readers may recognize, the LFP completely agrees with the least favorable weight in Section 4.  When J(M) = 0, Alice can send the completely mixed state effectively and then Bob obtains no information from the model M to achieve a higher detection rate than 1/d. Geometrically speaking, such a model fully spreads with no specific direction.

Basic Properties of PQI
In contrast, when J(M) > 0, a certain bias exists and it prevents Alice from preparing the completely mixed state. Thus, Bob benefits after knowing the model. If M satisfies the full-rank condition, then there exists a prior π such that the Bayes mixture ρ π > 0 (A > 0 denotes the positive definiteness of a Hermitian matrix A). If M does not satisfy the full-rank condition, we have a d -dimensional subspace where M (restricted to the subspace) satisfies the full-rank condition. Since inf π ρ π ∞ ≥ 1/d > 1/d, we have the lower bound of PQI, J(M) ≥ 1/d − 1/d.
We mention the relation between PQI and the von Neumann entropy S(ρ). (Recall that the von Neumann entropy is defined by S(ρ) = −Trρ log ρ). It is easily shown that J(M) = 0 if and only if S(ρ π LF ) = log d. The worst case for Bob also corresponds to the maximum entropy state. As we shall see in Section 6, our formulation is instead related to the minimum entropy.
Next, we consider how to calculate the PQI of a given model. If the model has a certain symmetry, then we obtain the LFP analytically and directly calculate ρ π LF ∞ . On the other hand, due to the minimax theorem in the author's previous work [20], the infimum of the operator norm of a Bayes mixture, inf π ρ π ∞ is easily calculated by finding the representative quantum state of the model, which is defined in Section 3. Thus, we are able to calculate the PQI of a given model by finding the minimax point (the representative quantum state of the model), and to do so, we utilize the algorithm shown in Section 4 in order to find the minimax point in the facility location problem on the unit sphere. We present the above procedure explicitly in Section 5.4 in detail.
The mathematical structure is quite similar to the calculation of channel capacity in classical information theory [34,35]. However, we emphasize that even in a formal analogue, we do not introduce any entropic quantity or any concept from information theory to define the above PQI. What we have used is an imaginary two-person game called the quantum detection game and some basic rules in quantum physics. Taking into account many works in quantum information theory [36], it is a bit surprising to develop purely quantum information without referring to any classical concepts in information theory [37][38][39][40].
We also note that PQI is completely different from other kind of information quantity such as the Fisher information [7,8]. For a parametric model of quantum states, ρ(θ), differentiable with respect to the parameter θ, Fisher information evaluates the change of quantum states, ρ(θ + ∆θ) − ρ(θ). It is related to the distinguishability between two close quantum states ρ(θ) and ρ(θ + ∆θ) from observation after performing some measurements. Let us take a specific example to see the difference. Suppose that we have a continuous one-parameter model M rot = {|ϕ(s) = (cos s)|0 + (sin s)|1 : 0 ≤ s ≤ π/4}. Although quantum Fisher information has been defined in various ways as an extension of classical Fisher information, it is not defined for a discrete model such as M 2 = {|0 , |+ }. Indeed, for M 2 , we only consider distinguishing between two possible states (quantum state discrimination), while for M rot , we have to consider parameter estimation (quantum state estimation), and the estimation error is bounded by SLD Fisher information [4][5][6]. However, PQI yields the same value for both M 2 and M rot .

Basic Formula for PQI Calculation
We provide several examples to show how we calculate the PQI of a model below. We give the following formula, which connects the representative quantum state and PQI. The formula is obtained by the minimax theorem (10).
Or equivalently, we have the formula Using the above formula and result in Section 4, we obtain the PQI of M 2 , M 3 , and M 4 (For the definition, see Section 2.1).
First, we consider the PQI of M 2 . We already know the representative quantum state of M 2 from Section 4. Thus, using the Formula (16), we obtain ρ π LF ∞ = ϕ rep |0 2 =

Example of PQI Calculation: M FP
Next, as a more nontrivial example, we calculate the PQI of the model M FP introduced in Section 5.1. First, following the algorithm in Section 4, let us find the minimax point (the representative quantum state of the model M FP ). In Step (1), we find the most distant pair. We mainly focus on the inner product between two vectors instead of the geodesic distance between them. Then, the most distant pair corresponds to those for which the inner product is the closest to zero. Since | ϕ 1 |ϕ j | 2 = , | ϕ j |ϕ k | 2 = 2 , (j, k = 2, 3, 4), and 0 < 2 < < 1, the most distant pair is {ϕ 2 , ϕ 3 }, {ϕ 2 , ϕ 4 }, and {ϕ 3 , ϕ 4 }.
Using the Formula (7) in Lemma 2, we obtain the geodesic midpoint between ϕ 2 and ϕ 3 , which is denoted by M 1 and . Comparing the inner products, it is easily seen that ϕ 1 is located at a point more distant from ϕ M 1 than ϕ 2 and ϕ 3 . Thus, we go to Step (2) in the algorithm. Note that in the model M FP , all inner products are real and positive. In Step (2), we find the most distant triplet. In our model, it is enough to consider the circumscribed hypercircle in a real four-dimensional Euclidean space. Due to the symmetry, we only check two triangles, ∆ 234 and ∆ 123 whose vertices are {ϕ 2 , ϕ 3 , ϕ 4 } and {ϕ 1 , ϕ 2 , ϕ 3 }, respectively.
First, let Q be the center of the circumscribed hypercircle of the triangle ∆ 234 . (Each edge is a geodesic on the sphere). Generally, the point Q is not uniquely determined. However, by imposing the condition that Q is on the three-dimensional real subspace L 234 = span R {ϕ 2 , ϕ 3 , ϕ 4 }, the point Q is uniquely determined as the point achieving the minimum distance from each vertex (radius of the circumscribed hypercircle). The condition is equivalent to an orthogonality condition, i.e., ψ ∈ L 234 ⇐⇒ ψ|ϕ L = 0, where |ϕ L = ( . Now let ϕ Q = (x, y, z, w) , x, y, z, w > 0 be a vector corresponding to Q. Then it satisfies ϕ Q |ϕ 2 = ϕ Q |ϕ 3 = ϕ Q |ϕ 4 , ϕ Q |ϕ L = 0, and ϕ Q 2 2 = 1, where · 2 denotes the Euclid norm. We obtain the solution ϕ Q = 1 Next, we investigate the other circumscribed hypercircle of the triangle ∆ 123 . In a similar way, we define the point R for ∆ 123 . Then, the state vector corresponding to R is Let each radius of the circumscribed hypercircle be r 123 and r 234 . Then Finally, we check whether the circumscribed hypercircle with center ϕ Q and radius r 234 includes the other point ϕ 1 . (If not, we go to Step (3) in the algorithm) Since we assume 3 holds, this implies that ϕ 1 is closer to the point ϕ Q than the other three points. Thus, the Algorithm stops and ϕ Q is the minimax location.
Using ϕ Q , we obtain Due to Equation (16), V L agrees with the infimum of the detection rate ρ π LF ∞ , and we obtain PQI J(M FP ) = 1+8 12 .

PQI Calculation from LFP
Now let us find the LFP in this model. Since the model M FP has a certain symmetry, we obtain it directly.
Even when we do not find the representative quantum state directly, we can construct it from the LFP in the following way. Since the Bayes mixture with respect to LFP is given by Equation (17), we find the first eigenvector ψ π * with the maximum eigenvalue ρ π * (no degeneracy), which is given by Actually, this agrees with the representative quantum state, ϕ Q , in Section 5.4. Through the minimax theorem [20], we can also directly show that the norm (18) achieves the minimum. To see this, we introduce the following inequality: Since V U ≥ V L (for details, see the author's previous work [20]), we obtain 1+2 3 Thus, one LFP is given by (19). The above argument does not exclude another possibility for the LFP with π 1 > 0.

Difference from Maximization of von Neumann Entropy
In our formulation, PQI has no direct relation to any entropic concept. Since some readers may expect a certain relationship, let us see what happens if we formally adopt the von Neumann entropy to obtain the LFP in the last example. We consider the maximization of S(ρ π ) over the prior π. The concavity of S(ρ) yields where π * denotes LFP (19). For some , we numerically find a positive λ * achieving the maximum of S λ|ϕ 1 ϕ 1 | + (1 − λ)ρ π * . Thus, a positive weight for |ϕ 1 ϕ 1 | could appear under the maximization, which is clearly different from our result.
While the LFP π * , which is obtained by minimization of ρ π ∞ , yields the minimax solution in the quantum detection game, the maximizer, say, π ent , is meaningless, at least in this example. Indeed, ρ π ent ∞ > ρ π * ∞ , which implies that ρ π ent is more informative than ρ π * , and the prior π ent is not the least favorable to Bob anymore.
We have carefully treated the information or uncertainty of a nonorthogonal pure state model and excluded classical fluctuation. As a consequence, the remaining uncertainty is not evaluated by the usual entropy anymore. For a nonorthogonal pure state model, the von Neumann entropy as a measure of information lacks theoretical justification. At least in the quantum detection game, the method based on the von Neumann entropy is a mere formal extension. It makes sense only for a model that consists of orthogonal pure states (see Section 2).
However, there are many variants of entropy [37][38][39][40][41] both in classical and quantum information theory. We discuss a certain relationship between our information quantity and the minimum entropy in the next section.

Discussion: Relation to Entropy
In the previous section, we introduced an information quantity for a pure state model called PQI. Under the full-rank condition, any classical model consists of an orthonormal basis. Then PQI of the model necessarily vanishes.
We emphasize that PQI is literally something purely quantum since we have not formally extended something in classical information theory. It is the information quantity completely independent of the concept of entropy, which does make sense in classical information theory. Thus, a natural question arises, i.e., what kind of relationship do the entropy and PQI have? Actually, PQI is related to the minimum entropy instead of the von Neumann entropy, as we will discuss below.

Jaynes Principle and Distinguishability
First, we briefly review the concept of entropy and the Jaynes principle [42,43]. Suppose that we are given the set of the alphabet. Then our lack of knowledge on the set is evaluated by Shannon entropy through a probability distribution {p i } satisfying p 1 + · · · + p d = 1, p i ≥ 0, i = 1, . . . , d. (Recall that classical Shannon entropy is defined as S cl (p) = ∑ i −p i log p i ). The larger the entropy becomes, the larger the uncertainty we have.
We have minimum information as interpreted as the maximum entropy state, that is, p i = 1/d and thus max p S cl (p) = log d holds. The central idea also provides the theoretical foundation for maximum entropy methods in data processing [44,45].
The underlying concept is distinguishability. In classical information theory, distinguishability holds trivially. In quantum theory, it is represented by the orthogonality of two quantum states. When pure states corresponding to alphabets, say, {1, 2, . . . , d}, are orthogonal to each other, every result in classical information theory is extended in a straightforward manner.
In statistical physics, a physical state of an ensemble is estimated through entropy maximization when we have no knowledge of the system. This way of thinking is called the Jaynes principle [42,43], and it is fundamental to statistical physics. For example, for a given set of eigenstates of a Hamiltonian, say, |E 0 , |E 1 , . . . , with some conditions, we obtain a canonical ensemble by using the principle.
In quantum physics, we are able to consider the maximization of the von Neumann entropy of the density matrix ρ π = ∑ j π j |ψ j ψ j | (Bayes mixture) for orthogonal vectors {ψ 1 , . . . , ψ d }. Since S(ρ π ) = S cl (π), this maximization completely reduces to the classical case. Then the maximizer is the completely mixed state, i.e., 1 d I, which corresponds to the uniform distribution. Formally, additional constraints also yield a quantum exponential family [46], which is the quantum analogue of the classical exponential family [47,48]. However, we have no solid criterion such as the Jaynes principle for a nonorthogonal pure states model. For example, a qubit |0 processed under one unitary operation, which is assumed to be among U 1 , U 2 , U 3 . (|ψ j = U j |0 , j = 1, 2, 3). In a sense, it is a simplified rotation error model (e.g., Kueng et al. [2]). In our formulation, a model M U = {ψ 1 , ψ 2 , ψ 3 } is given. Suppose that we have no information or knowledge on which unitary gate processed the qubit. Then, how do we describe the quantum bit?
Mathematically, it is possible to extend the maximum entropy criterion to the noncommutative case. Then we consider the maximization, sup π S(ρ π ) (20) over the prior π. Is this kind of formal extension enough in quantum information theory?
There are many quantities such as Rényi's entropy [37][38][39][40][41] in both classical and quantum information theory. Is there another possibility to consider such quantities? Every quantum state in the model M U is not orthogonal anymore; thus, they are not distinguishable, which is completely different from the set of the alphabet. In spite of this, do we seek some justification for the maximization of the entropy from classical information theory?
In our formulation, the above formal argument breaks down. First, for the model M U , we describe the system by the representative quantum state ϕ rep (M U ), which is completely independent of the von Neumann entropy. Second, in the quantum detection game between Alice and Bob, we see that the von Neumann entropy is useless in a specific example (Section 5.6). Rather, we consider the least favorable case or the minimization of the detection rate, inf π ρ π ∞ , which is contrastive to the maximization of entropy (20).
If we seek a purely quantum counterpart of the Jaynes principle, then minimization of ρ π ∞ would be promising. Luckily, due to monotonicity, the minimization is equivalent to the maximization of − log ρ π ∞ , where the function − log ρ ∞ is known as the minimum entropy of ρ. Some of its properties are similar to those of the von Neumann entropy and others are not. In the next subsection, we review basic properties of the minimum entropy.

Properties of the Minimum Entropy
In the present subsection, we briefly review basic properties of the minimum entropy and then give another definition for purely quantum information. The minimum entropy of the density matrix ρ is defined by T(ρ) := − log ρ ∞ , which is a special case of quantum Rényi entropy.
However, the concavity does not necessarily hold. Concavity of entropy means that a probability mixture of quantum states increases the uncertainty of the whole system. This negative property is not due to noncommutativity. To see this, let us take two commutative density matrices, Thus, convexity rather than concavity holds in the above example.
Since minimum entropy is based on the operator norm, we obtain a sufficient condition for convexity easily.
To see the above lemma, when we introduce T(ρ) as a variant of entropy over the whole density operators, its theoretical significance seems very weak.
However, when we consider PQI in a pure state model, the situation changes drastically. For the pure state family, concavity of the minimum entropy necessarily holds in the following sense. Lemma 6. Let a set of pure states be given, say M. Then concavity holds, restricted to the model.

Proof.
Choose a finite set of pure states from M, say ρ 1 , . . . , ρ k . Then Other properties of minimum entropy are usually shown in the context of quantum Rényi entropy (see, e.g., Hu and Ye [22] (Section III, p. 4), Dam and Hayden [21]).
Observing the above, we provide another possible definition of purely quantum information through T(ρ). In the quantum detection game, finding the LFP is equivalent to maximizing the minimum entropy T(ρ π ) rather than the von Neumann entropy S(ρ π ). To consider the logarithm of the detection rate, we obtain another definition of purely quantum information such as By definition, J (M) vanishes if the model has orthogonal pure states with the fullrank condition, where the classical case is included.
From Lemmas 5 and 6, we must be careful to treat minimum entropy T(ρ) in the definition of J (M). At least, it should not be considered over the set of the whole density matrices. As a consequence, a comparison of the two definitions, J(M) and J (M), also should be performed carefully and would require a deeper understanding of the model information, which will be a topic for future research.
Finally, we make two comments. First, our definition of the model information yields one operational meaning for the minimum entropy. It is apart from the usual extension of entropic concepts in classical information theory. Rather, it comes from an imaginary design of the quantum detector and facility location problem on the unit sphere in a complex projective space. Second, we expect that the purely quantum version of the Jaynes principle is established based on the minimum entropy. (For some related works on maximum entropy methods, see the reference [49]). It might be possible to develop data processing methods and some dynamics based on the new principle.

Infinite-Dimensional Hilbert Space
Thus far, we have considered the PQI of a model only in a finite-dimensional Hilbert space. While our definition of PQI applies to infinite-dimensional Hilbert space, technical difficulties seem to arise due to a parametric family of functions. In this section, we only skim them in a specific example.
Let L 2 (R) denote the set of square integrable complex functions over R and g be a known continuous function in L 2 (R) satisfying g 2 2 = |g(x)| 2 dx = 1. Let us consider the quantum statistical model M ∞ = {g(x − θ) : θ ∈ R} describing a wavefunction with a single parameter.
Parameter estimation of the shift parameter θ has been theoretically investigated [3]. If we replace the wavefunction g with a probability density function such as the Gaussian density, the estimation problem for the shift parameter θ is called that for the location parameter and is very common in classical statistics [50].
Before evaluating the PQI of the model M ∞ , first let us formally consider quantum state estimation with no observation. It is seen that the worst-case error equals one for every estimation.
For proof, see the Appendix A.1. The above lemma says that every quantum state in L 2 (R) would be a minimax location on "CP ∞ ".
Since the parameter space Θ = R is noncompact, the minimax theorem [20] does not hold generally. However, we directly show that the Formula (16) holds in this specific example, that is, The first equality holds due to the following lemma. Because of technical difficulties, we give a proof in the Appendix A.2. Lemma 8. Let g θ (x) = g(x − θ) with g θ 2 = 1 and be an arbitrary positive constant. Then there exists a finite set {θ 1 , . . . , θ n } and the uniform prior π n over the set such that ρ π n ∞ ≤ 1 n + , ρ π n = 1 n n ∑ j=1 |g θ j g θ j |.
Thus, a formal definition of PQI shows that J(M ∞ ) = inf π ρ π ∞ = 0. We can interpret the result as follows. Even if Bob knows that the quantum state is in the model M ∞ or that the quantum system is described by a wavefunction g(x − θ), he obtains no information, which gives Bob an advantage over Alice in the quantum detection game.
We do not obtain the conditions where PQI is positive with explicit examples. Even if the Formula (16) holds under some conditions, calculation of PQI would become drastically different. A detailed investigation is left for future study.

Concluding Remarks
We have defined one information quantity called purely quantum information (PQI) not for a pure state itself, but for a parametric model of pure states. While PQI evaluates the size of a pure state model, it necessarily vanishes in classical cases by definition. We call the center of the model the representative quantum state, and PQI is determined by the maximum length from the center to each quantum state in the model.
Finally, we give the answer to the problem presented in the beginning of the article. Let ψ θ = e −iθ 2 Z e −iθ 1 Y |0 and calculate PQI for two models: Model M 5 has the same amount of PQI as M 4 , J(M 5 ) = √ 3/6 = 0.289. The PQI of the model M 6 is J(M 6 ) = 1 2 cos 2π 5 = 0.154, which is smaller than J(M 5 ). This implies that M 6 spreads more than M 5 , and we see that the PQI is independent of the dimension of the parameter space.

Conflicts of Interest:
The author declares no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Appendix A. Proofs
Appendix A.1. Proof of Lemma 7 For > 0, we show that there exists M such that First, we take compact sets K and L satisfying K | f | 2 dx > 1 − and L |g| 2 dx > 1 − , respectively. Second, there exists a positive constant M such that |θ| ≥ M ⇒ L + θ = {x + θ : x ∈ L} and K are disjoint. Note that L+θ |g θ | 2 dx > 1 − for every θ. Then, we bound the absolute value of the inner product by two terms: where ∼ K denotes the complement of K. Due to the Cauchy-Schwarz inequality, the second term is bounded by The first term requires more steps. Since K ⊂ ∼ (L + θ), Putting them together, we obtain (A1). Therefore | g θ | f | → 0 as |θ| → 0 and we obtain (21) and (22).
Using the above, we construct the sequence of a prior distribution with finite support (i.e., discrete probability) and Bayes mixture. First, for fixed n, we take a parameter set {θ 1 , . . . , θ n } satisfying |θ i − θ j | ≥ M for i, j = 1, . . . , n.
Without loss of generality, we assume that g θ 1 , . . . , g θ n are linearly independent. Then, set G ij = g θ i |g θ j (G ii = 1). It is easily shown that the gram matrix G = [G ij ] is positive definite. Then, we decompose G into the identity I n and A = G − I n , where diagonal components of A are zero.

Now we show that
First, we expand an arbitrary normalized vector ψ as ψ = ∑ j c j |g θ j . Then 1 = ψ|ψ = c † Gc, where c = (c 1 , . . . , c n ) is a column vector and c † denotes the conjugate transpose of the vector c. Since G is positive definite, we take another parameter vector d as d = √ Gc. Note that there is one-to-one correspondence between ψ and d. Thus, where a 2 2 = aa † . This implies that which shows Equation (A2). Next, we show the inequality (A3). Due to Geršgorin's Theorem (see, e.g., Section 6.1 in Horn and Johnson [51]), all eigenvalues of A are located in the union of n discs Thus, we easily show that the absolute value of each eigenvalue is bounded by (n − 1) .