Information Geometry of Randomized Quantum State Tomography

Suppose that a d-dimensional Hilbert space H≃Cd admits a full set of mutually unbiased bases |1(a)〉,…,|d(a)〉, where a=1,…,d+1. A randomized quantum state tomography is a scheme for estimating an unknown quantum state on H through iterative applications of measurements M(a)=|1(a)〉〈1(a)|,…,|d(a)〉〈d(a)| for a=1,…,d+1, where the numbers of applications of these measurements are random variables. We show that the space of the resulting probability distributions enjoys a mutually orthogonal dualistic foliation structure, which provides us with a simple geometrical insight into the maximum likelihood method for the quantum state tomography.


Introduction
Quantum state tomography is a method of estimating an unknown quantum state represented on some Hilbert space H, consisting of a fixed set of measurements that provides sufficient information about the unknown quantum state, as well as a data processing that maps each measurement outcome into the quantum state space S(H) on H [1]. A set of measurements that fulfils this requirement is sometimes called a measurement basis. For mathematical simplicity, we restrict ourselves to Hilbert spaces of finite dimensions.
To elucidate our motivation, let us treat the simplest case when H C 2 . It is well known that there is a one-to-one affine correspondence between the qubit state space S(C 2 ) := {ρ ∈ C 2×2 | ρ ≥ 0, Tr ρ = 1} and the unit ball (called the Bloch ball) In fact, the correspondence is explicitly given by the Stokes parametrization where σ 1 , σ 2 , and σ 3 are the standard Pauli matrices. Since E ρ x [σ i ] := Tr ρ x σ i = x i for i ∈ {1, 2, 3}, the set σ = (σ 1 , σ 2 , σ 3 ) of observables is regarded as an unbiased estimator [2][3][4] for the Stokes parameter x = (x 1 , x 2 , x 3 ). This is the basic idea behind the standard qubit state tomography, which runs as follows: suppose that, among N independent experiments, the ith Pauli matrix σ i was measured N/3 times, and outcomes +1 (spin-up) and −1 (spin-down) were obtained n + i and n − i times, respectively. Then a naive estimate for the true value of the parameter x = (x 1 , x 2 , x 3 ) iŝ When the estimatex ∈ [−1, 1] 3 falls outside the Bloch ball B, it needs to be corrected so that the new estimate lies in the Bloch ball B. The maximum likelihood method is a canonical one to obtain a corrected estimate [2,[5][6][7][8][9][10]. From the point of view of information geometry [11][12][13], the maximum likelihood estimate (MLE) is the orthogonal projection from the temporary estimatex onto the Bloch ball B with respect to the standard Fisher metric along the ∇ (m) -geodesic [14], (cf., Appendix A). Now let us deal with a slightly generalized situation: suppose that the ith Pauli matrix σ i was measured N i times and outcomes +1 and −1 were obtained n + i and n − i times, respectively, where {N i } i=1,2,3 were random variables. Such a situation arises in an actual experiment due to unexpected particle loss [15]. We shall call such a generalized estimation scheme a randomized state tomography. A naive estimate in this case is the following: One may invoke the maximum likelihood method whenx falls outside the Bloch ball. It is then interesting to ask if there is also a useful geometrical picture for the MLE even when the numbers N i of measurements are random variables.
The above mentioned problem is naturally extended to quantum state tomography on an arbitrary Hilbert space that admits a full set of mutually unbiased bases [16,17] for all a, b ∈ {1, . . . , k} with a = b, and α, β ∈ {1, . . . , d}. It is known that the number k of mutually unbiased bases (MUBs) is at most d + 1 [18]. If there are d + 1 MUBs, the Hilbert space H is said to admit a full set of MUBs. For example, when the dimension d of H is a power of a prime, H admits a full set of MUBs [19]. Whether or not any Hilbert space admits a full set of MUBs is an open question [16].
In what follows, unless otherwise stated, we assume that the Hilbert space H C d under consideration admits a full set of MUBs. As demonstrated in Appendix B (cf., [17,20]), each density operator ρ ∈ S(H) can be uniquely represented as is a (d 2 − 1)-dimensional real parameter that is chosen so that ρ(ξ) ≥ 0. A simple calculation shows that, if the ath measurement M (a) is applied to the state ρ(ξ), one obtains each outcome α ∈ {1, . . . , d} with probability This implies that the parametrization ξ → ρ(ξ) establishes an affine isomorphism between the quantum state space Incidentally, the Stokes parametrization x → ρ x for the qubit state space S(C 2 ) is regarded as a special case of the above parametrization ξ → ρ(ξ) for S(C d ). In fact, the eigenvectors of the Pauli matrices σ 1 , σ 2 , σ 3 form a full set of MUBs on C 2 , and the Stokes parametrization Now that a standard affine parametrization ξ → ρ(ξ) has been established on an arbitrary Hilbert space H C d that admits a full set of MUBs, the scheme of randomized state tomography is naturally extended to H as follows. Suppose that the ath measurement M (a) was applied N (a) times and the outcome α ∈ {1, . . . , d} was obtained n were random variables.
Then, due to (2), a naive estimate for the parameter ξ The objective of the present paper is to clarify that the ∇ (m) -projection interpretation for the MLE is still valid for the randomized state tomography by changing the standard Fisher metric into a deformed one depending on the realization of the random variables N (a) , which might as well be called a randomized Fisher metric. Such a novel geometrical picture will provide important insights into the quantum metrology.
The paper is organized as follows. In Section 2, we first introduce a statistical model on an extended sample space Ω that represents the randomized state tomography. We then clarify that the probability simplex P (Ω) is decomposed into mutually orthogonal dualistic foliation by means of certain ∇ (m) -and ∇ (e) -autoparallel submanifolds. In Section 3, we give a statistical interpretation of the above-mentioned dualistic foliation structure. In particular, we point out that the MLE is the ∇ (m) -projection with respect to a deformed Fisher metric that depends on the realization of the random variables N (a) . These results are demonstrated by several illustrative examples in Section 4. Finally, some concluding remarks are presented in Section 5. For the reader's convenience, some background information is provided in Appendices A and B, including information geometry of the MLE and affine parametrization of a quantum state space S(H).

Geometry of Randomized State Tomography
We identify the randomized state tomography on H C d with the following scheme [21]: at each step of the measurement, one chooses a PVM M (a) at random with probability s (a) , (a = 1, . . . , d + 1), and applies the chosen PVM to yield an outcome α ∈ {1, . . . , d}. The sample space Ω for this statistical picture is Suppose that the unknown state ρ is specified by the coordinate ξ ∈ B as (1). Then the corresponding probability distribution on Ω is represented by the d(d + 1)-dimensional probability vector where the parameter s := (s (1) , . . . , s (d) ) belongs to the domain Note that the family forms a (d 2 + d − 1)-dimensional open probability simplex P (Ω), and the parameters (s, ξ) form a coordinate system of P (Ω). Since we are only interested in estimating the parameter ξ ∈ Ξ, the remaining parameter s ∈ D is understood as a set of nuisance parameters [2,12]. In what follows, we regard P (Ω) as a statistical manifold endowed with the standard dualistic structure g, ∇ (e) , ∇ (m) , where g is the Fisher metric, and ∇ (e) and ∇ (m) are the exponential and mixture connections [12]. Let us consider the following submanifolds of P (Ω): Since M(s) and E(ξ) are convex subsets of P (Ω), they are both ∇ (m) -autoparallel.
In addition, we have the following.

Proposition 1.
For each ξ ∈ Ξ, the submanifold E(ξ) is ∇ (e) -autoparallel. Furthermore, for each s ∈ D and ξ ∈ Ξ, the submanifolds M(s) and E(ξ) are mutually orthogonal with respect to the Fisher metric g.

Proof.
Let us change the coordinate system (s, for a ∈ {1, . . . , d}, and With this coordinate transformation, the probability vector p (s,ξ) is rewritten as Here, η d+1 is a function of {η a } a∈{1,...,d} defined by and is not a component of the coordinate system η := (η a , η b,α ). We see from the representation (3) that the coordinate system η is ∇ (m) -affine. The potential function for η is given by the negative entropy and the dual ∇ (e) -affine coordinate system θ is given by for a ∈ {1, . . . , d}, and Thus, fixing ξ is equivalent to fixing the coordinates  On the other hand, the submanifold E(ξ) is rewritten as are fixed and η a a∈{1,...,d} are arbitrary .
Thus, the orthogonality of M(s) and E(ξ) is an immediate consequence of the orthogonality of the dual affine coordinate systems θ and η with respect to the Fisher metric g.
Proposition 1 implies that the manifold P (Ω) is decomposed into mutually orthogonal dualistic foliation based on the submanifolds M(s) and E(ξ), as illustrated in Figure 1. We shall exploit this geometrical structure in the next section. is affinely isomorphic to the parameter space Ξ. The greyish cylindrical area indicates the subset B = {p (s,ξ) |s ∈ D, ξ ∈ B} of P (Ω). In particular, for each s ∈ D, the intersection M(s) ∩ B is affinely isomorphic to the physical domain B that corresponds to the state space S(H).

Estimation of the Parameter ξ
Let us proceed to the problem of estimating the unknown parameter ξ using the randomized tomography. Suppose that, among N independent repetitions of experiments, the ath measurement M (a) was applied N (a) times and outcomes α ∈ {1, . . . , d} were obtained n (a) α times. Then temporary estimates (ŝ,ξ) for the parameters (s, ξ) are given bŷ Ifξ has fallen outside the physical domain B, one may seek a corrected estimate by the maximum likelihood method. Observe that, due to (2), the empirical distributionq N ∈ P (Ω) is represented asq On the other hand, the physical domain B in the parameter space Ξ corresponds to the subset of P (Ω), (see Figure 1). The MLE p * in P (Ω) is then given by where D( · · ) is the Kullback-Leibler divergence (cf., Appendix A). A crucial observation is the following. (5) is achieved on M(ŝ) ∩ B.

Proposition 2. The minimum in
Proof. Let us take a point p (s,ξ) ∈ B arbitrarily. It then follows from the mutually orthogonal dualistic foliation of P (Ω) established in Proposition 1 that In the second equality, the generalized Pythagorean theorem was used. Consequently, for all s ∈ D, and the right-hand side is achieved if and only if s =ŝ.
The geometrical implication of Proposition 2 is illustrated in Figure 2. The MLE p * = p (ŝ,ξ * ) is the ∇ (m) -projection from the empirical distribution p (ŝ,ξ) to B, and is on the section M(ŝ) specified by the temporary estimateŝ. Now we arrive at a geometrical picture behind the parameter estimation based on randomized state tomography. Suppose we are given a temporary estimate (ŝ,ξ) withξ / ∈ B. Due to Proposition 2, we can restrict ourselves to section M(ŝ) as the search space for the MLE p * . Since each section M(ŝ) is affinely isomorphic to the parameter space Ξ, we can introduce a dualistic structure (g,∇ (e) ,∇ (m) ) on Ξ in the following way. Firstly, we identify the metricg with the Fisher metric g restricted on M(ŝ), that is,g Secondly, the mixture connection∇ (m) on Ξ is defined through the natural affine isomorphism between M(ŝ) and Ξ. Finally, the dual connection∇ (e) is defined by the dualitỹ Thus, the MLE ξ * in the parameter space Ξ is interpreted as the∇ (m) -projection fromξ to the physical domain B with respect to the metricg.

Examples
In this section, we present some examples that demonstrate the implication of Proposition 2 as well as the general diagram given in Figure 2.

When dim H = 2
Let us first study the simplest case when H = C 2 . A full set of MUBs is given by With these bases, the parameter representation (1) becomes is the standard Stokes parameter, which is related to ξ = (ξ (1) 1 , ξ 1 ) as x a = 2ξ (a) Figure 3 demonstrates how the∇ (m) -projection is realized. Here, the trajectories of ∇ (m) -projections that gives the MLE p * are plotted only on the x 1 x 2 -plane. The left and right panels correspond to the cases when N (1) : N (2) = 1 : 1 and N (1) : N (2) = 5 : 1, respectively. The change of ξ 1 -coordinate relative to the change of x 2 -coordinate along each trajectory is less noticeable in the right panel than in the left panel. This is because a tomography with N (1) /N (2) = 5 provides us with more information about x 1 -coordinate, relative to x 2 -coordinate, as compared with the case when N (1) /N (2) = 1.

When dim H = 3
The space H = C 3 admits a full set of MUBs; for example, where ω = (−1 + i √ 3)/2 is a primitive third root of unity. With these bases, the parameter representation (1) becomes The physical domain B that corresponds to the state space S(C 3 ) is a compact convex subset of the parameter space Ξ (⊂ R 8 ), and the extreme points of B form an algebraic variety with respect to the parameters 2 ). A numerical example of a∇ (m) -projection that gives the MLE is illustrated in Figure 4, where no probe particle is lost, that is, whenŝ In Figure 4, the dot laid outside the greyish region indicates the empirical distribution, i.e., the temporary estimateξ  Furthermore, the greyish region represents the physical domain B cut by a two-dimensional affine subspace of Ξ specified by the equation The vector v was chosen randomly under the condition that v ⊥ξ − ξ * and v = ξ − ξ * , where the orthogonality ⊥ and the norm · are understood relative to the standard Euclidean structure of R 8 . In Figure 4, the vector v was taken to be  Figure 4 also demonstrates that the sections of the physical domain B show a variety of shapes. Unfortunately, due to this asymmetry of B, we were unable to find a (nontrivial) two-dimensional affine subspace on which every∇ (m) -projection runs. Such a difficulty is in good contrast to the simplest case H C 2 , where the set B is rotationally symmetric and the∇ (m) -projections can be displayed on any two-dimensional section of B that passes through the origin of B as Figure 3.

When dim H ≥ 4
The space H = C 4 is also known to admit a full set of MUBs since dim H = 4 is the second power of the prime number 2; for example [22], It is straightforward to calculate the parameter representation (1) of a state ρ ∈ S(C 4 ); however, the corresponding density matrix is rather complicated, and we omit to display it here.
When H = C 6 , or more generally, when dim H is not a power of a prime, we do not know whether H admits a full set of MUBs. Let us touch upon a situation where a Hilbert space H, if it exists, does not admit a full set of MUBs. In this case, there is no measurement basis M (a) that allows a parametrization ξ of the state space S(H) having a direct connection to the probability distribution of the outcomes as (2). Such a situation could be comparable to the case when the Gell-Mann matrices [23] are used as the measurement basis for estimating an unknown state on H = C 3 . A state ρ ∈ S(C 3 ) is represented as where λ 1 , . . . , λ 8 are the Gell-Mann matrices, and x = (x 1 , . . . , x 8 ) is a set of real parameters. The physical domain forms a compact convex subset of the unit ball in R 8 . With the state ρ x , the probability distribution of obtaining the eigenvalues (−1, 0, 1) of the observable λ 1 is while the probability distribution of obtaining the eigenvalues (−1, 0, 1) of the observable λ 2 is Note that the probability of obtaining the eigenvalue 0 of λ 1 is identical to that of λ 2 . However, in a randomized estimation scheme in which λ i is measured N i times, the frequency of obtaining the eigenvalue 0 of λ 1 would be different from that of λ 2 . Consequently, one cannot assign a consistent temporary estimatex 8 for the parameter x 8 in that case. Put differently, the empirical distributionq N on the extended outcome space Ω does not in general have a coordinate representation (4). Thus, the existence of a full set of MUBs is crucial in our analysis.

Concluding Remarks
In the present paper, we explored an information geometrical structure of the randomized quantum state tomography, assuming that the Hilbert space under consideration admits a full set of MUBs. We first introduced a classical statistical model {p (s,ξ) } s,ξ on an extended sample space Ω, and found that the probability simplex P (Ω) was decomposed into mutually orthogonal dualistic foliation (Proposition 1). We then clarified that this geometrical structure had a statistical importance in estimating the coordinate ξ of an unknown quantum state ρ(ξ) under the existence of the nuisance parameter s (Proposition 2). This result gave a generalized insight into the ∇ (m) -projection interpretation for the MLE in that a similar interpretation was still valid for the randomized quantum state tomography by changing the standard Fisher metric into a deformed one. It also provided us with a new, convenient way of data processing in the actual quantum state tomography that may involve unexpected probe particle loss.
It should be noted that the existence of a full set of MUBs ensures the parametrization (1) of the quantum state space S(H). Such a parametrization is distinctive in that it enables a direct correspondence between the parameter space and the probability simplex, realizing the coordinate representation (4) of the empirical distributionq N . Thus, the use of a full set of MUBs is crucial in our analysis. Nevertheless, it is often the case that the Hilbert space under consideration takes the form H (C p ) ⊗n for p = 2 or 3 because qubits or qutrits are often regarded as building blocks of various quantum protocols. Therefore, the existence of a full set of MUBs would not be too strong a requirement in applications.
Author Contributions: The authors contributed equally to this work.

Funding:
The present study was supported by JSPS KAKENHI Grant Numbers JP22340019 and JP17H02861.

Acknowledgments:
The authors are grateful to Ryo Okamoto and Shigeki Takeuchi for helpful discussions.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: This set may be identified with the (|Ω| − 1)-dimensional (open) simplex, where |Ω| denotes the number of elements in Ω, and thus it is sometimes referred to as the probability simplex on Ω. The set P (Ω) is also regarded as a statistical manifold endowed with the dualistic structure (g, ∇ (e) , ∇ (m) ), where g is the Fisher metric, and ∇ (e) and ∇ (m) are the exponential and mixture connections [11][12][13].
Suppose that the state of the physical system at hand belongs to a (closed) subset M of P (Ω), but we do not know which is the true state. We further assume that the probability distributions of M are faithfully parametrized by a finite dimensional parameter θ as M = {p θ (ω) | θ ∈ Θ}.
In this case, M is called a parametric model, and our task is to estimate the true value of the parameter θ that specifies the true state. Suppose that, by n independent experiments, we have obtained data (x 1 , x 2 , . . . , x n ) ∈ Ω n . This information is compressed into the empirical distribution, an element of P (Ω) defined bŷ q n (ω) := Number of occurrences of ω in data (x 1 , x 2 , . . . , x n ) n for each ω ∈ Ω, where δ x i (ω) is the Kronecker delta. Ifq n belongs to the model M, then we have an estimateθ n that satisfies pθ n =q n . However, the empirical distributionq n does not always belong to the model M. Whenq n / ∈ M, we need to find an alternative estimate from the data. A canonical method of finding an alternative estimate pθ n ∈ M is the maximum likelihood method, in which one seeks the maximizer of the likelihood function θ −→ p θ (x 1 )p θ (x 2 ) . . . p θ (x n ), within the domain Θ of the parameter θ, so that θ n := arg max θ∈Θ {p θ (x 1 )p θ (x 2 ) · · · p θ (x n )} .
We can rewrite this relation as follows. is the Kullback-Leibler divergence from q to p. In other words, the maximum likelihood estimate (MLE) pθ n is the point on M that is "closest" from the empirical distributionq n as measured by the Kullback-Leibler divergence: pθ n = arg min p∈M D(q n p).
Due to the generalized Pythagorean theorem, the MLE is geometrically understood as the ∇ (m) -projection fromq n to M or its boundary, as illustrated in Figure A1.