Canonical Divergence for Measuring Classical and Quantum Complexity

A new canonical divergence is put forward for generalizing an information-geometric measure of complexity for both classical and quantum systems. On the simplex of probability measures, it is proved that the new divergence coincides with the Kullback–Leibler divergence, which is used to quantify how much a probability measure deviates from the non-interacting states that are modeled by exponential families of probabilities. On the space of positive density operators, we prove that the same divergence reduces to the quantum relative entropy, which quantifies many-party correlations of a quantum state from a Gibbs family.


Introduction
The many fields of applicability of methods of information geometry to the science of complexity encompass both classical and quantum systems [1]. Among them, an information-geometric approach to the complexity as the extent to which an object, as a whole, is more than its parts was established in [2] and then developed to relate various known measures of complexity to a general class of information-geometric complexity measures (see [3] for a comprehensive overview on this topic). The general idea for quantifying the extent to which the system is more than the sum of its parts is the following. Let S be a set of systems; for any system S ∈ S, we assign the collection of system parts which may be an element of a set S 0 that formally differs from S. The corresponding assignment Π : S → S 0 can be interpreted as a reduced description of the system S in terms of its parts. Having the parts Π(S), we have to reconstruct S by taking the sum of the parts in order to obtain a system that can be compared with the original system. The corresponding construction map is denoted by Σ : S 0 → S. The composition P(S) := (Σ • Π)(S) then corresponds to the sum of parts of the system S, and we can compare S with P(S). It turns out that P, under natural conditions, is the projection P : S → N to the set of non-complex systems N := {S ∈ S | P(S) = S} [4]. Therefore, the quantification of how much the system S differs from P(S) is established by a divergence function D : S × S → R such that D(S, S ) ≥ 0, D(S, S ) = 0 iff S = S .
Here comes the role of a canonical divergence for providing an information-geometric measure of complexity which can be interpreted as unique.
In the framework of information geometry, a dual structure (g, ∇, ∇ * ) on a smooth manifold M is given in terms of a metric tensor and two affine connections, which are dual in the following sense [5]: where T (M) denotes the space of sections on M. Eguchi named a function D : M × M → R satisfying the property in Equation (1) as a contrast (or divergence) function whenever D allows recovering the dual structure (g, ∇, ∇ * ) on M in the following way [6]: where ∂ i = ∂ ∂ξ i p and ∂ i = ∂ ∂ξ i q and {ξ p := (ξ 1 p , . . . , ξ n p )} and {ξ q := (ξ 1 q , . . . , ξ n q )} are local coordinate systems of p and q, respectively. Here, Γ ijk = g ∇ ∂ i ∂ j , ∂ k and Γ * ijk = g ∇ * ∂ i ∂ j , ∂ k are the connection symbols of ∇ and ∇ * , respectively. The investigation on a divergence function allowing to recover the dualistic structure on a smooth manifold is usually referred to as the inverse problem in information geometry. Matumoto [7] showed that such divergence exists for any statistical manifold. However, it is not unique and there are infinitely many divergences that give the same dual structure. Hence, the search for a divergence that can be somehow considered as the most natural is of upmost importance. When a manifold is dually flat, Amari and Nagaoka [5] introduced a Bregman type divergence to this end, with relevant properties concerning the generalized Pythagorean theorem and the geodesic projection theorem. This is referred to as canonical divergence and it is commonly assessed as the natural solution of the inverse problem in information geometry for dually flat manifolds. However, the need for a general canonical divergence, which applies to any dualistic structure, is a very crucial issue, as pointed out in [8]. In any case, such a divergence should recover the canonical divergence of Bregman type if applied to a dually flat structure. In addition, in the self-dual case where ∇ = ∇ * coincides with the Levi-Civita connection of g, the divergence D should be one half of the squared Riemannian distance: D(p, q) = 1 2 d(p, q) 2 [3].
In the context of the information-geometric approach to complexity, a further requirement is needed to ensure the compatibility in Equation (3). This is the geodesic projection property, which, in the present context, states that every minimizer P(S) of D is achieved by the geodesic projection of S onto the set of non-complex systems. In [9], Ay and Amari recently introduced a canonical divergence that satisfies all these requirements. Such a divergence is defined in terms of geodesic integration of the inverse exponential map. More precisely, given p, q ∈ M and the ∇-geodesic σ(t) (0 ≤ t ≤ 1) connecting q with p, the canonical divergence introduced in [9] is given by Here, exp : TM → M denotes the exponential map of ∇, which is defined by exp(X) = σ X (1) whenever the ∇-geodesic σ X (t), satisfyingσ X (0) = X, exists on an interval of t containing [0, 1]. Therefore, if σ(t) (0 ≤ t ≤ 1) is the ∇-geodesic such that σ(0) = p and σ(1) = q, then exp −1 p (q) := σ(0). According to this definition, we have that X t (p) = P σ(t) X p (σ(t)) = tσ(t), where P is the ∇-parallel transport from p to σ(t). This implies that the divergence D(p, q) assumes the following useful expression: Analogously, the dual function of D(p, q) is defined as the ∇ * -geodesic integration of the inverse of the ∇ * -exponential map [9]. Therefore, we have for the dual divergence D * a similar expression as Equation (7) for the canonical divergence D: where σ * (t) (0 ≤ t ≤ 1) is the ∇ * -geodesic connecting p with q. Therefore, the compatibility in Equation (3) of D with P suggests that the projection P(S) of a system S onto the space of non-complex systems can be achieved along the geodesic connection S with P(S). Actually, it has recently been proved that the ∇-geodesic minimizes the action integral of a suitably chosen kinetic energy [10].
An analogous result holds about the ∇ * -geodesic. In this way, both divergences, D(p, q) and D * (p, q), turn out to solve the Hamilton-Jacobi problem in information geometry, as put forward in [11]. The search for a general canonical divergence is still an open problem and it turns out to be of upmost importance in the context of the information-geometric approach to complexity (see progresses along this avenue put forward in [9,12]).
In this article, we aim to propose the canonical divergence in Equation (7) as an efficient tool for providing a unified definition of complexity measures. For this reason, we firstly consider D on the simplex of probability distributions where a measure of complexity as one instance of Equation (2) is supplied in terms of the Kullback-Leibler (KL)-divergence [4].
The general methods described for defining the complexity measure in Equation (2) can be particularized to the systems consisting of a finite node set V and each node v ∈ V can be in finitely many states I v . Then, we model the whole system as a probability measure p on the corresponding product configuration set I V = ∏ v∈V I v . The parts are given by marginals p A where A is taken from a set of subsets of V, denoted by S. Therefore, the decomposition map Π reads in this case as Π(p) = (p A ) A∈S , whereas the reconstruction map Σ is defined by the maximum entropy estimatep of p, leading to the projection π S : p →p. The image of π S turns out to be the closure of an exponential family E S , which plays the role of the set N of non-complex systems. A deviation measure, which is compatible with the maximum entropy projection π S is then the (KL)-divergence, which is defined by on the n-simplex P n = {p = (p 1 , . . . , p n ) | p i > 0 , ∑ i p i = 1} [6]. Finally, the measure of complexity as one instance of Equation (2) is obtained by We may notice that, if S consists of all subsets of V of cardinality 1, elements of the set E S of non-complex systems are totally uncorrelated in the sense that q ∈ E S has the product form q = q 1 ⊗ . . . ⊗ q n [2]. Consider random variables X 1 , . . . , X n with joint probability distribution p and marginal probability distributions p 1 , . . . , p n . Then, we have where H is the Shannon entropy. This quantity is referred to as multi information and denoted by I(X 1 , . . . , X n ). In particular, when n = 2, this is nothing but the mutual information. Very remarkably, the minimizerp in the closure of E S of the (KL)-divergence, namely KL(p,p) = inf is obtained by projecting p onto the closure of E S along a mixture (m)-geodesic [13]. This is usually referred to as the geodesic projection property of the (KL)-divergence. The geometric structure given by the Fisher metric, the mixture (m) and exponential (e) affine connections was introduced by Amari and Nagaoka on the space of probability densities for studying statistical estimation problems [5].
In this article, we then consider both divergences, D and D * , on P n with the endowed dualistic structure given by the classic Fisher metric and the mixture (m) and the exponential (e) connections. Here, we show that D(q, p) = KL(q, p) = D * (p, q). Actually, this result has already been shown in [9]. However, we prove it differently by relying on the nice representations of D and D * given by Equations (7) and (8), respectively. This proves that D can be interpreted as a generalization of the (KL)-divergence.
A further step for proving the effectiveness of D is to consider it (and its dual function) on the manifold of quantum states where the general idea for defining a complexity measure of a classic system expressed by Equation (2) has been extended to the quantum setting in terms of the quantum relative entropy [14]. More precisely, by considering a composite set of n ∈ N units (or parties, or particles), [n] := {1, . . . , n}, the composite system is described by the product algebra A [n] := A 1 ⊗ . . . ⊗ A n . Here, A i ⊂ M n i is the C * -subalgebra of complex n i × n i matrices such that the identity I n i ∈ A i . The many-party correlations are quantified in the state of a composite quantum system which can not be observed in subsystems composed of less than a given number of parties. In this context, the exponential families, which amount to the non-complex system in the classical case, are replaced by states that are fully described by their restriction to selected subsystems. These correspond to the family of Gibbs states E k := {e H k /Tre H k } of the k-local Hamiltonians H k . Here, a k-local Hamiltonian is defined as a sum of product terms a 1 ⊗ . . . ⊗ a n with at most k-non-scalar factors a i , where a i denotes a real self-adjoint operator. Therefore, the many-party correlations of a composite quantum state ρ ∈ A [n] which captures all correlations in ρ that cannot be observed in any k-party subsystem is the divergence from the Gibbs family E k [14]. Here, the divergence Q(ρ, σ) is the quantum relative entropy defined by where Tr denotes the trace operator on the finite-dimensional Hilbert space of density matrices. Similar to the classical case, we can consider the family E 1 of Gibbs states whose closure corresponds to the set of product states σ 1 ⊗ . . . ⊗ σ n . Consider then a composite quantum state ρ ∈ A [n] such that where a ∈ A {i} = A i and I [n]\{i} is the identity operator on the product A 1 ⊗ . . .Â i . . . ⊗ A n where A i is missing. In this case, the many-party correlations of ρ is the quantum multi information: where H(ρ) = −Tr(ρ log ρ) is the von Neumann entropy of ρ. In particular, when n = 2, this corresponds to the quantum mutual information. Algorithms for the evaluation of Q(ρ, E k ) as a complexity measure for quantum states are studied in [15]. In that context, the many-party correlations is related to the entanglement of quantum systems as defined in [16]. The scope of the present article is mainly to present the canonical divergence D defined in Equation (7) as an important tool for generalizing the concept of complexity measure claimed by Equation (10) for classical systems as well as the concept of many-party correlation given by Equation (11) for quantum systems. To this end, we consider the space of density matrices endowed with the quantum analog of the Fisher metric and the mixture (m) and exponential (e) affine connections. This structure turns out to be induced on the manifold of positive density operators by the Bogoliubov inner product [17]. In this setting, we prove that the divergence introduced in [9] reduces to the quantum relative entropy. In addition, we also show that D(σ, ρ) = Q(σ, ρ) = D * (ρ, σ).
The layout of the paper is as follows. Section 2 is devoted to the calculation of the canonical divergence and its dual function on the simplex of probability distributions. In Section 3, we describe the differential geometrical framework for finite quantum systems induced by the Bogoliubov inner product. In this particular framework, we then prove that the divergence given by Equation (7) reduces to the quantum relative entropy. Finally, we draw some conclusions in Section 4 by outlining the results obtained in this work and discussing possible extensions.

Canonical Divergence on the Simplex of Probability Measures
A dualistic structure on the simplex of probability measures was introduced by Amari in terms of the Fisher metric, the mixture (m) and exponential (e) connections [18]. Given a finite set I = {1, . . . , n}, we can represent probability measures on the set I as elements of R n . In this representation, the Dirac measures δ i , i = 1, . . . , n form the canonical basis of R n . Then, the (n − 1)-dimensional simplex of probability measure is given by In this section, we show that the canonical divergence D(p, q) coincides with the Kullback-Leibler divergence whenever p, q ∈ S n . In addition, we prove that, for the dual canonical divergence, the following relation D * (p, q) = KL(q, p) holds true. According to Equations (7) and (8), we need the Fisher metric defined on the tangent bundle TS n , the mixture (m)-geodesic and the exponential (e)-geodesic both connecting p with q. On the tangent space T p S n , the Fisher metric results in The dualistic structure (g, ∇, ∇ * ) on S n , given by the Fisher metric, the (m)-connection ∇ and the (e)-connection ∇ * , is dually flat, and the (m)and (e)-geodesics connecting p with q are [3]: We are now ready to compute the canonical divergence D(p, q) for arbitrary p, q ∈ S n . From Equations (7), (14) and (15), we have that where we use ∑ i (q i − p i ) = 0 because p, q ∈ S n . Analogously, we can compute the dual canonical divergence D * (p, q) by means of Equation (8). Therefore, by using Equations (14) and (16), we obtain that To develop further the calculation, let us analyze the derivativeγ i e (t). Recall that Therefore, by taking the derivative of γ i e (t) with respect to t, we obtaiṅ By stepping back to Equation (18), we start by performing an integration by parts: where the last term is obtained by noticing thaṫ because p, q ∈ S n . This proves that D * (p, q) = KL(q, p) = D(q, p) .

Geometric Structure of a Manifold of Quantum States
We start this section by showing that natural analogs of the Fisher metric and the exponential and mixture connections are defined on a manifold of quantum states [17]. To this end, we need to specify an inner product on the space of density operators. Since the divergence D of Equation (7) is defined on a statistical manifold (M, g, ∇, ∇ * ) with symmetric connections, we choose the Bogoliubov inner product. This is because of a well-known result that claims the (e)-connection induced by a generalized covariance is symmetric if and only if such a covariance is the Bogoliubov inner product [5]. At the end of this section, we motivate this choice in more detail.
Let H be a finite-dimensional Hilbert space, A = {A | A = A * } be the space of all the Hermitian operators on H and S = {ρ | ρ = ρ * > 0, Trρ = 1} be the space of positive density operators on H. Since S is an open subset of A 1 := {A | A = A * , TrA = 1}, then it can be naturally seen as a smooth manifold of dimension n = (dim H) 2 − 1 [17]. Let D ∈ T ρ S be a tangent vector at ρ to S; we call D (m) ∈ A 0 := {A | A ∈ A, TrA = 0} its (m)-representation and symbolically write It is worth noticing that, as an element of the tangent space, D can be naturally interpreted as a derivative. As an example, when a coordinate system {θ i } is given on S so that each state is parameterized as ρ ≡ ρ θ , the (m)-representation of the natural basis vector is written as (∂ i ) (m) = ∂ i ρ θ , where D = ∂ i = ∂/∂θ i . This allows us to introduce the (m)-connection on the manifold S of the quantum states in terms of the covariant derivative ∇ (m) : T (S) × T (S) → T (S), which is defined by the following relation: where the right hand side means the derivative by X of Y (m) : S → A 0 and T (S) denotes the space of sections on S. To introduce the (e)-connection on S, we need to specify a family { ·, · ρ | ρ ∈ S} of inner products on A usually named as generalized covariance. For the reason mentioned above, we consider the Bogoliubov inner product, which is given by Given D ∈ T ρ S, we then define the (e)-representation of D as the Hermitian operator D (e) ∈ A satisfying the following relation: For all A ∈ A, we assume A, I ρ = A ρ = Tr(ρA) (I denotes the identity operator). Thus, we can see that the derivative of the function A : ρ → A ρ by D is written as This implies that we can consider the (e)-representation D (e) ∈ A of a given D ∈ T ρ S as Therefore, it turns out that D (e) is the derivative of the map ρ → log ρ from S to A, which may be written as follows: D (e) = D log ρ .
where the right hand side means the derivative by X ρ of [Y] ρ : S → T ρ S. Finally, we define the inner product g ρ on T ρ S by which is usually called the quantum Fisher metric. The procedure thus far described endows the manifold S of quantum states with a geometric structure (g, ∇ (e) , ∇ (m) ) given by the quantum Fisher metric, and two torsion-free connections, namely the (e)-connection ∇ (e) and the (m)-connection ∇ (m) , which are dual with respect to g in the following sense: In addition, the dual structure (g, ∇ (m) , ∇ (e) ) is dually flat, meaning that the curvature tensors of ∇ (e) and ∇ (m) are both null.
Suppose that a coordinate system {ξ i } is given and that each element ρ ∈ S is specified by the coordinate ξ ∈ R n as ρ ≡ ρ ξ . According to Equation (20), we have that the mixture representation i ρ = ∂ i log ρ ξ . Therefore, the dual structure (g, ∇ (e) , ∇ (m) ) with respect to an arbitrary coordinate system {ξ i } reads as follows A generalized covariance is a family { ·, · ρ | ρ ∈ S} of inner products on the space of Hermitian operators A on the Hilbert space H, where A, B ρ depends smoothly on ρ for all A, B ∈ A and that satisfies the following properties: • For every U unitary matrix on the Hilbert space H, it is • If the Lie bracket [ρ, A] = 0, then A, B ρ = Tr (ρAB) .
This can be viewed as a quantum version of the L 2 -product of random variables A and B with respect to a probability measure p. Since E p [A, B] is the covariance of A and B when their expectations vanish, we can call the family { ·, · ρ | ρ ∈ S} satisfying the above conditions a generalized covariance. According to the theory by Eguchi, a divergence function D : M × M → R * induces a dual structure (g, ∇, ∇ * ) on M in the way expressed by Equations (4) and (5). It turns out that the connections ∇ and ∇ * obtained in such a way are torsion-free (or symmetric) [13]. To use the canonical divergence in Equation (7) in the quantum setting, we are then forced to select the Bogoliubov inner product for providing the quantum analog of the Fisher metric, the (m)-connection and (e)-connection on the manifold of positive density operators. Indeed, while the (m)-connection is always torsion-free, it turns out that the (e)-connection induced on S from a generalized covariance is symmetric if and only if such a covariance is the Bogoliubov inner product.

Canonical Divergence on the Manifold of Quantum States
In this section we show that the divergence function of Equation (7) reduces to the quantum relative entropy whenever the dual structure (g, ∇ (m) , ∇ (e) ) on S is given by the Fisher metric (Equation (28)), the mixture connection (Equation (21)) and the exponential connection (Equation (27)).
Let ρ 1 , ρ 2 ∈ S be two density matrices. To implement the computation of the divergence D(ρ 1 , ρ 2 ) for quantum states, we consider the (m)-geodesic γ m (t) = (1 − t) ρ 1 + t ρ 2 [19]. Then, the (m) and (e) representations of the tangent vectorγ m (t) are easily computed by means of Equations (20) and (25), respectively: From Equations (7) and (28), we have then Let us recall that γ m (t) is a curve in the space of density matrices and the logarithm of a positive matrix is a well-defined matrix. Therefore, the derivative with respect to t of log γ m (t) is viewed as the matrix of the derivatives of the entries of log γ m (t) with respect to t. Moreover, the same holds for the integration of a matrix: this is the matrix of the integration of the entries. Finally, since the trace is a linear operator it commutes with the integration. Hence, with the abuse of notation where we keep γ m instead of the entry (γ m ) ij , the computation in Equation (33) is performed as follows by integration by parts: This proves that D(ρ 1 , ρ 2 ) = Tr (ρ 1 (log ρ 1 − log ρ 2 )), which is the quantum relative entropy given by Equation (12).
The dual divergence of D(ρ 1 , ρ 2 ) is computed by considering the (e)-geodesic connecting ρ 1 and ρ 2 . Let ρ 1 = e H , where H is a self-adjoint Hamiltonian. Then, the (e)-geodesic from ρ 1 to ρ 2 is given by where A = log ρ 2 − log ρ 1 and e H+t A denotes the exponential matrix [19]. Since the trace operator is linear in its argument, it commutes with the derivative operator. Therefore, according to Equations (20) and (25), we obtain that the (m) and (e) representations ofγ e (t) are given bẏ The dual divergence of D(ρ 1 , ρ 2 ) is written as follows: To perform the computation in Equation (37) At this point, we can use the linearity of the trace operator and then the latter expression reduces to: Carrying the integration by parts out, we obtain where we use Trρ 1 = Trρ 2 = 1. This proves that D * (ρ 1 , ρ 2 ) = Trρ 2 (log ρ 2 − log ρ 1 ) = D(ρ 2 , ρ 1 ) .

Conclusions
As we have demonstrated, for a geometric definition of a general complexity measure, it is important to have a canonical divergence. This paper is based on recent progresses in defining a general canonical divergence within Information Geometry [9,12]. This divergence is defined in terms of geodesic integration of the inverse exponential map and holds the geodesic projection property when the structure (g, ∇, ∇ * ) is dually flat [3]. Let p ∈ M and M ⊂ M be a submanifold of M, the search forp ∈ M that minimizes the divergence D(p, q), q ∈ M, supplies the solution for defining an information-geometric complexity measure as in Equation (2). When every minimizerp of the divergence D is given by the geodesic projection of p onto M, we say that D holds the geodesic projection property. In this regard, the canonical divergence in Equation (7) would provide a measure of complexity as Equation (2) for a quite wide range of systems. A further step for defining Equation (2) for general systems has been put forward in [12], where a new divergence is introduced that turns out to be a generalization of the canonical divergence in Equation (7). As an example of Equation (2), we have considered the measure of complexity given by Equation (10), which quantifies how much a probability measure on the product configuration set of the finitely many states on a discrete set {1, . . . , n} deviates from a family of exponential probabilities that amounts to the non-complex set of system states, as it is given by non-interacting states [2]. In this case, the Kullback-Leibler divergence turns out to be suitable for providing the measure of complexity in Equation (2) for classic states on discrete sets [4]. To put the theory of Ay [2] in perspective and propose the canonical divergence in Equation (7) as suitable for supplying the complexity in Equation (2) on general systems, we have then proved that D coincides with the (KL)-divergence on the simplex of probability measures endowed with the dual structure given by the Fisher metric and the mixture and exponential connections.
The quantum counterpart of the general theory yielding the measure of complexity in Equation (2) does not yet exist. However, a quantum analog of Equation (10) has been established on the manifold of positive density operators [14]. Here, the family of non-interacting states is replaced by states that are fully described by their restriction to selected subsystems that turn out to be a family of Gibbs states. Therefore, many-party correlations are quantified in the state of composite quantum system, which cannot be observed in subsystems composed of fewer than a given number of parties. The suitable tool for providing such a quantification is established by the quantum relative entropy. This is because the maximum-entropy principle solves the inverse problem to reconstruct a global state from subsystem states and it also gives a natural scale of many-party correlation in terms of the gap to the maximal entropy value. Hence, the many-party correlation of a quantum state is quantified by the divergence from a family of Gibbs state. The many-party correlation in Equation (11) has been implemented in algorithms [15] proving to be related to the entanglement of quantum systems as defined in [16]. To consider the canonical divergence in Equation (7) as an efficient tool for extending the general theory leading to Equation (2), we have considered D on the manifold of positive density operators with the quantum analog of the Fisher metric and (m), (e) connections induced by the Bogoliubov inner product. We have finally proved that the canonical divergence coincides with the quantum relative entropy.