Canonical Divergence for Measuring Classical and Quantum Complexity

Domenico Felice; Stefano Mancini; Nihat Ay

doi:10.3390/e21040435

,

and

¹

Max Planck Institute for Mathematics in the Sciences, Inselstrasse 22, 04103 Leipzig, Germany

²

School of Science and Technology, University of Camerino, I-62032 Camerino, Italy

³

INFN-Sezione di Perugia, Via A. Pascoli, I-06123 Perugia, Italy

⁴

Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM 87501, USA

Entropy2019, 21(4), 435;https://doi.org/10.3390/e21040435

This article belongs to the Special Issue Quantum Entropies and Complexity

Version Notes

Order Reprints

Abstract

A new canonical divergence is put forward for generalizing an information-geometric measure of complexity for both classical and quantum systems. On the simplex of probability measures, it is proved that the new divergence coincides with the Kullback–Leibler divergence, which is used to quantify how much a probability measure deviates from the non-interacting states that are modeled by exponential families of probabilities. On the space of positive density operators, we prove that the same divergence reduces to the quantum relative entropy, which quantifies many-party correlations of a quantum state from a Gibbs family.

Keywords:

riemannian geometries; differential geometry; quantum information

1. Introduction

The many fields of applicability of methods of information geometry to the science of complexity encompass both classical and quantum systems [1]. Among them, an information-geometric approach to the complexity as the extent to which an object, as a whole, is more than its parts was established in [2] and then developed to relate various known measures of complexity to a general class of information-geometric complexity measures (see [3] for a comprehensive overview on this topic). The general idea for quantifying the extent to which the system is more than the sum of its parts is the following. Let

S

be a set of systems; for any system

S \in S

, we assign the collection of system parts which may be an element of a set

S_{0}

that formally differs from

S

. The corresponding assignment

Π : S \to S_{0}

can be interpreted as a reduced description of the system

S

in terms of its parts. Having the parts

Π (S)

, we have to reconstruct

S

by taking the sum of the parts in order to obtain a system that can be compared with the original system. The corresponding construction map is denoted by

Π : S_{0} \to S

. The composition

P (S) : = (Π \circ Π) (S)

then corresponds to the sum of parts of the system

S

, and we can compare

S

with

P (S)

. It turns out that

P

, under natural conditions, is the projection

P : S \to N

to the set of non-complex systems

N : = {S \in S | P (S) = S}

[4]. Therefore, the quantification of how much the system

S

differs from

P (S)

is established by a divergence function

D : S \times S \to R

such that

D (S, S^{'}) \geq 0, D (S, S^{'}) = 0 iff S = S^{'} .

(1)

Finally, the complexity of a system

S

is defined by

C (S) : = D (S, P (S)) .

(2)

Clearly, there are many choices for the divergence

D

, thus such a complexity measure is far from being unique. However, to ensure compatibility with

P

, one has to further assume that

D

satisfies

C (S) = D (S, P (S)) = inf_{S^{'} \in N} D (S, S^{'}) .

(3)

Here comes the role of a canonical divergence for providing an information-geometric measure of complexity which can be interpreted as unique.

In the framework of information geometry, a dual structure

(g, \nabla, \nabla^{*})

on a smooth manifold

M

is given in terms of a metric tensor and two affine connections, which are dual in the following sense [5]:

X g (Y, Z) = g (\nabla_{X} Y, Z) + g (Y, \nabla_{X}^{*} Z), \forall X, Y, Z \in T (M),

where

T (M)

denotes the space of sections on

M

. Eguchi named a function

D : M \times M \to R

satisfying the property in Equation (1) as a contrast (or divergence) function whenever

D

allows recovering the dual structure

(g, \nabla, \nabla^{*})

on

M

in the following way [6]:

\begin{matrix} g_{i j} (p) = - {\partial_{i} \partial_{j}^{'} D (ξ_{p}, ξ_{q})|}_{p = q} = {\partial_{i}^{'} \partial_{j}^{'} D (ξ_{p}, ξ_{q})|}_{p = q} \end{matrix}

(4)

\begin{matrix} Γ_{i j k} (p) = - {\partial_{i} \partial_{j} \partial_{k}^{'} D (ξ_{p}, ξ_{q})|}_{p = q}, Γ_{i j k}^{*} (p) = - {\partial_{i}^{'} \partial_{j}^{'} \partial_{k} D (ξ_{p}, ξ_{q})|}_{p = q}, \end{matrix}

(5)

where

\partial_{i} = \frac{\partial}{\partial ξ_{p}^{i}} and \partial_{i}^{'} = \frac{\partial}{\partial ξ_{q}^{i}}

and

{ξ_{p} : = (ξ_{p}^{1}, \dots, ξ_{p}^{n})}

and

{ξ_{q} : = (ξ_{q}^{1}, \dots, ξ_{q}^{n})}

are local coordinate systems of

p

and

q

, respectively. Here,

Γ_{i j k} = g (\nabla_{\partial_{i}} \partial_{j}, \partial_{k})

and

Γ_{i j k}^{*} = g (\nabla_{\partial_{i}}^{*} \partial_{j}, \partial_{k})

are the connection symbols of

\nabla

and

\nabla^{*}

, respectively. The investigation on a divergence function allowing to recover the dualistic structure on a smooth manifold is usually referred to as the inverse problem in information geometry. Matumoto [7] showed that such divergence exists for any statistical manifold. However, it is not unique and there are infinitely many divergences that give the same dual structure. Hence, the search for a divergence that can be somehow considered as the most natural is of upmost importance. When a manifold is dually flat, Amari and Nagaoka [5] introduced a Bregman type divergence to this end, with relevant properties concerning the generalized Pythagorean theorem and the geodesic projection theorem. This is referred to as canonical divergence and it is commonly assessed as the natural solution of the inverse problem in information geometry for dually flat manifolds. However, the need for a general canonical divergence, which applies to any dualistic structure, is a very crucial issue, as pointed out in [8]. In any case, such a divergence should recover the canonical divergence of Bregman type if applied to a dually flat structure. In addition, in the self-dual case where

\nabla = \nabla^{*}

coincides with the Levi–Civita connection of

g

, the divergence

D

should be one half of the squared Riemannian distance:

D (p, q) = \frac{1}{2} d {(p, q)}^{2}

[3]. In the context of the information-geometric approach to complexity, a further requirement is needed to ensure the compatibility in Equation (3). This is the geodesic projection property, which, in the present context, states that every minimizer

P (S)

of

D

is achieved by the geodesic projection of

S

onto the set of non-complex systems. In [9], Ay and Amari recently introduced a canonical divergence that satisfies all these requirements. Such a divergence is defined in terms of geodesic integration of the inverse exponential map. More precisely, given

p, q \in M

and the

\nabla

-geodesic

\tilde{σ} (t) (0 \leq t \leq 1)

connecting

q

with

p

, the canonical divergence introduced in [9] is given by

D (p, q) : = \int_{0}^{1} {⟨X_{t} (p), \dot{\tilde{σ}} (t)⟩}_{\tilde{σ} (t)} d t, X_{t} (p) : = \exp_{\tilde{σ} (t)}^{- 1} (p) .

(6)

Here,

\exp : T M \to M

denotes the exponential map of

\nabla

, which is defined by

\exp (X) = σ_{X} (1)

whenever the

\nabla

-geodesic

σ_{X} (t)

, satisfying

{\dot{σ}}_{X} (0) = X

, exists on an interval of

t

containing

[0, 1]

. Therefore, if

σ (t) (0 \leq t \leq 1)

is the

\nabla

-geodesic such that

σ (0) = p

and

σ (1) = q

, then

\exp_{p}^{- 1} (q) : = \dot{σ} (0)

. According to this definition, we have that

X_{t} (p) = P_{σ (t)} X_{p} (σ (t)) = t \dot{σ} (t)

, where

P

is the

\nabla

-parallel transport from

p

to

σ (t)

. This implies that the divergence

D (p, q)

assumes the following useful expression:

D (p, q) = \int_{0}^{1} t {∥ \dot{σ} (t) ∥}^{2} d t .

(7)

Analogously, the dual function of

D (p, q)

is defined as the

\nabla^{*}

-geodesic integration of the inverse of the

\nabla^{*}

-exponential map [9]. Therefore, we have for the dual divergence

D^{*}

a similar expression as Equation (7) for the canonical divergence

D

:

D^{*} (p, q) = \int_{0}^{1} t {∥ {\dot{σ}}^{*} (t) ∥}^{2} d t,

(8)

where

σ^{*} (t) (0 \leq t \leq 1)

is the

\nabla^{*}

-geodesic connecting

p

with

q

. Therefore, the compatibility in Equation (3) of

D

with

P

suggests that the projection

P (S)

of a system

S

onto the space of non-complex systems can be achieved along the geodesic connection

S

with

P (S)

. Actually, it has recently been proved that the

\nabla

-geodesic minimizes the action integral of a suitably chosen kinetic energy [10]. An analogous result holds about the

\nabla^{*}

-geodesic. In this way, both divergences,

D (p, q)

and

D^{*} (p, q)

, turn out to solve the Hamilton–Jacobi problem in information geometry, as put forward in [11].

The search for a general canonical divergence is still an open problem and it turns out to be of upmost importance in the context of the information-geometric approach to complexity (see progresses along this avenue put forward in [9,12]).

In this article, we aim to propose the canonical divergence in Equation (7) as an efficient tool for providing a unified definition of complexity measures. For this reason, we firstly consider

D

on the simplex of probability distributions where a measure of complexity as one instance of Equation (2) is supplied in terms of the Kullback–Leibler (KL)-divergence [4].

The general methods described for defining the complexity measure in Equation (2) can be particularized to the systems consisting of a finite node set

V

and each node

v \in V

can be in finitely many states

I_{v}

. Then, we model the whole system as a probability measure

p

on the corresponding product configuration set

I_{V} = \prod_{v \in V} I_{v}

. The parts are given by marginals

p_{A}

where

A

is taken from a set of subsets of

V

, denoted by

S

. Therefore, the decomposition map

Π

reads in this case as

Π (p) = {(p_{A})}_{A \in S}

, whereas the reconstruction map

Π

is defined by the maximum entropy estimate

\hat{p}

of

p

, leading to the projection

π_{S} : p \mapsto \hat{p}

. The image of

π_{S}

turns out to be the closure of an exponential family

E_{S}

, which plays the role of the set

N

of non-complex systems. A deviation measure, which is compatible with the maximum entropy projection

π_{S}

is then the (KL)-divergence, which is defined by

KL (p, q) : = \sum_{i = 1}^{n + 1} p_{i} \log (\frac{p_{i}}{q_{i}})

(9)

on the

n

-simplex

P_{n} = {p = (p_{1}, \dots, p_{n}) | p_{i} > 0, \sum_{i} p_{i} = 1}

[6]. Finally, the measure of complexity as one instance of Equation (2) is obtained by

KL (p, E_{S}) : = inf_{q \in E_{S}} KL (p, q) = KL (p, \hat{p}) .

(10)

We may notice that, if

S

consists of all subsets of

V

of cardinality

1

, elements of the set

E_{S}

of non-complex systems are totally uncorrelated in the sense that

q \in E_{S}

has the product form

q = q_{1} \otimes \dots \otimes q_{n}

[2]. Consider random variables

X_{1}, \dots, X_{n}

with joint probability distribution

p

and marginal probability distributions

p_{1}, \dots, p_{n}

. Then, we have

KL (p, E_{S}) = KL (p, p_{1} \otimes \dots \otimes p_{n}) = \sum_{i} H (X_{i}) - H (X_{1}, \dots, X_{n}),

where

H

is the Shannon entropy. This quantity is referred to as multi information and denoted by

I (X_{1}, \dots, X_{n})

. In particular, when

n = 2

, this is nothing but the mutual information. Very remarkably, the minimizer

\hat{p}

in the closure of

E_{S}

of the (KL)-divergence, namely

KL (p, \hat{p}) = inf_{q \in E_{S}} KL (p, q)

, is obtained by projecting

p

onto the closure of

E_{S}

along a mixture

(m)

-geodesic [13]. This is usually referred to as the geodesic projection property of the (KL)-divergence. The geometric structure given by the Fisher metric, the mixture

(m)

and exponential

(e)

affine connections was introduced by Amari and Nagaoka on the space of probability densities for studying statistical estimation problems [5].

In this article, we then consider both divergences,

D

and

D^{*}

, on

P_{n}

with the endowed dualistic structure given by the classic Fisher metric and the mixture

(m)

and the exponential

(e)

connections. Here, we show that

D (q, p) = KL (q, p) = D^{*} (p, q)

. Actually, this result has already been shown in [9]. However, we prove it differently by relying on the nice representations of

D

and

D^{*}

given by Equations (7) and (8), respectively. This proves that

D

can be interpreted as a generalization of the (KL)-divergence.

A further step for proving the effectiveness of

D

is to consider it (and its dual function) on the manifold of quantum states where the general idea for defining a complexity measure of a classic system expressed by Equation (2) has been extended to the quantum setting in terms of the quantum relative entropy [14]. More precisely, by considering a composite set of

n \in N

units (or parties, or particles),

[n] : = {1, \dots, n}

, the composite system is described by the product algebra

A_{[n]} : = A_{1} \otimes \dots \otimes A_{n}

. Here,

A_{i} \subset M_{n_{i}}

is the

C^{*}

-subalgebra of complex

n_{i} \times n_{i}

matrices such that the identity

I_{n_{i}} \in A_{i}

. The many-party correlations are quantified in the state of a composite quantum system which can not be observed in subsystems composed of less than a given number of parties. In this context, the exponential families, which amount to the non-complex system in the classical case, are replaced by states that are fully described by their restriction to selected subsystems. These correspond to the family of Gibbs states

E_{k} : = {e^{H_{k}} / Tr e^{H_{k}}}

of the

k

-local Hamiltonians

H_{k}

. Here, a

k

-local Hamiltonian is defined as a sum of product terms

a_{1} \otimes \dots \otimes a_{n}

with at most

k

-non-scalar factors

a_{i}

, where

a_{i}

denotes a real self-adjoint operator. Therefore, the many-party correlations of a composite quantum state

ρ \in A_{[n]}

which captures all correlations in

ρ

that cannot be observed in any

k

-party subsystem is the divergence

Q (ρ, E_{k}) : = inf_{σ \in E_{k}} Q (ρ, σ)

(11)

from the Gibbs family

E_{k}

[14]. Here, the divergence

Q (ρ, σ)

is the quantum relative entropy defined by

Q (ρ, σ) = Tr ρ (\log ρ - \log σ),

(12)

where

Tr

denotes the trace operator on the finite-dimensional Hilbert space of density matrices. Similar to the classical case, we can consider the family

E_{1}

of Gibbs states whose closure corresponds to the set of product states

σ_{1} \otimes \dots \otimes σ_{n}

. Consider then a composite quantum state

ρ \in A_{[n]}

such that

Tr (σ_{i} a) = Tr (ρ (a \otimes I_{[n] \ {i}})),

where

a \in A_{{i}} = A_{i}

and

I_{[n] \ {i}}

is the identity operator on the product

A_{1} \otimes \dots {\hat{A}}_{i} \dots \otimes A_{n}

where

A_{i}

is missing. In this case, the many-party correlations of

ρ

is the quantum multi information:

Q (ρ, E_{1}) = \sum_{i} \tilde{H} (σ_{i}) - \tilde{H} (ρ),

where

\tilde{H} (ρ) = - Tr (ρ \log ρ)

is the von Neumann entropy of

ρ

. In particular, when

n = 2

, this corresponds to the quantum mutual information. Algorithms for the evaluation of

Q (ρ, E_{k})

as a complexity measure for quantum states are studied in [15]. In that context, the many-party correlations is related to the entanglement of quantum systems as defined in [16].

The scope of the present article is mainly to present the canonical divergence

D

defined in Equation (7) as an important tool for generalizing the concept of complexity measure claimed by Equation (10) for classical systems as well as the concept of many-party correlation given by Equation (11) for quantum systems. To this end, we consider the space of density matrices endowed with the quantum analog of the Fisher metric and the mixture

(m)

and exponential

(e)

affine connections. This structure turns out to be induced on the manifold of positive density operators by the Bogoliubov inner product [17]. In this setting, we prove that the divergence introduced in [9] reduces to the quantum relative entropy. In addition, we also show that

D (σ, ρ) = Q (σ, ρ) = D^{*} (ρ, σ)

.

The layout of the paper is as follows. Section 2 is devoted to the calculation of the canonical divergence and its dual function on the simplex of probability distributions. In Section 3, we describe the differential geometrical framework for finite quantum systems induced by the Bogoliubov inner product. In this particular framework, we then prove that the divergence given by Equation (7) reduces to the quantum relative entropy. Finally, we draw some conclusions in Section 4 by outlining the results obtained in this work and discussing possible extensions.

2. Canonical Divergence on the Simplex of Probability Measures

A dualistic structure on the simplex of probability measures was introduced by Amari in terms of the Fisher metric, the mixture

(m)

and exponential

(e)

connections [18]. Given a finite set

I = {1, \dots, n}

, we can represent probability measures on the set

I

as elements of

R^{n}

. In this representation, the Dirac measures

δ^{i}, i = 1, \dots, n

form the canonical basis of

R^{n}

. Then, the

(n - 1)

-dimensional simplex of probability measure is given by

S_{n} : = \{p = \sum_{i} p_{i} δ^{i} \in R^{n} | p_{i} > 0 for all i, and \sum_{i} p_{i} = 1\} .

(13)

In this section, we show that the canonical divergence

D (p, q)

coincides with the Kullback–Leibler divergence whenever

p, q \in S_{n}

. In addition, we prove that, for the dual canonical divergence, the following relation

D^{*} (p, q) = KL (q, p)

holds true. According to Equations (7) and (8), we need the Fisher metric defined on the tangent bundle

T S_{n}

, the mixture

(m)

-geodesic and the exponential

(e)

-geodesic both connecting

p

with

q

. On the tangent space

T_{p} S_{n}

, the Fisher metric results in

g_{p} (X, Y) : = \sum_{i} \frac{1}{p_{i}} X^{i} Y^{i}, X, Y \in T_{p} S_{n} .

(14)

The dualistic structure

(g, \nabla, \nabla^{*})

on

S_{n}

, given by the Fisher metric, the

(m)

-connection

\nabla

and the

(e)

-connection

\nabla^{*}

, is dually flat, and the

(m)

- and

(e)

-geodesics connecting

p

with

q

are [3]:

\begin{matrix} γ_{m} (t) = p + t (q - p), t \in [0, 1] \end{matrix}

(15)

\begin{matrix} γ_{e} (t) = \sum_{i} \frac{p_{i} {(\frac{q_{i}}{p_{i}})}^{t}}{\sum_{j} p_{j} {(\frac{q_{j}}{p_{j}})}^{t}} δ^{i}, t \in [0, 1] . \end{matrix}

(16)

We are now ready to compute the canonical divergence

D (p, q)

for arbitrary

p, q \in S_{n}

. From Equations (7), (14) and (15), we have that

\begin{matrix} D (p, q) & = & \int_{0}^{1} t {∥ {\dot{γ}}_{m} (t) ∥}_{γ_{m} (t)}^{2} d t \\ = & \sum_{i} \int_{0}^{1} t \frac{1}{p_{i} + t (q_{i} - p_{i})} {(q_{i} - p_{i})}^{2} d t \\ = & \sum_{i} (q_{i} - p_{i} + p_{i} \log \frac{p_{i}}{q_{i}}) \\ = & \sum_{i} p_{i} \log \frac{p_{i}}{q_{i}} = KL (p, q), \end{matrix}

(17)

where we use

\sum_{i} (q_{i} - p_{i}) = 0

because

p, q \in S_{n}

. Analogously, we can compute the dual canonical divergence

D^{*} (p, q)

by means of Equation (8). Therefore, by using Equations (14) and (16), we obtain that

\begin{matrix} D^{*} (p, q) & = & \int_{0}^{1} t {∥ {\dot{γ}}_{e} (t) ∥}_{γ_{e} (t)}^{2} d t \\ = & \sum_{i} \int_{0}^{1} t {\dot{γ}}_{e}^{i} (t) \frac{{\dot{γ}}_{e}^{i} (t)}{γ_{e}^{i} (t)} d t . \end{matrix}

(18)

To develop further the calculation, let us analyze the derivative

{\dot{γ}}_{e}^{i} (t)

. Recall that

γ_{e}^{i} (t) = \frac{p_{i} {(\frac{q_{i}}{p_{i}})}^{t}}{\sum_{j} p_{j} {(\frac{q_{j}}{p_{j}})}^{t}} .

Therefore, by taking the derivative of

γ_{e}^{i} (t)

with respect to

t

, we obtain

\begin{matrix} {\dot{γ}}_{e}^{i} (t) & = & \frac{p_{i} {(\frac{q_{i}}{p_{i}})}^{t} \log \frac{q_{i}}{p_{i}}}{\sum_{j} p_{j} {(\frac{q_{j}}{p_{j}})}^{t}} - p_{i} {(\frac{q_{i}}{p_{i}})}^{t} \frac{\sum_{j} p_{j} {(\frac{q_{j}}{p_{j}})}^{t} \log \frac{q_{j}}{p_{j}}}{{(\sum_{j} p_{j} {(\frac{q_{j}}{p_{j}})}^{t})}^{2}} \\ = & γ_{e}^{i} (t) (\log \frac{q_{i}}{p_{i}} - \frac{\sum_{j} p_{j} {(\frac{q_{j}}{p_{j}})}^{t} \log \frac{q_{j}}{p_{j}}}{\sum_{j} p_{j} {(\frac{q_{j}}{p_{j}})}^{t}}) \\ = & γ_{e}^{i} (t) (\log \frac{q_{i}}{p_{i}} - \frac{d}{d t} \log \sum_{j} p_{j} {(\frac{q_{j}}{p_{j}})}^{t}) . \end{matrix}

By stepping back to Equation (18), we start by performing an integration by parts:

\begin{matrix} D^{*} (p, q) & = & \sum_{i} ({[γ_{e}^{i} (t) (t \frac{{\dot{γ}}_{e}^{i} (t)}{γ_{e}^{i} (t)})]}_{0}^{1} - \int_{0}^{1} γ_{e}^{i} (t) \frac{{\dot{γ}}_{e}^{i} (t)}{γ_{e}^{i} (t)} d t + \int_{0}^{1} t \frac{d^{2}}{d t^{2}} \log \sum_{j} p_{j} {(\frac{q_{j}}{p_{j}})}^{t} d t), \end{matrix}

(19)

where the last term is obtained by noticing that

\frac{{\dot{γ}}_{e}^{i} (t)}{γ_{e}^{i} (t)} = (\log \frac{q_{i}}{p_{i}} - \frac{d}{d t} \log \sum_{j} p_{j} {(\frac{q_{j}}{p_{j}})}^{t}) .

Since we know that

\frac{d}{d t} \log \sum_{j} p_{j} {(\frac{q_{j}}{p_{j}})}^{t} = \frac{\sum_{j} p_{j} {(\frac{q_{j}}{p_{j}})}^{t} \log \frac{q_{j}}{p_{j}}}{\sum_{j} p_{j} {(\frac{q_{j}}{p_{j}})}^{t}},

we can observe that

\frac{{\dot{γ}}_{e}^{i} (1)}{γ_{e}^{i} (1)} = (\log \frac{q_{i}}{p_{i}} - \sum_{j} q_{j} \log \frac{q_{j}}{p_{j}})

. Hence, we obtain from Equation (19)

\begin{matrix} D^{*} (p, q) & = & \sum_{i} (q_{i} (\log \frac{q_{i}}{p_{i}} - \sum_{j} q_{j} \log \frac{q_{j}}{p_{j}}) - {[γ_{e}^{i} (t)]}_{0}^{1} + {[t \frac{d}{d t} \log \sum_{j} p_{j} {(\frac{q_{j}}{p_{j}})}^{t}]}_{0}^{1} \\ - \int_{0}^{1} \frac{d}{d t} \log \sum_{j} p_{j} {(\frac{q_{j}}{p_{j}})}^{t} d t) \\ = & \sum_{i} (q_{i} \log \frac{q_{i}}{p_{i}} - q_{i} \sum_{j} q_{j} \log \frac{q_{j}}{p_{j}} - q_{i} + p_{i} + \sum_{j} q_{j} \log \frac{q_{j}}{p_{j}} - {[\log \sum_{j} p_{j} {(\frac{q_{j}}{p_{j}})}^{t}]}_{0}^{1}) \\ = & \sum_{i} q_{i} \log \frac{q_{i}}{p_{i}} - \sum_{i} q_{i} \sum_{j} q_{j} \log \frac{q_{j}}{p_{j}} - \sum_{i} q_{i} + \sum_{i} p_{i} + \sum_{j} q_{j} \log \frac{q_{j}}{p_{j}} - \log \sum_{j} q_{j} + \log \sum_{j} p_{j} \\ = & \sum_{i} q_{i} \log \frac{q_{i}}{p_{i}}, \end{matrix}

because

p, q \in S_{n}

. This proves that

D^{*} (p, q) = KL (q, p) = D (q, p) .

3. Geometric Structure of a Manifold of Quantum States

We start this section by showing that natural analogs of the Fisher metric and the exponential and mixture connections are defined on a manifold of quantum states [17]. To this end, we need to specify an inner product on the space of density operators. Since the divergence

D

of Equation (7) is defined on a statistical manifold

(M, g, \nabla, \nabla^{*})

with symmetric connections, we choose the Bogoliubov inner product. This is because of a well-known result that claims the

(e)

-connection induced by a generalized covariance is symmetric if and only if such a covariance is the Bogoliubov inner product [5]. At the end of this section, we motivate this choice in more detail.

Let

H

be a finite-dimensional Hilbert space,

A = {A | A = A^{*}}

be the space of all the Hermitian operators on

H

and

S = {ρ | ρ = ρ^{*} > 0, Tr ρ = 1}

be the space of positive density operators on

H

. Since

S

is an open subset of

A_{1} : = {A | A = A^{*}, Tr A = 1}

, then it can be naturally seen as a smooth manifold of dimension

n = {(\dim H)}^{2} - 1

[17]. Let

D \in T_{ρ} S

be a tangent vector at

ρ

to

S

; we call

D^{(m)} \in A_{0} : = {A | A \in A, Tr A = 0}

its

(m)

-representation and symbolically write

D^{(m)} = D ρ .

(20)

It is worth noticing that, as an element of the tangent space,

D

can be naturally interpreted as a derivative. As an example, when a coordinate system

{θ^{i}}

is given on

S

so that each state is parameterized as

ρ \equiv ρ_{θ}

, the

(m)

-representation of the natural basis vector is written as

{(\partial_{i})}^{(m)} = \partial_{i} ρ_{θ}

, where

D = \partial_{i} = \partial / \partial θ^{i}

. This allows us to introduce the

(m)

-connection on the manifold

S

of the quantum states in terms of the covariant derivative

\nabla^{(m)} : T (S) \times T (S) \to T (S)

, which is defined by the following relation:

{(\nabla_{X}^{(m)} Y)}^{(m)} = X (Y^{(m)}), \forall X, Y \in T (S),

(21)

where the right hand side means the derivative by

X

of

Y^{(m)} : S \to A_{0}

and

T (S)

denotes the space of sections on

S

.

To introduce the

(e)

-connection on

S

, we need to specify a family

{{⟨ \cdot, \cdot ⟩}_{ρ} | ρ \in S}

of inner products on

A

usually named as generalized covariance. For the reason mentioned above, we consider the Bogoliubov inner product, which is given by

{⟨A, B⟩}_{ρ} : = \int_{0}^{1} Tr (ρ^{λ} A ρ^{1 - λ} B) d λ, \forall A, B \in A .

(22)

Given

D \in T_{ρ} S

, we then define the

(e)

-representation of

D

as the Hermitian operator

D^{(e)} \in A

satisfying the following relation:

Tr (D^{(m)} A) = : {⟨D^{(e)}, A⟩}_{ρ} = \int_{0}^{1} Tr (ρ^{λ} D^{(e)} ρ^{1 - λ} A) d λ, \forall A \in A .

(23)

For all

A \in A

, we assume

{⟨ A, I ⟩}_{ρ} = {⟨ A ⟩}_{ρ} = Tr (ρ A)

(

I

denotes the identity operator). Thus, we can see that the derivative of the function

⟨ A ⟩ : ρ \to {⟨ A ⟩}_{ρ}

by

D

is written as

D ⟨ A ⟩ = Tr (D^{(m)} A) = {⟨ D^{(e)}, A ⟩}_{ρ} .

This implies that we can consider the

(e)

-representation

D^{(e)} \in A

of a given

D \in T_{ρ} S

as

D ρ = \int_{0}^{1} ρ^{λ} D^{(e)} ρ^{1 - λ} d λ .

(24)

Therefore, it turns out that

D^{(e)}

is the derivative of the map

ρ \mapsto \log ρ

from

S

to

A

, which may be written as follows:

D^{(e)} = D \log ρ .

(25)

By considering

{⟨D^{(e)}, I⟩}_{ρ} = {⟨D^{(e)}⟩}_{ρ} = Tr (ρ D^{(e)})

, we can immediately observe that

{⟨D^{(e)}⟩}_{ρ} = D {⟨I⟩}_{ρ} = 0 .

This proves that, although the

(e)

-representation depends on the choice of the generalized covariance, the space

T_{ρ}^{(e)} S : = {D^{(e)} | D \in T_{ρ} S}

can be simply written as follows

T_{ρ}^{(e)} S = {A | A \in A, {⟨A⟩}_{ρ} = Tr (ρ A) = 0} .

(26)

This fact supplies the manifold

S

of quantum states with the

(e)

-connection. To see this, let us consider the linear isomorphism

D \mapsto D^{'}

from

T_{ρ} S

to

T_{ρ^{'}} S

defined by

D^{' (e)} = D^{(e)} - {⟨D^{(e)}⟩}_{ρ^{'}}

. By writing this correspondence as

D^{'} = {[D]}_{ρ^{'}}

,

D = {[D^{'}]}_{ρ}

, the

(e)

-connection

\nabla^{(e)}

is then defined by

{(\nabla_{X}^{(e)} Y)}_{ρ} = X_{ρ} {[Y]}_{ρ}, \forall ρ \in S, \forall X, Y \in T (S),

(27)

where the right hand side means the derivative by

X_{ρ}

of

{[Y]}_{ρ} : S \to T_{ρ} S

.

Finally, we define the inner product

g_{ρ}

on

T_{ρ} S

by

g_{ρ} (X, Y) : = {⟨X^{(e)}, Y^{(e)}⟩}_{ρ} = Tr (X^{(m)} Y^{(e)}),

(28)

which is usually called the quantum Fisher metric. The procedure thus far described endows the manifold

S

of quantum states with a geometric structure

(g, \nabla^{(e)}, \nabla^{(m)})

given by the quantum Fisher metric, and two torsion-free connections, namely the

(e)

-connection

\nabla^{(e)}

and the

(m)

-connection

\nabla^{(m)}

, which are dual with respect to

g

in the following sense:

X g (Y, Z) = g (\nabla_{X}^{(m)} Y, Z) + g (Y, \nabla_{X}^{(e)} Z), \forall X, Y, Z \in T (S) .

(29)

In addition, the dual structure

(g, \nabla^{(m)}, \nabla^{(e)})

is dually flat, meaning that the curvature tensors of

\nabla^{(e)}

and

\nabla^{(m)}

are both null.

Suppose that a coordinate system

{ξ_{i}}

is given and that each element

ρ \in S

is specified by the coordinate

ξ \in R^{n}

as

ρ \equiv ρ_{ξ}

. According to Equation (20), we have that the mixture representation

\partial_{i}^{(m)}

of

\partial_{i} = \partial / \partial ξ^{i}

is given by

\partial_{i}^{(m)} ρ = \partial_{i} ρ_{ξ}

, whereas, by Equation (23), we have that the exponential representation

\partial_{i}^{(e)}

of

\partial_{i}

is written as

\partial_{i}^{(e)} ρ = \partial_{i} \log ρ_{ξ}

. Therefore, the dual structure

(g, \nabla^{(e)}, \nabla^{(m)})

with respect to an arbitrary coordinate system

{ξ^{i}}

reads as follows

\begin{matrix} g_{i j} = Tr (\partial_{i} ρ_{ξ} \partial_{j} \log ρ_{ξ}) \end{matrix}

(30)

\begin{matrix} Γ_{i j k}^{(e)} = Tr (\partial_{i} \partial_{j} \log ρ_{ξ} \partial_{k} ρ_{ξ}), Γ_{i j k}^{(m)} = Tr (\partial_{i} \partial_{j} ρ_{ξ} \partial_{k} \log ρ_{ξ}) . \end{matrix}

(31)

A generalized covariance is a family

{{⟨ \cdot, \cdot ⟩}_{ρ} | ρ \in S}

of inner products on the space of Hermitian operators

A

on the Hilbert space

H

, where

{⟨ A, B ⟩}_{ρ}

depends smoothly on

ρ

for all

A, B \in A

and that satisfies the following properties:

For every $U$ unitary matrix on the Hilbert space $H$ , it is

${⟨ U A U^{*}, U B U^{*} ⟩}_{U ρ U^{*}} = {⟨ A, B ⟩}_{ρ}, \forall A, B \in A, ρ \in S .$
If the Lie bracket $[ρ, A] = 0$ , then

${⟨ A, B ⟩}_{ρ} = Tr (ρ A B) .$

This can be viewed as a quantum version of the

L^{2}

-product

{⟨ A, B ⟩}_{p} = E_{p} [A, B]

of random variables

A

and

B

with respect to a probability measure

p

. Since

E_{p} [A, B]

is the covariance of

A

and

B

when their expectations vanish, we can call the family

{{⟨ \cdot, \cdot ⟩}_{ρ} | ρ \in S}

satisfying the above conditions a generalized covariance.

According to the theory by Eguchi, a divergence function

D : M \times M \to R^{*}

induces a dual structure

(g, \nabla, \nabla^{*})

on

M

in the way expressed by Equations (4) and (5). It turns out that the connections

\nabla

and

\nabla^{*}

obtained in such a way are torsion-free (or symmetric) [13]. To use the canonical divergence in Equation (7) in the quantum setting, we are then forced to select the Bogoliubov inner product for providing the quantum analog of the Fisher metric, the

(m)

-connection and

(e)

-connection on the manifold of positive density operators. Indeed, while the

(m)

-connection is always torsion-free, it turns out that the

(e)

-connection induced on

S

from a generalized covariance is symmetric if and only if such a covariance is the Bogoliubov inner product.

Canonical Divergence on the Manifold of Quantum States

In this section we show that the divergence function of Equation (7) reduces to the quantum relative entropy whenever the dual structure

(g, \nabla^{(m)}, \nabla^{(e)})

on

S

is given by the Fisher metric (Equation (28)), the mixture connection (Equation (21)) and the exponential connection (Equation (27)).

Let

ρ_{1}, ρ_{2} \in S

be two density matrices. To implement the computation of the divergence

D (ρ_{1}, ρ_{2})

for quantum states, we consider the

(m)

-geodesic

γ_{m} (t) = (1 - t) ρ_{1} + t ρ_{2}

[19]. Then, the

(m)

and

(e)

representations of the tangent vector

{\dot{γ}}_{m} (t)

are easily computed by means of Equations (20) and (25), respectively:

{\dot{γ}}_{m}^{(m)} (t) = {\dot{γ}}_{m} (t) = ρ_{2} - ρ_{1}, {\dot{γ}}_{m}^{(e)} (t) = \frac{d}{d t} \log γ_{m} (t) .

(32)

From Equations (7) and (28), we have then

D (ρ_{1}, ρ_{2}) = \int_{0}^{1} t Tr ({\dot{γ}}_{m} (t) \frac{d}{d t} \log γ_{m} (t)) d t .

(33)

Let us recall that

γ_{m} (t)

is a curve in the space of density matrices and the logarithm of a positive matrix is a well-defined matrix. Therefore, the derivative with respect to

t

of

\log γ_{m} (t)

is viewed as the matrix of the derivatives of the entries of

\log γ_{m} (t)

with respect to

t

. Moreover, the same holds for the integration of a matrix: this is the matrix of the integration of the entries. Finally, since the trace is a linear operator it commutes with the integration. Hence, with the abuse of notation where we keep

γ_{m}

instead of the entry

{(γ_{m})}_{i j}

, the computation in Equation (33) is performed as follows by integration by parts:

\begin{matrix} \int_{0}^{1} t {\dot{γ}}_{m} (t) \frac{d}{d t} \log γ_{m} (t) & = & {[t {\dot{γ}}_{m} (t) \log γ_{m} (t)]}_{0}^{1} - \int_{0}^{1} {\dot{γ}}_{m} (t) \log γ_{m} (t) d t \\ = & (ρ_{2} - ρ_{1}) \log ρ_{2} - \int_{ρ_{1}}^{ρ_{2}} \log γ_{m} (t) d γ_{m} (t) \\ = & (ρ_{2} - ρ_{1}) \log ρ_{2} - {[γ_{m} \log γ_{m}]}_{ρ_{1}}^{ρ_{2}} \\ = & ρ_{1} (\log ρ_{1} - \log ρ_{2}) . \end{matrix}

This proves that

D (ρ_{1}, ρ_{2}) = Tr (ρ_{1} (\log ρ_{1} - \log ρ_{2}))

, which is the quantum relative entropy given by Equation (12).

The dual divergence of

D (ρ_{1}, ρ_{2})

is computed by considering the

(e)

-geodesic connecting

ρ_{1}

and

ρ_{2}

. Let

ρ_{1} = e^{H}

, where

H

is a self-adjoint Hamiltonian. Then, the

(e)

-geodesic from

ρ_{1}

to

ρ_{2}

is given by

γ_{e} (t) = \frac{e^{H + t A}}{Tr e^{H + t A}}, (t \in [0, 1]),

(34)

where

A = \log ρ_{2} - \log ρ_{1}

and

e^{H + t A}

denotes the exponential matrix [19]. Since the trace operator is linear in its argument, it commutes with the derivative operator. Therefore, according to Equations (20) and (25), we obtain that the

(m)

and

(e)

representations of

{\dot{γ}}_{e} (t)

are given by

\begin{matrix} {\dot{γ}}_{e}^{(m)} = {\dot{γ}}_{e} (t) = \frac{A e^{H + t A}}{Tr e^{H + t A}} - \frac{e^{H + t A} Tr A e^{H + t A}}{{(Tr e^{H + t A})}^{2}} \end{matrix}

(35)

\begin{matrix} {\dot{γ}}_{e}^{(e)} = \frac{d}{d t} \log γ_{e} (t) = A - \frac{Tr A e^{H + t A}}{Tr e^{H + t A}} . \end{matrix}

(36)

The dual divergence of

D (ρ_{1}, ρ_{2})

is written as follows:

D^{*} (ρ_{1}, ρ_{2}) = \int_{0}^{1} t Tr ({\dot{γ}}_{e}^{(m)} {\dot{γ}}_{e}^{(e)}) d t .

(37)

To perform the computation in Equation (37), we use the expressions of

{\dot{γ}}_{e}^{(m)}

and

{\dot{γ}}_{e}^{(e)}

given by Equations (35) and (36):

D^{*} (ρ_{1}, ρ_{2}) = \int_{0}^{1} t Tr (\frac{A^{2} e^{H + t A}}{Tr e^{H + t A}} - 2 \frac{A e^{H + t A} Tr A e^{H + t A}}{{(Tr e^{H + t A})}^{2}} + \frac{e^{H + t A} {(Tr A e^{H + t A})}^{2}}{{(Tr e^{H + t A})}^{3}}) .

At this point, we can use the linearity of the trace operator and then the latter expression reduces to:

D^{*} (ρ_{1}, ρ_{2}) = \int_{0}^{1} t (\frac{Tr A^{2} e^{H + t A}}{Tr e^{H + t A}} - \frac{{(Tr A e^{H + t A})}^{2}}{{(Tr e^{H + t A})}^{2}}) d t = \int_{0}^{1} t \frac{d}{d t} (\frac{Tr A e^{H + t A}}{Tr e^{H + t A}}) d t .

Carrying the integration by parts out, we obtain

\begin{matrix} D^{*} (ρ_{1}, ρ_{2}) & = & {[t \frac{Tr A e^{H + t A}}{Tr e^{H + t A}}]}_{0}^{1} - \int_{0}^{1} \frac{Tr A e^{H + t A}}{Tr e^{H + t A}} d t \\ = & \frac{Tr ρ_{2} (\log ρ_{2} - \log ρ_{1})}{Tr ρ_{2}} - {[\log Tr e^{H + t A}]}_{0}^{1} \\ = & Tr ρ_{2} (\log ρ_{2} - \log ρ_{1}) - \log Tr ρ_{1} ρ_{2} ρ_{1}^{- 1} + \log Tr ρ_{1} \\ = & Tr ρ_{2} (\log ρ_{2} - \log ρ_{1}), \end{matrix}

where we use

Tr ρ_{1} = Tr ρ_{2} = 1

. This proves that

D^{*} (ρ_{1}, ρ_{2}) = Tr ρ_{2} (\log ρ_{2} - \log ρ_{1}) = D (ρ_{2}, ρ_{1}) .

4. Conclusions

As we have demonstrated, for a geometric definition of a general complexity measure, it is important to have a canonical divergence. This paper is based on recent progresses in defining a general canonical divergence within Information Geometry [9,12]. This divergence is defined in terms of geodesic integration of the inverse exponential map and holds the geodesic projection property when the structure

(g, \nabla, \nabla^{*})

is dually flat [3]. Let

p \in M

and

\tilde{M} \subset M

be a submanifold of

M

, the search for

\hat{p} \in \tilde{M}

that minimizes the divergence

D (p, q), q \in \tilde{M}

, supplies the solution for defining an information-geometric complexity measure as in Equation (2). When every minimizer

\hat{p}

of the divergence

D

is given by the geodesic projection of

p

onto

\tilde{M}

, we say that

D

holds the geodesic projection property. In this regard, the canonical divergence in Equation (7) would provide a measure of complexity as Equation (2) for a quite wide range of systems. A further step for defining Equation (2) for general systems has been put forward in [12], where a new divergence is introduced that turns out to be a generalization of the canonical divergence in Equation (7). As an example of Equation (2), we have considered the measure of complexity given by Equation (10), which quantifies how much a probability measure on the product configuration set of the finitely many states on a discrete set

{1, \dots, n}

deviates from a family of exponential probabilities that amounts to the non-complex set of system states, as it is given by non-interacting states [2]. In this case, the Kullback–Leibler divergence turns out to be suitable for providing the measure of complexity in Equation (2) for classic states on discrete sets [4]. To put the theory of Ay [2] in perspective and propose the canonical divergence in Equation (7) as suitable for supplying the complexity in Equation (2) on general systems, we have then proved that

D

coincides with the (KL)-divergence on the simplex of probability measures endowed with the dual structure given by the Fisher metric and the mixture and exponential connections.

The quantum counterpart of the general theory yielding the measure of complexity in Equation (2) does not yet exist. However, a quantum analog of Equation (10) has been established on the manifold of positive density operators [14]. Here, the family of non-interacting states is replaced by states that are fully described by their restriction to selected subsystems that turn out to be a family of Gibbs states. Therefore, many-party correlations are quantified in the state of composite quantum system, which cannot be observed in subsystems composed of fewer than a given number of parties. The suitable tool for providing such a quantification is established by the quantum relative entropy. This is because the maximum-entropy principle solves the inverse problem to reconstruct a global state from subsystem states and it also gives a natural scale of many-party correlation in terms of the gap to the maximal entropy value. Hence, the many-party correlation of a quantum state is quantified by the divergence from a family of Gibbs state. The many-party correlation in Equation (11) has been implemented in algorithms [15] proving to be related to the entanglement of quantum systems as defined in [16]. To consider the canonical divergence in Equation (7) as an efficient tool for extending the general theory leading to Equation (2), we have considered

D

on the manifold of positive density operators with the quantum analog of the Fisher metric and

(m)

,

(e)

connections induced by the Bogoliubov inner product. We have finally proved that the canonical divergence coincides with the quantum relative entropy.

Author Contributions

The authors have equally contributed to the manuscript. They have all read and approved its final version.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflict of interest.

References

Felice, D.; Cafaro, C.; Mancini, S. Information geometric methods for complexity. Chaos 2018, 28, 032101. [Google Scholar] [CrossRef] [PubMed]
Ay, N. Information geometry on complexity and stochastic interaction. Entropy 2015, 17, 2432–2458. [Google Scholar] [CrossRef]
Ay, N.; Jost, J.; Van Le, H.; Schwachhöfer, L. Information Geometry, 1st ed.; Springer International Publishing: Cham, Switzerland, 2017. [Google Scholar]
Ay, N.; Olbrich, E.; Bertschinger, N.; Jost, J. A geometric approach to complexity. Chaos 2011, 21, 037103. [Google Scholar] [CrossRef] [PubMed]
Amari, S.-I.; Nagaoka, H. Methods of Information Geometry; Oxford University Press: Oxford, UK, 2000. [Google Scholar]
Eguchi, S. A differential geometric approach to statistical inference on the basis of contrast functions. Hiroshima Math. J. 1985, 15, 341–391. [Google Scholar] [CrossRef]
Matumoto, T. Any statistical manifold has a contrast function—On the C³-functions taking the minimum at the diagonal of the product manifold. Hiroshima Math. J. 1993, 23, 327–337. [Google Scholar] [CrossRef]
Ay, N.; Tuschmann, W. Duality versus dual flatness in quantum information geometry. J. Math. Phys. 2003, 44, 1512–1518. [Google Scholar] [CrossRef]
Ay, N.; Amari, S.-I. A Novel Approach to Canonical Divergences within Information Geometry. Entropy 2015, 17, 8111–8129. [Google Scholar] [CrossRef]
Felice, D.; Ay, N. Dynamical Systems induced by Canonical Divergence in dually flat manifolds. arXiv 2018, arXiv:1812.04461. [Google Scholar]
Ciaglia, F.; Di Cosmo, F.; Felice, D.; Mancini, S.; Marmo, G.; Pérez-Pardo, J.M. Hamilton-Jacobi approach to potential functions in information geometry. J. Math. Phys. 2017, 58, 063506. [Google Scholar] [CrossRef]
Felice, D.; Ay, N. Towards a canonical divergence within Information Geometry. arXiv 2018, arXiv:1806.11363. [Google Scholar]
Eguchi, S. Geometry of minimum contrast. Hiroshima Math. J. 1992, 22, 631–647. [Google Scholar] [CrossRef]
Weis, S.; Knauf, A.; Ay, N.; Zhao, M.J. Maximizing the divergence from a hierarchical model of quantum states. Open Syst. Inf. Dyn. 2015, 22, 1550006. [Google Scholar] [CrossRef]
Niekamp, S.; Galla, T.; Kleinmann, M.; Gühne, O. Computing complexity measures for quantum states based on exponential families. J. Phys. A Math. Theor. 2013, 46, 125301. [Google Scholar] [CrossRef]
Vedral, V.; Plenio, M.B.; Rippin, M.A.; Knight, P.L. Quantifying entanglement. Phys. Rev. Lett. 1997, 78, 2275–2279. [Google Scholar] [CrossRef]
Nagaoka, H. Differential Geometrical Aspects of Quantum State Estimation and Relative Entropy. In Quantum Communications and Measurement; Belavkin, V.P., Hirota, O., Hudson, R.L., Eds.; Springer: Boston, MA, USA, 1995. [Google Scholar]
Amari, S. Differential geometry of curved exponential families-curvatures and information loss. Ann. Stat. 1985, 10, 357–387. [Google Scholar] [CrossRef]
Petz, D. Quantum Information Theory and Quantum Statistics; Springer: Berlin/Heidelberg, Germany, 2008. [Google Scholar]

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Canonical Divergence for Measuring Classical and Quantum Complexity

Abstract

1. Introduction

2. Canonical Divergence on the Simplex of Probability Measures

3. Geometric Structure of a Manifold of Quantum States

Canonical Divergence on the Manifold of Quantum States

4. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics