Next Article in Journal
Fuhrman, G. Rehabilitating Information. Entropy, 2010, 12, 164-196
Previous Article in Journal
Measurement Invariance, Entropy, and Probability

## Article Menu

Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

# From ƒ-Divergence to Quantum Quasi-Entropies and Their Use

by
Dénes Petz
Alfréd Rényi Institute of Mathematics, H-1364 Budapest, POB 127, Hungary
Entropy 2010, 12(3), 304-325; https://doi.org/10.3390/e12030304
Submission received: 26 July 2009 / Revised: 20 February 2010 / Accepted: 25 February 2010 / Published: 1 March 2010

## Abstract

:
Csiszár’s f-divergence of two probability distributions was extended to the quantum case by the author in 1985. In the quantum setting, positive semidefinite matrices are in the place of probability distributions and the quantum generalization is called quasi-entropy, which is related to some other important concepts as covariance, quadratic costs, Fisher information, Cramér-Rao inequality and uncertainty relation. It is remarkable that in the quantum case theoretically there are several Fisher information and variances. Fisher information are obtained as the Hessian of a quasi-entropy. A conjecture about the scalar curvature of a Fisher information geometry is explained. The described subjects are overviewed in details in the matrix setting. The von Neumann algebra approach is also discussed for uncertainty relation.

## 1. Introduction

Let $X$ be a finite space with probability measures p and q. Their relative entropy or divergence
$D ( p | | q ) = ∑ x ∈ X p ( x ) log p ( x ) q ( x )$
was introduced by Kullback and Leibler in 1951 [1]. More precisely, if $p ( x ) = q ( x ) = 0$, then $log ( p ( x ) / q ( x ) ) = 0$ and if $p ( x ) ≠ 0$ but $q ( x ) = 0$ for some $x ∈ X$, then $log ( p ( x ) / q ( x ) ) = + ∞$.
A possible generalization of the relative entropy is the f-divergence introduced by Csiszár:
$D f ( p | | q ) = ∑ x ∈ X q ( x ) f p ( x ) q ( x )$
with a real function $f ( x )$ defined for $x > 0$ [2,3]. For the convex function $f ( x ) = x log x$ the relative entropy is obtained.
This paper first gives a rather short survey about f-divergence and we turn to the non-commutative (algebraic, or quantum) generalization. Roughly speaking this means that the positive n-tuples p and q are replaced by positive semidefinite $n × n$ matrices and the main questions in the study remain rather similar to the probabilistic case. The quantum generalization was originally called quasi-entropy, but quantum f-divergence might be a better terminology. This notion is related to some other important concepts as covariance, quadratic costs, Fisher information, Cramér-Rao inequality and uncertainty relation. These subjects are overviewed in details in the matrix setting, but at the very end the von Neumann algebra approach is sketched shortly. When the details are not presented in the present paper, the precise references are given.

## 2. f-Divergence and Its Use

Let $F$ be the set of continuous convex functions $R + → R$. The following result explains the importance of convexity.
Let $A$ be a partition of $X$. If p is a probability distribution on $X$, then $p A ( A ) : = ∑ x ∈ A p ( x )$ becomes a probability distribution on $A$
Theorem 1 Let $A$ be a partition of $X$ and $p , q$ be probability distributions on $X$. If $f ∈ F$, then
$D f ( p A | | q A ) ≤ D f ( p | | q )$
The inequality in the theorem is the monotonicity of the f-divergence. A particular case is
$f ( 1 ) ≤ D f ( p | | q )$
Theorem 2 Let $f , g ∈ F$ and assume that
$D f ( p | | q ) = D g ( p | | q )$
for every distribution p and q. Then there exists a constant $c ∈ R$ such that $f ( x ) - g ( x ) = c ( x - 1 )$.
Since the divergence is a kind of informational distance, we want $D f ( p | | p ) = 0$ and require $f ( 1 ) = 0$. This is nothing else but a normalization,
$D f + c ( p | | q ) = D f ( p | | q ) + c$
A bit more generally, we can say that if $f ( x ) - g ( x )$ is a linear function, then $D f$ and $D g$ are essentially the same quantities.
It is interesting to remark that $q f ( p / q )$ can be considered also as a mean of p and q. In that case the mean of p and p should be p, so in the theory of means $f ( 1 ) = 1$ is a different natural requirement.
Set $f * ( x ) = x f ( x - 1 )$. Then $D f ( p | | q ) = D f * ( q | | p )$. The equality $f * = f$ is the symmetry condition.
Example 1 Let $f ( x ) = | x - 1 |$. Then
$D f ( p , q ) = ∑ x | p ( x ) - q ( x ) | = : V ( p , q )$
is the variational distance of p and q.
Example 2 Let $f ( x ) = ( 1 - x ) 2 / 2$. Then
$D f ( p , q ) = ∑ x ( p ( x ) - q ( x ) ) 2 = : H 2 ( p , q )$
is the squared Hellinger distance of p and q.
Example 3 The function
gives the relative α-entropy
The limit $α → 0$ gives the relative entropy.
Several other functions appeared in the literature, e.g.,
$f ( s ) ( x ) = 1 s ( 1 - s ) ( 1 + x - x s - x 1 - s ) 0 < s ≠ 1$
(The references are [4,5].)
The following result of Csisz´ar is a characterization (or axiomatization) of the f-divergence [6].
Theorem 3 Assume that a number $C ( p , q ) ∈ R$ is associated to probability distributions on the same set $X$ for all finite sets $X$. If
(a)
$C ( p , q )$ is invariant under the permutations of the basic set $X$.
(b)
if $A$ is a partition of $X$, then $C ( p A , q A ) ≤ C ( p , q )$ and the equality holds if and only if
$p A ( A ) q ( x ) = q A ( A ) p ( x )$
whenever $x ∈ A ∈ A$,
then there exists a convex function $f : R + → R$ which is continuous at 0 and $C ( p , q ) = D f ( p | | q )$ for every p and q.

## 3. Quantum Quasi-Entropy

In the mathematical formalism of quantum mechanics, instead of n-tuples of numbers one works with $n × n$ complex matrices. They form an algebra and this allows an algebraic approach. In this approach, a probability density is replaced by a positive semidefinite matrix of trace 1 which is called density matrix [7]. The eigenvalues of a density matrix give a probability density. However, this is not the only probability density provided by a density matrix. If we rewrite the matrix in a certain orthonormal basis, then the diagonal element $p 1 , p 2 , ⋯ , p n$ form a probability density.
Let $M$ denote the algebra of $n × n$ matrices with complex entries. For positive definite matrices $ρ 1 , ρ 2 ∈ M$, for $A ∈ M$ and a function $f : R + → R$, the quasi-entropy is defined as
$S f A ( ρ 1 ∥ ρ 2 ) : = 〈 A ρ 2 1 / 2 , f ( Δ ( ρ 1 / ρ 2 ) ) ( A ρ 2 1 / 2 ) 〉$
$= Tr ρ 2 1 / 2 A * f ( Δ ( ρ 1 / ρ 2 ) ) ( A ρ 2 1 / 2 )$
where $〈 B , C 〉 : = Tr B * C$ is the so-called Hilbert-Schmidt inner product and $Δ ( ρ 1 / ρ 2 ) : M → M$ is a linear mapping acting on matrices:
$Δ ( ρ 1 / ρ 2 ) A = ρ 1 A ρ 2 - 1$
This concept was introduced in [8,9], see also Chapter 7 in [10] and it is the quantum generalization of the f-entropy of Csiszár used in classical information theory (and statistics) [11,12].
The monotonicity in Theorem 1 is the consequence of the Jensen inequality. A function $f : R + → R$ is called matrix concave (or operator concave) if one of the following two equivalent conditions holds:
$f ( λ A + ( 1 - λ ) B ) ≥ λ f ( A ) + ( 1 - λ ) f ( B )$
for every number $0 < λ < 1$ and for positive definite square matrices A and B (of the same size). In the other condition the number λ is (heuristically) replaced by a matrix:
$f ( C A C * + D B D * ) ≥ C f ( A ) C * + D f ( B ) D *$
if $C C * + D D * = I$.
A function $f : R + → R$ is called matrix monotone (or operator monotone) if for positive definite matrices $A ≤ B$ the inequality $f ( A ) ≤ f ( B )$ holds. It is interesting that a matrix monotone function is matrix concave and a matrix concave function is matrix monotone if it is bounded from below [13].
Let $α : M 0 → M$ be a mapping between two matrix algebras. The dual $α * : M → M 0$ with respect to the Hilbert-Schmidt inner product is positive if and only if α is positive. Moreover, α is unital if and only if $α *$ is trace preserving. $α : M 0 → M$ is called a Schwarz mapping if
$α ( B * B ) ≥ α ( B * ) α ( B )$
for every $B ∈ M 0$.
The quasi-entropies are monotone and jointly convex [9,10].
Theorem 4 Assume that $f : R + → R$ is an operator monotone function with $f ( 0 ) ≥ 0$ and $α : M 0 → M$ is a unital Schwarz mapping. Then
$S f A ( α * ( ρ 1 ) , α * ( ρ 2 ) ) ≥ S f α ( A ) ( ρ 1 , ρ 2 )$
holds for $A ∈ M 0$ and for invertible density matrices $ρ 1$ and $ρ 2$ from the matrix algebra $M$.
Proof: The proof is based on inequalities for operator monotone and operator concave functions. First note that
$S f + c A ( α * ( ρ 1 ) , α * ( ρ 2 ) ) = S f A ( α * ( ρ 1 ) , α * ( ρ 2 ) ) + c Tr ρ 1 α ( A * A ) )$
and
$S f + c α ( A ) ( ρ 1 , ρ 2 ) = S f α ( A ) ( ρ 1 , ρ 2 ) + c Tr ρ 1 ( α ( A ) * α ( A ) )$
for a positive constant c. Due to the Schwarz inequality Equation (8), we may assume that $f ( 0 ) = 0$.
Let $Δ : = Δ ( ρ 1 / ρ 2 )$ and $Δ 0 : = Δ ( α * ( ρ 1 ) / α * ( ρ 2 ) )$. The operator
$V X α * ( ρ 2 ) 1 / 2 = α ( X ) ρ 2 1 / 2 ( X ∈ M 0 )$
is a contraction:
$∥ α ( X ) ρ 2 1 / 2 ∥ 2 = Tr ρ 2 ( α ( X ) * α ( X ) ) ≤ Tr ρ 2 ( α ( X * X ) = Tr α * ( ρ 2 ) X * X = ∥ X α * ( ρ 2 ) 1 / 2 ∥ 2$
since the Schwarz inequality is applicable to α. A similar simple computation gives that
$V * Δ V ≤ Δ 0 .$
Since f is operator monotone, we have $f ( Δ 0 ) ≥ f ( V * Δ V )$. Recall that f is operator concave, therefore $f ( V * Δ V ) ≥ V * f ( Δ ) V$ and we conclude
$f ( Δ 0 ) ≥ V * f ( Δ ) V$
Application to the vector $A α * ( ρ 2 ) 1 / 2$ gives the statement.
It is remarkable that for a multiplicative α we do not need the condition $f ( 0 ) ≥ 0$. Moreover, $V * Δ V = Δ 0$ and we do not need the matrix monotonicity of the function f. In this case the only condition is the matrix concavity, analogously to Theorem 1.
If we apply the monotonicity (9) to the embedding $α ( X ) = X ⊕ X$ of $M$ into $M ⊕ M$ and to the densities $ρ 1 = λ E 1 ⊕ ( 1 - λ ) F 1$, $ρ 2 = λ E 2 ⊕ ( 1 - λ ) F 2$, then we obtain the joint concavity of the quasi-entropy:
$λ S f A ( E 1 , E 2 ) + ( 1 - λ ) S f A ( F 1 , F 2 ) ≤ S f A ( λ E 1 + ( 1 - λ ) E 2 ) + S f A ( λ F 1 + ( 1 - λ ) F 2 )$
The case $f ( t ) = t α$ is the famous Lieb’s concavity theorem: $Tr A ρ α A * ρ 1 - α$ is concave in ρ [14].
The concept of quasi-entropy includes some important special cases. If $ρ 2$ and $ρ 1$ are different and $A = I$, then we have a kind of relative entropy. For $f ( x ) = x log x$ we have Umegaki’s relative entropy $S ( ρ 1 ∥ ρ 2 ) = Tr ρ 1 ( log ρ 1 - log ρ 2 )$. (If we want a matrix monotone function, then we can take $f ( x ) = log x$ and then we get $S ( ρ 2 ∥ ρ 1 )$.) Umegaki’s relative entropy is the most important example, therefore the function f will be chosen to be matrix convex. This makes the probabilistic and non-commutative situation compatible as one can see in the next argument.
Let $ρ 1$ and $ρ 2$ be density matrices in $M$. If in certain basis they have diagonal $p = ( p 1 . p 2 , ⋯ , p n )$ and $q = ( q 1 , q 2 , ⋯ , q n )$, then the monotonicity theorem gives the inequality
$D f ( p ∥ q ) ≤ S f ( ρ 1 ∥ ρ 2 )$
for a matrix convex function f. If $ρ 1$ and $ρ 2$ commute, them we can take the common eigenbasis and in (13) the equality appears. It is not trivial that otherwise the inequality is strict.
If $ρ 1$ and $ρ 2$ are different, then there is a choice for p and q such that they are different as well. Then
$0 < D f ( p ∥ q ) ≤ S f ( ρ 1 ∥ ρ 2 )$
Conversely, if $S f ( ρ 1 ∥ ρ 2 ) = 0$, then $p = q$ for every basis and this implies $ρ 1 = ρ 2$. For the relative entropy, a deeper result is known. The Pinsker-Csiszár inequality says that
$( ∥ p - q ∥ 1 ) 2 ≤ 2 D ( p ∥ q )$
This extends to the quantum case as
$( ∥ ρ 1 - ρ 2 ∥ 1 ) 2 ≤ 2 S ( ρ 1 ∥ ρ 2 )$
see [15], or [7, Chap. 3].
Problem 1 It would be interesting to extend Theorem 3 of Csiszár to the quantum case. If we require monotonicity and specify the condition for equality, then a function f is provided by Theorem 3, but for non-commuting densities the conclusion is not clear.
Example 4 The function
is matrix monotone decreasing for $α ∈ ( - 1 , 1 )$. (For $α = 0$, the limit is taken and it is $- log x$.) Then the relative entropies of degree α are produced:
$S α ( ρ 2 ∥ ρ 1 ) : = 1 α ( 1 - α ) Tr ( I - ρ 1 α ρ 2 - α ) ρ 2$
These quantities are essential in the quantum case.
If $ρ 2 = ρ 1 = ρ$ and $A , B ∈ M$ are arbitrary, then one can approach to the generalized covariance [16].
$qCov ρ f ( A , B ) : = 〈 A ρ 1 / 2 , f ( Δ ( ρ / ρ ) ) ( B ρ 1 / 2 ) 〉 - ( Tr ρ A * ) ( Tr ρ B )$
is a generalized covariance. If $ρ , A$ and B commute, then this becomes $f ( 1 ) Tr ρ A * B - ( Tr ρ A * ) ( Tr ρ B )$. This shows that the normalization $f ( 1 ) = 1$ is natural. The generalized covariance $qCov ρ f ( A , B )$ is a sesquilinear form and it is determined by $qCov ρ f ( A , A )$ when ${ A ∈ M : Tr ρ A = 0 }$. Formally, this is a quasi-entropy and Theorem 4 applies if f is matrix monotone. If we require the symmetry condition $qCov ρ f ( A , A ) = qCov ρ f ( A * , A * )$, then f should have the symmetry $x f ( x - 1 ) = f ( x )$.
Assume that $Tr ρ A = Tr ρ B = 0$ and $ρ = D i a g ( λ 1 , λ 2 , ⋯ , λ n )$. Then
$qCov ρ f ( A , B ) = ∑ i j λ i f ( λ j / λ i ) A i j * B i j$
A matrix monotone function $f : R + → R +$ will be called standard if $x f ( x - 1 ) = f ( x )$ and $f ( 1 ) = 1$. A standard function f admits a canonical representation
$f ( t ) = 1 + t 2 exp ∫ 0 1 ( 1 - t 2 ) λ 2 - 1 ( λ + t ) ( 1 + λ t ) ( λ + 1 ) 2 h ( λ ) d λ$
where $h : [ 0 , 1 ] → [ 0 , 1 ]$ is a measurable function [17].
The usual symmetrized covariance corresponds to the function $f ( t ) = ( t + 1 ) / 2$:
$Cov ρ ( A , B ) : = 1 2 Tr ( ρ ( A * B + B A * ) ) - ( Tr ρ A * ) ( Tr ρ B )$
The interpretation of the covariances is not at all clear. In the next section they will be called quadratic cost functions. It turns out that there is a one-to-one correspondence between quadratic cost functions and Fisher information.

## 4. Fisher Information

#### 4.1. The Cramér-Rao inequality

The Cramér-Rao inequality belongs to the basics of estimation theory in mathematical statistics. Its quantum analog was discovered immediately after the foundation of mathematical quantum estimation theory in the 1960’s, see the book [18] of Helstrom, or the book [19] of Holevo for a rigorous summary of the subject. Although both the classical Cramér-Rao inequality and its quantum analog are as trivial as the Schwarz inequality, the subject takes a lot of attention because it is located on the highly exciting boundary of statistics, information and quantum theory.
As a starting point we give a very general form of the quantum Cramér-Rao inequality in the simple setting of finite dimensional quantum mechanics. For $θ ∈ ( - ε , ε ) ⊂ R$ a statistical operator $ρ ( θ )$ is given and the aim is to estimate the value of the parameter θ close to 0. Formally $ρ ( θ )$ is an $n × n$ positive semidefinite matrix of trace 1 which describes a mixed state of a quantum mechanical system and we assume that $ρ ( θ )$ is smooth (in θ). Assume that an estimation is performed by the measurement of a self-adjoint matrix A playing the role of an observable. A is called locally unbiased estimator if
$∂ ∂ θ Tr ρ ( θ ) A | θ = 0 = 1$
This condition holds if A is an unbiased estimator for θ, that is
$Tr ρ ( θ ) A = θ ( θ ∈ ( - ε , ε ) )$
To require this equality for all values of the parameter is a serious restriction on the observable A and we prefer to use the weaker condition (19).
Let $φ 0 [ K , L ]$ be an inner product (or quadratic cost function) on the linear space of self-adjoint matrices. When $ρ ( θ )$ is smooth in θ, as already was assumed above, then
$∂ ∂ θ Tr ρ ( θ ) B | θ = 0 = φ 0 [ B , L ]$
with some $L = L *$. From (19) and (21), we have $φ 0 [ A , L ] = 1$ and the Schwarz inequality yields
$φ 0 [ A , A ] ≥ 1 φ 0 [ L , L ]$
This is the celebrated inequality of Cramér-Rao type for the locally unbiased estimator.
The right-hand-side of (22) is independent of the estimator and provides a lower bound for the quadratic cost. The denominator $φ 0 [ L , L ]$ appears to be in the role of Fisher information here. We call it quantum Fisher information with respect to the cost function $φ 0 [ · , · ]$. This quantity depends on the tangent of the curve $ρ ( θ )$. If the densities $ρ ( θ )$ and the estimator A commute, then
We want to conclude from the above argument that whatever Fisher information and generalized variance are in the quantum mechanical setting, they are very strongly related. In an earlier work [20,21] we used a monotonicity condition to make a limitation on the class of Riemannian metrics on the state space of a quantum system. The monotone metrics are called Fisher information quantities in this paper.
Since the sufficient and necessary condition for the equality in the Schwarz inequality is well-known, we are able to analyze the case of equality in (22). The condition for equality is
$A = λ L$
for some constant $λ ∈ R$. Therefore the necessary and sufficient condition for equality in (22) is
$ρ ˙ 0 : = ∂ ∂ θ ρ ( θ ) | θ = 0 = λ - 1 J 0 ( A )$
Therefore there exists a unique locally unbiased estimator $A = λ J 0 - 1 ( ρ ˙ 0 )$, where the number λ is chosen in such a way that the condition (19) should be satisfied.
Example 5 Let
$ρ ( θ ) : = ρ + θ B$
where ρ is a positive definite density and B is a self-adjoint traceless operator. A is locally unbiased when $Tr A B = 1$. In particular,
$A = B Tr B 2$
is a locally unbiased estimator and in the Cramér-Rao inequality (22) the equality holds when $φ 0 [ X , Y ] = Tr X Y$, that is, $J 0$ is the identity.
If $Tr ρ B = 0$ holds in addition, then the estimator is unbiased.

#### 4.2. Coarse-graining and monotonicity

In the simple setting in which the state is described by a density matrix, a coarse-graining is an affine mapping sending density matrices into density matrices. Such a mapping extends to all matrices and provides a positivity and trace preserving linear transformation. A common example of coarse-graining sends the density matrix $ρ 12$ of a composite system $1 + 2$ into the (reduced) density matrix $ρ 1$ of component 1. There are several reasons to assume completely positivity about a coarse graining and we do so.
Assume that $ρ ( θ )$ is a smooth curve of density matrices with tangent $A : = ρ ˙$ at ρ. The quantum Fisher information $F ρ ( A )$ is an information quantity associated with the pair $( ρ , A )$, it appeared in the Cramér-Rao inequality above and the classical Fisher information gives a bound for the variance of a locally unbiased estimator. Let now β be a coarse-graining. Then $β ( ρ ( θ ) )$ is another curve in the state space. Due to the linearity of β, the tangent at $β ( ρ 0 )$ is $β ( A )$. As it is usual in statistics, information cannot be gained by coarse graining, therefore we expect that the Fisher information at the density matrix $ρ 0$ in the direction A must be larger than the Fisher information at $β ( ρ 0 )$ in the direction $β ( A )$. This is the monotonicity property of the Fisher information under coarse-graining:
$F ρ ( A ) ≥ F β ( ρ ) ( β ( A ) )$
Although we do not want to have a concrete formula for the quantum Fisher information, we require that this monotonicity condition must hold. Another requirement is that $F ρ ( A )$ should be quadratic in A, in other words there exists a non-degenerate real bilinear form $γ ρ ( A , B )$ on the self-adjoint matrices such that
$F ρ ( A ) = γ ρ ( A , A )$
The requirements (25) and (26) are strong enough to obtain a reasonable but still wide class of possible quantum Fisher information.
We may assume that
$γ ρ ( A , B ) = Tr A J ρ - 1 ( B * )$
for an operator $J ρ$ acting on matrices. (This formula expresses the inner product $γ D$ by means of the Hilbert-Schmidt inner product and the positive linear operator $J ρ$.) In terms of the operator $J ρ$ the monotonicity condition reads as
$β * J β ( ρ ) - 1 β ≤ J ρ - 1$
for every coarse graining β. ($β *$ stand for the adjoint of β with respect to the Hilbert-Schmidt product. Recall that β is completely positive and trace preserving if and only if $β *$ is completely positive and unital.) On the other hand the latter condition is equivalent to
$β J ρ β * ≤ J β ( ρ )$
We proved the following theorem in [20].
Theorem 5 If for every invertible density matrix $ρ ∈ M n ( C )$ a positive definite sesquilinear form $γ ρ : M n ( C ) × M n ( C ) → C$ is given such that
(1)
the monotonicity
$γ ρ ( A , A ) ≥ γ β ( ρ ) ( β ( A ) , β ( A ) )$
holds for all completely positive coarse grainings $β : M n ( C ) → M m ( C )$,
(2)
$γ ρ ( A , A )$ is continuous in ρ for every fixed A,
(3)
$γ ρ ( A , A ) = γ ρ ( A * , A * )$,
(4)
$γ ρ ( A , A ) = Tr ρ - 1 A 2$ if A is self-adjoint and $A ρ = ρ A$,
then there exists a unique standard operator monotone function $f : R + → R$ such that
$γ ρ f ( A , A ) = Tr A J ρ - 1 ( A ) a n d J ρ = R ρ 1 / 2 f ( L ρ R ρ - 1 ) R ρ 1 / 2$
where the linear transformations $L ρ$ and $R ρ$ acting on matrices are the left and right multiplications, that is
$L ρ ( X ) = ρ X a n d R ρ ( X ) = X ρ$
The above $γ ρ ( A , A )$ is formally a quasi-entropy, $S 1 / f A ρ - 1 ( ρ , ρ )$, however this form is not suitable to show the monotonicity. Assume that $ρ = D i a g ( λ 1 , λ 2 , ⋯ , λ n )$. Then
$γ ρ f ( A , A ) = ∑ i j 1 λ i f ( λ j / λ i ) | A i j | 2$
It is clear from this formula that the Fisher information is affine in the function $1 / f$. Therefore, Hansen’s canonical representation of the reciprocal of a standard operator monotone function can be used [22].
Theorem 6 If $f : R + → R +$ be a standard operator monotone function, then
where μ is a probability measure on $[ 0 , 1 ]$.
The theorem implies that the set ${ 1 / f : f i s s t a n d a r d o p e r a t o r m o n o t o n e }$ is convex and gives the extremal points
One can compute directly that
$∂ ∂ λ g λ ( x ) = - ( 1 - λ 2 ) ( x + 1 ) ( x - 1 ) 2 2 ( x + λ ) 2 ( 1 + x λ ) 2$
Hence $g λ$ is decreasing in the parameter λ. For $λ = 0$ we have the largest function $g 0 ( t ) = ( t + 1 ) / ( 2 t )$ and for $λ = 1$ the smallest is $g 1 ( t ) = 2 / ( t + 1 )$. (Note that this was also obtained in the setting of positive operator means [23], harmonic and arithmetic means.)
Via the operator $J ρ$, each monotone Fisher information determines a quantity
$φ ρ [ A , A ] : = Tr A J ρ ( A )$
which is a quadratic cost functional. According to (29) (or Theorem 4) this possesses the monotonicity property
$φ ρ [ β * ( A ) , β * ( A ) ] ≤ φ β ( ρ ) [ A , A ]$
Since (28) and (29) are equivalent we observe a one-to-one correspondence between monotone Fisher information and monotone quadratic cost functions.
Theorem 7 If for every invertible density matrix $ρ ∈ M n ( C )$ a positive definite sesquilinear form $φ ρ : M n ( C ) × M n ( C ) → C$ is given such that
(1)
the monotonicity (33) holds for all completely positive coarse grainings $β : M n ( C ) → M m ( C )$,
(2)
$φ ρ [ A , A ]$ is continuous in ρ for every fixed A,
(3)
$φ ρ [ A , A ] = φ ρ [ A * , A * ]$,
(4)
$φ ρ [ A , A ] = Tr ρ A 2$ if A is self-adjoint and $A ρ = ρ A$,
then there exists a unique standard operator monotone function $f : R + → R$ such that
$φ ρ f [ A , A ] = Tr A J ρ ( A )$
with the operator $J ρ$ defined in Theorem 5.
Any such cost function has the property $φ ρ [ A , B ] = Tr ρ A * B$ when ρ commutes with A and B. The examples below show that it is not so generally.
Example 6 Among the standard operator monotone functions, $f a ( t ) = ( 1 + t ) / 2$ is maximal. This leads to the fact that among all monotone quantum Fisher information there is a smallest one which corresponds to the function $f a ( t )$. In this case
$F ρ min ( A ) = Tr A L = Tr ρ L 2 , w h e r e ρ L + L ρ = 2 A$
For the purpose of a quantum Cramér-Rao inequality the minimal quantity seems to be the best, since the inverse gives the largest lower bound. In fact, the matrix L has been used for a long time under the name of symmetric logarithmic derivative, see [19] and [18]. In this example the quadratic cost function is
$φ ρ [ A , B ] = 1 2 Tr ρ ( A B + B A )$
and we have
$J ρ ( B ) = 1 2 ( ρ B + B ρ ) a n d J ρ - 1 ( A ) = ∫ 0 ∞ e - t ρ / 2 A e - t ρ / 2 d t$
for the operator $J$ of the previous section.
To see the second formula of (36), set $A ( t ) : = e - t ρ / 2 A e - t ρ / 2$. Then
$d d t A ( t ) = - 1 2 ( ρ A ( t ) - A ( t ) ρ )$
and
Hence
Let $T = T *$ and $ρ 0$ be a density matrix. Then $D ( θ ) : = exp ( θ T / 2 ) ρ 0 exp ( θ T / 2 )$ satisfies the differential equation
$∂ ∂ θ D ( θ ) = J D ( θ ) T$
and
$ρ ( θ ) = D ( θ ) Tr D ( θ )$
is a kind of exponential family.
If $Tr ρ 0 T = 0$ and $Tr ρ 0 T 2 = 1$, then
$∂ ∂ θ Tr ρ ( θ ) T | θ = 0 = 1$
and T is a locally unbiased estimator (of the parameter θ at $θ = 0$). Since
$∂ ∂ θ ρ ( θ ) | θ = 0 = J 0 ( T )$
we have equality in the Cramér-Rao inequality, see (24).
Example 7 The function
$f β ( t ) = β ( 1 - β ) ( x - 1 ) 2 ( x β - 1 ) ( x 1 - β - 1 )$
is operator monotone if $0 < | β | < 1$.
When $A = i [ ρ , B ]$ is orthogonal to the commutator of the foot-point ρ in the tangent space, we have
Apart from a constant factor this expression is the skew information proposed by Wigner and Yanase some time ago ([24]). In the limiting cases $β → 0$ or 1 we have
$f 0 ( x ) = x - 1 log x$
and the corresponding Fisher information
$γ ρ ( A , B ) : = ∫ 0 ∞ Tr A ( ρ + t ) - 1 B ( ρ + t ) - 1 d t$
is named after Kubo, Mori, Bogoliubov, etc. The Kubo-Mori inner product plays a role in quantum statistical mechanics (see [25], for example). In this case
$J - 1 ( B ) = ∫ 0 ∞ ( ρ + t ) - 1 B ( ρ + t ) - 1 d t and J ( A ) = ∫ 0 1 ρ t A ρ 1 - t d t$
Therefore the corresponding quadratic cost functional is
$φ ρ [ A , B ] = ∫ 0 1 Tr A ρ t B ρ 1 - t d t$
Let
$ρ ( θ ) : = exp ( H + θ T ) Tr exp ( H + θ T )$
where $ρ = e H$. Assume that $Tr e H T = 0$. The Frechet derivative of $e H$ is $∫ 0 1 Tr e t H T e ( 1 - t ) H d t$. Hence A is locally unbiased if
$∫ 0 1 Tr ρ t T ρ 1 - t A d t = 1$
This holds if
$A = T ∫ 0 1 Tr ρ t T ρ 1 - t T d t$
In the Cramér-Rao inequality (22) the equality holds when $J 0 ( K ) = ∫ 0 1 D t K D 1 - t d t$.
Note that Equation (44) is again an exponential family, the differential equation for
$D ( θ ) = exp ( H + θ T )$
has the form Equation (37) with
$J D ( θ ) ( K ) = ∫ 0 1 D ( θ ) t K D ( θ ) 1 - t d t$
Problem 2 It would be interesting to find more exponential families. This means solution of the differential equation
$∂ ∂ θ D ( θ ) = J D ( θ ) T , D ( 0 ) = ρ 0$
If the self-adjoint T and the positive ρ commute, then the solution is $D ( θ ) = exp ( θ T ) ρ 0$. A concrete example is
$∂ ∂ θ D ( θ ) = D ( θ ) 1 / 2 T D ( θ ) 1 / 2$

#### 4.3. Manifolds of density matrices

Let $M : = { ρ ( θ ) : θ ∈ G }$ be a smooth m-dimensional manifold of invertible density matrices. When a quadratic cost function $φ 0$ is fixed, the corresponding Fisher information is a Riemannian metric on the manifold. This gives a possibility for geometric interpretation of statistical statements [26,27].
Fisher information appears not only as a Riemannian metric but as an information matrix as well. The quantum score operators (or logarithmic derivatives) are defined as
and
is the quantum Fisher information matrix.
The next result is the monotonicity of Fisher information matrix.
Theorem 8 [16] Let β be a coarse-graining sending density matrices on the Hilbert space $H 1$ into those acting on the Hilbert space $H 2$ and let $M : = { ρ ( θ ) : θ ∈ G }$ be a smooth m-dimensional manifold of invertible density matrices on $H 1$. For the Fisher information matrix $I 1 Q ( θ )$ of $M$ and for Fisher information matrix $I 2 Q ( θ )$ of $β ( M ) : = { β ( ρ ( θ ) ) : θ ∈ G }$ we have the monotonicity relation
$I 2 Q ( θ ) ≤ I 1 Q ( θ )$
Assume that $F j$ are positive operators acting on a Hilbert space $H 1$ on which the family $M : = { ρ ( θ ) : θ ∈ G }$ is given. When $∑ j = 1 n F j = I$, these operators determine a measurement. For any $ρ ( θ )$ the formula
$β ( ρ ( θ ) ) : = D i a g ( Tr ρ ( θ ) F 1 , ⋯ , Tr ρ ( θ ) F n )$
gives a diagonal density matrix. Since this family is commutative, all quantum Fisher information coincide with the classical (23) and the classical Fisher information stand on the left-hand-side of (47). The right-hand-side can be arbitrary quantum quantity but it is minimal if based on the symmetric logarithmic derivative, see Example 6. This particular case of the Theorem is in the paper [28].
Assume that a manifold $M : = { ρ ( θ ) : θ ∈ G }$ of density matrices is given together a statistically relevant Riemannian metric γ. Given two points on the manifold, their geodesic distance is interpreted as the statistical distinguish-ability of the two density matrices in some statistical procedure.
Let $ρ 0 ∈ M$ be a point on our statistical manifold. The geodesic ball
$B ε ( ρ 0 ) : = { ρ ∈ M : d ( ρ 0 , ρ ) < ε }$
contains all density matrices which can be distinguished by an effort smaller than ε from the fixed density $ρ 0$. The size of the inference region $B ε ( ρ 0 )$ measures the statistical uncertainty at the density $ρ 0$. Following Jeffrey’s rule the size is the volume measure determined by the statistical (or information) metric. More precisely, it is better to consider the asymptotics of the volume of $B ε ( ρ 0 )$ as $ε → 0$. It is known in differential geometry that
where m is the dimension of our manifold, $C m$ is a constant (equals to the volume of the unit ball in the Euclidean m-space) and $S c a l$ means the scalar curvature, see [29, 3.98 Theorem]. In this way, the scalar curvature of a statistically relevant Riemannian metric might be interpreted as the average statistical uncertainty of the density matrix (in the given statistical manifold). This interpretation becomes particularly interesting for the full state space endowed by the Kubo-Mori inner product as a statistically relevant Riemannian metric.
The Kubo-Mori (or Bogoliubov) inner product is given by
$γ ρ ( A , B ) = Tr ( ∂ A ρ ) ( ∂ B log ρ )$
or (41) in the affine parametrization. On the basis of numerical evidences it was conjectured in [30] that the scalar curvature which is a statistical uncertainty is monotone in the following sense. For any coarse graining α the scalar curvature at a density ρ is smaller than at $α ( ρ )$. The average statistical uncertainty is increasing under coarse graining. Up to now this conjecture has not been proven mathematically. Another form of the conjecture is the statement that along a curve of Gibbs states
$e - β H Tr e - β H$
the scalar curvature changes monotonously with the inverse temperature $β ≥ 0$, that is, the scalar curvature is monotone decreasing function of β. (Some partial results are in [31].)
Let $M$ be the manifold of all invertible $n × n$ density matrices. If we use the affine parametrization, then the tangent space $T ρ$ consists of the traceless self-adjoint matrices and has an orthogonal decomposition
$T ρ = { i [ ρ , B ] : B ∈ M n s a } ⊕ { A = A * : Tr A = 0 , A ρ = ρ A }$
We denote the two subspaces by $T ρ q$ and $T ρ c$, respectively. If $A 2 ∈ T ρ c$, then
$F ( Δ ( ρ / ρ ) ) ( A 2 ρ ± 1 / 2 ) = A 2 ρ ± 1 / 2$
implies
$qCov ρ f ( A 1 , A 2 ) = Tr ρ A 1 * A 2 - ( Tr ρ A 1 * ) ( Tr ρ A 2 ) , γ ρ f ( A 1 , A 2 ) = Tr ρ - 1 A 1 * A 2$
independently of the function f. Moreover, if $A 1 ∈ T ρ q$, then
$γ ρ f ( A 1 , A 2 ) = qCov ρ f ( A 1 , A 2 ) = 0$
Therefore, the decomposition (50) is orthogonal with respect to any Fisher information and any quadratic cost functional. Moreover, the effect of the function f and the really quantum situation are provided by the components from $T ρ q$.

#### 4.4. Skew information

Let f be a standard function and $X = X * ∈ M n$. The quantity
$I ρ f ( X ) : = f ( 0 ) 2 γ ρ f ( i [ ρ , X ] , i [ ρ , X ] )$
was called skew information in [22] in this general setting. The skew information is nothing else but the Fisher information restricted to $T ρ q$, but it is parametrized by the commutator.
If $ρ = D i a g ( λ 1 , ⋯ , λ n )$ is diagonal, then
$γ ρ f ( i [ ρ , X ] , i [ ρ , X ] ) = ∑ i j ( λ i - λ j ) 2 λ j f ( λ i / λ j ) | X i j | 2$
This implies that the identity
$f ( 0 ) γ ρ f ( i [ ρ , X ] , i [ ρ , X ] ) = 2 Cov ρ ( X , X ) - 2 qCov ρ f ˜ ( X , X )$
holds if $Tr ρ X = 0$ and
The following result was obtained in [32].
Theorem 9 If $f : R + → R$ is a standard function, then $f ˜$ is standard as well.
The original proof is not easy, even matrix convexity of functions of two variables is used. Here we sketch a rather elementary proof based on the fact that $1 / f ↦ f ˜$ is linear and on the canonical decomposition in Theorem 6.
Lemma 1 Let $0 ≤ λ ≤ 0$ and $f λ : R + → R$ be a function such that
Then the function $f ˜ : R + → R$ defined in (52) is an operator monotone standard function.
The proof of the lemma is elementary. From the lemma and Theorem 6, Theorem 9 follows straightforwardly [33].
The skew information is the Hessian of a quasi-entropy:
Theorem 10 Assume that $X = X * ∈ M n$ and $Tr ρ X = 0$. If f is a standard function such that $f ( 0 ) ≠ 0$, then
$∂ 2 ∂ t ∂ s S F ( ρ + t i [ ρ , X ] , ρ + s i [ ρ , X ] ) | t = s = 0 = f ( 0 ) γ ρ f ( i [ ρ , X ] , i [ ρ , X ] )$
for the standard function $F = f ˜$.
The proof is based on the formula
$d d t h ( ρ + t i [ ρ , X ] ) | t = 0 = i [ h ( ρ ) , X ]$
see [33].
The next example seems to be new, the author does not know a direct application presently.
Example 8 We compute the Hessian of the relative entropy of degree α in an exponential parametrization:
$∂ 2 ∂ t ∂ s S α ( e H + t A | | e H + s B ) | t = s = 0 = ∫ 0 1 Tr e ( 1 - u ) H B e u H A g α ( u ) d u ,$
where
Since
$∂ 2 ∂ t ∂ s S α ( e H + t A | | e H + s B ) = 1 α ( 1 - α ) ∂ 2 ∂ t ∂ s Tr exp α ( H + s B ) exp ( 1 - α ) ( H + t A )$
we calculate as follows:
for the functional
$F ( t ) = e ( 1 - t ) H B e t H A$
We continue
$∫ 0 1 ∫ 0 1 F ( - x α + y - y α + α ) d x d y = ∫ 0 1 ∫ 0 1 F ( - x α + y ( 1 - α ) + α ) d x d y = ∫ x = 0 1 1 1 - α ∫ z = 0 1 - α F ( - x α + z + α ) d z d x = 1 α ∫ w = - α 0 1 1 - α ∫ z = 0 1 - α F ( z - w ) d z d w = ∫ 0 1 F ( u ) g α ( u ) d u$
where $g α$ is as above.
We know that
$∂ 2 ∂ t ∂ s exp ( H + t A + s B ) = | t = s = 0 = ∫ 0 1 ∫ 0 s e ( 1 - s ) H B e ( s - u ) H A e u H d u d s$
therefore
$∂ 2 ∂ t ∂ s exp ( 1 - α ) ( H + t A + s B ) = ( 1 - α ) 2 ∫ 0 1 ∫ 0 s e ( 1 - s ) ( 1 - α ) H B e ( s - u ) ( 1 - α ) H A e u ( 1 - α ) H d u d s$
therefore we obtain
$∫ 0 1 ∫ 0 s Tr e [ 1 - ( s - u ) ] ( 1 - α ) H B e ( s - u ) ( 1 - α ) H A d u d s = ∫ 0 1 ( 1 - x ) Tr e [ 1 - x ] ( 1 - α ) H B e x ( 1 - α ) H A d x$
If $α = 0$, then we have the Kubo-Mori inner product.

## 5. Von Neumann Algebras

Let $M$ be a von Neumann algebra. Assume that it is in standard form, it acts on a Hilbert space $H$, $P ⊂ H$ is the positive cone and $J : H → H$ is the modular conjugation. Let φ and ω be normal states with representing vectors Φ and Ω in the positive cone. For the sake of simplicity, assume that φ and ω are faithful. This means that Φ and Ω are cyclic and separating vectors. The closure of the unbounded operator $A Φ ↦ A * Ω$ has a polar decomposition $J Δ ( ω / φ ) 1 / 2$ and $Δ ( ω / φ )$ is called relative modular operator. $A Φ$ is in the domain of $Δ ( ω / φ ) 1 / 2$ for every $A ∈ M$.
For $A ∈ M$ and $f : R + → R$, the quasi-entropy
$S f A ( ω ∥ φ ) : = 〈 A Φ , f ( Δ ( ω / φ ) ) A Φ 〉$
was introduced in [8], see also Chapter 7 in [10]. Of course, (5) is a particular case.
Theorem 11 Assume that $f : R + → R$ is an operator monotone function with $f ( 0 ) ≥ 0$ and $α : M 0 → M$ is a Schwarz mapping. Then
$S f A ( ω ∘ α ∥ φ ∘ α ) ≥ S f α ( A ) ( ω ∥ φ )$
holds for $A ∈ M 0$ and for normal states ω and φ of the von Neumann algebra $M$.
The relative entropies are jointly convex in this setting similarly to the finite dimensional case. Now we shall concentrate on the generalized variance.

#### 5.1. Generalized covariance

To deal with generalized covariance, we assume that $f : R + → R$ is a standard operator monotone (increasing) function. The natural extension of the covariance (from probability theory) is
$qCov ω f ( A , B ) = 〈 f ( Δ ( ω / ω ) ) A Ω , f ( Δ ( ω / ω ) ) B Ω 〉 - ω ( A ) ¯ ω ( B )$
where $Δ ( ω / ω )$ is actually the modular operator. Although $Δ ( ω / ω )$ is unbounded, the definition works. For the function f, the inequality
$2 x x + 1 ≤ f ( x ) ≤ 1 + x 2$
holds. Therefore $A Ω$ is in the domain of $f ( Δ ( ω / ω ) )$.
For a standard function $f : R + → R +$ and for a normal unital Schwarz mapping $β : N → M$ the inequality
$qCov ω f ( β ( X ) , β ( X ) ) ≤ qCov ω ∘ β f ( X , X ) ( X ∈ N )$
is a particular case of Theorem 11 and it is the monotonicity of the generalized covariance under coarse-graining. The common symmetrized covariance
$Cov ω ( A , B ) : = 1 2 ω ( A * B + B A * ) - ω ( A ) ¯ ω ( B )$
is recovered by the particular case $f ( t ) = ( 1 + t ) / 2$.
Since
$qCov ω f ( A , B ) = γ ω f ( A - ω ( A ) I , B - ω ( B ) I ) ,$
it is enough to consider these sesquilinear forms on the subspace $T ω : = { A ∈ M : ω ( A ) = 0 }$.

#### 5.2. The Cramér-Rao Inequality

Let ${ ω θ : θ ∈ G }$ be a smooth m-dimensional manifold in the set of normal states of the von Neumann algebra $M$ and assume that a collection $A = ( A 1 , ⋯ , A m )$ of self-adjoint operators is used to estimate the true value of θ. The subspace spanned by $A 1 , A 2 , ⋯ , A m$ is denoted by V.
Given a standard matrix monotone function f, we have the corresponding cost function
$φ θ [ A , B ] ≡ qCov f ω θ ( A , B )$
for every θ and the cost matrix of the estimator A is a positive semidefinite matrix, defined by
$φ θ [ A ] i j = φ θ [ A i , A j ]$
The bias of the estimator is
For an unbiased estimator we have $b ( θ ) = 0$. From the bias vector we form a bias matrix
$B i j ( θ ) : = ∂ θ i b j ( θ )$
For a locally unbiased estimator at $θ 0$, we have $B ( θ 0 ) = 0$.
The relation
$∂ θ i ω θ ( H ) = φ θ [ L i ( θ ) , H ] ( H ∈ V )$
determines the logarithmic derivatives $L i ( θ )$. The Fisher information matrix is
$J i j ( θ ) : = φ θ [ L i ( θ ) , L j ( θ ) ]$
Theorem 12 Let $A = ( A 1 , ⋯ , A m )$ be an estimator of θ. Then for the above defined quantities the inequality
holds in the sense of the order on positive semidefinite matrices.
Concerning the proof we refer to [16].

#### 5.3. Uncertainty relation

In the von Neumann algebra setting the skew information (as a sesquilinear form) can be defined as
$I ω f ( X , Y ) : = Cov ω ( X , Y ) - qCov ω f ˜ ( X , Y )$
if $ω ( X ) = ω ( Y ) = 0$. (Then $I ω f ( X ) = I ω f ( X , X )$.)
Lemma 2 Let $K$ be a Hilbert space with inner product $〈 〈 · , · 〉 〉$ and let $〈 · , · 〉$ be a sesquilinear form on $K$ such that
$0 ≤ 〈 f , f 〉 ≤ 〈 〈 f , f 〉 〉$
for every vector $f ∈ K$. Then
$[ 〈 f i , f j 〉 ] i , j = 1 m ≤ [ 〈 〈 f i , f j 〉 〉 ] i , j = 1 m$
holds for every $f 1 , f 2 , ⋯ , f m ∈ K$.
Proof: Consider the Gram matrices $G : = [ 〈 〈 f i , f j 〉 〉 ] i , j = 1 m$ and $H : = [ 〈 f i , f j 〉 ] i , j = 1 m$, which are symmetric and positive semidefinite. For every $a 1 , ⋯ , a m ∈ R$ we get
$∑ i , j = 1 m ( 〈 〈 f i , f j 〉 〉 - 〈 f i , f i 〉 ) a ¯ i a j = 〈 〈 ∑ i = 1 m a i f i , ∑ i = 1 m a i f i 〉 〉 - 〈 ∑ i = 1 m a i f i , ∑ i = 1 m a i f i 〉 ≥ 0$
by assumption. This says that $G - H$ is positive semidefinite, hence it is clear that $G ≥ H$.
Theorem 13 Assume that $f , g : R + → R$ are standard functions and ω is a faithful normal state on a von Neumann algebra $M$. Let $A 1 , A 2 , ⋯ , A m ∈ M$ be self-adjoint operators such that $ω ( A 1 ) = ω ( A 2 ) = ⋯ = ω ( A m ) = 0$. Then the determinant inequality
holds.
Proof: Let $E ( · )$ be the spectral measure of $Δ ( ω , ω )$. Then for $m = 1$ the inequality is
where $d μ ( λ ) = d 〈 A Ω , E ( λ ) A Ω 〉$. Since the inequality
$f ( x ) g ( x ) ≥ f ( 0 ) g ( 0 ) ( x - 1 ) 2$
holds for standard functions [34], we have
and this implies the integral inequality.
Consider the finite dimensional subspace $N$ generated by the operators $A 1 , A 2 , ⋯ , A m$. On $N$ we have the inner products
$〈 〈 A , B 〉 〉 : = Cov ω g ( A , B )$
and
$〈 A , B 〉 : = 2 g ( 0 ) I ω f ( A , B ) .$
Since $〈 A , A 〉 ≤ 〈 〈 A , A 〉 〉$, the determinant inequality holds, see Lemma 2.
This theorem is interpreted as quantum uncertainty principle [32,35,36,37]. In the earlier works the function g from the left-hand-side was $( x + 1 ) / 2$ and the proofs were more complicated. The general g appeared in [34].

## Acknowledgements

This paper is dedicated to Imre Csiszár who discussed with the author in the last twenty years very friendly. The research has been partially supported by the Hungarian Research Grant OTKA TS049835.

## References

1. Kullback, S.; Leibler, R.A. On information and sufficiency. Ann. Math. Statistics 1951, 22, 79–86. [Google Scholar] [CrossRef]
2. Csiszár, I. Eine informationstheoretische Ungleichung und ihre Anwendung auf den Beweis der Ergodizität von Markoffschen Ketten. Magyar. Tud. Akad. Mat. Kutató Int. Közl. 1963, 8, 85–108. [Google Scholar]
3. Csiszár, I. A class of measures of informativity of observation channels. Per. Math. Hung. 1972, 2, 191–213. [Google Scholar] [CrossRef]
4. Csiszár, I.; Fischer, J. Informationsentfernungen im Raum der Wahrscheinlichkeitsverteilungen. Magyar Tud. Akad. Mat. Kutató Int. Közl. 1962, 7, 159–180. [Google Scholar]
5. Österreicher, F.; Vajda, I. A new class of metric divergences on probability spaces and its applicability in statistics. Ann. Inst. Statist. Math. 2003, 55, 639–653. [Google Scholar] [CrossRef]
6. Csiszár, I. Information measures: a critical survey. In Transactions of the Seventh Prague Conference on Information Theory, Statistical Decision Functions and the Eighth European Meeting of Statisticians, Prague, Czech Republic, 18 August to 23 August, 1974; Academia: Prague, Czech Republic, 1978; Volume B, pp. 73–86. [Google Scholar]
7. Petz, D. Quantum Information Theory and Quantum Statistics; Springer: Berlin, Germany, 2008. [Google Scholar]
8. Petz, D. Quasi-entropies for states of a von Neumann algebra. Publ. RIMS. Kyoto Univ. 1985, 21, 781–800. [Google Scholar] [CrossRef]
9. Petz, D. Quasi-entropies for finite quantum systems. Rep. Math. Phys. 1986, 23, 57–65. [Google Scholar] [CrossRef]
10. Ohya, M.; Petz, D. Quantum Entropy and Its Use, 2nd ed.; Springer-Verlag: Heidelberg, Germany, 1993. [Google Scholar]
11. Csiszár, I. Information type measure of difference of probability distributions and indirect observations. Studia Sci. Math. Hungar. 1967, 2, 299–318. [Google Scholar]
12. Liese, F.; Vajda, I. On divergences and informations in statistics and information theory. IEEE Trans. Inform. Theory 2006, 52, 4394–4412. [Google Scholar] [CrossRef]
13. Hansen, F.; Pedersen, G.K. Jensen’s inequality for operators and Löwner’s theorem. Math. Ann. 1982, 258, 229–241. [Google Scholar] [CrossRef]
14. Lieb, E.H. Convex trace functions and the Wigner-Yanase-Dyson conjecture. Adv. Math. 1973, 11, 267–288. [Google Scholar] [CrossRef]
15. Hiai, F.; Ohya, M.; Tsukada, M. Sufficiency, KMS condition and relative entropy in von Neumann algebras. Pacific J. Math. 1981, 96, 99–109. [Google Scholar] [CrossRef]
16. Petz, D. Covariance and Fisher information in quantum mechanics. J. Phys. A: Math. Gen. 2003, 35, 79–91. [Google Scholar] [CrossRef]
17. Hansen, F. Characterizations of symmetric monotone metrics on the the state space of quantum systems. Quantum Inf. Comput. 2006, 6, 597–605. [Google Scholar]
18. Helstrom, C.W. Quantum Detection and Estimation Theory; Academic Press: New York, NY, USA, 1976. [Google Scholar]
19. Holevo, A.S. Probabilistic and Statistical Aspects of Quantum Theory; North-Holland: Amsterdam, Holland, 1982. [Google Scholar]
20. Petz, D. Monotone metrics on matrix spaces. Linear Algebra Appl. 1996, 244, 81–96. [Google Scholar] [CrossRef]
21. Petz, D.; Sudár, Cs. Geometries of quantum states. J. Math. Phys. 1996, 37, 2662–2673. [Google Scholar] [CrossRef]
22. Hansen, F. Metric adjusted skew information. Proc. Natl. Acad. Sci. USA 2008, 105, 9909–9916. [Google Scholar] [CrossRef] [PubMed]
23. Kubo, F.; Ando, T. Means of positive linear operators. Math. Ann. 1980, 246, 205–224. [Google Scholar] [CrossRef]
24. Wigner, E.P.; Yanase, M.M. Information content of distributions. Proc. Nat. Acad. Sci. USA 1963, 49, 910–918. [Google Scholar] [CrossRef] [PubMed]
25. Fick, E.; Sauermann, G. The Quantum Statistics of Dynamic Processes; Springer: Berlin, Germany, 1990. [Google Scholar]
26. Amari, S. Differential-Geometrical Methods in Statistics, Lecture Notes Stat. 28; Springer: Berlin, Germany, 1985. [Google Scholar]
27. Amari, S.; Nagaoka, H. Methods of information geometry. Transl. Math. Monographs 2000, 191. [Google Scholar]
28. Braunstein, S.L.; Caves, C.M. Statistical distance and the geometry of quantum states. Phys. Rev. Lett. 1994, 72, 3439–3443. [Google Scholar] [CrossRef] [PubMed]
29. S. Gallot, S.; Hulin, D.; Lafontaine, J. Riemannian Geometry; Springer: Berlin, Germany, 1993. [Google Scholar]
30. Petz, D. Geometry of canonical correlation on the state space of a quantum system. J. Math. Phys. 1994, 35, 780–795. [Google Scholar] [CrossRef]
31. Andai, A. Information Geometry in Quantum Mechanics. PhD dissertation, BUTE, Budapest, Hungary, 2004. [Google Scholar]
32. Gibilisco, P.; Imparato, D.; Isola, T. Uncertainty principle and quantum Fisher information II. J. Math. Phys. 2007, 48, 072109. [Google Scholar] [CrossRef] [Green Version]
33. Petz, D.; Szabó, V E.S. From quasi-entropy to skew information. Int. J. Math. 2009, 20, 1421–1430. [Google Scholar] [CrossRef]
34. Gibilisco, P.; Hiai, F.; Petz, D. Quantum covariance, quantum Fisher information and the uncertainty principle. IEEE Trans. Inform. Theory 2009, 55, 439–443. [Google Scholar] [CrossRef]
35. Andai, A. Uncertainty principle with quantum Fisher information. J. Math. Phys. 2008, 49, 012106. [Google Scholar] [CrossRef]
36. Gibilisco, P.; Imparato, D.; Isola, T. A volume inequality for quantum Fisher information and the uncertainty principle. J. Statist. 2007, 130, 545–559. [Google Scholar] [CrossRef] [Green Version]
37. Kosaki, H. Matrix trace inequality related to uncertainty principle. Internat. J. Math. 2005, 16, 629–645. [Google Scholar] [CrossRef]
38. Csiszár, I.; Körner, J. Information Theory. Coding Theorems for Discrete Memoryless Systems; Akadémiai Kiadó: Budapest, Hungary, 1981. [Google Scholar]
39. Feller, W. An introduction to Probability Theory and Its Applications, vol. II.; John Wiley & Sons: New York, NY, USA, 1966. [Google Scholar]
40. Hiai, F.; Petz, D. Riemannian geometry on positive definite matrices related to means. Lin. Alg. Appl. 2009, 430, 3105–3130. [Google Scholar] [CrossRef]
41. Kullback, S. Information Theory and Statistics; John Wiley and Sons: New York, NY, USA, 1959. [Google Scholar]

## Share and Cite

MDPI and ACS Style

Petz, D. From ƒ-Divergence to Quantum Quasi-Entropies and Their Use. Entropy 2010, 12, 304-325. https://doi.org/10.3390/e12030304

AMA Style

Petz D. From ƒ-Divergence to Quantum Quasi-Entropies and Their Use. Entropy. 2010; 12(3):304-325. https://doi.org/10.3390/e12030304

Chicago/Turabian Style

Petz, Dénes. 2010. "From ƒ-Divergence to Quantum Quasi-Entropies and Their Use" Entropy 12, no. 3: 304-325. https://doi.org/10.3390/e12030304

Back to TopTop