Gentlemen: there’s lots of room left in spaces.
1. Introduction
Chentsov theorem is the fundamental theorem in Information Geometry. After Rao’s remark on the geometric nature of the Fisher Information (in what follows shortly FI), it is Chentsov who showed that on the simplex of the probability vectors, up to scalars, FI is the unique Riemannian geometry, which “contract under noise” (to have an idea of recent developments about this see [
1]). So FI appears as the “natural” Riemannian geometry over the manifolds of density vectors, namely over
Since FI is the pull-back of the map
it is natural to study the geometries induced on the simplex of probability vectors by the embeddings
Setting
we call the corresponding geometries on the simplex of probability vectors
α-geometries (first studied by Chentsov himself).
When building similar objects in infinite dimension or in the noncommutative case several interesting questions arise, mostly involving spaces.
The purpose of the present paper is to highlight some of the open problems in this area. The epigraph before the Introduction is a half quote of a sentence by Saunders Mac Lane, which is at the beginning of Chapter II in [
2]. It is somewhat surprising that Information Geometry suggests some intriguing questions about the geometry of
spaces.
2. The α-Geometries in Finite Dimensions
First of all we may look differently at
-geometries using divergencies. A divergence on an
n-dimensional manifold
M is a smooth function
which separates points (it is zero iff
) such that the matrix
defined in a local chart, is strictly positive definite for all
. So any divergence, by the above formula, has an associated Riemannian geometry. Let our manifold
M be
the simplex of strictly positive probability vectors in
defined in
Section 1.
An example of divergence on
is the Kullback–Leibler relative entropy, defined as
The
-divergencies are defined as:
The following result is well known.
Theorem 1. The geometries generated by the pull-back of the α-embeddings and the geometries generated by the α-divergencies coincide.
Two complete references for the classical contents of
Section 1 and
Section 2 can be found in [
3,
4] while in [
5] it is possible to find an overview of the new developments in Information Geometry.
3. The Unit Sphere of a (Doubly) Uniformly Convex Banach Space
How to transfer this to infinite dimensions? Let us restrict and let be any measure space. Let M be a set of strictly positive probability densities on , which is endowed with a (possibly infinite dimensional) manifold structure (I remain purposely vague on this point because in moving from finite to infinite dimension a number of delicate analytical questions arise about regularity of the maps involved in these constructions and certainly a comprehensive approach is very much needed).
The function
can be seen as a (smooth) function from
M to a sphere in the
space associated to the above-mentioned measure space. So, what we could pull-back on
M, say the
-geometry, would be exactly the geometry of the
sphere.
Following
Section 2 in [
6] let us show that the sphere of
space, which is not a Riemann–Hilbert manifold, has some “almost Riemannian” features.
In what follows X is a Banach space and is its dual. We denote by the unit sphere of X and if and we write If we write and say that x is orthogonal to y.
The
duality mapping is defined by
The space X has the duality map property if J is single valued; in such a case we set (by the Hahn–Banach theorem always).
We also say that
X has the
projection property if for any closed convex
and any
there is a unique
such that
In such a case we set
.
Definition 1. A Banach space X is doubly uniformly convex (DUC) if X and its dual are uniformly convex.
Typical examples of DUC are the spaces. In general we have the following properties.
Proposition 1. Let X be a DUC Banach space.
- (i)
X has the projection property.
- (ii)
X has the duality map property.
- (iii)
.
- (iv)
If then
Proposition 2. Let X be a DUC Banach space.
- (i)
is a Banach submanifold.
- (ii)
, the tangent space to at x, can be identified with ker.
- (iii)
The projection operator is given by
Using this projection, the trivial connection on X induces a connection on that we call the natural connection on .
When the Banach space X is a Hilbert space then the above construction gives nothing else that the Levi–Civita connection on the unit sphere of X considered as a Riemann–Hilbert submanifold of X. From this it follows that the unit sphere of a DUC Banach space inherits a kind of manageable “Levi-Civita” connection from the trivial geometry of the ambient space.
The above results were proved in [
6,
7] where they were used to give the first rigorous treatment of
-geometries in infinite dimension. In particular the classical basic formula relating the
-connections to the exponential and mixture connections has been proved:
4. Embedding Densities in the Unit Sphere of an Orlicz Space
Beyond the results of
Section 3 there is a very simple idea: if we consider a density
on an arbitrary measure space
and
is a function on the positive axis, which admits an inverse function
then
so that the function
should embed
into the unit sphere of “something”. This very simple (and vague) idea can be made precise by the notion of Orlicz space, which we briefly recall (this was done in [
7]).
A
Young function is a symmetric convex function
such that
and
. Let
be a measure space and
a measurable function. The
Luxemburg norm is given by
The
Orlicz space generated by the Young function
is
If
,
we get the
space endowed with the equivalent norm
.
Let us consider now the cases where the Young function
is invertible when restricted to the positive axis. If
is a density we call
the Φ-
embedding. Trivially we have:
which implies that
. Indeed one can prove (see [
7]) that
so we may embed any density into the unit sphere of any Orlicz space associated to invertible Young functions.
5. Curvature and Scalar Curvature
Let
be an interval and
a sufficiently regular curve. The
curvature in the point
is defined as
Curvature coincides with
where
R is the radius of the osculating circle, namely the circle that gives the best approximation of the curve in a given point.
For a general Riemannian manifold one can introduce the notion of
scalar curvature according to the following lines. In general if ∇ is an affine linear connection on a manifold
M the curvature is defined as (see p. 133 in [
8])
where
are vector fields. Now consider the case where
is a Riemannian manifold and ∇ the associated Levi–Civita connection. The Riemannian curvature tensor is defined as (p. 201 in [
8])
Fix now a point
and let
be a 2-dimensional subspace. Using the exponential map we may associate to
a 2-dimensional embedded surface
formed by the geodesic segment of length less than
r, which start tangentially to
. Let
denote the Gaussian curvature of
N. At pages 99–100 in [
9] we have the following result.
Proposition 3. If is a basis for the plane σ then In particular if
is an orthonormal basis of
we have, for
The
scalar curvature in
is defined as
From the very definition it is straightforward to deduce that the scalar curvature of an
n-dimensional sphere of radius
R is constantly equal to
6. Problem 1. Does the Scalar Curvature Behave Like Entropy?
We are ready to discuss the first problem of our list. Let us recall that the
-geometry on
is the pull-back geometry induced by the
-embeddings
Let
be the curvature of the
-geometry (where
) at the density vector
. One immediately realizes that:
if then ;
if then .
A straightforward calculation in [
10] proves the following result.
Theorem 2. If then the curvature is a strictly Schur-convex function;
If then the curvature is a strictly Schur-concave function.
This theorem is “visually” trivial: if you make a picture of the unit spheres of endowed with the norms you will be convinced of the truth of the statement without any calculation.
It is natural to try to understand what happens in dimension n, namely, let us consider the -embedding on and let Scal be the associated scalar curvature. Also in this case one has some trivial cases:
if then ;
if then .
Indeed for we have hyperplane and for we have the geometry of an -dimensional sphere whose radius is 2. The following natural conjecture remains open.
Conjecture 1. If then the scalar curvature Scal is a strictly Schur-convex function;
If then the scalar Scal is a strictly Schur-concave function.
Some steps toward a proof can be found in [
11].
7. Petz Theorem
The Chentsov theorem has a noncommutative, “quantum” counterpart, the Petz classification theorem [
12,
13]. In the quantum case the “noise” is represented by completely positive, trace preserving maps. Let
be the space of complex
matrices,
the real subspace of Hermitian matrices and
the submanifold of (faithful) density matrices, namely
On the (real) manifold we lose the unicity of the Chentsov theorem: indeed on there are many Riemannian metrics “contracting under noise”. However, Petz was able to characterize all the metrics with this property; these metrics deserve to be called Quantum Fisher Information(s).
Theorem 3. There exists a bijective correspondence between Quantum Fisher Information(s) and Kubo–Ando noncommutative means given by the formulawhere f is the operator monotone function associated to the corresponding mean. Obviously is a kind of generalized “division by ”.
So we have a big family of Riemannian metrics on
, which play the role of Fisher information in the quantum setting. Among them we are interested in those associated to the following operator monotone functions:
We have that
and
The quantum Fisher information associated to is the -metric while the one associated to is the -metric.
8. Problem 2. Geometrization of WYD-Information in Infinite Dimensions?
The WYD(p) metrics are rather special among the quantum Fisher information(s): they are the only one that comes from the pull-back of a dualized pairing, which was proved in [
14]. As specified in [
15] one can look at this procedure as if we have quantum dynamics associated to a Schrödinger equation, which is embedded using two conjugated
-embeddings. The final result of this procedure is exactly the WYD(p) metric. In particular, for
one sees that the Wigner–Yanase information has a geometric origin, it arises from the pull-back of the map
as the classical Fisher information [
16].
Since WYD information appears in infinite dimensions [
17] (Von Neumann algebra setting), it is natural to ask if also in that case one can trace a geometric origin for that object. The ingredients of the previous approach are quantum dynamics,
spaces and
-embedding: all these objects make sense also in the von Neuman algebra setting; therefore, there is no clear obstacle in this direction.
9. Problem 3. Petz Conjecture for the BKM Scalar Curvature: A Solution by Geometry?
It has been suggested that the scalar curvature of Fisher Information could have a relevant physical meaning in statistical mechanics being linked to the free energy. Maybe stimulated by Petz began the study of the scalar curvature in quantum setting with special emphasis on the BKM metric. He formulated the following conjecture in [
18].
Conjecture 2. The scalar curvature of the BKM metric is a Schur concave function.
The truth of the Petz conjecture would be a consequence of the following conjecture.
Conjecture 3. There exists such that for the scalar curvature of the WYD(p) metric is a Schur-concave function.
Indeed this second conjecture looks much easier to understand than the Petz conjecture. Consider the noncommutative
-geometry on
namely the pull-back geometry induced by the
-embeddings
exactly as in the commutative case.
Let be the scalar curvature of the -geometry (where ) at the density matrix . One immediately realizes that:
if then ;
if then .
Indeed for we have a hyperplane and for we have the geometry of a real -dimensional sphere whose radius is 2. Imitating the commutative case we formulate another conjecture.
Conjecture 4. If then the scalar curvature Scal is a strictly Schur-convex function;
If then the scalar curvature Scal is a strictly Schur-concave function.
Therefore Conjecture 3 appears rather reasonable: the metric comes from a pair of -embeddings in duality, from a pair where . On the other hand, the BKM metric appears in the limit . For we have a flat geometry, scalar curvature is zero, and for we see a Schur-concave scalar curvature whose contribution could imply that the BKM scalar curvature has a similar behavior, therefore proving Petz conjecture.
10. Problem 4. The Exponential Manifold by Orlicz Embedding?
Using the Orlicz spaces (in particular the Zygmund ones) in [
19] a Banach manifold structure, called the exponential statistical manifold, has been defined for the space of the strictly positive density functions on an arbitrary measure space.
Because of the existence of the
-embeddings of
Section 4 it is possible to ask: can the exponential statistical manifold structure be derived (like the
-geometries) from the pull-back of an Orlicz embedding?
11. The α-Proudman–Johnson Equations and the α-Connections: The Lenells–Misiolek Result. Problems 5, 6, 7
In Problem 1981–29 in the Arnold’s Problems the author asks to find equations of mathematical physics that can be realized as geodesic flows on infinite-dimensional ellipsoids (see page 354 in [
20]). This question is natural in the light of the geometric approach to hydrodynamics due to Arnold himself in [
21]. In recent years this point of view has led to many similar results, a good reference for this is the Introduction of [
20]. Still in recent years there has been a lot of interest in the study of the
-Proudman–Johnson equations, see [
22,
23,
24] for more details. A surprising link between
-geometries and the
-Proudman–Johnson equations has been found by Lenells and Misiolek in [
25]. A very rough description is the following.
Let be the circle, the group of smooth diffeomorphisms and Rot (isomorphic to ) the space of rigid rotations. Using the proper analog of the -divergences the authors build the -geometries, and the associated -connections on /Rot.
Lenells and Misiolek prove in [
25] the following result.
Theorem 4. The geodesic equation of on /Rot is the α-Proudman–Johnson equationIn particular, yields the completely integrable Hunter–Saxton equationand yields the completely integrable μ-Burgers equation Problem 5. Can the Arnold problem be solved for the -Proudman–Johnson equations?
Lenells and Misiolek look at the -connections through -divergence. Imagine that also in the diffeomorphism group context the -embeddings produce the same result of the -divergences, similarly to the finite dimensional case. In such a case the geodesic equation could be the one describing maximum circles on spheres of the space thereby solving the Arnold problem for the -Proudman–Johnson equations.
Problem 6. Complete integrability for the -Proudman–Johnson equations?
If the answer to the previous question is positive, does this help in understanding when one has complete integrability for the -Proudman–Johnson equations?
Problem 7. An Orlicz generalization of -Proudman–Johnson equations?
If the answer to Problem 5 is positive and we can look at the -Proudman–Johnson equations as a by product of embedding of densities in the spheres, it is natural to ask if using the Orlicz embedding we can get a family of differential equations for which the -Proudman–Johnson equations is just the particular example associated to spaces.