Next Article in Journal
Using Matrix-Product States for Open Quantum Many-Body Systems: Efficient Algorithms for Markovian and Non-Markovian Time-Evolution
Next Article in Special Issue
The Siegel–Klein Disk: Hilbert Geometry of the Siegel Disk Domain
Previous Article in Journal
Contextuality Analysis of Impossible Figures
Previous Article in Special Issue
On Voronoi Diagrams on the Information-Geometric Cauchy Manifolds
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Lagrangian Submanifolds of Symplectic Structures Induced by Divergence Functions

Dipartimento di Matematica Tullio Levi-Civita, Università degli Studi di Padova, 35121 Padova, Italy
Entropy 2020, 22(9), 983; https://doi.org/10.3390/e22090983
Submission received: 29 July 2020 / Revised: 26 August 2020 / Accepted: 2 September 2020 / Published: 3 September 2020
(This article belongs to the Special Issue Information Geometry III)

Abstract

:
Divergence functions play a relevant role in Information Geometry as they allow for the introduction of a Riemannian metric and a dual connection structure on a finite dimensional manifold of probability distributions. They also allow to define, in a canonical way, a symplectic structure on the square of the above manifold of probability distributions, a property that has received less attention in the literature until recent contributions. In this paper, we hint at a possible application: we study Lagrangian submanifolds of this symplectic structure and show that they are useful for describing the manifold of solutions of the Maximum Entropy principle.

1. Introduction

Information Geometry [1,2] provides a sound and fruitful framework for interpreting statistics using classical differential geometry notions [3]. A principal object in Information Geometry is the notion of contrast or divergence function, which (informally speaking) measures the degree of separation between two probability distributions [4,5,6]. The main thrust of divergence functions is that they allow to define a Riemannian structure on a finite dimensional submanifold M of probability distributions endowed with a dual coordinate system, with far reaching implications. A less-studied spin off of contrast function is the possibility of introducing a symplectic structure on the square of M by the pull-back of the canonical symplectic structure defined on the cotangent bundle T M . This procedure was introduced in 1995 in the pioneering paper [7], suggesting that symplectic geometry may have a natural role to play in statistics. In recent times there has been a renewed interest in possible applications of the symplectic structures introduced, as in [7] for example, to studying the analogies with the discrete Lagrangian mechanics (see in [8]) or the relations with completely integrable systems of Hamiltonian mechanics (see in [9,10]).
In this paper, we try to look at a possible role for Lagrangian submanifolds of the above-discussed symplectic structure on M 2 in the case that M is an exponential family M ( h , k ) . Exponential families are prototypical examples of finite dimensional manifolds admitting a dually flat canonical structure defined by the canonical divergence, and they play a relevant role in information geometry and statistics [1,2]. For our argument, their importance is due to the fact that they represent the manifold of solutions of the variational problem associated to the Maximum Entropy Principle (MEP) with linear constraints ([11,12]). In some applications to statistical mechanics, e.g., in the descriptions of phase transitions in Ising spin systems, MEP with nonlinear constraints is considered, see, e.g., in [13,14,15]. In this case, the set of possible solutions has a richer structure, which is well captured by a Lagrangian submanifold of T M ( h , k ) . In this work, we are concerned with the Lagrangian submanifolds defined in the square of M ( h , k ) via the canonical pull-back hinted at above.
The structure of the paper is as follows. In Section 2, we recall the needed tools of Symplectic Geometry, and in Section 2.1 we review the canonical pull-back construction via divergence function construction exposed in [7]. In Section 3, we consider the special case of exponential families associated with MEP with nonlinear constraints.

2. Synopsis of Symplectic Geometry

We briefly recall the basic facts of symplectic geometry that are necessary for introducing our argument referring to classical textbooks for the proof of the results. A symplectic manifold ( M , ω ) is a smooth even-dimensional manifold M equipped with a non-degenerate, closed two-form ω ( d ω = 0 , where d is the external derivation operator). A submanifold L of M is a Lagrangian submanifold if 2 dim L = dim M and the two-form restricted to L is vanishing, ω | L = 0 . A prototypical example of symplectic manifold is the cotangent bundle T S of a manifold S. If x = ( x 1 , , x n ) are local coordinates on S, and ( x , λ ) are local coordinates on T S , then the Liouville one-form θ c on T S has the local expression θ c = λ i d x i (summation over repeated indices is understood) and the symplectic two form is
ω = d θ c = d λ i d x i .
A classical theorem of Darboux says that every symplectic manifold ( M , ω ) admits an atlas of local coordinates ( x , λ ) such that locally ω has the representation (1). A relevant example of Lagrangian submanifold of T S is the graph of the differential of a function g : S R , that is,
L g = { ( x , λ ( x ) ) T S : λ ( x ) = d g ( x ) , x S } .
Note that L g is a n-dimensional submanifold which is transversal to the fibers of the fibration π : T S S , that is, its tangent bundle T L g is transversal to the vertical bundle ker T π .
According to a theorem of Maslov–Hormander ([16,17]), a general (i.e. not necessarily trasversal) Lagrangian submanifold of T S can be locally described as the graph of a smooth function G depending on extra parameters. Let us sketch briefly this construction along the lines of the works in [18,19].
Let U be a k-dimensional manifold called supplementary manifold, and let G : S × U R be a smooth function whose representation in a local chart is G ( x , u ) . We define the critical set of G as (we use the notation ( G x ) i = G / x i and ( G x y ) i j = 2 G / x i y j ) for partial derivatives)
E = { ( x , u ) : G u ( x , u ) = 0 } .
If d G u has maximal rank over E , that is,
rk d G u = rk ( G x u G u u ) = k for   all ( x , u ) E
then G is called Morse family and the following Λ G is a Lagrangian submanifold of T S ,
Λ G = { ( x , G x ( x , u ) ) T S where ( x , u ) E } .
If there are no extra parameters k = 0 , then Λ G is the graph of a differential and thus Λ G is a transversal submanifold. Note that the above rank condition (3) can be satisfied if the square submatrix G u u has maximal rank, i.e., det G u u 0 on E . In this case, by the implicit function theorem there exist a locally defined function u = u ( x ) such that E is the graph of u and setting G ^ ( x ) = G ( x , u ( x ) ) we have that
G ^ x ( x ) = G x ( x , u ( x ) ) + G u ( x , u ( x ) ) u x ( x ) = G x ( x , u ( x ) ) for   all ( x , u ) E .
Therefore, where det G u u 0 on E , all the parameters u can be eliminated and Λ G ^ is locally transversal to the fibers. The set of points of S where det G u u ( x , u ) = 0 for ( x , u ) E is called the caustic of Λ G . These are the points where the Lagrangian submanifold is tangent to the fibers of π : T S S and trasversality is lost.

2.1. Symplectic Structures Defined by Divergence Functions

Given a smooth n-dimensional manifold, M, let us denote with M 2 = M × M the square of M and with Δ M M 2 the diagonal of M 2 . We will use local coordinates x = ( x 1 , , x n ) on M and ( x , y ) = ( x 1 , , x n , y 1 , , y n ) on M 2 .
Let D : M 2 [ 0 , + ) be a smooth non-negative function whose representation in a local chart is D ( x , y ) 0 . We use the notations
( D x ) i = D x i , ( D y ) j = D y j , ( D y x ) j i = y j ( D x i ) = ϕ j i
for first and second order derivatives of D. The function D is a yoke (see [7] ) if the following conditions hold and D is a divergence (see [8]) if ( iii ) below holds on the whole M 2 .
(i)
D = 0 only on Δ M
(ii)
D x = 0 and D y = 0 on Δ M
(iii)
ϕ = D x y is positive definite on Δ M
thus points of Δ M are minima of D. A divergence function act as a pseudo-distance but it does not satisfy the symmetry nor the triangle inequality conditions. In [7], the following fibered map F D : M 2 T M over M is considered, whose representation in a local chart is
F D ( x , y ) = ( x , D x ( x , y ) ) .
By condition (iii) above there exist a neighborhood W of Δ M , where F D has a smooth inverse
F D 1 ( x , λ ) = ( x , y ( x , λ ) ) .
Using the local diffeomorphism F D a symplectic structure ( W , ω D ) is defined in [7] via the pull-back ω D = F D * ω of the canonical two form (1) on T M . The local form of ω D can be computed as follows,
ω D = F D * ω = F D * ( d θ c ) = d ( F D * θ c ) = d ( D x ) i d x i )
thus (see Section 3.2 in [7])
ω D = 2 D x j x i d x j d x i + 2 D y j x i d y j d x i = ϕ j i d y j d x i
because the first term 2 D / x j x i is symmetric in the i , j indices. For the applications that we have in mind of the above theory, we will assume in (iii) above that D y x is positive definite on the whole M 2 so that F D is a global diffeomorphism.
Simple examples of Lagrangian submanifolds of M 2 with respect to ω D are (with a little abuse of notation) the n-dimensional submanifolds M x = M × { y } M , which are also transversal to the fibers of π 1 : M 2 M , π 1 ( x , y ) = x . Moreover, as ω D ( u , u ) = 0 , Δ M is also a Lagrangian submanifold.
Note also that (6) implies that F D is a symplectomorphism, thus L = F D 1 ( Λ ) is a Lagrangian submanifold of M 2 whenever Λ T M is a Lagrangian submanifold. In this paper, we will be mainly concerned with the study of Lagrangian submanifolds of M 2 defined in this way.
In the following Section 2.2, we will compute the above introduced objects for the relevant case of exponential families of probability distributions and canonical divergence.
In [7], the Hamiltonian H : T M [ 0 , + ) associated to a divergence function is defined as H = D F D 1 and locally it has the form
H ( x , λ ) = D ( x , y ( x , λ ) ) .

2.2. Canonical Divergence and Exponential Families

In this section, we recall the basic definitions of exponential family and canonical divergence, as described, e.g., in [1,2]. Let ( X , B , d x ) be a probability space, where X may be a discrete set or X = R k . We stipulate that in case of a discrete set the integrals over X with respect to the measure d x are substituted by summations. Let
P ( X ) = { p : X [ 0 , + ) , p ( x ) 0 , X p d x = 1 }
and suppose that q P ( X ) for suitable k, where q ( x ) = e k ( x ) > 0 . Consider n independent observables
h : X R n , rk d h ( x ) = n x X
and define the related free energy ψ : Θ R n R as (here θ · h = θ i h i )
e ψ ( θ ) = X e θ · h ( x ) + k ( x ) d x .
The n real numbers θ i are called canonical parameters. They define uniquely a probability distribution p ( · ; θ ) which belongs to the exponential family defined by h , k ,
M ( h , k ) = { p ( x ; θ ) = e θ · h ( x ) + k ( x ) ψ ( θ ) , θ Θ } P ( X ) .
The relevant fact is that M ( h , k ) is a n-dimensional submanifold of the infinite dimensional set P ( X ) and that the canonical parameters θ are local coordinates. Note that q M ( h , k ) as ψ ( 0 ) = 0 and q ( x ) = p ( x ; 0 ) . Another system of local coordinates is provided by the so-called expectation parameters defined by
η = ψ θ ( θ ) = E p θ [ h ] = X h ( x ) p ( x ; θ ) d x .
As ψ is a convex function, the gradient map ψ θ ( θ ) = η is globally invertible with inverse θ = θ ^ ( η ) , which is also a gradient map θ ^ ( η ) = φ η ( η ) , where
φ ( η ) = θ ^ ( η ) · η ψ ( θ ^ ( η ) )
is the Legendre transform of ψ (see, e.g., in [1]). We will denote with p ( x ; η ) the point in M ( h , k ) associated to η . The Kullback–Leibler divergence is defined for general ( p , p ˜ ) in P ( X ) 2 as
D K L ( p , p ˜ ) = X p ( x ) log p ( x ) p ˜ ( x ) d x .
The restriction of D K L to M ( h , k ) 2 , the square of M ( h , k ) , D K L : M ( h , k ) 2 [ 0 , + ) is called canonical divergence. It can be shown (see in [1]) that when M ( h , k ) is referred to the coordinates ( η , θ ) , D K L has the local representation
D ( η , θ ) = φ ( η ) + ψ ( θ ) η · θ .
Note that as p ( · ; θ ) = q for θ = 0
D K L ( p , q ) = D K L ( p ( · ; η ) , p ( · ; 0 ) ) = φ ( η ) + ψ ( 0 ) η · 0 = φ ( η ) .
A key object is the map F D introduced in (5) associated to M ( h , k ) and the canonical divergence (11). It has the local form in coordinates ( η , θ ) , see (5) and (11),
F D ( η , θ ) = ( η , D η ) = ( η , φ η ( η ) θ ) ,
with the explicit inverse, using local coordinates ( η , λ ) in T M ( h , k ) ,
F D 1 ( η , λ ) = ( η , θ ( η , λ ) ) = ( η , φ η ( η ) λ ) = ( η , θ ^ ( η ) λ ) .
A simple but elegant result of the above-introduced framework is the following.
Proposition 1.
Let Λ G be a Lagrangian submanifold of T M ( h , k ) described by the Morse family G ( η , u ) as in (4). Then, L S = F D 1 ( Λ G ) is a Lagrangian submanifold of M ( h , k ) 2 described by the Morse family S ( η , u ) = φ ( η ) G ( η , u ) .
Proof. 
From (4) we have that λ = G η ( η , u ) on Λ G and from (14)
F D 1 ( Λ G ) = { ( η , θ ) = ( η , φ η ( η ) G η ( η , u ) ) = ( η , S η ( η , u ) ) , ( η , u ) E }
where S ( η , u ) = φ ( η ) G ( η , u ) . Moreover, as S u ( η , u ) = G u ( η , u ) the critical set E in (2) is the same. □
As a consequence of the above proposition, if Λ G is transversal to the fibers of T M ( h , k ) (no extra parameters u), then its image in M ( h , k ) 2 is transversal to the fibers of π 1 .
Another interesting consequence is that the zero section of the cotangent bundle T M ( h , k ) , locally represented as Z = { ( η , 0 ) : η E } , is mapped by F D 1 into
Z 0 = F D 1 ( Z ) = { ( η , θ ^ ( η ) ) : η E }
which is contained into D 1 ( 0 ) , the zero-level set of the canonical divergence. Indeed, from (10) and (11) we have that
D ( η , θ ^ ( η ) ) = φ ( η ) + ψ ( θ ^ ( η ) ) η · θ ^ ( η ) = φ ( η ) φ ( η ) 0
thus Z 0 D 1 ( 0 ) in the general case and Z 0 = D 1 ( 0 ) if n = 1 . For later use, we compute from (7) the Hamiltonian associated to the canonical divergence
H ( η , λ ) = D F D 1 ( η , λ ) = φ ( η ) + ψ ( θ ^ ( η ) λ ) η · ( θ ^ ( η ) λ ) .
We set for the sake of simplicity θ ^ ( η ) = θ ^ and we compute from (8) the free energy ψ ( θ ^ ( η ) λ )
e ψ ( θ ^ λ ) = X e ( θ ^ λ ) · h + k d x = X e ( θ ^ λ ) · h + k + ψ ( θ ^ ) ψ ( θ ^ ) d x = e ψ ( θ ^ ) X e λ · h e θ ^ · h + k ψ ( θ ^ ) d x = e ψ ( θ ^ ) E p θ ^ [ e λ · h ] .
Using (15) and (16), the Hamiltonian can be written using relation (10) as
H ( η , λ ) = φ ( η ) + ψ ( θ ^ ) + ln E p θ ^ [ e λ · h ] η · θ ^ + η · λ = ln E p θ ^ [ e λ · h ] + η · λ .
It is interesting to investigate more in detail the structure of the Lagrangian submanifold L S = F D 1 ( Λ G ) M ( h , k ) 2 by studying the form of the two probability distributions F D 1 ( η , λ ) = ( η , θ ^ λ ) in L S associated to the coordinates respectively η and θ ^ λ . We compute from (9)
p ( x ; η ) = e θ ^ · h ( x ) + k ( x ) ψ ( θ ^ )
and using (17)
p ( x ; θ ^ λ ) = e ( θ ^ λ ) · h + k ψ ( θ ^ λ ) = e θ ^ · h λ · h + k ψ ( θ ^ ) ln E p θ ^ [ e λ · h ] = p ( x ; η ) e λ · h ( x ) E p θ ^ [ e λ · h ] .
Note that setting
p ( x ; λ ) = e λ · h ( x ) Z ( λ ) = e λ · h ( x ) X e λ · h ( x ) d x
relation (18) can be given the form
p ( x ; θ ^ λ ) = p ( x ; η ) e λ · h ( x ) X p ( x ; η ) e λ · h ( x ) d x = p ( x ; η ) p ( x ; λ ) X p ( x ; η ) p ( x ; λ ) d x .
We will give an interpretation of this relation in the case of discrete probability distributions in Section 3.2 below.

3. Application to Maximum Entropy Principle with Nonlinear Constraints and Phase Transitions

A relevant application of the above-introduced framework concerns the use of the Maximum Entropy Principle with nonlinear constraints. Let us consider a physical system X whose description is given in terms of a probability distribution q P ( X ) . The Maximum Entropy Principle (E.T. Jaynes, see in [11,12]) is a general inference procedure that allows to update an initial probability distribution q on the basis of subsequent information on the system represented by the average values E p [ h ] of some observables h of interest for the system. The sought distribution p is the one that minimizes the relative entropy D K L ( p , q ) on the set of the distributions which satisfy the constraints on E p [ h ] . From a mathematical point of view, we are faced with a constrained extremization problem to be solved below using the Lagrange multipliers method.
We will see that the set of solutions for different values of the constraints defines a Lagrangian submanifold of a cotangent space of a manifold M ( h , k ) . We are interested in describing the corresponding Lagrangian submanifold in M ( h , k ) 2 .
This section has a pedagogical character, so for the sake of simplicity we will avoid technicalities and assume that X = { 1 , , n } is a discrete space and that there is only one observable of interest defined by assigning h = ( h 1 , , h n ) . The case of k observables can be dealt with along the same lines with no extra effort. The case of a continuous space X R n presents more technical difficulties and it is considered in [20].
Let q i = e k i P ( X ) be the a priori distribution describing X. The Kullback–Leibler divergence is called relative entropy in this setting and has the form
D ( p , q ) = i p i ln p i q i .
Let f : R R be a smooth globally non-invertible function (think for example of a cubic f ( x ) = x ( x 2 a 2 ) for a R , see Figure 1 below). We look for the minima of D on the set of p P ( X ) that satisfy the nonlinear constraint on p in the form g : R + n R , g ( p ) = y that is
g ( p ) = f ( E p [ h ] ) = f ( i = 1 n h i p i ) = y .
The choice of this type of constraints is motivated by classical applications in statistical physics. For example in the Ising model in the Curie–Weiss (mean field) approximation the average energy of the spin lattice is a quadratic function of the average magnetization E p [ s ] , see [14,15]. We have that
d g ( p ) = f ( E p [ h ] ) h = ( f ( E p [ h ] ) h 1 , , f ( E p [ h ] ) h n ) .
Note that we do not take into account at this stage of the procedure the normalization constraint stipulating that we will enforce it by dividing any candidate extremum point p ^ by i p ^ i . After introducing the Lagrange function where λ is the Lagrange multiplier associated to the constraint (20)
G ( y , p , λ ) = D ( p , q ) λ ( f ( E p [ h ] ) y )
we see that the candidate extrema are the solutions ( p , λ ) for given y of (here i = 1 , , n )
( G p ) i = ln p i q i + 1 λ f ( E p [ h ] ) h i = 0 , G λ = f ( E p [ h ] ) y = 0
that is, setting q i = e k i , we have to face a trascendental equation for the unnormalized probability
p i = c e λ f ( E p [ h ] ) h i + k i , f ( E p [ h ] ) = y .
After normalization, (24)1 becomes
p i = e λ f ( E p [ h ] ) h i + k i ψ ( λ , p ) , e ψ ( λ , p ) = e λ f ( E p [ h ] ) h i + k i .
Let us denote with f ( y ) R the set of pre-images of y along f (see, e.g., Figure 1 below)
f ( y ) = { η R : f ( η ) = y } = { η 1 , , η α , , η A } , η α = η α ( y )
where we have supposed that, for every y, f ( y ) is a finite set of cardinality A ( y ) < + . The crux is that we can substitute the constraint f ( E p [ h ] ) = y in (24)2 with the following equivalent one
f ( E p [ h ] ) ) = y E p [ h ] f ( y )
therefore we can describe the—possibly non-unique—solution (25) of the extremum problem (23) as
p i α = e λ f ( η α ) h i + k i ψ ( λ , α ) , e ψ ( λ , α ) = e λ f ( η α ) h i + k i
where α = 1 , , A ( y ) , showing that the candidate solution belongs to an exponential family M ( h , k ) . Note that in Information Geometry, the critical points of the MEP extremum problem are computed as geodesic projections over a submanifold which is an exponential family and multiplicity of solutions are related to the non-uniqueness of the geodesic projection, see in [1,15].
Note that where f ( η α ) 0 setting λ f ( η α ) = θ α the solution (27) can be given the standard form (see in [1,14]) of MEP solution
p ^ i = e θ ^ h i + k i ψ ( θ ^ ) , θ ^ ( η ) = φ η ( η )
with linear constraint E [ h ] = η α , hence (25) becomes
p i α = e θ α h i + k i ψ ( θ α ) , e ψ ( θ α ) = e θ α h i + k i .
The multipliers θ α = θ ^ ( η α ( y ) ) , α = 1 , , A ( y ) are uniquely determined (see (10)) by the equation
ψ θ ( θ ) = η i . e . θ ^ ( η ) = φ η ( η )
for η = η α ( y ) and accordingly we can compute the multipliers λ as
λ α ( y ) = θ ^ ( η α ( y ) ) f ( η α ( y ) ) .
Note that the solution to our constrained extremization problem (28) has the form of a curved exponential family (see [1]) with respect to the discrete parameter α . We will see in the next Section 3.1 that the framework of Lagrangian submanifold is useful to describe the global picture of the solutions in case of multiple solutions.

3.1. The Global Picture via Lagrange Submanifold

If we set in the Lagrange function (22) ( p , λ ) = u , we see that for G ( y , u ) the set of points ( y , u ) satisfying the first order necessary condition for unconstrained extremum (23) is the critical set
E = { ( y , u ) : G u ( y , u ) = 0 } .
We can check if the Lagrange function G ( y , u ) defines a Morse family using the rank condition (3)
rk G y u G u u = n + 1 for   all ( y , u ) E
where in this case
G y u G u u = 0 G p p d g T 1 d g 0
and G p p is the n-dimensional Hessian matrix (here δ i j is Kronecker symbol)
( G p p ) i j = ( D p p ) i j λ f ( E p [ h ] ) h i h j = δ i j p i λ f ( E p [ h ] ) h i h j .
If G ( y , u ) is a Morse family, then by Maslov–Hormander theorem
Λ G = { ( y , G y ) where ( y , u ) E }
is a Lagrangian submanifold of T R . We claim that (33) provides a global description of the set of solutions (28). We have seen in Section 1 that a sufficient condition for the elimination of all extra parameters u is that G u u has maximal rank for all ( y , u ) E . A criterion for this is given by the following classical result in constrained optimization theory, here adapted to our notations, which express the second order sufficient condition for maxima or minima (see in [14,21] for the proof).
Proposition 2.
If the symmetric matrix G p p in (32) is (positive or negative) definite on ker d g for ( y , u ) E , then the square matrix G u u in (31) has maximal rank.
From (21), we have that for ( y , u ) E
ker d g ( p ) = { u R n : f ( η α ) h · u = 0 }
and from (32), that
G p p u · u = i u i 2 p i λ f ( η α ) ( h · u ) 2 .
It is straightforward to derive from the above relations that the two cases below hold
f ( η α ) 0 ker d g ( p ) = { u : h · u = 0 } G p p u · u > 0 u 0 , f ( η α ) = 0 ker d g ( p ) = R n G p p u · u R .
Therefore, at points ( y , u ) E where f ( η α ) 0 the Lagrangian submanifold Λ G in (33) is transversal. At points in E where f ( η α ) = 0 , we have d g = f ( η α ) h = 0 , see (21), thus transversality is lost as—see the form of G u u in (31)—for these points
det G u u ( p , λ ) = 0 , a n d ( y , u ) E .
We remark that the above introduced framework is able to give the global description of the set of solutions (28), (30) in terms of the Lagrangian submanifold locally described as
Λ f ( y ) = { ( y , G y ) = y , λ ( η α ( y ) ) = ( y , θ ^ ( η α ( y ) ) f ( η α ( y ) ) ) } T R y
where λ ( η α ( y ) ) is given by (30). If we consider f : E R η R y , y = f ( η ) as a local change of coordinates on M ( h , k ) (since f is locally invertible where f ( η ) 0 ) it is easy to prove that
Proposition 3.
The submanifold Λ f ( y ) T R y in (34) is the image Λ f ( y ) = T f ( Λ f ) of
Λ f = { ( η , θ ^ α ( η ) ) : η E } T M ( h , k )
where θ ^ α ( η ) is the multiplier in (29) associated to the constraint E p [ h ] = η α and η α I ( η ) = f ( f ( η ) ) .
Proof. 
If y = f ( η ) is the local change of coordinates in M ( h , k ) , then the tangent map T f : T R η T R y has the local form ( y , y ˙ ) = T f ( η , η ˙ ) = ( f ( η ) , f ( η ) η ˙ ) and the cotangent map T f : T R η T R y has the local form
( y , λ ) = T f ( η , β ) = f ( η ) , β f ( η )
if we want that the Liouville one-form (see above (1)) has the same canonical form θ c = λ d y = β d η in the two coordinate charts. See, e.g., in [19] for a proof of this last classical result of differential geometry. □
We want to study the Lagrangian submanifold Λ f defined in (35) and its image L f = F D 1 ( Λ f ) M ( h , k ) 2 , where F D 1 is defined in (14), whose local expression is
L f = { ( η , θ ^ ( η ) θ ^ ( η α ) ) : η E } .
First we consider the case that f is a globally invertible function. In this case, I ( η ) = f ( f ( η ) ) = { η } and θ ^ ( η ) = φ η ( η ) . The Lagrangian submanifold Λ f in (35) is the graph of the differential φ η ( η ) and it is transversal, see Figure 2a. Moreover, see below (9), if η = η 0 = E q [ h ] then θ ^ ( η 0 ) = 0 . As ψ θ ( θ ) = η is invertible with inverse θ = θ ^ ( η ) , we have
d θ ^ d η ( η ) = ( d 2 ψ d θ 2 ) 1 = var p ^ ( h ) = E p ^ [ h 2 ] η 2 > 0
and θ ^ ( η ) is a monotonically increasing function, see Figure 2a. Its image (36) is L f = M ( h , k ) × { 0 } , see Figure 2b.
If we consider a globally non invertible function f as the one depicted in Figure 1, then I ( η ) contains multiple points and Λ f is non transversal at points where f ( η ) = 0 , see Figure 3a. The corresponding image L f has multiple branches and it is not a manifold at points ( b , c ) where transversality fails, see Figure 3b).

3.2. Probability Distributions in L f

In this section, we study the structure of the probability distributions in L f . In the local coordinate systems ( η , θ ) of M ( h , k ) 2 , η and θ ^ ( η ) describe the same probability distribution that we write for brevity as p i ( η ) = p i ( θ ^ ) . Therefore, the probability distributions in L f in (36) associated to η and θ ^ ( η ) θ ^ ( η α ) are, respectively,
p i ( η ) = e θ ^ h i k i ψ ( θ ^ )
and, see (18),
p i ( θ ^ θ ^ ( η α ) ) = p i ( η ) e θ ^ ( η α ) h i i p i ( η ) e θ ^ ( η α ) h i .
Setting
p ˜ i ( η α ) = e θ ^ ( η α ) h i Z ( λ ) , Z ( λ ) = i e θ ^ ( η α ) h i ,
the above (38) can be rewritten as the discrete version of (19), that is,
p i ( θ ^ θ ^ ( η α ) ) = p i ( η ) p ˜ i ( η α ) i p i ( η ) p ˜ i ( η α ) .
This last formula can be interpreted as follows; let A and B be two independent random variables A , B : Ω X , where X = { 1 , , n } is the discrete state space, described by the probability distributions p i and p ˜ i , respectively (for example, A and B describe two dices with n faces). Then, i p i p ˜ i is the probability that A and B are found in the same state and
P r o b ( A = i , B = i | A = B ) = p i p ˜ i i p i p ˜ i
in (39) is the conditional probability that A and B are found in the state i provided that they are found in the same state. Note that for p i ( η ) in (37) we have e k i = q i , thus (37) can be rewritten as
p i ( η ) = q i e θ ^ h i i q i e θ ^ h i = q i p ˜ ( θ ^ ) i q i p ˜ i ( θ ^ )
and (39) above is equal to
p i ( θ ^ θ ^ ( η α ) ) = q i p i p ˜ i i q i p i p ˜ i = P r o b ( A = i , B = i , C = i | A = B = C )
where A , B , C are described by q i , p i = p ˜ i ( θ ^ ( η ) ) , p ˜ i = p ˜ i ( θ ^ ( η α ) ) .

4. Discussion

Canonical coordinates η and θ associated to an exponential family M ( h , k ) are dually flat coordinates with respect to the duality defined by the canonical divergence. With respect to these coordinates, a generalization of the Pitagorean theorem is proved in Information Geometry which provides a generalized formulation of the Maximum Entropy Principle with linear constraints as a geodesic projection problem (see [2]). Multiplicity of the solutions θ ^ ( η ) of the Maximum Entropy problem are due to the non uniqueness of the projection. In this paper, we have shown that the set of couples ( η , θ ^ ( η ) ) defines a transversal Lagrangian submanifold Λ of T M ( h , k ) , and we have seen with an example that if nonlinear constraints are considered the set of possible multiple solutions to the Maximum Entropy problem is globally described by a folded (i.e., a possibly non-trasversal) Lagrangian submanifold Λ f . We have computed their pull-back to the square manifold M ( h , k ) 2 via the map F D 1 . We think that this framework offers a complementary view to the generalized Pitagorean Theorem. We plan to address in a subsequent paper a generalization of the theory presented here to a more general form of nonlinear constraint.

Funding

This research received no external funding.

Conflicts of Interest

The author declares no conflict of interest.

References

  1. Amari, S. Information Geometry and Its Applications; Springer: Berlin/Heidelberg, Germany, 2016; Volume 194. [Google Scholar]
  2. Amari, S.; Hiroshi, N. Methods of Information Geometry; American Mathematical Soc.: Providence, RI, USA, 2007; Volume 191. [Google Scholar]
  3. Murray, M.K.; Rice, J.W. Differential Geometry and Statistics; CRC Press: Boca Raton, FL, USA, 1993; Volume 48. [Google Scholar]
  4. Amari, S.; Cichocki, A. Information geometry of divergence functions. Bull. Pol. Acad. Sci. Tech. 2010, 58, 183–195. [Google Scholar] [CrossRef] [Green Version]
  5. Eguchi, S. A differential geometric approach to statistical inference on the basis of contrast functionals. Hiroshima Math. J. 1985, 15, 341–391. [Google Scholar] [CrossRef]
  6. Ay, N.; Amari, S. A novel approach to canonical divergences within information geometry. Entropy 2015, 17, 8111–8129. [Google Scholar] [CrossRef]
  7. Barndorff-Nielsen, O.E.; Jupp, P.E. Statistics, yokes and symplectic geometry. Ann. Fac. Sci. Toulouse Math. 1997, 6, 389–427. [Google Scholar] [CrossRef]
  8. Leok, M.; Zhang, J. Connecting information geometry and geometric mechanics. Entropy 2017, 19, 518. [Google Scholar] [CrossRef]
  9. Noda, T. Symplectic structures on statistical manifolds. J. Aust. Math. Soc. 2011, 90, 371–384. [Google Scholar] [CrossRef] [Green Version]
  10. Nakamura, Y. Completely integrable gradient systems on the manifolds of Gaussian and multinomial distributions. Jpn. J. Ind. Appl. Math. 1993, 10, 179. [Google Scholar] [CrossRef]
  11. Jaynes, E.T. Information theory and statistical mechanics. Phys. Rev. 1957, 106, 620–630. [Google Scholar] [CrossRef]
  12. Jaynes, E.T. Probability Theory: The Logic of Science; Cambridge University Press: Cambridge, UK, 2003. [Google Scholar]
  13. Brot, R. Phase Transitions. In Statistical Physics. Phase Transitions and Superfluidity; Brandeis University Summer Institute in Theoretical Physics, Gordon and Breach Science Publishers: London, UK, 1966; pp. 5–103. [Google Scholar]
  14. Favretti, M. Lagrangian submanifolds generated by the Maximum Entropy principle. Entropy 2005, 7, 1–14. [Google Scholar] [CrossRef]
  15. Fujiwara, A.; Shigeru, S. Hereditary structure in Hamiltonians: Information geometry of Ising spin chains. Phys. Lett. A 2010, 374, 911–916. [Google Scholar] [CrossRef]
  16. Maslov, V.P.; Bouslaev, V.C.; Arnol’d, V.I. Theorie des Perturbations et Methodes Asymptotiques; Dunod: Paris, France, 1972. [Google Scholar]
  17. Hormander, L. Fourier integral operators. I. Acta Math. 1971, 127, 79. [Google Scholar] [CrossRef]
  18. Weinstein, A. Lectures on Symplectic Manifolds; No. 29.; American Mathematical Soc.: Providence, RI, USA, 1977. [Google Scholar]
  19. Cardin, F. Elementary Symplectic Topology and Mechanics; Springer: Berlin/Heidelberg, Germany, 2015. [Google Scholar]
  20. Favretti, M. Isotropic submanifolds generated by the Maximum Entropy Principle and Onsager reciprocity relations. J. Funct. Anal. 2005, 227, 227–243. [Google Scholar] [CrossRef]
  21. Bertsekas, D.P. Constrained Optimization and Lagrange Multiplier Methods; Academic Press: Cambridge, MA, USA, 2014. [Google Scholar]
Figure 1. Plot of y = f ( η ) = η ( η 2 a 2 ) . Points b , c correspond to points where f ( η ) = 0 .
Figure 1. Plot of y = f ( η ) = η ( η 2 a 2 ) . Points b , c correspond to points where f ( η ) = 0 .
Entropy 22 00983 g001
Figure 2. The case of a transversal Lagrangian submanifold.
Figure 2. The case of a transversal Lagrangian submanifold.
Entropy 22 00983 g002
Figure 3. The case of a folded, i.e., non transversal Lagrangian submanifold.
Figure 3. The case of a folded, i.e., non transversal Lagrangian submanifold.
Entropy 22 00983 g003

Share and Cite

MDPI and ACS Style

Favretti, M. Lagrangian Submanifolds of Symplectic Structures Induced by Divergence Functions. Entropy 2020, 22, 983. https://doi.org/10.3390/e22090983

AMA Style

Favretti M. Lagrangian Submanifolds of Symplectic Structures Induced by Divergence Functions. Entropy. 2020; 22(9):983. https://doi.org/10.3390/e22090983

Chicago/Turabian Style

Favretti, Marco. 2020. "Lagrangian Submanifolds of Symplectic Structures Induced by Divergence Functions" Entropy 22, no. 9: 983. https://doi.org/10.3390/e22090983

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop