Abstract
The divergence function in information geometry, and the discrete Lagrangian in discrete geometric mechanics each induce a differential geometric structure on the product manifold . We aim to investigate the relationship between these two objects, and the fundamental role that duality, in the form of Legendre transforms, plays in both fields. By establishing an analogy between these two approaches, we will show how a fruitful cross-fertilization of techniques may arise from switching formulations based on the cotangent bundle (as in geometric mechanics) and the tangent bundle (as in information geometry). In particular, we establish, through variational error analysis, that the divergence function agrees with the exact discrete Lagrangian up to third order if and only if Q is a Hessian manifold.
1. Introduction
Information geometry and geometric mechanics each induce geometric structures on an arbitrary manifold Q, and we investigate the relationship between these two approaches. More specifically, we study the interaction of three objects: , the tangent bundle on which a Lagrangian function is defined; , the cotangent bundle on which a Hamiltonian function is defined; and , the product manifold on which the divergence function (from the information geometric perspective) or the Type I generating function (from the geometric mechanics perspective) is defined. In discrete mechanics, while the correspondence is via a symplectomorphism given by the time-h flow map associated with the Hamiltonian , and the correspondence is via the map relating the boundary-value and initial-value formulation of the Euler–Lagrange flow, it is the correspondence between through the fiberwise Legendre map based on L or H that actually serves to couple the Hamiltonian flow with Lagrangian flow, leading to one and the same dynamics. We propose a decoupling of the Lagrangian and Hamiltonian dynamics through the use of a divergence function defined on the Pontryagin bundle that measures the discrepancy (or duality gap) between and . We also establish, through variational error analysis that the divergence function agrees with exact discrete Lagrangian up to third order if and only if Q is a Hessian manifold.
Geometric mechanics [1] investigates the equations of motion using the Lagrangian, Hamiltonian, and Hamilton–Jacobi formulations of classical Newtonian mechanics. Two apparently different principles were used in those formulations: the principle of conservation (energy, momentum, etc.) leading to Hamiltonian dynamics and the principle of variation (least action) leading to Lagrangian dynamics. The conservation properties of the Hamiltonian approach are with respect to the underlying symplectic geometry on the cotangent bundle, whereas the variational principles that result in the Euler–Lagrange equation of motions and the Hamilton–Jacobi equations reflect the geometry of the semispray on the tangent bundle. Lagrangian and Hamiltonian mechanics reflect two sides of the same coin–that they describe the identical dynamics on the configuration space (base manifold) is both remarkable and also to be expected due to their construction: because the Lagrangian and Hamiltonian are dual to each other, and related via the Legendre transform.
Information geometry [2,3] in the broadest sense of the term, provides a dualistic Riemannian geometric structure that is induced by a class of functions called divergence functions, which essentially provide a method of smoothly measuring a directed distance between any two points on the manifold, where the manifold is the space of probability densities. It arises in various branches of information science, including statistical inference, machine learning, neural computation, information theory, optimization and control, etc. Various geometric structures can be induced from divergence functions, including metric, affine connection, symplectic structure, etc., and this is reviewed in [4]. Convex duality and the Legendre transform play a key role in both constructing the divergence function and characterizing the various dualities encoded by information geometry [5,6].
Given that geometric mechanics and information geometry both prescribe dualistic geometric structures on a manifold, it is interesting to explore the extent to which these two frameworks are related. In geometric mechanics, the Legendre transform provides a link between the Hamiltonian function that is defined on the cotangent bundle , with the Lagrangian function that is defined on the tangent bundle , whereas in information geometry, it provides a link between the biorthogonal coordinates of the base manifold Q if it is dually-flat and exhibits the Hessian geometry. To understand their deep relationship, it turns out that we need to resort to the discrete formulation of geometric mechanics, and investigate the product manifold . The basic tenet of discrete geometric mechanics is to preserve the fact that Hamiltonian flow is a symplectomorphism, and construct discrete time maps that are symplectic. This results in two ways of viewing discrete-time mechanics, either as maps on or , which are related via discrete Legendre transforms. The shift in focus from to precisely lends itself to establishing a connection to information geometry, as the divergence function is naturally defined on , and in both information geometry and discrete geometric mechanics, induces a symplectic structure on . This is the basic observation that connects geometric mechanics and information geometry, and we will explore the implications of this connection in the paper.
Our paper is organized as follows. Section 2 provides a contemporary viewpoint of geometric mechanics, with Lagrangian and Hamiltonian systems discussed in parallel with one another, in terms of geometry on and , respectively, including a discussion of Dirac mechanics on the Pontryagin bundle , which provides a unified treatment of Lagrangian and Hamiltonian mechanics. Section 3 summarizes the results of the discrete formulation of geometric mechanics, which is naturally defined on the product manifold . Section 4 is a review of now-classical information geometry, including the Riemannian metric and affine connections on , and the manner in which the divergence function naturally induces dualistic Riemannian structures. The special cases of Hessian geometry and biorthogonal coordinates are highlighted, showing how the Legendre transform is essential for characterizing dually-flat spaces. Section 5 starts with a presentation of the symplectic structure on induced by a divergence function, which is naturally identified with the Type I generating function on it. We follow up by investigating its transformation into a Type II generating function (which plays a key role in discrete Hamiltonian mechanics). We then propose to decouple the discrete Hamiltonian and Lagrangian dynamics by using the divergence function to measure their duality gap. Finally, we perform variational error analysis to show that on a dually-flat Hessian manifold, the Bregman divergence is third-order accurate with respect to the exact discrete Lagrangian. Section 6 closes with a summary and discussion.
2. A Review of Geometric Mechanics
Consider an n-dimensional configuration manifold Q, with local coordinates . The Lagrangian formulation of mechanics is defined on the tangent bundle , in terms of a Lagrangian . From this, one can construct an action integral which is a functional of the curve , given by
Then, Hamilton’s variational principle states that,
where the variation is induced by an infinitesimal variation of the trajectory q, subject to the condition that the variations vanish at the endpoints, i.e., . Applying standard results from the calculus of variations, we obtain the following Euler–Lagrange equations of motion,
The Hamiltonian formulation of mechanics is defined on the cotangent bundle , and the fiberwise Legendre transform, , relates the tangent bundle and the cotangent bundle as follows,
where is the conjugate momentum to :
The term fiberwise is used to emphasize the fact that establishes a pointwise correspondence between and for any point q on Q. The cotangent bundle forms the phase space, on which one can define a Hamiltonian ,
where is viewed as a function of by inverting the Legendre transform (3), and
denotes the duality or natural pairing between a vector v and covector p at the point . A straightforward calculation shows that
and
From these, we transform the Euler–Lagrange equations into Hamilton’s equations,
The canonical symplectic form on can be identified with a quadratic form induced by the skew-symmetric matrix J, i.e., . With that identification, Hamilton’s equations can be expressed as,
Alternatively, Hamilton’s equations (5) can be derived using Hamilton’s phase space variational principle, which states that,
for infinitesimal variations that vanish at the endpoints. The infinitesimal variation of the integral is computed by differentiating under the integral, integrating by parts, and using the fact that the infinitesimal variations vanish at the endpoints, which yields:
and by the fundamental theorem of the calculus of variations, which states that the integral is stationary only when the terms in the parentheses multiplying into the independent variations and vanish, we recover Hamilton’s equations (5).
Lagrangian and Hamiltonian mechanics are typically viewed as different representations of the same dynamical system, with the Legendre transform relating the two formulations. Here, the Legendre transform (with as its inverse) refers to both the map relating two sets of variables, with , as well as the relationship between two functions, the Lagrangian and the Hamiltonian . The Legendre transform links pairs of convex conjugate functions; in classical mechanics, the Lagrangian L and Hamiltonian H are always related in this sense of forming a convex pair. The requirement that be strictly convex in the variable is referred to as hyperregularity. When the Lagrangian is positive homogeneous (or singular), the Legendre transform yields a Hamiltonian function that is identically zero, which means that in such cases, the Hamiltonian analogue of the Lagrangian system does not exist, which is problematic in the context of analytic mechanics. In order to address such degeneracy, it is necessary to consider Dirac mechanics on Dirac manifolds, which is a simultaneous generalization of Lagrangian and Hamiltonian mechanics.
In geometric mechanics, including the contemporaneous Dirac formulation, the Lagrangian L and Hamiltonian H are always coupled via the fiberwise Legendre transform . In information geometry, it is a well-known fact that one can construct the divergence function (to be defined later), which captures the departure from such perfect coupling. In other words, we can view Lagrangian and Hamiltonian systems as two separate systems, which are endowed with their own dynamics and are in some sense dual to each other, and we then use the divergence function to measure their duality gap. For this reason, we will review the Lagrangian and Hamiltonian formulation of mechanics in terms of and , respectively, without necessarily assuming that the Lagrangian and Hamiltonian are related by the Legendre transform.
2.1. Lagrangian Mechanics as an Extremization System on
As noted previously, the Euler–Lagrange equations (2) arise from the stationarity conditions that describe the extremal curves of the action integral, over the class of varied curves that fix the endpoints. Carrying out the differentiation in (2) explicitly yields,
The fundamental tensor associated with the Lagrangian is given by,
which is assumed to be positive-definite, i.e., the Lagrangian L is hyperregular. Let denote the matrix inverse of , then (6) can be written as
where
So, Equation (7) with the above are Euler–Lagrange equations in disguise, and its solution is an extremal curve of the action integral.
Recall that a smooth curve on Q can be lifted to a curve on in a natural way: a curve becomes . Given an arbitrary , the system of equations (7) specify a family of curves, called a semispray. As seen above, semisprays arise naturally in variational calculus as extremal curves of the action integral associated with a Lagrangian.
Semisprays can be more generally described by a vector field. Recall that a vector field on Q is a section of . Now, consider a vector field on the tangent bundle ; it is a section of the double tangent bundle . The integral surfaces of the semispray induces a decomposition of the total space into the horizontal subspace and the vertical subspace , which defines an Ehresmann connection. A vector on encodes the second-order derivative of curves on Q, and a semispray defines the following vector field V on :
where the factor is there by convention. The integral curve of the semispray satisfies the second-order ordinary differential equation (7), and we say that a semispray is a vector field on the tangent bundle which encodes a second-order system of differential equations on the base manifold Q.
A semispray is called a full spray if the spray coefficients satisfy
for . In this case, the integral curve remains invariant under reparameterization by a positive number, i.e., it satisfies homogeneity. When the semispray becomes a (full) spray, the Lagrange geometry becomes Finsler geometry, and the fundamental tensor becomes the Finsler–Riemann metric tensor (which includes the Riemann metric as a special case).
As noted above, a semispray induces an Ehresmann connection on Q and this connection is torsion-free and typically nonlinear. Conversely, given a torsion-free connection, one can construct a semispray. The connection is homogenous if and only if the semispray is a full spray. Moreover, if the spray is affine, then the connection is affine as well—an affine spray takes the form
where is referred to as the affine connection.
To summarize, Lagrangian dynamics is related to action minimization by the Euler operator, and leads to a semispray on the configuration manifold Q. Under suitable conditions, the Lagrangian function defined on will lead to a torsion-free but generally nonlinear connection, and an affine connection only for a very special form of Lagrangian.
2.2. Hamiltonian Mechanics as a Conservative System on
Given a Hamiltonian , we consider the Hamiltonian vector field (where denotes a section) defined by
It is straightforward to verify that along the dynamical flow of :
So, a Hamiltonian vector field advects the Hamiltonian H along its flow, so that H is constant along solution curves, which implies that the Lie derivative of H along the flow of vanishes,
Formally, starting from the tautological 1-form on Q, one obtains a 2-form , called the Poincaré 2-form,
which is the canonical symplectic form on :
where are vector fields on .
More generally, given a Hamiltonian H along with a symplectic form , which is, by definition, a closed, nondegenerate 2-form, one obtains the Hamiltonian vector field on , defined in abstract notation by
or equivalently in a more familiar notation,
One can define the Poisson bracket of two functions F and G by using their respective Hamiltonian vector fields and the symplectic form,
For the canonical symplectic form, it has the following coordinate expression,
In this way, Hamilton’s equations can be expressed in terms of the Poisson bracket as follows,
By Darboux’s theorem, it is always possible to choose local coordinates on , referred to as canonical coordinates, such that the symplectic form has the expression . In these coordinates, Hamilton’s equations defined in terms of the symplectic structure (9) and Poisson structure (10) recover the canonical Hamiltonian vector field (8).
Note that any smooth function H on induces a Hamiltonian vector field. An arbitrary vector field X on is locally Hamiltonian, i.e., induced by a smooth function H, if is closed, i.e., . In addition, a Hamiltonian vector field preserves the volume form , i.e.,
where is the n-fold exterior product of ,
2.3. Symplectic Maps and Symplectic Flows
A symplectic map is a diffeomorphism of that preserves its symplectic structure . We first consider a one-parameter family of symplectic maps generated by the flow map of a vector field . Since the entire family of symplectic maps leave invariant, it follows that . It can be shown (using Cartan’s magic formula, and the fact that is closed) that a vector field is symplectic if is closed, i.e., . By the Poincaré lemma, this implies that is locally exact, that is, in the neighborhood of any point, there exists some function such that . So there is always locally exists a Hamiltonian that generates a vector field X whose flow is symplectic with respect to .
More generally, a diffeomorphism is a symplectic map from a symplectic space to another space if:
where are the symplectic forms on , respectively. The above condition (11) holds if and only if for any functions f, g:
- (i)
- ,
- (ii)
- .
With respect to Darboux coordinates about a point , the condition (11) that a map is symplectic can be expressed locally by , where denotes the Jacobian of at z.
A canonical transformation of is an automorphism ,
such that
The significance of canonical transformations is that they preserve the form of Hamilton’s equations, and one can check that an automorphism is canonical by verifying that in a Darboux coordinate chart.
2.4. Symplectic Structure on Pulled Back from
If we endow with the canonical symplectic form, we can construct a symplectic form on in such a way that these two spaces are symplectomorphic.
The mapping between and can be constructed in two different ways, Case I involves the Legendre transform:
and Case II involves the Riemannian metric tensor g (on Q):
Note that we say that g is a pseudo-Riemannian metric on Q when g acts on a pair of tangent vectors at the tangent space at a point q of Q; it can be viewed as a symmetric -tensor that maps . On the other hand, the symplectic form is a skew-symmetric -tensor that acts on a pair of tangent vectors on , so it maps .
Case I. Given the Lagrangian , this induces the fiberwise Legendre transform , which is given by . If L is hyperregular, then this map is a diffeomorphism. If we endow with the pullback symplectic form , which is given by
then the Legendre transform is a symplectomorphism (by construction).
Case II. The Riemannian metric g induces the musical isomorphisms and between and , which are the operations that lower and raise the index, respectively. If we endow with the pullback symplectic form , which is given by
then the musical isomorphism is a symplectomorphism (by construction).
Link between Case I and Case II. It is possible that the two ways of identifying may be the same; this happens when g on coincides with the second derivatives of with respect to the v-variable:
assuming L is hyperregular. The inverse of g, denoted , can be obtained from
using the Hamiltonian defined on . Note that when the Lagrangian has the form , this corresponds to the Riemannian metric g being given by the kinetic energy metric .
2.5. Hamilton-Jacobi Theory and Dirichlet-to-Neumann Map
In classical mechanics, the Hamilton–Jacobi equation is first introduced as a partial differential equation that the action integral satisfies. Recall that the action integral S along the solution of the Euler–Lagrange equation (2) over the time interval is
This is referred to as Jacobi’s solution of the Hamilton–Jacobi equation. Here, we assume that the initial position is fixed and the final position depends on the initial velocity . By taking a variation of the endpoint , one obtains a partial differential equation satisfied by :
This is the Hamilton–Jacobi equation, when H does not explicit depend on t.
Conversely, it is shown that if is a solution of the Hamilton–Jacobi equation then is a generating function for the family of canonical transformations (or symplectic flows) that describe the dynamics defined by Hamilton’s equations. This result is the theoretical basis for the powerful technique of exact integration called separation of variables.
There are two uses of . First, it serves to characterize the Dirichlet-to-Neumann map, which refers to the correspondence between the boundary data with the initial data of a dynamical system. Second, it provides a foliation of the configuration space Q, around the point and parameterized by t, that is defined by the condition .
In the rest of the paper, we will view as a scalar-valued function of , which we refer to as the exact discrete Lagrangian ,
this is equivalent to the expression for Jacobi’s solution, as the stationarity conditions of this variational characterization are simply the Euler–Lagrange equations. Furthermore, this characterization has the added benefit that it is well-defined even if the Lagrangian is degenerate. The exact discrete Lagrangian provides us with the time-h flow map for the Euler–Lagrange equation. Given a fixed initial point , this defines a map which takes to an initial velocity , such that the Euler–Lagrange trajectory with initial condition has boundary values . This is the Dirichlet-to-Neumann map , .
To address the Dirichlet-to-Neumann map more generally, let us first recall the definition of a retraction:
Definition 1
([7], Definition 4.1.1 on p. 55). A retraction on a manifold Q is a smooth mapping : with the following properties: Let be the restriction of to for an arbitrary ; then,
- (i)
- , where denotes the zero element of ;
- (ii)
- with the identification , satisfieswhere is the tangent map of at .
Equation (17) implies that the map is invertible in some neighborhood of in . Its inverse is conveniently denoted as , which is defined by
it is easy to see that is also invertible in some neighborhood of for any .
Let us introduce a special class of coordinate charts that are compatible with a given retraction map . A coordinate chart with U an open subset in Q and is said to be retraction compatible at if
- (i)
- is centered at q, i.e., ;
- (ii)
- the compatibility condition
An atlas for the manifold Q is retraction compatible if it consists of retraction compatible coordinate charts.
In Equation (19), we assumed that and so strictly speaking is defined on . However, it is always possible to define a coordinate chart such that by stretching out the open set to so that (19) is defined for any .
Retraction maps provide general means to relate to : in essence it provides a correspondence between and for all (we may take to mean the projection of onto either the first or the second slot).
2.6. Variational Mechanics and the Pontryagin Bundle
Lagrangian and Hamiltonian mechanics can be combined into Dirac mechanics [8,9], which is described on the Pontryagin bundle , which has position, velocity, and momentum as local coordinates.
Just as the Euler–Lagrange equations of motion arises out of Hamilton’s principle, Hamilton’s equations can also arise from Hamilton’s phase space principle:
On the Pontryagin bundle , which has local coordinates , a relaxation of Hamilton’s principle (1) is the Hamilton–Pontryagin variational principle, which uses a Lagrange multiplier p to impose the second-order condition ,
This encapsulates both Hamilton’s and Hamilton’s phase space variational principles, as well as the Legendre transform, and gives the implicit Euler–Lagrange equations,
The last equation explicitly imposes the primary constraint condition, which is important when describing degenerate Lagrangian systems, such as electrical circuits. Note that the p are interpreted as Lagrange multipliers [10] in addition to its usual interpretation as conjugate momenta. The three equations can be combined by eliminating v and p to recover the Euler–Lagrange equations.
An important application of Hamilton–Jacobi theory is in optimal control theory. Consider a typical optimal control problem,
subject to the constraints,
and the boundary conditions and . We convert constrained optimization to unconstrained optimization by using Lagrange multipliers p (sometimes called the costate or auxiliary variables), and we can define the augmented cost functional:
where we introduced the costate variables p, and also defined the control Hamiltonian,
The variables forms a Hamiltonian system, so we impose the optimality condition,
to obtain the equation for the optimal control , and we obtain the Hamiltonian,
We also define the optimal cost-to-go function,
where for is the solution of Hamilton’s equations with the above H such that ; and is the optimal cost
and the function is defined by
Since this definition coincides with (14), the function satisfies the Hamilton–Jacobi equation (15); this reduces to the Hamilton–Jacobi–Bellman (HJB) equation for the optimal cost-to-go function :
It can also be shown that the costate p of the optimal solution is related to the solution of the Hamilton–Jacobi–Bellman equation.
3. Discrete Formulation of Geometric Mechanics
In this section, we review various schemes for discretizing mechanics (see, e.g., [11]). Geometric mechanics focuses on the differential geometric structure of the configuration manifolds, the associated symplectic and Poisson structures on the phase space, and the conservation laws generated by symmetries, and geometric structure-preserving numerical integration aims to preserve as many of these geometric properties as possible under discretization. The main idea is to start from the canonical symplectic form on , and look at the symplectomorphisms that preserve or its pullback via the Legendre transforms to or .
3.1. Symplectomorphisms from to and to
Given a cotangent bundle with a symplectic form , we wish to endow the bundles and with a symplectic structure. Given a function , the Legendre transform is viewed as the fiber derivative , . The pullback of with respect to yields a symplectic structure on .
Similarly, given a function , we define two discrete fiber derivatives, : , which serve as discrete Legendre transforms:
Here refers to taking a derivative with respect to the first or second slot, respectively:
The two choices of discrete fiber derivatives correspond to whether one views as a bundle over Q with respect to or , i.e., projection onto the first or the second slot. These induce symplectic structures on by pullback.
Let be a symplectic map and let the maps denoted by the dotted arrows in the figure above be defined by requiring that the diagram commutes. Then, these maps are also symplectic maps, and the fiber derivative is a symplectomorphism between and , and the discrete fiber derivatives are symplectomorphisms between and .
3.2. Discrete Lagrangian Mechanics
The aim of geometric structure-preserving numerical integration is to preserve as many geometric conservation laws as possible under discretization. Discrete variational mechanics [11] is based on the discrete Hamilton’s principle,
where the endpoints and are fixed, and the discrete Lagrangian, , is a Type I generating function of the symplectic map. Recall that there exists an exact discrete Lagrangian (16), that generates the exact time-h flow of a Lagrangian system, but it cannot be computed in general. One possible method of constructing computable discrete Lagrangians is the Galerkin approach, which involves replacing the infinite-dimensional function space and the integral in (16) with a finite-dimensional function space and a quadrature formula, respectively. Below are two examples of discrete Lagrangians:
- (i)
- Symplectic midpoint integratorthis can be obtained from the Galerkin approach by considering the family of linear polynomials as the finite-dimensional function space, and the midpoint rule as the quadrature formula.
- (ii)
- Störmer–Verlet integratorthis can be obtained from the Galerkin approach by considering the family of linear polynomials as the finite-dimensional function space, and the trapezoidal rule as the quadrature formula.
Performing variational calculus on the discrete Hamilton’s principle (25) yields the discrete Euler–Lagrange (DEL) equations,
The above equation implicitly defines the discrete Lagrangian map at points sufficiently close to the diagonal of . This is equivalent to the implicit discrete Euler–Lagrange (IDEL) equations,
which is precisely the characterization of a symplectic map in terms of Type I generating function. It implicitly defines the discrete Hamiltonian map , and it is symplectic with respect to the canonical symplectic form on , i.e., .
The two discrete fiber derivatives induce a single unique discrete symplectic form on ,
and the discrete Lagrangian map is symplectic with respect to on , i.e., .
The discrete Lagrangian and Hamiltonian maps can be expressed in terms of the discrete fiber derivatives, , and , respectively. This characterization of the discrete flow maps underlies the proof of the variational error analysis theorem.
These properties may be summarized in the following commutative diagram,
If the exact discrete Lagrangian is used, then the discrete Hamiltonian map is equal to the time-h flow map of Hamilton’s equations, and the dotted arrow is the time-h flow map of the Euler–Lagrange equations.
The variational integrator approach to constructing symplectic integrators simplifies the numerical analysis of these methods. In particular, the task of establishing the geometric conservation properties and order of accuracy of the discrete Lagrangian map reduces to the simpler task of verifying certain properties of the discrete Lagrangian instead.
3.3. Discrete Hamilton–Jacobi Formulation
In the context of discrete variational mechanics, discrete Hamilton–Jacobi theory can be viewed as a composition theorem which relates the composition of symplectic maps generated by a Type II generating function with a symplectic map generated by a Type I generating function . By convention, the first argument in the composition generating function is typically omitted, and we simply consider it to be a function of the final position .
The right discrete Hamiltonian, [12], is related to the discrete Lagrangian by the Legendre transform,
where we impose the condition that . Equivalently, this can be characterized variationally by . This leads to a discrete Hamilton’s principle in phase space,
which yields the right discrete Hamilton’s equations,
which is precisely the characterization of a symplectic map in terms of Type II generating function.
The continuous Hamilton–Jacobi equation can be derived by considering the evolution properties of Jacobi’s solution, which is the action integral evaluated along the solution of the Euler–Lagrange equations. One can derive a discrete Hamilton–Jacobi theory by considering a discrete analogue of Jacobi’s solution, expressed in terms of the right discrete Hamiltonian,
which we evaluate along a solution of the right discrete Hamilton’s equations (29). From this, we have,
where is considered to be a function of and . Taking derivatives with respect to , we obtain,
but the term inside the parenthesis vanishes as we are restricting this to a solution of the right discrete Hamilton’s equations. Therefore, we have that
which when substituted into (30) yields the discrete Hamilton–Jacobi equation,
3.4. Discrete Hamilton–Pontryagin Principle
Leok and Ohsawa [13] considered the discrete Hamilton’s principle and relaxed the discrete second-order condition,
and reimposed it using Lagrange multipliers , in order to derive the discrete Hamilton–Pontryagin principle on ,
Here, the superscripts 0, or 1 on refers to the first or second slot, respectively, in . This in turn yields the implicit discrete Euler–Lagrange equations,
where denote as before the partial derivative with respect to the first or second argument in . Making the identification , the last two equations define the discrete fiber derivatives, as given by (23) and (24). Discrete fiber derivatives induce a discrete symplectic form, , and the discrete Lagrangian map and the discrete Hamiltonian map preserve and , respectively.
4. Information Geometry
4.1. Statistical Structure on
On a differentiable manifold endowed with a metric g and a torsion-free affine connection ∇, the compatibility of a metric g and a connection ∇ is encoded by the cubic form 3-tensor field , i.e., the covariant derivative of g. In a local coordinate system with basis , the metric tensor g is locally represented by
and the components of ∇ takes the contravariant form , where
or the covariant form , where
Torsion-freeness of ∇ implies the symmetry of its (first) two lower indices, i.e.,
We can now compute the cubic form,
or in components,
When the cubic form is identically zero, ∇ is said to be parallel with respect to g. A torsion-free connection parallel to g is called the Levi-Civita connection associated to the given metric g:
The fundamental theorem of Riemannian geometry establishes the existence and uniqueness of the Levi-Civita connection, which is a solution of (37), and is given by,
Generalizing the notion of parallelism of a connection is the notion of conjugacy (denoted by ∗) between two connections. A connection is said to be conjugate (or dual) to ∇ with respect to g if
Clearly, . Moreover, , which satisfies (37), is special in the sense that it is self-conjugate . Writing out (38):
where is defined analogously to (34) and (35),
so that
Clearly, , i.e., it is symmetric with respective to its first two indices. When both ∇ and are torsion-free, this implies that
then , which leads to C being totally symmetric in all the indices,
Requiring that is totally symmetric imposes a compatibility condition between g and ∇, so that they form a Codazzi pair (see [14]), which generalizes the Levi-Civita coupling (whose corresponding cubic form ). Lauritzen [15] defined a statistical manifold to be a manifold equipped with metric g and connection ∇ such that (i) ∇ is torsion-free; (ii) is totally symmetric. Equivalently, a manifold has a statistical structure when the conjugate (with respect to g) of a torsion-free connection ∇ is also torsion-free. In this case, , and the Levi-Civita connection .
On a statistical manifold, one can define a one-parameter family of affine connections , called -connections () [2]:
Obviously, is the Levi-Civita connection, and the cubic form is given by .
The curvature/flatness of a connection ∇ is described by the Riemann curvature tensor R, defined as
Writing and substituting (34), the components of the Riemann curvature tensor are
By definition, is antisymmetric when . The covariant form of the Riemann curvature is
When the connection is torsion-free, is antisymmetric when or when , and symmetric when . It is related to the Ricci tensor Ric by .
In addition, it can be shown that the curvatures for the pair of conjugate connections satisfy
A connection is said to be flat when . So, ∇ is flat if and only if is flat. In this case, the manifold is said to be dually-flat, and the metric g takes on a particular form (to be discussed later).
4.2. Divergence Function and Induced Geometry
A divergence function on a manifold with respect to a local chart is a function satisfying
- (i)
- , with equality holding if and only if ;
- (ii)
- ;
- (iii)
- is positive-definite.
Here , denote partial derivatives with respect to the i-th component of point x and of point y, respectively, and the second-order mixed derivative, and so on.
On a manifold, divergence functions act as pseudo-distance functions that are nonnegative but need not be symmetric. Every divergence function induces a dualistic Riemannian structure, i.e., statistical structure, which was first demonstrated by Eguchi (see [16]).
Lemma 1.
A divergence function induces a Riemannian metric g and a pair of torsion-free conjugate connections given as
The are torsion-free and are conjugate with respect to the induced metric . Hence, the divergence function induces , which is a statistical manifold (Lauritzen [15]).
A popular divergence function is the Bregman divergence [17], which is associated to a strictly convex function :
where denotes the exterior derivative, and denotes the canonical pairing of a vector and a covector (dual to ), i.e.,
Where there is no danger of confusion, the subscript n in is often omitted. A basic fact in convex analysis is that the necessary and sufficient condition for a smooth function to be strictly convex is
for all .
Recall that when is convex, its convex conjugate, , is defined through the Legendre transform:
with and . Since is also convex, by (43), we obtain the Fenchel inequality,
for any , , with equality holding if and only if
or, in component form,
Using conjugate variables, we can introduce the canonical divergence (and ),
They are related to the Bregman divergence (41) via the relation
Though the Bregman divergence is not a metric, it satisfies a quadrilateral relation [18]: For any four points ,
As a special case, when , , the above equality reduces to the Pythagorean (generalized cosine) relation among three points :
This is the Pythagorean relation [3] for a dually-flat space. Using this relation, one can state minimization problems for divergence functions.
The quadrilateral relation can be expressed in terms of the canonical divergence as follows,
for any four points , .
Zhang [5] introduced the -indexed family of -divergence functions on ,
Furthermore, is defined by taking :
Note that satisfies the relation (called referential duality in [5,19]),
that is, exchanging the two points in the directed distance amounts to .
4.3. Hessian Manifolds and Biorthogonal Coordinates
Applying Lemma 1 to the Bregman divergence induces the following metric,
and the pair of torsion-free conjugate connections,
In this case, is dually-flat. This yields a Hessian manifold, where g takes the form of the Hessian of a strictly convex function . More generally, as shown in [5], the -divergence of (47), which degenerates to the Bregman divergence when , induces an -independent Hessian metric along with the following -connections
Hessian manifolds enjoy a special status in information geometry, as they exhibit biorthogonal coordinates on that are globally affine coordinates despite the nontrivial Riemannian (Hessian) metric on .
Consider the coordinate transform ,
where the Jacobian matrix F is given by
where is the Kronecker delta, which takes the value 1 when and 0 otherwise. If the new coordinate system (with components denoted by subscripts) is such that
then the x-coordinate system and the u-coordinate system are said to be biorthogonal to each other, since, from the definition of the metric tensor (33),
In this case, we define
which is equal to , the Jacobian of the inverse coordinate transform . We also introduce the contravariant representation of the affine connection ∇ with respect to the u-coordinate system, and denote it by an unconventional notation , which is defined by
similarly, is defined by
The covariant representation of the affine connections will be denoted by superscripted and ,
The representation of the affine connections in the u-coordinate system (denoted by superscripts) and the x-coordinate system (denoted by subscripts) are related by
and
Similarly relations hold between and , and between and .
Analogous to (39), we have the following identity,
Therefore, with respect to biorthogonal coordinates, a pair of conjugate connections satisfy,
and
We now investigate conditions for the existence of biorthogonal coordinates on a Riemannian manifold . From its definition (49), it can easily be shown that
Proposition 1
([20]). A Riemannian manifold with metric admits biorthogonal coordinates if and only if is totally symmetric, i.e.,
In this case, is Hessian.
That (56) is satisfied for biorthogonal coordinates is evident by virtue of (48) and (49). Conversely, given (56), there must be n functions , such that,
The above identity implies that there exist a function such that and, by positive definiteness of , would have to be a strictly convex function! In this case, the x- and u-variables satisfy (45), and the pair of convex functions, and its conjugate , are related to and by
It follows from Proposition 1 that a necessary and sufficient condition for a Riemannian manifold to admit biorthogonal coordinates it that its Levi-Civita connection is given by
From this, the following can be shown:
Corollary 1.
A Riemannian manifold admits a pair of biorthogonal coordinates x and u if and only if there exists a pair of conjugate connections ∇ and such that .
In other words, biorthogonal coordinates are affine coordinates for the dually-flat pair of connections. In fact, we can now define a pair of torsion-free connections by
and show that they are conjugate with respect to g, that is, they satisfy (38). This means that we select an affine connection ∇ such that x is its affine coordinate system. From (53), when is expressed in u-coordinates,
This implies that u is an affine coordinate system with respect to . Furthermore,
where is the convex conjugate of . Therefore, biorthogonal coordinates are affine coordinates for a pair of dually-flat connections. On the manifold of parameterized probability density functions, if the x-coordinates are the natural parameters, then the u-coordinates are the expectations.
5. Linking Information Geometry with Geometric Mechanics
5.1. Symplectic Structure on Induced from the Divergence Function
We will now establish the connection between information geometry and discrete geometric mechanics. The divergence function from information geometry can be viewed as a Type I generating function of a symplectic map, and in particular, it can be viewed as a discrete Lagrangian in the sense of discrete Lagrangian mechanics. More specifically, let the configuration manifold be the information manifold, i.e., , and the discrete Lagrangian be the divergence function, i.e., . With this identification, we observe that the information geometric construction of symplectic structure on described below is nothing but the discrete symplectic structure on given in (28) where the discrete Lagrangian is replaced with the divergence function .
From information geometry, a divergence function is given as a scalar-valued binary function on Q (of dimension n). We now view it as a unary function on (of dimension ) that vanishes along the diagonal . In this subsection, we investigate the conditions under which a divergence function can serve as a generating function of a symplectic structure on . A compatible metric on will also be derived. When restricted to the diagonal submanifold , the skew-symmetric symplectic form will vanish, so , which carries a statistical structure, is actually a Lagrangian submanifold (see [21,22]).
First, we fix a point x in the first slot and a point y in the second slot of – this results in two n-dimensional submanifolds of that will be denoted, (with the y point fixed) and (with the x point fixed), respectively. The canonical symplectic form on the cotangent bundle is given by
Given , we define a map from to , which is given by,
Recall that the comma in the subscript of a divergence function indicates whether it is being differentiated with respect to a variable in the first or second slot. It is easily checked that there exists a neighborhood of the diagonal , such that the map is a diffeomorphism. In particular, the Jacobian of the map is given by
which is nondegenerate in a neighborhood of the diagonal .
We calculate the pullback by of the canonical symplectic form on to :
Here, , since by the equality of mixed partials, always holds.
Similarly, we consider the canonical symplectic form on and define a map from , which is given by
Using to pullback to yields an analogous formula:
Therefore, based on canonical symplectic forms on and , we obtained the same symplectic form on
Theorem 1
([22]). A divergence function induces a symplectic form (57) on which is the pullback of the canonical symplectic forms and by the maps and ,
With the symplectic form given as above, it is easy to check that is closed:
It was Barndorff-Nielsen and Jupp [21] who first proposed (57) as an induced symplectic form on , apart from a minus sign; they called the divergence function a york.
The fact that this symplectic structure coincides with the one introduced in discrete mechanics should come as no surprise. The and submanifolds are related to the two ways of viewing as a bundle over Q, depending on whether one chooses , or , as the bundle projection. Then, the maps , are, up to a sign, simply the discrete fiber derivatives , where the discrete Lagrangian is replaced by the divergence function .
5.2. Divergence as a Type I Generating Function
As we have seen previously, symplectic maps are a natural way of describing the flow of Hamiltonian mechanics on the cotangent bundle . We will now consider the characterization of symplectic maps in terms of generating functions, and in particular, we review three different parameterizations based on the classification given in Goldstein [23].
Lemma 2.
Given , then on is symplectic if and only if there exists such that
To prove this, observe that
from which, we immediately obtain
Identifying the corresponding terms yield (59).
Type I generating functions are linked with other types of generating functions via partial Legendre transforms. Fixing the first or second variable slot leads to, respectively, Type II or III generating functions, denoted respectively.
Let be a submanifold, with local coordinates , of , with local coordinates , where is dependent on and . Then on is symplectic if and only if there exists such that
Likewise, let be a submanifold, whose local coordinates are , of with local coordinates where is dependent on and . Then on is symplectic if and only if there exists such that
In the case of discrete mechanics, the Type II generating function is denoted by and the Type III generating function is denoted by . We compute their exterior derivatives:
From this, we obtain,
Therefore, symplectic maps can be defined implicitly in terms of a Type II generating function ,
and a Type III generating function ,
More explicitly, these are related to the discrete Lagrangian , which is a Type I generating function, by the following partial Legendre transforms:
or equivalently,
The upshot of the above discussion is that , are Legendre dual variables with respect to , , whereas in the fiberwise Legendre transform , it is , which are dual to , —the dual correspondence is , instead of . As before, the two discrete Legendre dualities are due to the two ways of viewing as a bundle over Q.
In the context of information geometry, is nothing but the partial Legendre transform of the divergence function with respect to the first or second argument. Consider the Bregman divergence ,
and view it as a discrete Lagrangian . Then, its partial Legendre transform with respect to , the Type II generating function , is
which evaluates to
where
is obtained by solving
By substitution, we obtain,
Note that in this case, the Legendre dual of is no longer as given by the fiberwise Legendre map, but is rather shifted by an amount . It is interesting that still takes the form of , as does . This is a special property of taking the Bregman divergence as the generating function.
5.3. -Divergence for Decoupling L and H
In geometric mechanics, Hamiltonian and Lagrangian dynamics represent one and the same dynamics–they are coupled; this is because and are related by the fiberwise Legendre transform –in fact they are a Legendre pair. The conservation properties of the Hamiltonian approach with respect to the underlying symplectic geometry and the variational principles that arise in the Lagrangian and Hamilton–Jacobi theories reflect two sides of the same coin.
To appreciate this, we look at the interaction of three manifolds , and . We take to be the configuration variable q at successive time-step—it is the dynamical equation that governs the evolution from to . The Hamiltonian dynamics, which is encoded in the preservation of of , governs discrete Hamiltonian flow , through a Type I generating function . On the other hand, the Lagrangian flow is governed by the retraction map , such as the Dirichlet-to-Neumann map induced by Jacobi’s solution to the Hamilton–Jacobi equation. Those two dynamic updates need not be identical. In mechanics, the Hamiltonian energy conservation system and the Lagrangian extremization system lead to one and the same dynamics, precisely because and are linked through the fiberwise Legendre transform at :
In other words, L and H are perfectly coupled–with no duality gap.
Information geometry, on the other hand, starts with a divergence (or contrast) function on , which measures the discrepancy between the two systems. Given on and on , we write
Theorem 2.
Let and be strictly convex functions, defined on and in terms of the variables and , respectively. Then, for the following statements, any two imply the rest:
- (i)
- ;
- (ii)
- and are (fiberwise) convex conjugate (Legendre dual) to each other;
- (iii)
- ;
- (iv)
- .
When , with , and
Then,
The Euler–Lagrange equations are equivalent to
Our insight here is that does not have to vanish identically. The consequence is that we do not require the Lagrangian dynamics (extremization dynamics) and Hamiltonian dynamics (conservation dynamics) to be coupled; they will be allowed to evolve independently. The function allows us to study fiberwise symplectomorphisms of Dirac manifolds.
Let us consider the case that (ii) holds, i.e., and are Legendre duals to each other. Then, the canonical divergence can be written as the Bregman divergence and , after applying the fiberwise Legendre map or ,
This implies that,
and they satisfy,
This is the reference-representation biduality [18,19], which is satisfied whenever L and H are Legendre duals of each other.
5.4. Variational Error Analysis
Recall that we previously defined the exact discrete Lagrangian (16), which is related to Jacobi’s solution of the Hamilton–Jacobi equation. The significance of the exact discrete Lagrangian is that it generates the exact discrete time flow of a Lagrangian system, but in general it cannot be computed explicitly. Instead, a computable discrete Lagrangian is used instead to construct a discretization of Lagrangian mechanics, and it induces the discrete Lagrangian map .
Since discrete variational mechanics is expressed in terms of discrete Lagrangians, and the exact discrete Lagrangian generates the exact flow map of a continuous Lagrangian system, it is natural to ask whether we can characterize the order of accuracy of the Lagrangian map as an approximation of the exact flow map, in terms of the extent to which the discrete Lagrangian approximates the exact discrete Lagrangian . This is indeed possible, and is referred to as variational error analysis. Theorem 2.3.1 of [11] shows that if a discrete Lagrangian approximates the exact discrete Lagrangian to order p, i.e., , then the discrete Hamiltonian map, , viewed as a one-step method, is order p accurate.
As mentioned above, the divergence function from information geometry can serve as a Type I generating function of a symplectic map, and hence it can be viewed as a discrete Lagrangian in the sense of discrete Lagrangian mechanics. A divergence function also generates the Riemannian metric and affine connection structures on the diagonal manifold (Lemma 1), in addition to generating the symplectic structure on . Viewed in this way, a natural question is to what extent can we view the divergence function as corresponding to the exact Lagrangian flow of an associated continuous Lagrangian. We can show that
Theorem 3.
The exact discrete Lagrangian associated with the geodesic flow, with respect to the induced metric g, can be approximated by a divergence function up to third order accuracy,
if and only if Q is a Hessian manifold, i.e., is the Bregman divergence , for some strictly convex function Φ.
Proof.
Let us expand the exact discrete Lagrangian to obtain,
where .
From the definition of a divergence function:
Differentiating with respect to q,
so
Differentiating with respect to q again,
Observe that the left-hand side is the metric induced by the divergence function,
Expanding around for :
we obtain
where
Clearly, , and
Comparing the corresponding terms in powers of h, we obtain,
This, according to Proposition 1, demonstrates that the manifold is Hessian, and hence dually-flat. So, for the expansions to agree to , the inducing divergence function must be the Bregman divergence . ☐
6. Summary
In this paper, we show the differences and connections between geometric mechanics and information geometry in canonically prescribing differential geometric structures on a smooth manifold Q. The Legendre transform plays crucial roles in both; however, they serve very different purposes. In geometric mechanics, the fiberwise Legendre map serves to link the cotangent bundle with tangent bundle , whereas in information geometry, the Legendre transform relates the pair of biorthogonal coordinates, which are special coordinates on a dually-flat manifold Q. More specifically, (or its inverse ) is invoked to establish the isomorphism between in geometric mechanics, whereas in information geometry, a Hessian metric g built upon a convex function on Q is used for the correspondence between two coordinate systems on Q, and also for potentially (but not necessarily) establishing a correspondence between and .
The link between information geometry and discrete mechanics is much stronger when one considers the discrete version (as opposed to the traditional, continuous version) of geometric mechanics. Both endow a symplectic structure on , through the use of a discrete Lagrangian in the case of geometric mechanics and a divergence function in the case of information geometry—in fact they are both Type I generating functions for inducing on via pullback from the canonical symplectic structure on . Using the Legendre transform, Type II generating functions can be constructed, which lead to the (right) discrete Hamiltonian in geometric mechanics and to the dual divergence function in information geometry.
Our analyses draw a distinction between the fiberwise Legendre map (which is used in continuous mechanics setting), the Legendre transform between biorthogonal coordinates (which is used in information geometry), and the Legendre transform between Type I and Type II generating functions (which is used in the setting of both discrete geometric mechanics and information geometry). The distinctions are more prominent when one considers the Pontryagin bundle . There, we can construct a divergence function that actually measures the duality gap between the Lagrangian function and the Hamiltonian function that generate a pair of (forward and backward) Legendre maps. In so doing, we demonstrate that information geometry can be viewed as an extension of geometric mechanics based on Dirac mechanics and geometry, with a full-blown duality between the Lagrangian and Hamiltonian components.
7. Discussion and Future Directions
Noda [24] showed that, with respect to the symplectic structure on , the Hamiltonian flow of the canonical divergence induces geodesic flows for ∇ and . He interpreted biorthogonal coordinates as a single coordinate system on , in a way that is consistent with treating as the Type I generating function on . It remains unclear how the resulting Hamiltonian flow is related to dynamical flow on the Dirac manifold.
In another related work, Ay and Amari [25] sought to characterize the canonical form of divergence functions for general (non dually-flat) manifolds. They investigated the retraction map which we discussed in Section 2.5, and used the exponential map associated to any torsion-free affine connection ∇ on . This approach, based on parallel transport, in essence generates a semispray on , and is quite different from characterizing the dynamics using the Hamilton flow on . Note that even though one may define a symplectic structure (through pullback) on as well, Ay and Amari [25] treats the semispray on as the primary geometric object. Future research will clarify its relation to our approach, which is based on defining a symplectic structure on directly.
Finally, comparing information geometry with geometric mechanics may shed light on universal machine learning algorithms. In machine learning or state estimation applications, we wish to have the estimated distribution be influenced by the observations, so that the estimated distribution eventually becomes consistent with the observed data. Let denote the sequence of predictions by (possibly a series of) model distributions , and let denote the actual data generated by an unknown distribution that we are trying to estimate. In practice, the divergence functions are constructed so that the pseudo-distance between two distributions and can be computed using only complete information about and samples from . As such, we can measure the mismatch between the current prediction and the actual data using , since the asymmetry in the definition of is such that we only require samples from the true but unknown distribution. So, adding a momentum term to ensure gentle change in model predictions, a possible choice of a discrete Lagrangian for generating the discrete dynamics for the machine learning application might be given by
where the first term can be interpreted as the action associated with the kinetic energy, and the second term is the action associated with the potential energy. By construction, the term vanishes when the prediction is consistent with the actual observation , and it is positive otherwise, so the term can be viewed as a potential energy term that penalizes mismatch between the estimated distribution and the observational data. Our variational error analysis may thus shed light on an asymptotic theory of inference where sample size is akin to discretization step .
The link between geometric mechanics and information geometry, as revealed through our present investigation, is still rather preliminary. The possibility of a unified mathematical framework for information and mechanics is intriguing and remains a challenge for future research.
Acknowledgments
We thank the anonymous reviewers for helping to improve this paper. The first author is supported by NSF grants CMMI-1334759 and DMS-1411792. The second author is supported by DARPA/ARO Grant W911NF-16-1-0383.
Author Contributions
Both authors contributed equally to the research and writing of the manuscript. Both authors have read and approved the final manuscript.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Marsden, J.E.; Ratiu, T.S. Introduction to Mechanics and Symmetry: A Basic Exposition of Classical Mechanical Systems, 2nd ed.; Springer: New York, NY, USA, 1999. [Google Scholar]
- Amari, S. Differential-Geometrical Methods in Statistics; Lecture Notes in Statistics; Springer: New York, NY, USA, 1985. [Google Scholar]
- Amari, S.; Nagaoka, H. Methods of Information Geometry (Translations of Mathematical Monographs); Translated from the 1993 Japanese original by Daishi Harada; American Mathematical Society: Providence, RI, USA; Oxford University Press: Oxford, UK, 2000. [Google Scholar]
- Zhang, J. Divergence Functions and Geometric Structures They Induce on a Manifold. In Geometric Theory of Information; Nielsen, F., Ed.; Springer: Berlin, Germany, 2014; pp. 1–30. [Google Scholar]
- Zhang, J. Divergence function, duality, and convex analysis. Neural Comput. 2004, 16, 159–195. [Google Scholar] [CrossRef] [PubMed]
- Zhang, J.; Matsuzoe, H. Dualistic Riemannian Manifold Structure Induced from Convex Functions. In Advances in Applied Mathematics and Global Optimization: In Honor of Gilbert Strang; Gao, D., Sherali, H., Eds.; Springer: Boston, MA, UK, 2009; pp. 437–464. [Google Scholar]
- Abraham, R.; Marsden, J. Foundations of Mechanics, 2nd ed.; Benjamin/Cummings Publishing: Reading, MA, USA, 1978. [Google Scholar]
- Yoshimura, H.; Marsden, J. Dirac structures in Lagrangian mechanics Part I: Implicit Lagrangian systems. J. Geom. Phys. 2006, 57, 133–156. [Google Scholar] [CrossRef]
- Yoshimura, H.; Marsden, J. Dirac structures in Lagrangian mechanics Part II: Variational structures. J. Geom. Phys. 2006, 57, 209–250. [Google Scholar] [CrossRef]
- Tulczyjew, W.M.; Urbanski, P. A slow and careful Legendre transformation for singular Lagrangians. Acta Phys. Pol. B 1999, 30, 2909–2978. [Google Scholar]
- Marsden, J.; West, M. Discrete mechanics and variational integrators. Acta Numer. 2001, 10, 317–514. [Google Scholar] [CrossRef]
- Lall, S.; West, M. Discrete variational Hamiltonian mechanics. J. Phys. A 2006, 39, 5509–5519. [Google Scholar] [CrossRef]
- Leok, M.; Ohsawa, T. Variational and Geometric Structures of Discrete Dirac Mechanics. Found. Comput. Math. 2011, 11, 529–562. [Google Scholar] [CrossRef]
- Simon, U. Affine differential geometry. In Handbook of Differential Geometry; Elsevier Science: Amsterdam, The Netherlands, 2000; Volume I, pp. 905–961. [Google Scholar]
- Lauritzen, S. Statistical manifolds. In Differential Geometry in Statistical Inference; IMS Lecture Notes; Amari, S., Barndorff-Nielsen, O., Kass, R., Lauritzen, S., Rao, C., Eds.; IMS: Hayward, CA, USA, 1987; Volume 10, pp. 163–216. [Google Scholar]
- Eguchi, S. Geometry of minimum contrast. Hiroshima Math. J. 1992, 22, 631–647. [Google Scholar]
- Bregman, L.M. The Relaxation Method of Finding the Common Point of Convex Sets and Its Application to the Solution of Problems in Convex Programming. USSR Comput. Math. Phys. 1967, 7, 200–217. [Google Scholar] [CrossRef]
- Zhang, J. Dual scaling of comparison and reference stimuli in multi-dimensional psychological space. J. Math. Psychol. 2004, 48, 409–424. [Google Scholar] [CrossRef]
- Zhang, J. Reference duality and representation duality in information geometry. AIP Conf. Proc. 2015, 1641, 130–146. [Google Scholar]
- Shima, H. The Geometry of Hessian Structures; World Scientific Publishing: Hackensack, NJ, USA, 2007. [Google Scholar]
- Barndorff-Nielsen, O.E.; Jupp, P.E. Yorks and symplectic structures. J. Stat. Plan. Inference 1997, 63, 133–146. [Google Scholar] [CrossRef]
- Zhang, J.; Li, F. Symplectic and Kähler Structures on Statistical Manifolds Induced from Divergence Functions. In Proceedings of the Geometric Science of Information, Paris, France, 28–30 August 2013. [Google Scholar]
- Goldstein, H. Classical Mechanics, 2nd ed.; Addison-Wesley Series in Physics; Addison-Wesley Publishing: Reading, MA, USA, 1980. [Google Scholar]
- Noda, T. Symplectic structures on statistical manifolds. J. Aust. Math. Soc. 2011, 90, 371–384. [Google Scholar] [CrossRef]
- Ay, N.; Amari, S. A novel approach to canonical divergences within information geometry. Entropy 2015, 17, 8111–8129. [Google Scholar] [CrossRef]
© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).