Abstract
The statistical bundle is the set of couples () of a probability density Q and a random variable W such that . On a finite state space, we assume Q to be a probability density with respect to the uniform probability and give an affine atlas of charts such that the resulting manifold is a model for Information Geometry. Velocity and acceleration of a one-dimensional statistical model are computed in this set up. The Euler–Lagrange equations are derived from the Lagrange action integral. An example Lagrangian using minus the entropy as potential energy is briefly discussed.
1. Introduction
The set-up of classical Lagrangian Mechanics is a finite-dimensional Riemannian manifold. For example, see the monographs by V.I. Arnold ([1], Chapters III–IV), R. Abraham and J.E. Mardsen ([2], Chapter 3), J.E. Marsden and T.S. Ratiu ([3], Chapter 7). Classical Information geometry, as it was first defined in the monograph by S.-I. Amari and H. Nagaoka [4], views parametric statistical models as a manifold endowed with a dually-flat connection. In a recent paper, M. Leok and J. Zhang [5] have pointed out the natural relation between these two topics and have given a wide overview of the mathematical structures involved.
In the present paper, we take up the same research program with two further qualifications. First, we assume a non-parametric approach by considering the full set of positive probability functions on a finite set, as it was done, for example, in our review paper [6]. The discussion is restricted here to a finite state space to avoid difficult technical problems. Second, we consider a specific expression of the tangent space of the statistical manifold, which is a Hilbert bundle that we call a statistical bundle. Our aim is to emphasize the basic statistical intuition of the geometric quantities involved. Because of that, we chose to systematically use the language of non-parametric differential geometry as it is developed in the monography of S. Lang [7].
Herein, we use our version of Information Geometry; see the review paper [6]. Preliminary versions of this paper have been presented at the SigmaPhy2017 Conference held in Corfu, Greece, 10–14 July 2017, and at a seminar held at Collegio Carlo Alberto, Moncalieri, on 5 September 2017. In these early versions, we did not refer to Leok and Zhang’s work, which we were unaware of at that time.
In Section 2, we review the definition and properties of the statistical bundle, and of the affine atlas that endows it with both a manifold structure and a natural family of transports between the fibers. In Section 3, we develop the formalism of the tangent space of the statistical bundle and derive the expression of the velocity and the acceleration of a one-dimensional statistical model in the given affine atlas. The derivation of the Euler–Lagrange equations, together with a relevant example, is discussed in Section 4.
2. Statistical Bundle
We consider a finite sample space , with . The probability simplex is , and is its interior. The uniform probability on is denoted as , , . The maximal exponential family is the set of all strictly positive probability densities of . The expected value of with respect to the density is denoted .
In [6,8,9], we made the case for the statistical bundle being the key structure of Information Geometry. The statistical bundle with base is
The statistical bundle is a semi-algebraic subset of ; i.e., it is defined by algebraic equations and strict inequalities. It is trivially a real manifold. At each , the fiber is endowed with the scalar product
To this structure we add a special affine atlas of charts in order to show a structure of affine manifold, which is of interest in the statistical applications. The exponential atlas of the statistical manifold is the collection of charts given for each by
where (with a slight abuse of notation)
As , we say that is the chart centered at P. If , it is easy to derive the exponential form of Q as a density with respect to P; namely, . As , then , so that the cumulant function is defined on by
that is, is the expression in the chart at P of Kullback–Leibler divergence of , and we can write
The patch centered at P is
In statistical terms, the random variable is the relative point-wise information about Q relative to the reference P, while is the deviation from its mean value at P. The expression of the other divergence in the chart centered at P is
The equation above shows that the two divergences are convex conjugate functions in the proper charts; see [10].
3. The Tangent Space of the Statistical Bundle
Let us compute the expression of the velocity at time t of a smooth curve in the chart centered at P. The expression of the curve is
and hence we have, by denoting the derivative in by the dot,
and
If we define the velocity of to be
then is a curve in the statistical bundle whose expression in the chart centered at P is . The velocity as defined above is nothing else as the score function of the one-dimensional statistical model; see e.g., the textbook by B. Efron and T. Haste (Section 4.2, [11]). The variance of the score function (i.e., the squared norm of in ) is classically known as Fisher information at t.
We define the second statistical bundle to be
with charts
we can identify the second bundle with the tangent space of the first bundle as follows.
For each curve in the statistical bundle, define its velocity at t to be
because is a curve in the second statistical bundle, and its expression in the chart at P has the last two components equal to the values given in Equations (3) and (4).
In particular, consider the a curve . The velocity is
where the acceleration is
It should be noted that the acceleration has been defined without explicitly mentioning the relevant connection. In fact, the connection here is implicitly defined by the transports , which is unusual in Differential Geometry, but is quite natural from the probabilistic point of view; see P. Gibilisco and G. Pistone [12]. We shall see below that the non-parametric approach to Information Geometry allows the definition of a dual transport, hence a dual connection as it was in [4]. Because of that, we could have defined other types of acceleration together with the one we have defined. Namely, we could consider an exponential acceleration , a mixture acceleration , and a Riemannian acceleration
each acceleration being associated with a specific connection; see the review paper [6]. We do not further discuss the different second-order geometries associated with the statistical bundle in this paper.
Example 1 (Boltzmann–Gibbs).
Let us compare the formalism we have introduced above with standard computations in Statistical Physics. The Boltzmann–Gibbs distribution gives to point the probability , with and , see Landau and Lifshitz ([13], Chapter 3). As a curve in , it is because of the reference to the uniform probability. The velocity defined above becomes in this case , while the acceleration of Equation (5) is . Notice that we have the equation .
Following the original construction of Amari’s Information Geometry [4], we have defined on the statistical bundle a manifold structure which is both an affine and a Riemannian manifold. The base manifold is actually a Hessian manifold with respect to any of the convex functions , (see [14]). Many computations are actually performed using the Hessian structure. The following equations are easily checked and frequently used:
We have defined a centering operation that can be thought of as a transport among fibers,
whose adjoint is . In fact, is the adjoint of ,
Moreover, iff , then
Example 2 (Entropy flow).
This example is taken from [8]. In the scalar field , there is no dependence on the fiber. If is a smooth curve in expressed in the chart centered at P, then we can write
where the argument of the last expectation belongs to the fiber and we have expressed the expected value as a derivative by using Equation (7).
Again using Equations (7) and (9), we compute the derivative of the entropy along the given curve as
We use now the equations
and to obtain
We have identified the gradient of the entropy in the statistical bundle,
Notice that the previous computation could have been done using the exponential family . See the computation of the gradient flow in [8].
In the next section, we extend the computation illustrated in the example above to scalar fields on the statistical bundle.
4. Lagrangian Function
A Lagrangian function is a smooth scalar field on the statistical bundle
At each fixed density , the partial mapping
is defined on the vector space ; hence, we can use the ordinary derivative, which in this case is called the fiber derivative,
Example 3 (Running Example 1).
If
then . The example is suggested by the form of the classical Lagrangian function in mechanics, where the first term is the kinetic energy and is the potential energy.
As the statistical bundle is non-trivial, the computation of the partial derivative of the Lagrangian with respect to the first variable requires some care. We want to compute the expression of the total derivative in a chart of the affine atlas defined in Equations (1) and (2).
Let be a smooth curve in the statistical bundle. In the chart centered at P, we have
with being a smooth curve in . Let us compute the velocity of variation of the Lagrangian L along the curve .
with . It follows that
If we write and , then we have
where is the fiber derivative of L. As and , it follows from Equations (16) and (17) that
In the equation above, the first term on the RHS does not depend on P because the LHS and the second term of the RHS do not depend on P. Hence, we define the first partial derivative of the Lagrangian function to be
so that the derivative of L along becomes
In particular, if , then
see Equation (5).
Example 4 (Running Example 2).
With the Lagrangian of Equation (15), we have
see Equations (9) and (11). The first partial derivative is
where we have used Equations (9) and (10) together with .
We have found that
and also
Using the fiber derivative computed in the first part of the running example, we find
Notice that Equation (12) shows that one of the terms in the equations above is .
5. Action Integral
If is a smooth curve in the exponential manifold, then the action integral
is well defined. We consider the expression of Q in the chart centered at P, .
Given with , for each and , we define the perturbed curve
We have , , and
whose expression in the chart centered at P is .
Let us consider the variation in of the action integral. We apply Equation (19) applied to the smooth curve in given by
where t is fixed. As
and
we obtain
If is a critical curve of the action integral, then ; hence, for all and H, we have
This in turn implies that for each and , the Euler–Lagrange equation holds:
Example 5 (Running Example 3).
For the Lagrangian of Equation (15), we can use Equation (20) in the form
with . For the other term, we have
whose derivative is
Dropping the generic H, the Euler–Lagrange equation becomes
that is,
6. Discussion
We have shown that the research program consisting of applying concepts taken from Classical Mechanics to Statistics makes sense, even if no practical application has been produced in this paper. Some simple examples have been discussed in order to show clearly that the language from classical mechanics is indeed suggestive when applied to typical concepts in Statistics such as Fisher score and statistical entropy. The derivation of the Euler–Lagrange equations is classically done in the set-up of the Riemannian geometry, while here we have used the affine structure of Information Geometry. The present provisional results prompt a generalization to non-finite sample spaces and the development of applications. Finally, the related Hamiltonian formalism remains to be investigated.
Acknowledgments
The Author gratefully thanks Hiroshi Matsuzoe (Nagoya Institute of Technology, Japan), Lamberto Rondoni (Politecnico di Torino, Italy), Antonio Scarfone (CNR and Politecnico di Torino, Italy), Tatsuaki Wada (Ibaraki University, Japan), for their interesting comments on early versions of this piece of research. He thanks two anonymous referees for their useful and enlightening comments. He acknowledges the support of de Castro Statistics, Collegio Carlo Alberto, and of GNAMPA-INdAM.
Conflicts of Interest
The author declares no conflict of interest.
References
- Arnold, V.I. Mathematical Methods of Classical Mechanics, 2nd ed.; Graduate Texts in Mathematics; Springer: New York, NY, USA, 1989; Volume 60, p. xvi+516. [Google Scholar]
- Abraham, R.; Marsden, J.E. Foundations of Mechanics, 2nd ed.; Advanced Book Program, Reading, Mass; Benjamin/Cummings Publishing Co., Inc.: San Francisco, CA, USA, 1978; pp. xxii+m–xvi+806. [Google Scholar]
- Marsden, J.E.; Ratiu, T.S. Introduction to Mechanics and Symmetry: A Basic Exposition of Classical Mechanical Systems, 2nd ed.; Texts in Applied Mathematics; Springer: New York, NY, USA, 1999; Volume 17, p. xviii+582. [Google Scholar]
- Amari, S.; Nagaoka, H. Methods of Information Geometry; American Mathematical Society: Providence, RI, USA, 2000; p. x+206. [Google Scholar]
- Leok, M.; Zhang, J. Connecting Information Geometry and Geometric Mechanics. Entropy 2017, 19, 518. [Google Scholar] [CrossRef]
- Pistone, G. Nonparametric information geometry. In Geometric Science of Information, Proceedings of the First International Conference, GSI 2013, Paris, France, 28–30 August 2013; Nielsen, F., Barbaresco, F., Eds.; Lecture Notes in Computer Science; Springer: Heidelberg, Germany, 2013; Volume 8085, pp. 5–36. [Google Scholar]
- Lang, S. Differential and Riemannian Manifolds, 3rd ed.; Graduate Texts in Mathematics; Springer: Berlin, Germany, 1995; Volume 160, p. xiv+364. [Google Scholar]
- Pistone, G. Examples of the application of nonparametric information geometry to statistical physics. Entropy 2013, 15, 4042–4065. [Google Scholar] [CrossRef]
- Lods, B.; Pistone, G. Information Geometry Formalism for the Spatially Homogeneous Boltzmann Equation. Entropy 2015, 17, 4323–4363. [Google Scholar] [CrossRef]
- Pistone, G.; Rogantin, M. The exponential statistical manifold: mean parameters, orthogonality and space transformations. Bernoulli 1999, 5, 721–760. [Google Scholar] [CrossRef]
- Efron, B.; Hastie, T. Computer Age Statistical Inference: Algorithms, Evidence, and Data Science; Cambridge University Press: New York, NY, USA, 2016; Volume 5, p. xix+475. [Google Scholar]
- Gibilisco, P.; Pistone, G. Connections on non-parametric statistical manifolds by Orlicz space geometry. IDAQP 1998, 1, 325–347. [Google Scholar] [CrossRef]
- Landau, L.D.; Lifshits, E.M. Course of Theoretical Physics. Statistical Physics, 3rd ed.; Butterworth-Heinemann: Oxford, UK, 1980; Volume 5. [Google Scholar]
- Shima, H. The Geometry of Hessian Structures; World Scientific Publishing Co. Pte. Ltd.: Hackensack, NJ, USA, 2007; p. xiv+246. [Google Scholar]
© 2018 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).