Lie Group Cohomology and (Multi)Symplectic Integrators: New Geometric Tools for Lie Group Machine Learning Based on Souriau Geometric Statistical Mechanics

In this paper, we describe and exploit a geometric framework for Gibbs probability densities and the associated concepts in statistical mechanics, which unifies several earlier works on the subject, including Souriau’s symplectic model of statistical mechanics, its polysymplectic extension, Koszul model, and approaches developed in quantum information geometry. We emphasize the role of equivariance with respect to Lie group actions and the role of several concepts from geometric mechanics, such as momentum maps, Casimir functions, coadjoint orbits, and Lie-Poisson brackets with cocycles, as unifying structures appearing in various applications of this framework to information geometry and machine learning. For instance, we discuss the expression of the Fisher metric in presence of equivariance and we exploit the property of the entropy of the Souriau model as a Casimir function to apply a geometric model for energy preserving entropy production. We illustrate this framework with several examples including multivariate Gaussian probability densities, and the Bogoliubov-Kubo-Mori metric as a quantum version of the Fisher metric for quantum information on coadjoint orbits. We exploit this geometric setting and Lie group equivariance to present symplectic and multisymplectic variational Lie group integration schemes for some of the equations associated with Souriau symplectic and polysymplectic models, such as the Lie-Poisson equation with cocycle.


Introduction
A geometric theory of statistical mechanics was developed by Souriau [1], motivated by the observation that Gibbs equilibrium states do not satisfy the usual physical covariance assumptions. This geometric theory, called by him Lie Groups Thermodynamics, is based on a Hamiltonian action of a Lie group on a symplectic manifold, to which are associated generalized Gibbs states, indexed by a Lie algebra parameter β playing the role of a geometric (Planck) temperature. Usual Gibbs states defined from a Hamiltonian appear as special cases in which the Lie group is a one-parameter group. The generalized Gibbs states become compatible with Galileo relativity in classical mechanics and with Gibbs posteriors show excellent performance in diverse tasks, such as classification, regression and ranking. The usual recommendation is to sample from a Gibbs posterior using MCMC (Markov chain Monte Carlo). With covariant Souriau Gibbs density, it is possible to extend MCMC and Gibbs sampler approach for Lie Groups Machine Learning.
More recently, the use of perturbation techniques was proposed as an alternative to MCMC techniques for sampling. These results were extended in conditional random fields loss, proving that the maximum in expectation with low-rank perturbations, provides an upper bound on the log partition (what we call Massieu characteristic function). New lower bounds on the partition function and new unbiased sequential sampler for the Gibbs distribution based on low-rank perturbations were introduced. All these methods are based on sampling from the Gibbs distribution, upper bounding the log partition function. All these results are synthetized in [12], where a new general method is also proposed, with connections to the recently proposed Fenchel-Young losses [13], using doubly stochastic scheme for minimization of these losses, for unsupervised and supervised learning. This is a generalization to the Gibbs distribution.
Methods for learning parameters of a Gibbs disribution on data (y i ) i=1, ..., n are based on maximization of the likelihood that is optimized by gradients methods using the empirical log-likelyhood, given by ∇ θˆ n =ŷ n − E Gibbs,θ (y).
For this method of moment-matching, the expectation of the Gibbs distribution is a challenge in some cases. This approach was replaced by a method called "perturb-and-MAP" to learn the parameters in this model as a proxy for log-likelihood. This minimization is equivalent to maximizing previous equation by substituting the log-partition log ψ(θ) with F (θ) = E F(θ + V) = E max y∈C y, θ + V with a random noise vector V, > 0. This approach could be linked with the use of Fenchel-Young losses [13]. In the perturbed model, the Fenchel-Young loss is given by: L (θ; y) = F (θ) + Ω(y) − θ, y = D Ω (y,ŷ * (θ)) with loss gradient ∇ θ L (θ; y) = ∇ θ F (θ) − y = y * (θ) − y, where y * (θ) = E p θ (y) y = E arg max y∈C y, θ + V and D Ω (y,ŷ * (θ)) is the Bregman divergence associated with Ω. As F generalizes the log-sum-exp function on the simplex, its dual Ω is a generalization of the negative entropy (which is the Fenchel dual of log-sum-exp). These connections were studied in [14].
In this paper, we describe a geometric framework for the study of Gibbs probability densities in statistical mechanics and information geometry, as well as the associated concepts of thermodynamic heat, entropy, and Fisher metric, inspired by Souriau's symplectic model of statistical mechanics. This geometric framework unifies several earlier works on the subject, including Souriau's symplectic model of statistical mechanics, its polysymplectic extension, Koszul model, and approaches developed in quantum information geometry. This approach helps to identify the common geometric structures appearing in various examples and provides a body of geometric tools for information geometry and Lie group machine learning. The emphasis is put on the role of the equivariance with respect to Lie group actions. For instance, we discuss the expression of the Fisher metric in presence of equivariance, we consider the associated Lie-Poisson equations with cocycle (also called affine Lie-Poisson equations) as well as their field theoretic versions, and we exploit the property of the entropy of the Souriau model as a Casimir function, to apply a geometric model for energy preserving entropy production on Lie algebras. In our developments, we make heavily use of several concepts from geometric mechanics, such as momentum maps, Casimir functions, coadjoint orbits, and Lie-Poisson brackets, as unifying concepts appearing in various applications of this framework to information geometry and machine learning. We consider in detail the Koszul model, the polysymplectic extension of the Souriau model, the case of the multivariate Gaussian probability densities, models of information geometry for quantum systems. We exploit the geometric framework to build geometric numerical integrator schemes for some of the equations associated with Souriau's model and its polysymplectic extension. This is achieved by identifying the variational principles underlying these equations and by discretizing these principles, following the techniques of variational discretization, which result in schemes that preserve coadjoint orbits, (multi)symplectic structures, and discrete versions of Noether theorems.
The content of the paper is as follows. In Section 2.1 we present the general geometric framework for Gibbs probability densities that will be used in the paper. In particular, we review the definition of the Massieu potential, the thermodynamic heat, the entropy, the identification of the Fisher metric with the Hessian of the Massieu potential, and the maximum entropy principle. These results are independent of the existence of Lie group symmetries of the theory. The implications of such symmetries are studied in detail Section 2.2 where we present a Lie group equivariant setting that includes as special cases the Souriau model, its polysymplectic extension, and the case of multivariate Gaussian probability densities. The Souriau model is reviewed in Section 2.3 where we show that the associated entropy is a Casimir function for the Lie-Poisson bracket with cocycle and, motivated by an approach developed in quantum information geometry, we take advantage of this property to formulate a geometric model for entropy production. We also present the stochastic Hamiltonian equations associated with the Lie-Poisson bracket with cocycle. The polysymplectic model is reviewed in Section 2.4, where we show that the entropy also satisfies a natural extension of the Casimir property and we formulate a polysymplectic extension of the Lie-Poisson equations with cocycle. Finally, in Section 2.5 we give a general expression of the Fisher metric on orbits when equivariance is assumed. In Section 3 we apply the framework considered in Section 2 to various examples and identify common underlying geometric structures. We start in Section 3.1 with the case of multivariate Gaussian probability densities as an illustration of the general framework for which a cocycle is needed and which does not fall into the setting of the Souriau model. We apply Noether theorem to derive invariant quantities for geodesics of the Fisher metric. We then enlighten in Section 3.2 the strong analogies with quantum information geometry by considering Lie algebras with unitary representation and show that the Fisher metric as defined from the generalized heat capacity in Section 2.1, coincides with the Bogoliubov-Kubo-Mori metric. In this particular case the equation with Casimir dissipation/production reproduces a dissipative model used in quantum information geometry. Finally, in Section 3.3 we consider in detail the case of the Euclidean group of the plane SE(2), the associated Fisher metric, Lie-Poisson equations with cocycle and entropy production equations. In Section 4, we make use of this geometric setting to propose geometric integrators for some of the equations associated with the Souriau model and its polysymplectic extension. We first review some facts on variational integrators on Lie groups in Section 4.1 and about central extensions of Lie groups and the associated Euler-Poincaré equations in Section 4.2. This allows obtaining a variational formulation for the Lie-Poisson equations with cocycle. Based on this, we present a symplectic integrator for the Lie-Poisson equation with cocycle in Section 4.3 and a multisymplectic integrator for the Lie-Poisson field equations with cocycle in Section 4.4.

A Class of Generalized Gibbs Probability Densities, Its Associated Entropy and Fisher Metric
In this section, we present a general framework for Gibbs probability densities in statistical mechanics and information geometry, that includes the classes considered for instance in the Koszul and Souriau models, as well as multivariate exponential families. In particular, we review the importance of the logarithm of the characteristic function, identified as the Massieu potential, from which the entropy arises as its Legendre transform and the Fisher information metric as its Hessian. We also discuss the relation of these Gibbs sates with the maximum entropy principle. While the concepts manipulated here are standard, our aim is to organize them in a general setting that is appropriate for the developments made in this paper.
The results described in this paragraph are independent of possible Lie group symmetries of the theory whose implications will be discussed in Section 2.2.
Let E be a vector space, whose elements will be denoted β since they are generalisations of the inverse temperature. The duality pairing between elements ν of the dual space E * and elements β ∈ E is denoted as ν, β . Besides the vector space E, the setting also involves a manifold M, endowed with a volume form dµ.
Let U : M → E * be a smooth function defined on M with values in E * . Denote by Ω ⊂ E the largest open set such that for all β ∈ Ω the two integrals converge. We denote by ψ : Ω → R the partition function (or characteristic function), given by For all β ∈ Ω, we consider the generalized Gibbs probability densities For application in information geometry it is required that β → p β is injective. It is important to note that the Gibbs densities are not defined on the whole vector space E but only on the open subset Ω. An element β ∈ Ω is called a geometric temperature. From now on we assume that Ω is not empty.
The Massieu potential is the function Φ : Ω → R defined by from which we can write the generalized Gibbs probability densities as The thermodynamic heat Q : Ω → E * is the first derivative of the Massieu potential, i.e., where E β denotes the expectation with respect to p β .
We denote by Ω * the image of the function Ω by Q and assume that Q = DΦ : Ω → Ω * is a diffeomorphism. In this case, we can define the entropy s : Ω * → R as the Legendre transform of the Massieu potential Φ : Ω → R, namely where β = Q −1 (ν). In other words, β ∈ Ω in (6) is such that The name entropy for this Legendre transform is justified by the following result.

Lemma 1.
For every β ∈ Ω, we have the equality where Q(β) is the thermodynamic heat and is the entropy of the probability density p.
Proof. On one hand, using the definition of s in Equation (6) and Φ in Equation (4), we have On the other hand, we compute These expressions are equal.
The Koszul 1-form, [18], defined as the differential of − log ψ(β) coincides with the thermodynamic heat Q : Ω → Ω * of the general setting above. It reads The Koszul metric defined as the second derivative of log ψ(β) coincides with the Fisher metric of information geometry from Proposition 2. It reads From Proposition 4, given ν ∈ Ω * , the Koszul density of the cone Ω with β = Q −1 (ν), satisfies the maximum entropy principle see [3] for a direct proof. An important example is Ω := sym + (n) ⊂ E = sym(n), the cone of symmetric positive definite n × n matrices. The dual space is chosen as E * = sym(n) with duality pairing ν, β = Tr(ν T β). In this case, it is well-known that Ω * = Ω. The generalized Gibbs probability densities are where the Koszul-Vinberg characteristic function can be explicitly computed as The Massieu potential is deduced as Φ(β) = − log(ψ(β)) = n + 1 2 log(det(β)) − log(ψ(I n )) (10) and the thermodynamic heat and entropy are We can thus write the generalized Gibbs probability densities as Finally, the expression of the Fisher metric on Ω is found by using Equation (10) as for every δβ 1 , δβ 2 ∈ E.

Equivariance with Respect to Lie Group Actions
In this section, we study the consequences of the equivariance of the function U appearing in the generalized Gibbs probability densities. More precisely, given a Lie group G, we assume that U : M → E * is G-equivariant with respect to an action of the Lie group on M and an affine action of the Lie group on E * . This setting includes as special cases the Souriau symplectic model of statistical mechanics [24], its polysymplectic extension [5], the case of multivariate Gaussian densities, as treated for instance in [4], and approaches developed in quantum information geometry [25], for which the Fisher metric will be shown to coincide with the Bogoliubov-Kubo-Mori metric in Section 3.2.
Let G be a Lie group, and let be a left action of G on M, i.e., φ satisfies for every g, h ∈ G, with id M the identity on M. We denote by g the Lie algebra of G. The infinitesimal generator of the action corresponding to ξ ∈ g is the vector field ξ M on M defined by for every m ∈ M, where exp : g → G is the Lie group exponential map. We also consider a left linear action of G on the vector space E, ρ g ∈ L(E, E). We denote by ρ * : We recall that a group one-cocycle with respect to ρ * is a map θ ∈ C ∞ (G, E * ) such that for every g, h ∈ G. Equivalently, a group one-cocycle θ ∈ C ∞ (G, E * ) with respect to ρ * is such that is an affine left action of G on E * . Finally, we recall that the Jacobian of the action φ g : M → M relative to the volume form dµ is the function Jφ g : M → R defined by φ * g dµ = Jφ g dµ, where φ * g denotes the pull-back of the n-form dµ by the diffeomorphism φ g . We will be interested in actions which satisfy Jφ g = c(g) is a constant on M. Please note that c(gh) = c(g)c(h), for every g, h ∈ G. The particular case c(g) = 1 corresponds to volume preserving diffeomorphisms.

Proposition 5.
Assume that the action φ of G on M satisfies φ * g µ = c(g)µ and the function U is G-equivariant: for all g ∈ G and m ∈ M, where θ ∈ C ∞ (G, E * ) is a group one-cocycle. Then the open subset Ω ⊂ E is invariant under the action of G on E, the partition function ψ satisfies ψ(ρ g (β)) = ψ(β)c(g)e θ(g −1 ),β for every g ∈ G, and the probability density p β satisfies is the pull-back of a density. As a consequence, the Massieu potential Φ(β), the thermodynamic heat Q(β), the entropy s(ν), and the heat capacity K(β) satisfy the following equivariance properties for every g ∈ G.
Proof. Using Equation (16)  The other statement are checked in a similar way, by using Equations (13)- (16).
This proposition unifies in a single statement, several Lie group equivariance properties observed in several models for information geometry and Lie group machine learning, see, e.g., [3][4][5]7,8]. Before discussing the symplectic and polysymplectic models we illustrate below these equivariance properties for the Koszul model recalled above.
Equivariance in the Koszul model. For the Koszul model recalled above, see [3] and references therein, G = Aut(Ω) is the group of linear isomorphism that preserves Ω ⊂ E. Given g ∈ Aut(Ω), we have ρ g : Ω → Ω and it is clear that the dual action ρ * g preserves the dual cone Ω * . In this very special case, M = Ω * and the G action on M is chosen as φ g := ρ * g −1 . Since U : Ω * → E * is the identity, there is no cocycle. However, we have c(g) = Jφ g which is not equal to one in general and, for instance, the transformation Equation (17) of the Massieu potential reads Φ(ρ g (β)) = Φ(β) − log(c(g)).
Let us consider as special case the cone of symmetric positive definite matrices Ω = sym + (n) ⊂ E = sym(n). The dual space is chosen as E * = sym(n) with duality pairing ν, β = Tr(ν T β) and we have Ω * = Ω.
We consider the left action of GL(n) on E = sym(n) given by Therefore, we have Proposition 5 directly yields the following equivariance properties

Souriau Symplectic Model of Statistical Mechanics
In this section, we show that the Souriau symplectic model of statistical mechanics [24] arises as a special case of the preceding setting, by considering (M, ω) a symplectic manifold and dµ the Liouville form associated with ω.
We then exploit this setting to show that the entropy in the Souriau model is a Casimir function of the Lie-Poisson bracket with Lie algebra cocycle associated with the nonequivariance cocycle of the momentum map, i.e., it Poisson commutes with every functions. Based on this we formulate a dynamical geometric model for dissipation/production of this Casimir, following the Lie algebraic setting proposed in [26,27]. This allows us to clarify the link between the geometry underlying Souriau symplectic models and that underlying models proposed in [25] in the framework of quantum physics by information geometry for some Lie algebras, see also [28]. Details will be given in Section 3.2. Finally, we present a stochastic perturbation of the Lie-Poisson equations with cocycle within the setting of stochastic Hamiltonian dynamics.
To present the Souriau model, we first quickly recall below the notion of momentum map and nonequivariance cocycle for symplectic manifolds, see, e.g., [29][30][31]. Consider a symplectic manifold (M, ω), i.e., a manifold M endowed with a closed non-degenerate two form ω. The associated Liouville form is dµ = (−1) n(n−1)/2 n! ω ∧ . . . ∧ ω (n times), where 2n = dim M. Given a function h : M → R, the Hamiltonian vector field associated with H is the vector field X h defined by Recall that the symplectic form ω defines the Poisson bracket (see Remark 7) on functions f , g ∈ C ∞ (M).
A Lie group action φ : G × M → M of G on M is symplectic, if it preserves the symplectic form, i.e., φ * g ω = ω, for every g ∈ G. Taking the derivative of this identity with respect to g at g = e, we get £ ξ M ω = 0, for every ξ ∈ g, where ξ M is the infinitesimal generator associated with the Lie algebra element ξ ∈ g, see Equation (12), and £ is the Lie derivative. Equivalently, we have for every ξ ∈ g, i.e., the one-form i ξ M ω is locally exact. If it is globally exact, i.e., if ξ M is a Hamiltonian vector field for every ξ ∈ g, then the action is called Hamiltonian and admits a momentum map J : M → g * , which satisfies When M is connected, there is a well-defined group one-cocycle θ : G → g * , called the nonequivariance cocycle, given by where m ∈ M can be arbitrarily chosen. It characterizes the nonequivariance of the momentum map with respect to the action of G on M and the coadjoint action of G on g * . The group one-cocycle property is for every g, h, ∈ G. We consider its differential Θ := T e θ seen as a map Θ : Taking the derivative of the relation above, we get where the last term uses the Poisson bracket Equation (24). The map Θ : g × g → R is bilinear, skew-symmetric, and, as can be readily verified, satisfies the Lie algebra two-cocycle identity We refer to [29][30][31] for detailed introductions to these concepts.
Remark 6 (Lie group and Lie algebra cohomology). A group one-cocycle θ ∈ C ∞ (G, g * ) is called a group one-coboundary if there is a λ ∈ g * such that for every g ∈ G. The quotient space of one-cocycles modulo one-coboundaries is called the first group cohomology of G and is denoted by H 1 (G, g * ). These definitions extend to arbitrary representation of G on a vector space, as in Equation (14). A Lie algebra two-cocycle Θ is called a Lie algebra two-coboundary if there is λ ∈ g * such that for all ξ, η ∈ g. The quotient space of Lie algebra two-cocycles by Lie algebra two-coboundaries is called the second Lie algebra cohomology of g and is denoted by H 2 (g, R).

Souriau Symplectic Model of Satistical Mechanics
The Souriau symplectic model of statistical mechanics is obtained by considering the following specific situation in the setting described in Section 2.2: M : a symplectic manifold dµ : the Liouville volume φ g : a Hamiltonian action E = g : the Lie algebra of G ρ g = Ad g : the adjoint action of G on g U = J : M → g * : a momentum map.
In particular, the thermodynamic heat becomes Q(β) = E β (J) and the Fisher metric on Ω ⊂ g is Proposition 5 directly yields the following equivariance properties Q(Ad g β) = Ad * g −1 (Q(β)) + θ(g) (29) s(Ad * g −1 ν + θ(g)) = s(ν) K(Ad g β) Ad g δβ 1 , Ad g δβ 2 = K(β) δβ 1 , δβ 2 , for every g ∈ G. Note also that Ω * is invariant under the affine action ν → Ad * g −1 ν + θ(g). From Proposition 4, given ν ∈ Ω * ⊂ g * , the generalized Gibbs probability density We refer to [2] for a detailed presentation of Souriau's model. We refer to [32] for recent developments exploiting Souriau's model. As mentioned earlier in the general case, it is important to note that the generalized Gibbs densities are not defined on the whole Lie algebra g but only on the open subset Ω ⊂ g of geometric temperatures. As already observed by Souriau the set Ω can be empty in some examples, such as the case of the action of the Galilean group. In this case, Souriau's method considers Gibbs densities associated with one-parameter subgroups of the acting Lie group.

Lie-Poisson Equations with Cocycle and Property of the Entropy in Souriau's Model
From Equation (30), we note that the entropy s is constant on the affine coadjoint orbits defined by for µ 0 ∈ g * . It is well-known that affine coadjoint orbits are symplectic manifolds, with symplectic form given by for µ ∈ O, ξ, η ∈ g. This is an extension to the affine case of the well-known Kirilov-Kostant-Souriau symplectic form on coadjoint orbits. The connected components of the affine coadjoint orbits Equation (32) are the symplectic leaves in the Poisson manifold (g * , see, e.g., [29]. The Hamiltonian system (see Remark 7) associated with the Lie-Poisson bracket with cocycle Equation (34) and to a given Hamiltonian function h : g * → R is given by the Lie-Poisson equations with cocycle (or affine Lie-Poisson equations) which yield the dynamical system for a curve µ(t) ∈ g * . This dynamical system preserves each affine coadjoint orbit Equation (32) and defines on each of them a Hamiltonian system with respect to the symplectic form Equation (33). The Lie-Poisson equations with cocycle have important applications, in particular they appear in the geometric formulation of complex fluids, see [33][34][35], and geometrically exact (Cosserat) rods, see [36,37]. See [38] for another point of view on Lie-Poisson equations with cocycle. These equations are also referred to as affine Lie-Poisson equations or Lie-Poisson equations with non-zero cohomology.
Remark 7 (Poisson brackets and reduction, see [31]). Recall that a Poisson bracket on a manifold M is a Lie algebra structure {·, ·} on C ∞ (M) which is a derivation in each factor: on the dual of any Lie algebra g, as well as its affine modified version Equation (34) by a two-cocycle Θ.
The Hamiltonian system associated with a Poisson bracket and a given Hamiltonian h ∈ C ∞ (M) is the dynamical system characterized by the condition for every functions f ∈ C ∞ (M), see for instance Equations (35) and (36) . An important point for applications in mechanics is the understanding of such Poisson structures as being induced from a canonical symplectic form (or, equivalently, from the associated canonical Poisson bracket) on a cotangent bundle, via reduction by symmetry relative to a Lie group action. This is the case for the Lie-Poisson bracket Equation (37) which is induced by the canonical symplectic form on T * G and the action of G on T * G given by the cotangent lifted action of right or left translation. The Lie-Poisson bracket with cocycle Equation (34) is induced by the canonical symplectic form on T * G and an affine modified cotangent lifted action of right or left translation ( [33]). for every smooth functions f : g * → R.
Proof. From Equation (30), we have As a consequence of the above, the information manifold foliates into level sets of the entropy, containing a family of coadjoint orbits, that could be interpreted in Thermodynamics: motion remaining on theses level sets is non-dissipative, whereas motion transversal to these level sets is dissipative. The affine Kirillov-Kostant-Souriau form makes each orbit into a homogeneous symplectic manifold. Hamiltonian motion on these affine coadjoint orbits is given by the solutions of the Lie-Poisson equations with cocycle Equation (36). We shall present below a geometric way to introduce dissipation and hence, motion through affine coadjoint orbits.

Elementary examples.
A particularly simple case of Souriau symplectic model is when the symplectic manifold is a cotangent bundle M = T * Q endowed with the canonical symplectic form. Let G be a Lie group acting on the left on Q. Then its cotangent lifted action on T * Q is symplectic and admits the momentum map J : T * Q → g * given by In this case, there is no cocycle, which yields obvious simplifications in the properties Equations (28)- (30).
Another case without cocycle is when M is an affine coadjoint orbit M = O = {Ad * g −1 µ + θ(g) | g ∈ G} endowed with the symplectic form Equation (33). In this case, the momentum map is simply the inclusion J : O → g * of the affine coadjoint orbit in the dual of the Lie algebra g * , [29]. While this example is simple, it plays an important role in the applications, e.g., [8,39]. An example with nonequivariance cocycle will be treated in detail in Section 3.3 for the special Euclidean group of the plane.

Dynamics with Casimir Dissipation/Production
We take advantage of the Casimir function s associated with the Souriau model, to formulate a dynamical geometric model for dissipation/production of this Casimir. This allows us to clarify the link between Souriau symplectic models and models proposed in [25] in the framework of quantum physics by information geometry for some Lie algebras, see also [28].
We follow the general Lie algebraic approach developed in [26,27] for Casimir dissipation, slightly extended here to take into account of a cocycle, and to a wider class of dissipation.
Given a symmetric positive bilinear form γ : g × g → R, a Hamiltonian h : g * → R, a parameter Λ = 0, and a function k : g * → R such that we consider the modification of the Lie-Poisson equations with cocycle Equation (35) given by for every f . We denote by : g → g * the flat operator associated with γ. That is, the linear form ξ ∈ g * is given by ξ (η) = γ(ξ, η), for all ξ, η ∈ g. Please note that the flat operator need not be either injective or surjective. Using the equality For Θ = 0 and h = k, this is the model proposed in [26,27] and applied there in the infinite dimensional setting, with applications to geophysical fluids and magnetohydrodynamics.
The main properties of system Equation (41) are the following.
(i) Energy conservation: taking f = h in Equation (40), we obtain because of Equation (39) and since {h, h} Θ = 0. Hence the total energy h is preserved. (ii) Casimir dissipation (Λ > 0) or production (Λ < 0): taking f = s in Equation (40), and using {s, f } Θ = 0, we obtain We will explain in Section 3.2 how system Equation (41) recovers the model proposed in [25] in the context of information geometry for quantum systems for Lie algebras with unitary representation.

Stochastic Hamiltonian Dynamics
We shall briefly discuss here a stochastic perturbation of the Lie-Poisson equation with cocycle Equation (35) within the setting of stochastic Hamiltonian dynamics, see [40][41][42], which preserves the affine coadjoint orbits. This theory was recently extended for stochastic geometric modeling in fluid dynamics via variational principles in [43], see also [44,45].
In the context of this paper, this stochastic extension is motivated in geometric statistical mechanics to model Gibbs density in the case of centrifuge with random vibration along the axis (that is an open problem for industrial centrifuge, because for large equipment it is difficult to reduce vibration of this axis). In statistical machine learning, the problem is motivated for small data analytics, where the Gibbs density as maximum entropy of first order is an approximation. In this case, there is some fluctuations in estimation of mean momentum map due to the fact that the true Gibbs density is a density of higher order. This approximation could be modeled by an additional noise on the moment map.
In the setting of the Lie-Poisson equations with cocycle given in Equation (35), we consider the stochastic Hamiltonian dynamics given by where h i : g * → R, i = 1, . . . , N are given Hamiltonians and W i (t), i = 1, . . . , N are independent Brownian motions introduced in the Stratonovich sense, as indicated by the symbol •. Please note that the contribution of each Hamiltonian is inserted via the Lie-Poisson bracket with cocycle {·, ·} Θ given in Equation (34). This results in the following Stratonovich differential equation for the stochastic process The Itô form of Equation (43) can be obtained by the usual conversion formula. It can be expressed in a concise and general way as In a similar way with its deterministic version in Equation (36) the system Equation (43) restricts to a stochastic Hamiltonian system on each affine coadjoint orbits Equation (32) with respect to the Kirillov-Kostant-Souriau symplectic structure with cocycle Equation (33). From Corollary 8 it follows that the entropy s of the Souriau model is preserved by the stochastic dynamics Equation (43).
In absence of the cocycle, Equation (43) can be formally obtained from the variational principle for variations δg and δµ of (g, µ) ∈ G × g * . More precisely, the variations δg and δµ give the two conditions which yield Equation (43) in the special case Θ = 0. Such variational principles play an essential role in stochastic geometric modelling [43][44][45], where the emphasis is made on the Lagrangian side. For instance in [44] the Lagrangian version of Equation (44) given by is used, for variations δg, δξ, δµ of (g, ξ, µ) ∈ G × g × g * , where : g → R is the Lagrangian associated with h : g * → R in the hyperregular case. Variations δg, δξ, and δµ give the three conditions which yield equivalent equations to Equation (43) with Θ = 0.
To extend Equation (44) to cover the case of a Lie algebra two-cocycle Θ = 0 we shall formulate the variational principle on the central extension G = G × R of the Lie group G with respect to a group two-cocyce B : G × G → R that integrates Θ : g × g → R. We refer to Section 4.2 below for a quick review of the main formulas for central extensions and their use in connection with the Lie-Poisson equations with cocycle. For the application to Equation (43) here, we just need to recall the expression of the group multiplication (g, between Θ and B. Based on this, we consider the variational principle for variations δg , δα, δµ, and δa of (g, α) ∈ G = G × R and (µ, a) ∈ g * = g * × R. In the first term, the operations are associated with the tangent lift of the multiplication on the central extension G and the . A computation, using the fact that B integrates Θ, i.e., Equation (46), shows that the variations δg, δα, δµ, and δa give the four conditions One notes that taking the initial condition a = 1, we get the stochastic Lie-Poisson system with cocycle Equation (43) as desired. This approach using central extension can be easily used on the Lagrangian side too and yields the appropriate extension of Equation (45) to handle a cocycle Θ = 0.

Polysymplectic Model of Statistical Mechanics
Polysymplectic geometry, as developed in [47], arises as a special case of multisymplectic geometry which is the natural geometric setting of classical field theories, see, e.g., [48]. When used in conjunction with the general setting developed in Sections 2.1 and 2.2, the polysymplectic setting furnishes a natural generalisation of the Souriau symplectic model, to which many properties extend. This extension was proposed in [5]. Here we emphazise this model as a specific case of the general framework described in Sections 2.1 and 2.2. This allows transposing immediately all the properties of this framework to the polysymplectic model. In particular, we will see that the entropy of the polysymplectic model enjoys a natural extension of the Casimir property observed in Section 2.3.2. The relevant equation is here an Lie-Poisson field equation with cocycle that we will describe in detail below. This model is motivated by higher-order model of statistical physics. For instance, for small data analytics (rarified gases, sparse statistical surveys, . . . ), the density of maximum entropy should consider higher order moments constraints, so that the Gibbs density is not only defined by first moment but fluctuations request 2nd order and higher moments, as introduced in [49][50][51][52][53][54].
Polysymplectic manifolds. We only need a restricted amount of notions from polysymplectic geometry which are straighforward extensions of those recalled above in the symplectic context. We refer to [47] for more information. A polysymplectic manifold (M, ω) is a manifold M endowed with a closed nondegenerate R n -valued 2-form. We can identify ω with a collection (ω 1 , . . . , Similarly as before, this implies that i ξ M ω is a closed R n -valued one-form on M. If this form is exact, then the action is called Hamiltonian and admits a polysymplectic momentum map and one defines the map Θ : Taking the derivative of the relation above, we get As a consequence Θ is skew-symmetric, and satisfies the two-cocycle identity see [47].
Polysymplectic model. The polysymplectic model of statistical mechanics is obtained by considering the following specific situation in the equivariant setting described in Section 2.2: M : a polysymplectic manifold dµ : a volume form φ g : a volume preserving Hamiltonian action E = L(R n , g) : the linear maps from R n to the Lie algebra of G ρ g = (Ad g ) n : the action induced on L(R n , g) by the adjoint action of G on g Here the space E = L(R n , g) of linear maps is identified with the Cartesian product g n = g × . . . × g and (Ad g ) n acts on β ∈ E as (Ad g ) n (β 1 , . . . , β n ) = (Ad g β 1 , . . . , Ad b β n ).
The thermodynamic heat becomes a map Q : Ω ⊂ L(R n , g) → Ω * ⊂ L(g, R n ) with Q(β) = E β (J) ∈ Ω * ⊂ L(g, R n ) and the Fisher metric on Ω ⊂ L(R n , g) is Proposition 5 directly yields the following equivariance properties for every g ∈ G. Note also that Ω * is invariant under the affine action From Proposition 4, given ν ∈ Ω * ⊂ L(g, R n ), the generalized Gibbs probability density

Particular cases.
A particularly simple case of polysymplectic Souriau model is given by the manifold is the projection onto the k th factor of the sum and Ω can is the canonical symplectic form on T * Q. Let G be a Lie group acting on the left on Q. Then its naturally induced action on T * Q ⊕ . . . ⊕ T * Q is polysymplectic and admits the polysymplectic momentum map J : where J : T * Q → g * is the momentum map associated with the cotangent lifted action of G on T * Q given in Equation (38). In this case, there is no cocycle. Another case without cocycle in the polysymplectic momentum map is when M is chosen as an given by µ → (Ad * g −1 ) n µ + θ(g), with θ ∈ C ∞ (G, L(g, R n )) a group one-cocycle. This orbit M is endowed with a natural polysymplectic form ω = (ω 1 , . . . , ω n ) with ω i defined by where Θ is given in Equation (47), which is the polysymplectic version of Equation (33). In this case, the polysymplectic momentum map is simply the inclusion J : O → L(g, R n ) of the orbit in L(g, R n ).
Property of the entropy and polysymplectic Lie-Poisson equations with cocycle. In the context of the polysymplectic model, a natural generalisation of the Lie-Poisson equations with cocycle Equation (36) are for a map µ : Hamiltonian. In absence of the cocycle, such a field theoretic version of the Lie-Poisson equation appears, for instance, for the spacetime Lagrangian and Hamiltonian theoretic description of Cosserat rods and molecular strands, see [36,37].
From the invariance property Equation (52), we have For h = s, Equation (54) thus reduce to ∑ n k=1 ∂ ∂x k µ k = 0. This is the natural extension of the Casimir property of s observed in the Souriau model in Section 2.3.2, given there by the condition ad * δs δµ µ − Θ δs δµ , · = 0, giving d dt µ = 0.

The Fisher Metric on Orbits and Equivariance
We give here a general expression of the Fisher metric on orbits of the action ρ : G × E → E, in the general setting described in Sections 2.1 and 2.2. This clarifies the link between the Fisher metric and the metric on adjoint orbits considered by Souriau, as enlightened in [4].
As in Section 2.1 we consider a manifold M, a vector space E, a function U : M → E * , and the class of generalized Gibbs probability densities As in Section 2.2 given a Lie group G we consider an action φ : G × M → M and a representation ρ : G × E → E. We denote by β ∈ E, ν ∈ E * the infinitesimal generators of the representations ρ g and ρ * g associated with ξ ∈ g. We will use the equality ξ E * (ν), β = ν, ξ E (β) . Given the group one-cocycle θ ∈ C ∞ (G, E * ) associated with the function U, see Equation (16), we define Θ ∈ C ∞ (g, E * ) by for ξ ∈ g and β ∈ E. Recall that the Fisher metric is I(β) = −E β [D 2 log p β ] and coincides with the generalized heat capacity, see Proposition 2.

Proposition 9.
On the G-orbit through β ∈ Ω, the Fisher metric is written in terms of Θ and Q as follows Proof. Taking the derivative with respect to g at e of the equality Equation (18) given by for every γ ∈ E, we get Therefore, from Proposition 2, we can write which proves the result.
We illustrate this result for the Souriau model, its polysymplectic extension, and the Koszul model.

Corollary 10.
On the adjoint orbit through β in g, the Fisher metric is written as follows Please note that Equation (57) can be written as is a two-cocycle. In particular, the last term is a coboundary. We refer to [2] for more information.
Corollary 11. On the adjoint orbit through β in L(R n , g), the Fisher metric is written as follows Koszul model. For the Koszul model with Ω = sym(n) + the cone of positive definite matrices and the Lie group G = GL(n), the actions Equations (21) and (22) have the associated infinitesimal generators In this case, Θ = 0 and Proposition 9 is satisfied by noting the equalities

Applications
In this section, we show how the framework considered in Section 2 applies to various examples and helps identifying common underlying geometric structures. We start with the case of multivariate Gaussian probability densities as an illustration of the general framework, for which a cocycle is needed and which does not fall into the setting of the Souriau model. We then enlighten the strong analogies with quantum information geometry by considering Lie algebras with unitary representation and show that the Fisher metric as defined from the generalized heat capacity in Section 2.1, coincides with the Bogoliubov-Kubo-Mori metric. In this particular case the equation with Casimir dissipation/production considered in Section 2.3.3 reproduces a dissipative model of [25]. Finally, we consider in detail the case of the Euclidean group of the plane SE(2) since it allows explicit and relatively easy computations while exhibiting the interesting feature of having cocycle. This example fits into the setting of the Souriau symplectic model.

Multivariate Gaussian Probability Densities
In this paragraph we study in detail the case of multivariate Gaussian densities, following the approach developed in Sections 2.1 and 2.2. A first treatment in this spirit was given in [4], Section 8.
Here we clarify several steps in this approach by following systematically the general setting presented in Sections 2.1 and 2.2, while we note that this example is not a particular case of the Souriau model. We present explicitly the cocycle, which is here defined on the general affine group, with values in the Cartesian product of symmetric matrices and the Euclidean space.
Gaussian probability densities in generalized Gibbs form. Consider a multivariate Gaussian density with symmetric and positive definite covariance matrix R ∈ sym + (n) and mean m ∈ R n . The Gaussian probability density is written in the generalized Gibbs form p β discussed above in Section 2.1 as follows: for every z ∈ R n . In the last equality above, we have defined the energy function the vector β ∈ sym + (n) × R n in terms of (R, m) as The general theory of Section 2.1 will be applied here with the manifold M = R n , the vector space E = sym(n) × R n , and the open subset Ω = sym + (n) × R n . It is important to note that the element β of the general theory is not given by the couple (R, m), but related to (R, m) via Equation (59). This plays a main role in the understanding of the equivariance properties below.
Characteristic function, thermodynamic heat, and entropy. The Massieu potential is computed in terms of β ∈ Ω as where we defined the constant K = − n 2 log(π). To compute the derivative, we consider the dual space E * = sym(n) × R n , with duality pairing (ν 1 , ν 2 ), (β 1 , β 2 ) = Tr(ν 1 β 1 ) + ν 2 · β 2 , (ν 1 , ν 2 ) ∈ E * , (β 1 , β 2 ) ∈ E * . With respect to this duality pairing we have so we get the thermodynamic heat Q : In terms of the covariance matrix R and the mean m, this is written as The entropy in terms of β = (β 1 , β 2 ) and (R, m) is computed by taking the Legendre transform of Φ as Its expression s : Ω * → R in terms of (ν 1 , ν 2 ) is found by using Fisher information metric. We compute the generalized heat capacity K(β) := −D 2 Φ(β) as follows, see Section 2.1: From Proposition 2 this coincides with the Fisher metric. Let us verify that this is the case by rewriting these five terms in terms of the mean and covariance matrix (m, R). The above expression equals which gives the Fisher metric I(R, m) for multivariate Gaussian densities.
Equivariance with respect to the general affine group. We consider the general affine group We consider the left action of GA(n) on R n given by φ (A,a) (z) = Az + a.
We note that Jφ (A,a) = det(A), a constant function on R n , hence φ satisfies the hypothesis of Lemma 5. It is instructive to observe that the expression is not linear in (R, m), compare with Equation (17). However, such a statement is true when it is expressed in terms of the variables (β 1 , β 2 ). We first need the expression of the action of GL(n) on (β 1 , β 2 ). This is done in the next lemma.

Lemma 12.
The left action of GA(n) induced on (β 1 , β 2 ) ∈ sym(n) × R n by the action Ψ in Equation (60) is given by

Its dual left action is
Proof. This is a direct computation using Equation (59).
The situation is illustrated in the following commuting diagram.
The following result shows that the equivariant setting developed in Section 2.2 applies here with the action φ (A,a) and the representation ρ (A,a) (not Ψ (A,a) ).

Lemma 13.
The energy function U(z) = (zz T , z) satisfies the relation for the group one-cocycle θ : GA(n) → sym(n) × R n given by θ(A, a) = (aa T , a).
The Massieu potential, the thermodynamic heat, and the entropy satisfy the equivariance properties The other results follow from Proposition 5 and from Jφ (A,a) = det(A). Alternatively, we can compute explicitly The identity relating the Fisher information metric, the cocycle, and the thermodynamic heat follows from the general Equation (56) as where (β 1 , β 2 ) ∈ sym + (n) × R n , ξ = (ξ 1 , ξ 2 ), ζ = (ζ 1 , ζ 2 ) ∈ ga(n), Θ(ξ 1 , ξ 2 ) = (0, ξ 2 ) and the infinitesimal generators are Geodesics on multivariate Gaussian densities and Noether theorem. Let us consider the Lagrangian L : TΩ = Ω × E → R given by the kinetic energy of the Fisher metric The associated Euler-Lagrange equations are In accordance with Proposition 5, see Equation (20), the Fisher metric is invariant with respect to the action of GA(n) on (R, m) ∈ Ω given in Equation (60). As a consequence, the Lagrangian is invariant under the tangent lifted action of GA(n) on TΩ given by From Noether theorem, the corresponding momentum map is conserved. The momentum map J L : TΩ → ga(n) * associated with this Lagrangian and this action is given by with J : T * Ω → ga(n) * the momentum map of the cotangent lifted action of GA(n) relative to the canonical symplectic form, see Equation (38). Using the expression of the infinitesimal generator of Ψ given by (U, u) Ω (R, m) = R, UR + RU T , m, Um + u , From Noether theorem, we have the conservation laws We also refer to [55,56].

Unitary Representations and Quantum Fisher Metric
In this paragraph, we highlight the strong analogies between the equivariant setting considered in this paper, and techniques in quantum information geometry, as developed in [25], see also [28]. In particular, when this setting is considered in the quantum context, the Fisher metric, as defined from the derivative of the generalized heat capacity, coincides with the Bogoliubov-Kubo-Mori metric. We also illustrate how the general equations with Casimir dissipation/production considered above reproduce the dissipative model proposed in [25].
In [25] information geometry was studied for some Lie algebras where for certain unitary representations, the statistical manifold of states was defined as convex cone for which the partition function is finite, making reference to Bogoliubov-Kubo-Mori metric. Please note that only the case with zero cohomology for the Lie algebras g = so(3) and g = sl(2, R) was studied.
Let G be a Lie group, acting on a complex Hilbert space by a unitary left representation, U g : H → H. We denote by β H the associated infinitesimal generator, giving the Lie algebra representation, and consider the self-adjoint operator iβ H . We assume dim H < ∞. The following class of density matrices is considered for β ∈ g, with partition function ψ(β) = Tr(exp(−iβ H )). We adopted in Equation (64) a general form for the class of density matrices, which includes the class considered in [25] and reference therein. Please note that Equation (64) is a model of faithful quantum states. Note also that the map β → ρ β is not necessarily injective in general. We do not assume this hypothesis for the development below, but it is required in quantum information geometry. As in Section 2.1, we adopt the following definitions corresponding to the Massieu potential, the thermodynamic heat, and the generalized heat capacity. We note that Q(β), δβ = Tr(ρ β iδβ H ) = iδβ H ρ β , for all δβ ∈ g, which gives the expectation value of the observable iδβ H in the quantum state ρ β . A result analogue to Equation (5) in the classical case. The generalized heat capacity is computed as thereby giving the covariance of the observables i(δβ 1 ) H and i(δβ 2 ) H in the quantum state ρ β . In [25], K is called the Bogoliubov-Kubo-Mori metric and chosen as the quantum version to the Fisher metric. Such a choice is geometrically natural in view of the result of Proposition 2 which identifies K with the Fisher metric in the classical case. The von Neumann entropy of the density matrix can be expressed in terms of Φ and Q as − Tr(ρ β log ρ β ) = Tr(ρ β iβ H ) + log ψ(β) = Q(β), β − Φ(β) = s(ν), for ν = Q(β) = i(·) H β ∈ g * . This is analogue to the result of Lemma 1 giving the entropy as the Legendre transform of Φ(β), thus giving a quantum version of the Clairaut equation. See also [57] for a link with the Fisher metric.

Souriau Symplectic Model for SE(2), Lie-Poisson Equations with Cocycle, and Casimir Dissipation
In this paragraph, we illustrate many aspects of the geometric setting by considering the special Euclidean group of the plane, as it allows explicit and relatively easy computations while having a nonequivariant momentum map. We present the Lie-Poisson equations with cocycle (affine Lie-Poisson equations) with Casimir dissipation/production associated with the entropy of the Souriau symplectic model.

Momentum map and cocycle.
Consider the special Euclidean group of the plane SE(2) = SO(2) R 2 with semidirect product group multiplication where R ϕ is a rotation of angle ϕ. It acts on the plane R 2 as with infinitesimal generator (λ, u) R 2 (x) = −λJx + u for (λ, u) ∈ se(2) = so(2) R 2 , where we identify so(2) with R and with We consider on R 2 the symplectic form ω(x, y) = x · Jy. It is easy to see that the action (69) is symplectic and admits the momentum map This momentum map is not equivariant, with nonequivariance cocycle given by Gibbs densities, entropy, and Fisher metric. The generalized Gibbs probability densities are here given on M = R 2 by where β = (λ, u) ∈ Ω ⊂ se(2), with Ω = (−∞, 0) × R 2 and the partition function and Massieu potential are computed to be From this, we get the thermodynamic heat Q : Ω ⊂ se (2) → Ω * ⊂ se(2) * as 1 λ u and we note that Ω * = {(µ, m) ∈ se(2) * | µ + |m| 2 2 < 0}. The entropy s : Ω * → R is obtained as the Legendre transform of Φ : Ω → R as We note the relation δs δm between the partial derivatives of s. From Proposition 2, the Fisher metric is found as for every β = (λ, u) ∈ Ω ⊂ se(2), i.e., Affine Lie-Poisson equations and Casimir dissipation. The affine coadjoint action associated with Equation (70) is found as from which we directly observe that the entropy Equation (71) is constant on affine coadjoint orbits O Θ ⊂ se(2) * and hence is a Casimir of the Lie-Poisson bracket with cocycle on se(2) * . Using the expression Θ((λ, u), (γ, v)) = −ω(u, v) of the associated two cocycle, we get the Lie-Poisson bracket with cocycle Given a Hamiltonian h : se(2) * → R, one gets the Lie-Poisson equations with cocycle as the following system of ODEsḟ These equations determine Hamiltonian dynamics on affine coadjoint orbits that are the level sets of the entropy. From the point of view of thermodynamics, motion remaining on these surfaces is non-dissipative, whereas motion transversal to these surfaces is dissipative. We apply below the geometric approach to include dissipation and hence, motion through affine coadjoint orbits, as considered in general in Section 2.3.3.
Given the entropy Equation (71) = h), the Casimir dissipative/production Equation (40) gives here δk δm for every f , therefore, the following equations emerge which have the property of preserving the Hamiltonian while dissipating/producing entropy as They are the SE(2) version of the Equation (67) proposed in the quantum context.

Variational Principles and (Multi)Symplectic Integrators
In this section, we make use of the geometric setting presented above to propose geometric integrators for some of the equations described earlier. Geometric integrators are numerical schemes designed with the aim to preserve as much possible the geometric structures underlying the equations they discretize [58]. It turns out that the preservation of geometric structures not only produces an improved qualitative behaviour, but also allows for a more accurate long-time integration. One efficient way to derive geometric integrators is to exploit the variational formulation of the continuous equations and to mimic this formulation at the spatial and/or temporal discrete level. For instance, for the ODEs of classical mechanics, a time discretization of the Lagrangian variational formulation permits the derivation of numerical schemes, called variational integrators, that are symplectic, exhibit good energy behavior, and inherit a discrete version of Noether's theorem which guarantees the exact preservation of momenta arising from symmetries, see [59]. These methods are especially well-suited for systems on Lie group [60].
Variational integrators were extended to PDEs in various ways, one way being given by multisymplectic variational integrators ( [61][62][63][64]) in which the starting point is a spacetime discretization of the Hamilton principle. Here also, a discrete version of Noether's theorem for field theories is available in presence of symmetries. We refer to [64][65][66] for recent applications of multisymplectic variational discretizations.
In this section, we will present a geometric discretization of the Lie-Poisson equations with cocycle, see Section 2.3, that is symplectic and preserves the affine coadjoint orbits. We will then extend this approach to treat the case of the polysymplectic version of these Lie-Poisson equations with cocycle, see Section 2.4, by constructing a multisymplectic integrator. To achieve these goals, we will first present the variational principles attached to these equations, by looking at them from the Lagrangian side. Then these variational principles will be discretized in time or in space and time.

Preliminaries on Variational Lie Group Integrators
We very briefly recall the broad idea of variational integrators and refer to [59] for the detailed description. They are based on a discrete version of the Hamilton principle given, for a Lagrangian L : TQ → R, as for arbitrary variations of the curve q(t) with fixed extremities at t = 0, T.

Euler-Poincaré and Lie-Poisson equations.
We will be especially interested in the case where the configuration manifold is a Lie group, Q = G, and the Lagrangian L : TG → R is right G-invariant.
In this case, L induces a reduced Lagrangian on the quotient space (TG)/G identified with the Lie algebra g, i.e., we get : g → R defined by the relation L(g,ġ) = (ġg −1 ). The Euler-Lagrange equations for L are equivalent to equations on g written in terms of the reduced Lagrangian : g → R, called the Euler-Poincaré equations. They are obtained by computing the variational principle for induced by the Hamilton principle Equation (74). It is given by and yields the Euler-Poincaré equations d dt for the curve ξ(t) ∈ g. In Equation (75) see [31].

Variational integrators.
Let Q be a configuration manifold and let L : TQ → R be a Lagrangian. Suppose that a time step ∆t was fixed, denote by {t k = k∆t | k = 0, . . . , N} the sequence of time, and by q d : q k+1 ) that approximates the action integral of L along the curve segment between q k and q k+1 , that is, we have where q(t k ) = q k and q(t k+1 ) = q k+1 . Usually this approximation is related to some numerical quadrature rule of the integral above. The discrete analogue of Hamilton's principle Equation (74) reads for all variations δq d of q d with vanishing endpoints. After taking variations and applying a discrete integration by parts formula (change of indices), we obtain the discrete Euler-Lagrange equations: These equations define, under appropriate conditions, an algorithm which solves for q k+1 knowing the two previous configuration variables q k and q k−1 .
To define the discrete momentum maps, one first needs to consider the discrete Legendre transforms defined by Then, given a Lie group action Φ : G × Q → Q, the discrete Lagrangian momentum maps J + If the discrete curve {q j } N j=0 satisfies the discrete Euler-Lagrange equations then we have the equality If the discrete Lagrangian L d is G-invariant under the diagonal action of G induced by Φ on Q × Q, then the two discrete momentum maps coincide, J − L d = J + L d =: J L d , therefore from Equation (82), we obtain that J L d is a conserved quantity along the discrete curve solution of Equation (79), that is, This result is referred to as the discrete Noether's theorem.
The symplectic character of the integrator is obtained by showing that the scheme (q k−1 , q k ) → (q k , q k+1 ) preserves the discrete symplectic two-forms Ω ± where Ω can is the canonical symplectic two-form on T * Q, see [59].
Discrete Euler-Poincaré equations. For Lie groups, variational discretization and the associated discrete Lagrangian reductions, was started in [60,67], and referred to as Lie group variational integrators. The essential idea behind such integrators is to discretize Hamilton's principle and to update group elements using group operations. For the case of invariant systems on Lie group, one chooses a discrete Lagrangian that inherits the invariance of the continuous Lagrangian, i.e., L d : From this invariance, one defines the reduced discrete Lagrangian L d on the associated quotient space (G × G)/G identified with G with quotient map (g k , g k+1 ) ∈ G × G → g k+1 g −1 k ∈ G, i.e., the two discrete Lagrangians are related as L d (g k , g k+1 ) = L d (g k+1 g −1 k ), this is the point of view developed in [60]. The discrete Hamilton principle Equation (78) for L d induces a discrete Euler-Poincaré variational principle for L d that yields the discrete Euler-Poincaré equations on G. Numerically speaking it is desirable to obtain the algorithm on a vector space rather than on a Lie group. For this aim, a local diffeomorphism τ : g → G with τ(0) = e is introduced to express small discrete changes in the group configuration through unique Lie algebra elements. Such a map is referred to as a retraction map ( [68,69]).
The discrete reduced Lagrangian is transported into a discrete Lagrangian d defined on a neighborhood of 0 in g via the relation The relation on the right in Equation (84) is thought of as a discrete version of ξ =ġg −1 .
The discrete Euler-Poincaré equations for d are obtained by computing the discrete variational principle induced on the discrete action ∑ N−1 k=0 d (ξ k ) from the discrete Hamilton principle δ ∑ N−1 k=0 L(g k , g k+1 ) = 0 recalled above in Equation (78). The main step in this process is to compute the variations δξ k of ξ k = 1 ∆t τ −1 (g k+1 g −1 k ) induced by arbitrary variations δg k . One finds the expression where η k = δg k g −1 k and d L τ −1 (ξ) : g → g is the inverse to the left trivialized derivative of τ, d L τ(ξ) : The discrete Euler-Poincaré variational principle thus reads with respect to variations δξ k of the form Equation (85) with η k vanishing at the endpoints. It yields the discrete Euler-Poincaré equations.
Being equivalent to the discrete Euler-Lagrange equations on the Lie group, this scheme is equivalent to a symplectic scheme (g k−1 , g k ) → (g k , g k+1 ) on G × G. From the discrete Noether theorem, the scheme also preserves the discrete momentum map and the coadjoint orbits O ⊂ g * . Moreover, the scheme µ k−1 ∈ O → µ k ∈ O is symplectic on coadjoint orbits with respect to the Kirillov-Kostant-Souriau symplectic form, see [60]. Please note that the discrete momentum map is computed as which is readily seen to be preserved, J L d (g k−1 , g k ) = J L d (g k , g k+1 ), along the solutions of Equation (118)

Central Extensions and Variational Principle for the Lie-Poisson Equations with Cocycle
We considered in Equation (36) the Lie-Poisson equations with cocycle given by associated with the Souriau symplectic model. Our aim is to derive a geometric integrator for this system that is symplectic and preserves the affine coadjoint orbits for general Hamiltonian. One systematic step is to look at Equation (89) from the Lagrangian side, as it was done for the ordinary Lie-Poisson equations above. Assuming that h is hyperregular, we can take the associated Lagrangian : g → R and rewrite the equations as d dt for a curve ξ(t) ∈ g. However, in general (i.e., for arbitrary , arbitrary g, and arbitrary Θ) there is no natural variational principle for these equations, in the sense of a variational principle induced from the ordinary Hamilton principle for a Lagrangian L : TG → R. Nevertheless, there is a way to interpret the system Equation (90) as being induced by an ordinary Euler-Poincaré equations on a central extension of the Lie group G, integrating the Lie algebra cocycle Θ. This is related to a well-known fact that affine coadjoint orbits can be seen as ordinary coadjoint orbits of a central extension. We recall this fact below.
Lie group operations on central extensions. We shall focus on topologically trivial central extensions of finite dimensional Lie groups by R. The central extended group is thus of the form where B : G × G → R is a group two-cocycle, i.e., it satisfies for all f , g, h ∈ G. It can always been chosen such that B(e, g) = B(g, e) = 0, in which case we have B(g, g −1 ) = B(g −1 , g) and (g, α) −1 = (g −1 , −α − B(g −1 , g)). One obtains from this the expression of the adjoint and coadjoint actions as Ad * (g,α) (µ, a) = (Ad * g µ + aθ(g −1 ), a) where the group one-cocycle θ ∈ C ∞ (G, g * ) is defined by Equation (92) shows that the ordinary coadjoint orbits of G through (µ, 1) are affine coadjoint orbits of G. We have the corresponding formulas (95), the Euler-Poincaré equations for a reduced Lagrangian : g = g × R → R take the form

Euler-Poincaré and Lie-Poisson equations on central extensions. From Equation
They are the critical conditions for the Euler-Poincaré variational principle which is just a special instance of Equation (75) applied to central extensions. In Equation (97) η(t) ∈ g and v(t) ∈ R are arbitrary curves vanishing at the extremities. Given a Lagrangian : g → R, one can then define the Lagrangian (ξ, u) = (ξ) + All these considerations are standard, see, e.g., [29,31].

Variational Symplectic Integrators for the Lie-Poisson Equations with Cocycle
Here we shall present a geometric symplectic Lie group integrator for Lie-Poisson equations with cocycle Equation (36) that preserves the affine coadjoint orbits for general Hamiltonian. In particular, the scheme preserves the affine Kirillov-Kostant-Souriau symplectic form on these affine coadjoint orbits. We shall use the Euler-Poincaré variational formulation on central extensions presented in Section 4.2.
Some useful identities. Given a central extension G = G × R, we shall consider the retraction map whereτ : g → G is a retraction map for G. To derive the discrete Euler-Poincaré equations we shall need several identities involving d L τ and d Lτ , see Equation (86), that are shown in the next Lemma.

Lemma 14.
For a local diffeomorphism of the form Equation (99) on central extension, we have the following identities where B : G × G → R is the group two-cocycle.
Proof. These identities are proven as follows.
(a) Using the definition of d L τ, we compute where in the third equality, we used the formula for the tangent lift of left translation on G.
Using the properties of the group two-cocycle B, we get the identity for all g ∈ G and η ∈ g. Hence we get the result. (b) Taking the dual map and using (a), we get where η k ∈ g and v k ∈ R are arbitrary discrete curves vanishing at the endpoints. (b) The discrete curve (ξ k , u k ) is a solution of the discrete Euler-Poincaré equations Proof. We use the discrete Euler-Poincaré formulation Equations (87)-(118). For (a), we use Equation (85) and Lemma 14, and we compute Using the identity θ(g), ξ + D 2 B(g, e) · Ad g −1 ξ = D 1 B(e, g) · η, we get the desired result. For (b), we use the formula for the coadjoint action on central extension to get which proves Equation (100). Then, to get Equation (101), we note that We note that the relation with the solution (g k , α k ) of the discrete Euler-Lagrange on the Lie group G is given as which is explicitly given by the relations Similarly, the variations η k , v k used in discrete Euler-Poincaré variational principle are related to the variations δg k , δα k used in the discrete Hamilton principle via the equality (η k , v k ) = (δg k , δα k )(g k , α k ) −1 = (δg k g −1 k , δα k + D 1 B(g k , g −1 k ) · δg k ). The discrete momentum map J L d : G × G → g * is computed as where (µ k , a k ) are given in Equation (101) and relation Equation (102) are assumed. It is readily seen that J L d is preserved along the solutions of Equation (100). The symplectic integrator for the Lie-Poisson equations with cocycle is deduced as follows.

Proposition 16. (Symplectic integrator for Lie-Poisson equations with cocycle)
Let h : g * → R be a Hamiltonian assumed to be hyperregular, with associated Lagrangian : g → R. Then the numerical scheme is a symplectic scheme for the Lie-Poisson equations with cocycle It preserves the affine coadjoint orbits and µ k−1 → µ k is symplectic relative to the affine Kirillov-Kostant-Souriau symplectic form Proof. It is a direct consequence of Proposition 15, by choosing the reduced Lagrangian Equation (98), taking the initial condition a 0 = 1 and noting that a k+1 = a k = 1.
It is possible to rewrite the scheme in a way that is more advantageous from the point of view of implementation. By inserting Equation (104) in Equation (103) and using the identity we get the scheme in terms of ξ k as It is also often assumed that the retraction map τ satisfies τ(−ξ)τ(ξ) = e. In this case, we have the identity Ad * τ(ξ) d L τ −1 (−ξ) * = d L τ −1 (ξ), see [69], and the scheme Equation (106) takes the form In absence of the last two terms, we recover the most practically used form of the ordinary discrete Euler-Poincaré equations, e.g., [69]. The last two terms correspond to a discretization of the cocycle which ensures that the resulting scheme is symplectic on each affine coadjoint orbit. It is clear that such a form is not likely to be guessed from the continuous equations without having at hands the discrete variational principle.
Remark 17 (Choice of retraction map). For an exposition of retraction maps, such as canonical coordinates of the first and second kind, and their applications to Lie group methods, the reader is referred to [68]. A possible choice is the exponential map exp : g → G. In this case, d L exp(ξ) · η and d L exp −1 (ξ) · η are given as series which are truncated in order to achieve a desired order of accuracy [58]. A standard choice is the Cayley map cay : g → G defined by cay(ξ) = (e − ξ/2) −1 (e + ξ/2) which is valid for a general class of quadratic matrix groups (which include the groups SO(3), SE(2), and SE (3)). Based on this simple form, the derivative maps become for each ξ, η ∈ g.

Multisymplectic Lie Group Variational Integrators
In this paragraph, we briefly indicate how the discrete variational setting of the previous section can be extended to variational discretization in several independent variables, i.e., when the unknown is a field rather than a curve. At the continuous setting, the underlying geometric variational setting is the multisymplectic framework of field theories, see, e.g., [48]. Discrete multisymplectic variational versions of this setting were developed and applied in [61,62]. Multisymplectic variational discretization on Lie groups and the discrete Euler-Poincaré field equations were carried out in [63,64].
We will focus on the special case of fields defined on an open subset U of R n with smooth boundary, with values in a configuration manifold Q. We also assume that the Lagrangian only depends on the values of the fields and their first derivatives, not on the parameter x ∈ R n , so it is a map L : TQ ⊕ . . . ⊕ TQ → R. Hamilton's principle for a field q : U ⊂ R n → Q is δ U L q(x), ∂ 1 q(x), . . . , ∂ n q(x) dx = 0, for arbitrary variations of the field q that vanish on the boundary of U, from which the Euler-Lagrange equations for the field q(x) are obtained.
We shall focus on the case Q = G a Lie group and for right-invariant Lagrangians, i.e., L gh, v 1 h, . . . , v n h = L g, v 1 , . . . , v n , for every v 1 , . . . , v n ∈ T g G and every h ∈ G. In this case, L induces the reduced Lagrangian : g ⊕ . . . ⊕ g → R defined by (v 1 g −1 , . . . , v n g −1 ) = L g, v 1 , . . . , v n . As in the ordinary Euler-Poincaré case recalled above, Hamilton's principle yields the reduced variational principle for an arbitrary field η : U → g vanishing on the boundary, which results in the Euler-Poincaré field equations To guarantee the existence of a field g : U → G such that ξ k = ∂ k gg −1 , k = 1, . . . , n, the fields ξ i in Equation (109) must satisfy the relation ∂ k ξ i − ∂ i ξ k = [ξ k , ξ i ]. In terms of the associated Hamiltonian h : g * ⊕ . . . ⊕ g * → R, these equations give Equation (54) without cocycle, i.e., with Θ k = 0.
To include the case with cocycle in a variational setting, we shall proceed exactly as in Section 4.2, by passing to a central extension of G. This is here done in the context of the Euler-Poincaré field equations, rather than for the ordinary Euler-Poincaré equations. This is the content of the next paragraph.
Variational principle for the Lie-Poisson field equations with cocycle. The goal of this paragraph is to obtain a variational principle for the Lie-Poisson field equations with cocycle Equation (54) associated with Souriau's polysymplectic model. By considering the Euler-Poincaré field equations Equation (109) on a central extension, we get the system They are the critical conditions for the variational principle δ T 0 ((ξ 1 , u 1 ), . . . , (ξ n , u n ))dt = 0, which is just a special instance of Equation (109) applied to central extensions. The existence of a field (g, α) :  (54), as desired, where it is assumed that Θ k = Θ, for all k.
The same reasoning also directly applies on the Hamiltonian side, in which case the Lie-Poisson field equation with cocycle Equation (89) is an invariant subsystem of an ordinary Lie-Poisson field equation associated with a central extension of G.
Remark 18 (Polysymplectic vs multisymplectic setting). The field theories that are used in this paper can be described within the restricted setting of polysymplectic geometry. For such particular field theories the configuration bundle of the theory is trivial, the base is Euclidean, and the Lagrangian does not depend on the variables in the base. General classical field theories cannot be described by polysymplectic geometry and fit into the more general setting of multisymplectic geometry. This mainly comes from the fact that the field theoretic analogue to the cotangent bundle (endowed with the canonical symplecic form) of mechanics is the dual jet bundle of the configuration bundle (endowed with a canonical multisymplectic form). We shall apply below multisymplectic variational integrators to field equations belonging to the setting of polysysmplectic geometry. In this case, they could logically be called polysymplectic variational integrators. Please note that we kept the original naming multisymplectic variational integrators, since the theory applies to general multisymplectic field theories not just polysymplectic ones. See, e.g., [62,64] for applications of multisymplectic integrators to situations that are not covered by the polysymplectic formalism.

Multisymplectic Lie group integrators.
To present multisymplectic integrators, we shall focus on the two dimensional case and assume that the fields are defined on a rectangle U = [0, A] × [0, B] ⊂ R 2 . We shall write (x 1 , x 2 ) = (x, y). Let Q be a configuration manifold and let L : TQ ⊕ TQ → R be a Lagrangian. We shall consider the very special case of a discrete grid determined by {(x k , y a ) = (k∆x, a∆y) | k = 0, . . . , N 1 , a = 1, . . . , N 2 } with given ∆x and ∆y. We shall denote by q d : , q k a+1 ) that approximates the action integral of L on the rectangle [x k , x k+1 ] × [y a , y a+1 ] for a field interpolating the values q a k , q a k+1 , q k a+1 . The discrete Hamilton principle reads for all variations δq d of q d with vanishing boundary values. The discrete Euler-Lagrange equations are obtained as the critical point condition for a discrete field q d . Given a Lie group action Φ : G × Q → Q, the discrete Lagrangian field momentum maps J i L d , : Q × Q × Q → g * , i = 1, 2, 3 are defined by which satisfies J 1 L d + J 2 L d + J 3 L d = 0. We refer to [61,62] for an introduction to multisymplectic variational integrators, including the notion of discrete multisymplecticity, discrete Cartan forms, and discrete field momentum maps, see also [71]. These integrators, also satisfy a discrete Noether theorem in presence of symmetries, as we shall see below in the special case of Lie groups.
Multisymplectic variational integrators on Lie groups were developed in [63,71], for application to geometrically exact (Cosserat) rods. As above, we shall focus on the two dimensional case and U = [0, A] × [0, B] ⊂ R 2 . For Q = G a Lie group, the discrete Lagrangian is a map L d : G × G × G → R.
We assume that the continuous Lagrangian is G invariant and that the discrete Lagrangian L d inherits this invariance, i.e., L(g a k h, g a k+1 h, g k a+1 h) = L(g a k , g a k+1 , g k a+1 ), for every h ∈ G. Hence, by passing to the quotient associated with this action we get a reduced Lagrangian L d : G × G → R, L d (g a k+1 (g a k ) −1 , g a+1 k (g a k ) −1 ) = L(g a k , g a k+1 , g k a+1 ). As mentioned earlier, it is advantageous to introduce a retraction map τ : g → G, τ(0) = e, from which the discrete reduced Lagrangian can be defined on a neighborhood of (0, 0) in g × g via the relation g (ξ a k , ζ a k ) = L d (g a k+1 (g a k ) −1 , g a+1 k (g a k ) −1 ) with τ(∆xξ a k ) = g a k+1 (g a k ) −1 and τ(∆yζ a k ) = g a+1 k (g a k ) −1 .
The discrete Euler-Poincaré field equations for d are obtained by computing the discrete variational principle induced on the discrete action ∑ a=0 L(g a k , g a k+1 , g a+1 k ) = 0 recalled above in Equation (113). The main step in this process is to compute the variations δξ a k δζ a k induced by arbitrary variations δg a k . One finds the expression The discrete field Euler-Poincaré variational principle thus reads with respect to variations δξ a k , δζ a k of the form Equation (116) with η a k vanishing at the boundary. It yields the discrete Euler-Poincaré field equations.
We refer to [63,71] for details, including the treatment of boundary conditions, the description of the associated discrete Cartan forms, the discrete field momentum maps, as well as the symplectic and multisymplectic characters of the scheme.
We just recall below the expression of the field momentum maps Equation (114) which take the following form: Ad * g k ν a k J 2 L d (g a k , g a k+1 , g a+1 k ) = 1 ∆x Ad * g k µ a k J 3 L d (g a k , g a k+1 , g a+1 k ) = 1 ∆y Ad * g k ν a k .
The discrete Noether theorem, then asserts that a certain g * -valued discrete integral of J i L d along the boundary of any subgrid domain is zero, see [71].

Multisymplectic variational discretization for Lie-Poisson field equations with cocycle.
Based on the previous result, we first give below a multisymplectic integrator for the Euler-Poincaré field equations on central extensions. Then we deduce a multisymplectic integrator for the Lie-Poisson field equations with cocycle appearing in the polysymplectic Souriau model. The next proposition is the multisymplectic version of Proposition 15. (a) The discrete curve (ξ k , u k ) is critical for the discrete Euler-Poincaré field variational principle δ ∑ a,k ((ξ a k , u a k ), (ζ a k , w a k )) = 0, with respect to variations , e) · η a k + D 1 B(e,τ(∆yζ a k )) · η a+1 k where η a k ∈ g and v a k ∈ R are arbitrary discrete fields vanishing at the boundary. (b) The discrete curve (ξ a k , u a k , ζ a k , w a k ) is a solution of the discrete Euler-Poincaré field equations (120) Proof. The proof can be obtained by appropriate extension of the proof of Proposition 15, by using the multisymplectic variational setting recalled in the previous paragraph.
We note that the relation between the solution of the discrete Euler-Poincaré equations and the solution (g a k , α a k ) of the discrete Euler-Lagrange field equations on the Lie group G is given as (ξ a k , u a k ) = is a multisymplectic scheme for the Lie-Poisson field equations with cocycle ∂ x µ + ∂ y ν + ad * If the retraction map τ satisfies τ(−ξ)τ(ξ) = e, the scheme can be rewritten in a simpler way, as done in Equation (107) in the symplectic case.
The benefit of the structure preserving properties of the proposed numerical schemes will be exploited in a future work.

Conclusions
In the context of artificial intelligence, machine learning algorithms use more and more methodological tools coming from physics or statistical mechanics. The laws and principles that underpin this physics can shed new light on the conceptual basis of artificial intelligence. Thus, the principles of maximum entropy and François Massieu's notions of characteristic functions enrich the variational formalism of machine learning. Conversely, the pitfalls encountered by artificial intelligence to extend its application domains, question the foundations of statistical physics, such as the generalization of the notions of Gibbs densities in spaces of more elaborate representation such as data on homogeneous symplectic manifolds and Lie groups. The porosity between the two disciplines has been established since the birth of artificial intelligence with the use of Boltzmann machines and the problem of robust methods for calculating partition function. More recently, gradient algorithms for neural network learning use large-scale robust extensions of the natural gradient of Fisher-based information geometry (to ensure reparameterization invariance), and stochastic gradient based on the Langevin equation (to ensure regularization), or their coupling called "Natural Langevin Dynamics". Concomitantly, during the last fifty years, statistical physics has been the object of new geometrical formalizations (contact, Dirac, or symplectic geometry, variational principles, etc.) to try to give a new covariant formalization to the thermodynamics of dynamical systems, as Lie Groups thermodynamics. Finally, the study of geometric integrators as symplectic integrators with good properties of covariances and stability (use of symmetries, preservation of invariants and momentum maps) will open the door to new generation of numerical schemes. Machine learning inference processes are just beginning to adapt these new integration schemes and their remarkable stability properties to increasingly abstract data representation spaces. Artificial intelligence currently uses only a very limited portion of the conceptual and methodological tools of statistical physics. The purpose of this paper was to encourage constructive dialogue around a common foundation, to allow the establishment of new principles and laws governing the two disciplines in a unified approach.