Exponential Families with External Parameters

In this paper we introduce a class of statistical models consisting of exponential families depending on additional parameters, called external parameters. The main source for these statistical models resides in the Maximum Entropy framework where we have thermal parameters, corresponding to the natural parameters of an exponential family, and mechanical parameters, here called external parameters. In the first part we we study the geometry of these models introducing a fibration of parameter space over external parameters. In the second part we investigate a class of evolution problems driven by a Fokker-Planck equation whose stationary distribution is an exponential family with external parameters. We discuss applications of these statistical models to thermodynamic length and isentropic evolution of thermodynamic systems and to a problem in the dynamic of quantitative traits in genetics.


Introduction
This work is a first attempt to study the geometrical properties and potential applications of a class of statistical models consisting of exponential families depending on additional parameters, called external parameters. The main source for these statistical models comes from the application of E.T. Jaynes Maximum Entropy framework [1] to thermodynamical systems, where we can identify in a natural way thermal parameters (corresponding to natural parameters in an exponential family) and mechanical parameters, here called external parameters. While the construction of equilibrium Statistical Mechanics from the Maximum Entropy principle is a well established domain of science, little attention is paid in the literature to the intrinsic geometrical structure of these statistical models. Given the widespread application of Maximum Entropy principle to disparate fields of science, it is reasonable to assume that a closer scrutiny of these models can pave the way to further applications outside statistical thermodynamics.
Here is the plan of the paper: in Section 2 we recall the definitions of regular statistical model and of exponential family. The main point is that we are dealing with a finite dimensional Riemannian manifold with respect to the Fisher metric. In Section 3 we introduce the exponential families with external parameters, we state the conditions that render them a regular statistical model and we compute the Fisher metric. The additional geometrical structure that we get with these exponential families is a fibration over the space of external parameters U in the sense that for every fixed u ∈ U the fiber is a standard exponential family. The notion of Eheresmann connection on a fibered bundle and of parallel transport is recalled in Section 4. In Section 5 we outline some applications of these parameterized exponential families: we give a formula for the thermodynamic length of a process described by a path in both natural and external parameters and we give conditions for the isentropic evolution of the system. Section 6 is motivated by a model problem in quantitative genetics (briefly recalled in Appendix A) where the dynamics of the system is given by a Fokker-Planck equation with gradient drift and the equilibrium or stationary distribution is an exponential family with external parameters. We recast the dynamic approximation procedure exposed in [2][3][4] in the framework of exponential family with external parameters and we give a generalization of the ODE that drives the approximating dynamics. We think that the consideration from the present point of view of the problem exposed in [2][3][4] may shed light on some still poorly understood aspects of the model.

Exponential Families in Statistical Thermodynamics
To help locate the contribution of the present paper in the scientific literature we briefly review and compare some of the geometrical approaches to statistical mechanics that are most relevant for our argument. A line of research initiated by the influential papers of Wheinhold [5] and Ruppeiner [6] investigates the Riemannian metric structure on parameter space related to the Boltzmann-Gibbs canonical distribution. This Riemannian metric is the one defined by the Hessian matrix of the free energy ψ = log Z (which coincides with the Fisher metric) with respect to the canonical parameters or by its inverse which is the Hessian of the entropy S, related to ψ by the Legendre transform. The Levi-Civita connection with respect to this metric allows to define the Riemannian curvature tensor and its sectional and scalar curvature. For a two-dimensional parameter space the divergence of the scalar curvature is a signal of the existence of a phase transition in the underlying physical system. This theory has been applied to Ising and Potts lattice system, to the ideal and Van der Waals gas and to black hole thermodynamics (see e.g., [7][8][9][10]). However in dimension grater of two the scalar curvature has a less stringent role and care must be taken in the interpretation of the results.
In this work we also start from the Boltzmann-Gibbs distribution but we stress the different role of natural or thermal parameters θ, which occur linearly in (15), and external parameters u which may enter nonlinearly in the Boltzmann-Gibbs distribution. In particular we are interested in using the external parameters as control parameters on the evolution of the system. The related geometrical framework exposed in Section 4 adopts the connection and curvature associated to the Ehresmann connection on the fibration locally described by (θ, u) → u, which is fit for describing the isentropic evolution of the system or the dependence of the work control protocol on the global geometric structure i.e., the holonomy of the path of the external control space.
A second line of research relating information geometry and statistical thermodynamics concerns the notion of thermodynamic length (see [11][12][13]), which is important in the design of optimal driving protocols for the non-equilibrium evolution of (small) thermodynamic systems, see [14,15], both for classical and quantum descriptions. In this work (see Section 5.1) we investigate the notion of thermodynamic length using our geometric framework and we give a formula for for thermodynamic length that highlights the contribution of natural and external (controlled) parameters.
For the sake of completeness we cite the statistical models introduced by J. Naudits (see [16,17]) called generalized exponential families and q-exponential families by Amari-Ohara, [18]. In these models the exponential function is generalized by introducing the so-called q-deformed exponential. In practice one considers simultaneously two elements of an exponential family, the second one is called escort distribution. These deformed exponential families are useful for describing Tsallis thermostatistics [19] which gives a more accurate description for thermodynamic systems where the extensivity of the classical definition of entropy notion is defied. However this highly debated topic is not relevant for the present work.
This paper is a first attempt to study the exponential families with external parameters using geometrical tools. Even if we were inspired by the Maximum Entropy formalism our result are completely general. In particular we investigated the case where the family (with respect to the the natural and external parameters) is a regular statistical models. This is only a first step in the analysis of these parameterized models; a further step would be in the direction of singular (in opposition to regular) statistical models (see [20]) a domain where there is nowadays an increasing attention in the information geometry community. A drawback of this work is that most of the results are presented in a coordinate-dependent way and have a local character. We hope to resolve these issues in a subsequent work. Some of the results presented here were introduced in a less refined form in [21].

Statistical Models and Exponential Families
Before introducing their generalization in Section 3 below, we recall the definitions of regular statistical model and of exponential family (see [22][23][24]). Let (X, B, dx) be a probability space where X may be a discrete or continuous set. We stipulate that in case of a discrete set the integrals over X with respect to the measure dx are substituted by sum symbols. Let be the infinite dimensional space of probability densities over X. Let Z ⊂ R d be the open set of the parameters, f : Z −→ P (X) be a given smooth map and consider the subset of P (X) To avoid technicalities, we stipulate that the support of p, i.e., the set where p > 0 is the same for all p ∈ S and that it coincides with X. We now state the conditions under which S is a regular d-dimensional statistical model (see [22,24,25]).
Definition 1 (Regular statistical model). S is a regular statistical model if the following conditions are satisfied: (regularity) the d functions defined on X are linearly independent as functions on X for every z ∈ Z.
A statistical model which is not regular is called singular (see [20] for a comprehensive discussion on singular models). If condition 1. hold the model is called identifiable, otherwise it is called unidentifiable. If condition 2. fail the main consequence is that the Fisher metric (22) is only positive semidefinite because condition (23) fail. Many statistical model e.g., Boltzmann machines, Bayes networks, hidden Markov models are singular. Note that for a regular statistical model the inverse ϕ of the map f , ϕ(p) = z defines a global coordinate system for S.
To check regularity condition 2. it is convenient to introduce the so called log-likelihood l = ln p of p and the score base Since l i and p i are proportional, the regularity condition 2. holds if and only if the elements of the score base are linearly independent on X.

Exponential Family
Foundamental examples of statistical models are the exponential families. Let us introduce a observable functions h : X → R a , h = (h 1 , . . . , h a ) and suppose that the a + 1 functions are linearly independent as functions over X, where 1 denotes the constant function over X. Moreover, let k = k(x) be a function defined on X and let us introduce the free energy ψ : Θ ⊂ R a −→ R, ψ = ψ(θ) as (here θ · h denotes the scalar product in R a ) where the parameter space Θ is the subset of R a where e ψ (θ) < +∞. The a real numbers θ are called natural parameters. It is known that the set Θ is open and convex in θ and that ψ is a convex function in the θ variable (see [23,26]).
The following subset of the infinite dimensional space P (X) is called exponential family. We show that E is an a-dimensional regular statistical model.
therefore the injectivity condition 1. above holds if and only if for all θ, θ ∈ Θ holds and this is true by the independence condition (2) above. To check regularity condition 2 above, we compute the elements of the score base. They are (here we use the shorthand notation ∂ i f = ∂ f /∂z i and f = f pdx, moreover summation over repeated indices is understood) The last equality ∂ α ψ = h α holds if we assume that the integrability condition h α is satisfied for every α. It is not restrictive to assume that h α = 0 therefore the regularity condition 2. holds if and only if the d functions h α are linearly independent over X, which again follows from (2). One can show (see [22,27]) that every smooth diffeomorphism θ → m(θ) give an equivalent parameterization of the elements of the exponential family. In this sense E has the structure of a smooth manifold, called statistical manifold. Another coordinate system for E (we will denote it with p = p(x; η)) is provided by the so called expectation parameters is the Legendre transform of ψ (see [22]).

Exponential Families Depending on External Control Parameters
These statistical models are introduced by supposing that the observables h that defines an exponential family depend on so-called external parameters u ∈ U ⊂ R b , which are to be distinguished from the natural parameters θ. These generalized exponential families arise naturally when one applies the Maximum Entropy formalism to equilibrium Statistical Mechanics, that we briefly recall here (see E.T. Jaynes books [1,28]).
It is well known that when the information consist of the average values of some random variables h α describing observables of interest for the system, the maximum entropy probability densities are exponential families. Indeed, if we introduce the Shannon entropy functional for a probability density p ∈ P (X) then the probability density that maximize H on the set of probability densities that satisfy the constraints h = X hpdx = c ∈ R a has the form of an exponential family of the form in (4) with k = 0. If the observables of interest for the system h = h(x, u) depend on extra parameters, the exponential family inherits naturally a dependence on the external parameters, see (15) below. Typical examples of external parameters are the magnetic or electric field applied to the system or the length of a polymer chain (see [12,29]). Also, for a quantum system confined in an infinite square well potential, the discrete energy levels h i depends on the width L of the well. Another typical example of a thermodynamic system subject to an external parameter is an ideal gas in a container of variable volume V; however in this case the parameter V affects the state space X = X(V) and not the observables h therefore this important system it is not described by a generalized exponential family (see [21] for a discussion of this point).
An important difference between the natural parameters θ and the external ones u is that the former are the Lagrange multipliers associated to the constraints when one solves the constrained extremization problem for H using Lagrange multipliers method, while the latter are parameters in the problem formulation that can be controlled by an agent external to the system under consideration. This difference is displayed when we consider the variation of h for p = p(x; θ, u). If we suppose, as we will always do, that we can exchange the order of integration and differentiation with respect to a parameter, we have where dQ has the meaning of generalized heat exchanged and ∂ u h of generalized work exchanged (see [28]). Moreover, while the value of the external parameters u is controlled and can be varied by an agent external to the system, the value of the natural parameters θ can be varied only by putting the system in contact with an heath bath at a prescribed value of the inverse temperature θ (see again [28]). The Kullback-Leibler divergence, also called relative entropy (see [27]) is defined for p, q ∈ P (X) and q > 0 as It is well known that the probability densityp that minimize D on the set of probability densities that satisfy the constraints h = X hpdx = c ∈ R a has the form of an exponential family as in (4) The probability distributionp is the distribution that gives the minimum information gain when one wants to update the current statistical description of the system given by q using the new available information h = c. We will refer in the sequel to this as the minimum Relative Entropy principle. The parameters θ ofp(x; θ) in (4) are uniquely determined as θ =θ(c) by the constraint conditions since the gradient map θ → ∂ θ ψ is invertible. Note that for θ = 0 we have p(x; 0) = q(x) therefore the caseθ(c) = 0 corresponds uniquely to the constraint value c = X hqdx meaning that the constraints do not represent a new piece of information on the system. We will use this fact in the following.
Having exposed the motivations for considering these probability distributions, in the sequel we will investigate the geometrical properties of exponential families with external parameters or controlled exponential families for short.

Exponential Families with External Parameters
Let U ⊂ R b be the external parameter space and consider the a observables Let k(x) be a function on X and define the free energy ψ : where the parameter space Z is the subset of R d where e ψ < +∞. We suppose that (i) Z is open and we introduce the map We consider the following subset of the infinite dimensional space P (X) and we suppose that is an exponential family. As a consequence π −1 (u) is a convex subset in θ and h α (x, u), 1 are a + 1 functions linearly independent over X.
A natural question is to ask if the set F can be seen as a foliated manifold whose leaves are the statistical manifolds E (u). Note however that if θ = 0 is allowed (that is X e k dx < +∞) we have for θ = 0 in (13) ψ(0) = ψ(0, u) and p(x; 0, u) = e k(x)−ψ(0) for every u ∈ π(Z) therefore So the statistical manifold leaves are not disjoint.
A second natural question to ask is if F can be given the structure of a regular statistical model. To this we need to check conditions 1. and 2. in the Definition 1 above. Concerning injectivity condition 1. for the map z → f (z) we have that f (0, u) = e k−ψ(0) for all u ∈ U so injectivity condition 1. may fail for controlled exponential families at θ = 0. However, if we recall the statistical mechanics interpretations of controlled exponential families made in Section 3 and in particular in (12), we can consider the point of singularity θ = 0 outside the domain of application of the statistical model (see however [20] for a discussion of this point). If we assume θ = 0, due to the possibly nonlinear dependence of h(x, u) on u, condition (6) to assess injectivity for a controlled exponential family becomes Condition (17) seems hard to satisfy even if we assume hypothesis (ii) as the following example shows. Suppose that the observables h depends linearly on u and (see (ii)) suppose that the d + 1 functions in (18) h α (x, u), 1 are linearly independent over X for every fixed u. Note that the elements of F in (15) depend on θ, u through the scalar quantity θ · Au. To prove injectivity of the map z → p z , z = (θ, u) we need to prove as functions on X. But this is not true if for example θ = λθ and u = u/λ for λ = 0. So the model (18) is singular. This should not be a surprise because elements of the family F are not characterized by the observables A αk (x) but by the linear space spanned by the A αk (x). Indeed, if we set θ = Bθ and u = Cu where B and C are nonsingular square matrices, then hence the family F is equally described by A = B T AC with respect to the parameters (θ , u ). Another lesson we can draw from this example is that for an exponential family linearly dependent in the external parameters, the distinction between natural and external ones is lost, as their role can be interchanged.
All that said, we stipulate that (15) is an exponential family depending on the parameters z = (θ, u) if (i) for every fixed u the set E (u) is an exponential family and (ii) F is a regular d = a + b statistical model for a suitable choice of the open parameter set Z ⊂ R a × U.
In the case of an exponential family (15) depending on natural and external parameters in addition to a natural parameters score base vectors we have b external parameters score base vectors Note that l α = 0 and l k = 0 because L αk = 0. Moreover, one can always assume that h α = 0 and ∂ k h α = 0 therefore the regularity condition 2. above holds if and only if the are linearly independent over X.

Fisher Metric for an Exponential Family with External Parameters
Regular statistical models can be endowed with a Riemannian metric defined on their parameter space Z. This is called Fisher metric [30] and it has the form The Fisher matrix is symmetric and positive definite therefore it defines a Riemannian metric on Z (see [24], p. 24). In fact we have since the score vectors l i are linearly independent over X. Note also (see [24]) that g is invariant with respect to change of coordinates in the state space X and covariant (as an order 2 tensor) with respect to change of coordinates in the parameter space Z.
The elements of the Fisher matrix (22) relative to an exponential family with external parameters (15) can be detailed as follows: using (19) we also have from (20) and It is useful to set A αβ = g αβ , M αk = g αk , B km = g km and introduce a block representation of the symmetric (a + b)-dimensional Fisher matrix g as We now give the expression of the Fisher metric coefficients using the free entropy function ψ in (13), which is also called the moment generating function because its derivative with respect to the θ parameters give the different moments of the random variables h. We thus have the well know relation By direct computation on (13) we have also Moreover we have hence We see that, unlike the case of natural parameters θ, second order derivatives of the free entropy ψ with respect to mixed or external parameters do not coincides with the elements of Fisher matrix.

Example 1.
As a toy model, we introduce the following example of a controlled exponential family. Let X = [0, +∞) and U = [0, +∞) and consider the two observables where x ∈ X, u ∈ U For this example we set k(x) = − ln x. We check that we have an integrable free energy function e ψ = X e θ·h+k dx = +∞ 0 e (θ 1 −1) ln x+θ 2 ln(x+u) dx which is finite if θ 1 > 0, u > 0 and θ 2 + θ 1 < 0. Here Γ(z) is the Gamma function defined as Note that since e k (x) = 1/x is non integrable over X, θ 1 = θ 2 = 0 is a non feasible value. By inspection h 1 , h 2 , 1 are linearly independent over X for every fixed u, the map is injective. From the likelihood the elements of the score base are which are linearly independent over X. So the statistical model defined by (32) is a 2 + 1 dimensional controlled exponential family. Note that the probability density is known as a (possible formulation of a) compound Gamma distribution; moreover, for u = 1, this is the Beta distribution of second kind [31]. We now compute the Fisher matrix elements for this example. Let us introduce the Polygamma We have and from relation (28) above we have so the A block of g depends only on θ. Moreover from (29) and (31) we have

A Synopsis of Ehresmann Connections
On a smooth fibration π : M −→ N, where M, N are smooth manifolds, with dim M = m, dim N = n, the set V M = ker Tπ of the vectors that project onto the null space of TN is an integrable subbundle of TM called the vertical bundle.
An Ehresmann connection (see e.g., [32]) on π : M −→ N is the assignment of a distribution HM transversal to V M, so that HM ⊕ V M = TM. The elements of HM are the horizontal vectors; since Tπ restricted to HM is an isomorphism, it has a fiberwise defined inverse, the horizontal lift: hor : T π(z) N −→ T z M, hor(X) ∈ H z M. Let X = X h + X v be the splitting of a vector in T z M into its horizontal and vertical component. The projection on V M with respect to the horizontal subspace defines the vector-valued connection one-form whose kernel is the horizontal distribution. The assignment of an horizontal distribution, of an horizontal lift operator or of a connection one-form are equivalent ways to define a connection on π : M −→ N. The curvature of the connection is the V M-valued two-form defined as which shows that the curvature measures the failure of the horizontal distribution to be integrable. Moreover, the curvature relates the Lie brackets of vector fields X, Y on the base manifold N with the Lie bracket of their horizontal lifts through the formula Again, we find that if the curvature is vanishing the horizontal distribution, spanned by vectors of the type horX, is involutive hence integrable. Next we give the local expressions of a connection in a fibered chart. Let z = (x, y) be a fibered chart on U ⊂ M, π(x, y) = y.
Then the vertical space is, α = 1, . . . , a = dimM − dimN, and the connection one-form ω is: The A α l (z) are the connection's coefficients. The horizontal vectors have the coordinate expression while the horizontal lift of a base vector U = U l ∂ ∂y l ∈ T π(z) N has the form We now specialize the above relations to the important case where the horizontal distribution H z M is defined to be the g-orthogonal of V z M with respect to a Riemannian metric g on M. Referring to a block representation of the metric g in the coordinates (x, y) like the above one (27) for (θ, u) we ask that every The connection one-form (37) becomes from (39) and it is called mechanical connection in the control theory for mechanical systems, where g is the kinetic energy of a mechanical system. In the orthogonal splitting case the metric g has the simpler form by (39) Since X v = (ω(X), 0) and using again the block representation (27) of g we have and

Parallel Transport Equation
Let γ : [0, T] → N be a smooth path in the base manifold and let z 0 ∈ π −1 (γ(0)). The parallel transport equation is the following ODE for the horizontal lift vector fielḋ The connection is called complete if the parallel transport equation has a solution defined on the whole [0, T]. If in (41) we have K = K(y) then the metric g is called bundlelike metric. The main geometric consequence is that if we introduce the Riemannian manifold (N, K) then the horizontal lift is an isometry and the solution z(t) of the parallel transport equation is a curve that projects over γ of the same length.

Some Applications of Exponential Families with External Parameters
In this Section we apply the geometric framework of the previous Section 4 to the fibration π : Z −→ U, π(z, u) = u introduced in (14). We can also consider the inverse ϕ of the map z → f (z) = p z and introduce the fibration Sinceπ −1 (u) = E (u), fibers ofπ are exponential families for every fixed value of the external parameters. One can show that the orthogonal splitting of TZ induces and orthogonal splitting of TF with respect to the Fisher metric (see [21]).

Thermodynamic Length
Let t → z(t) = (θ(t), u(t)) ∈ Z, t ∈ [0, T] be a path in parameter space. Define the time-dependent relative entropy along the path as D(t) = D(p(z(t))|p(z(0)) and compute the Taylor expansion of D(t) at t = 0. A direct computation shows that D(0) = 0, D (0) = 0 hence where ż 2 g is the scalar product with respect to the Fisher metric in (Z, g). It holds that The quantity 2D(dt) = ż 2 g can be related to the entropy change rate dσ/dt of the heat bath and to the total system entropy production rate dF/dt in a non quasi-static evolution of the system by the formula (see [14]) Therefore ż 2 g is a measure of the system entropy production rate dσ sys /dt in a non-quasi static evolution of the system. When integrated along the finite time evolution protocol z(t), the quantity is called action of the path and can be interpreted as the thermodynamic cost (loss in the entropy transfer due to the system entropy production) associated to the protocol therefore it is a measure of the dissipated (non available) work. The quantity (see [11,12,15]) is called the thermodynamic length of the path z(·). By the Cauchy-Schwartz inequality one obtains the inequality (see [14]) showing that the thermodynamic length (TL) gives a lower bound on the dissipated work in a non quasi-static evolution of the system [11,15]. The above relation is used when studying the controlled evolution of classical and quantum small thermodynamic systems, e.g., molecular motors (see [15]). Using the representation (41) of the scalar product with respect to the Fisher metric g we have the interesting formula for the TL of a controlled exponential family In particular, if the path z is the horizontal lift of a path u in the external parameter space thenż = hor(u) and ω(ż) = 0. If moreover the metric g is bundle-like with respect to the fibration π we have K(z) = K(π(z)) and the thermodynamic length can be expressed as showing that TL depends solely on the external parameters evolution u = u(t).

Isentropic Evolution Driven by External Parameters
We have recalled in Section 3 that the elements of a controlled exponential family F where q = e K ∈ P are the solution of the constrained minimization problem for the relative entropy D(p|q) of the form (11) p(x; c, u) = eθ ·h(x,u)−ψ(θ,u)+k(x) whereθ =θ(c, u) is uniquely determined by inverting the gradient map ∂ θ ψ(θ, u) = c. We have that where S(c, u) = ψ − θ · ∂ θ ψ is the entropy of the statistical system when the information on the system is described by the constraint h = c. In the following we consider D(c, u) as a function of (θ, u) knowing that θ is in a one-to-one correspondence with c. Let us compute the differential of D(θ, u) corresponding to a infinitesimal variation of the parameters z = (θ, u). We have More in detail, using (28) we obtain and using (29) we obtain so collecting the results and using (40) we have dD(z) = θ α g αβ dθ β + g αk du k = θ α A αβ dθ β + M αk du k = θ α A αβ ω β and the following proposition holds Proposition 1.
(1) The variation of entropy for an infinitesimal change in the parameters z = (θ, u) can be expressed using the g-orthogonal Ehresmann connection ω on π : (2) the change in entropy for the system along a given path z = z(t), t ∈ [0, T] in parameter space is given by (3) since ω(ż) = 0 for an horizontal path, the horizontal liftż = horγ of a path γ in the external parameter space U gives an isentropic (∆S = 0) evolution of the system.
Note that the horizontal lift do not represent all the possible isentropic evolution of the system. These are characterized by the weaker (with respect to ω(ż) = 0) condition θ · Aω(ż) = 0. Let us investigate this condition using the general relation (10) that we can now write as If we want to gain insight into the above relation using a thermodynamic analogy, then h α is the α-type energy, dQ α = A αβ ω β is the α-type heat exchanged and dW α the α-type work exchanged. If we interpret the natural parameters θ as the α-type inverse temperature θ α = 1/T α then (46) display as Therefore an horizontal path corresponds to the condition dQ α = 0 for all α and certainly it represents an isentropic evolution of the system, but we can have an isentropic evolution even if dQ α = 0 if the heat fluxes divided by their temperatures have a zero sum. As a final remark, note that in the exponential family we have the scalar product θ · h hence the inverse temperature vector θ ∈ R a should be seen as an element of the dual space of the h ∈ R a vector space and not as a point in a local coordinate chart. See [33] on this point.

Information Geometry of Gradient Systems
In this Section we consider a class of evolution problems described by a Fokker-Planck type equation (FPE) on a regular connected domain X ⊂ R n which is open and bounded. We write FPE as in [34] (i, j = 1, . . . , n, repeated indices are summed) where D i (x) is the drift field, D ij (x) is the symmetric diffusion matrix, ∇· denotes divergence and S is the probability current To ensure that a solution p(x, t) is normalized to one for all t ≥ 0 we need to ask that is S · ν = 0 on ∂X. We restrict to the case that the diffusion matrix is diagonal D ij = d i (x)δ ij and positive definite ( d i > 0) and therefore we rewrite S as Moreover we suppose that the drift field is of the form where φ is a function defined on X. A stationary solution p ∞ of FPE is obtained if we have S i (p ∞ ) = 0 for all i that is from (48) One can show ( [34], Chapter 6) that in this setting the stationary solution to FPE (47) is unique. We can rewrite the FPE using p ∞ from (48) and (49) as follows or in compact notation as where D is the diagonal diffusion matrix. The trend to the equilibrium can be studied We are free to set the value of the natural parameters and we set θ = θ 1 where θ 1 is a feasible value. The explicit solution of FPE (51) is difficult to study and one could be content with the study of the time evolution of the average values of the observables h α that is the functions along the unknown solution of FPE. With this aim, it is natural to consider the following: (Approximation problem): to find the time evolution of the natural parameters θ = θ(t) such that the density has the same average values of the unknown solution of FPE i.e., This strategy (called Dynamic Maximum Entropy method in [3]) seems reasonable because the exponential density (55) is the maximum entropy distribution which satisfy the constraints of the form h = c therefore it contains exactly the required amount of information needed to satisfy the average values constraint. In the following we will investigate the interplay between the following three densities: (1) the unknown solution p of FPE, (2) the approximating exponential density p θ and (3) the exponential equilibrium density p ∞ = p θ 1 .

Triangular Relation
To start with note that for (56) (dropping the explicit time dependence in θ and p) therefore the condition (57) can be rewritten as Note that the equation ∂ θ ψ(θ) = h p has a unique solution θ = θ(t) for all t ≥ 0. Next, let us compute the distance in entropy between the solution of FPE and its stationary solution (55) with θ = θ 1 and the distance between p and p θ (here θ = θ(t) is the value of the approximating solution satisfying (59)) which coincides with the Bregman divergence (see [27]) of the convex function ψ Collecting the above results (58) (60) (61) and summing the right hand sides we obtain the triangular relation (see [27], Theorem 1.2 or [22], Theorem 3.7) It follows that Proposition 2. The function θ(t) satisfies condition (57) of the Approximation problem if and only if the following relation (called generalized Phytagorean theorem in [27]) holds (see Figure 1) meaning that p θ(t) is the geodesic projection of p on the exponential family (flat submanifold) E satisfying to D(p|p θ ) = min{D(p|p θ ) : p θ ∈ E } that is p θ(t) is the best approximation of p on E with respect to the information gain.
where we have introduced the the symmetric matrix By comparing (66) and (68) we obtain a closed form ODE for θ since θ − θ 1 = 0 which is equation (5.2) in [3] or equation (12) in [4]. It can be given normal form since g αβ is invertible. In these paper the above equation is solved numerically and it is shown that it gives very good (sometimes surprisingly good) estimates of h p using h θ even if θ is far from θ 1 .
Using the information geometry tools we have shown that the above triangular relation (62) holds independently from the assumption that θ be close to θ 1 (called quasi equilibrium approximation in [2]). Moreover, is is evident from inspection of (65) that the substitution of p with p θ rendersḃ = 0 thereforeȧ =ċ. Note that if D ij = dδ ij and we substitute p with p θ in the above formula (67) we geṫ Hence, if p θ 1 satisfies a LSI, we have exponential speed of convergence of p θ to equilibrium distribution p θ 1 , which explains the good behavior of the approximation.

A Dynamic Approximation Problem with External Parameters
We now suppose that the drift field (54) ) which defines the FPE depends on external parameters because h = h(x, u). We consider the same dynamic approximation problem of Section 6.1 with the extra degrees of freedoms given by the external parameters u. The approximation condition (57) now reads

Appendix A. An Approximation Problem in Quantitative Genetics
We give a brief account of a classical problem in dynamics of quantitative traits whose approach to equilibrium described by a Fokker-Planck equation has been investigated in [2][3][4]. See also [37] for a gentle introduction to dynamics of populations. We consider a polygenic trait located in n biallelic loci (A, a). If we consider a sufficiently large population, the frequencies of genotype AA at locus i are described by n independent random variables x = (x 1 , . . . , x n ), x i ∈ [0, 1]. The dynamics of these allele frequencies can be described by a diffusion process under the action of stochastic forces which represent the effect of directional selection, dominance, mutation and random drift. This diffusion process is described by a linear Fokker-Planck equation whose equilibrium distribution is known from a long time ( see [38,39]) p(x) = e θ·h(x,u)−ψ(θ,u)+k(x) , x = (x 1 , . . . , x n ) ∈ X = [0, 1] n Below we show that it can be seen as a generalized exponential distribution with natural parameters θ = (θ 1 , θ 2 , θ 3 , θ 4 ) and external parameters u = (v, w) ∈ R 2n + . Setting d(x i ) = x i (1 − x i ), we introduce the following four observables: h 1 is the directional selection and v = (v 1 , . . . , v n ) are external parameters describing the effects on loci; is the (non integrable) neutral distribution of allele frequencies in absence of selection and mutation. Since the random variables x i are independent, the free energy can be factorized using Fubini theorem We have e ψ < +∞ if θ 3 , θ 4 > 0 so θ = 0 is a non feasible value. Moreover p(x) is integrable but tends to +∞ for x tending to ∂X if 1 > θ 3 , θ 4 > 0 while p = 0 on ∂X if θ 3 , θ 4 > 1. Since the observables h 1 and h 2 have linear dependence on the external parameters v and w, this statistical model fails the injectivity condition 1. therefore it is not identifiable. The Fokker-Planck equation can be given the form (51) where D = d(x i )δ ij = x i (1 − x i ) is diagonal, but D = 0 on ∂X therefore the inequalities like (52) that govern the trend to equilibrium are not strict ones due to the degeneracy of the diffusion matrix D on ∂X. This is a major source of difficulties in the analysis of this model as reported in [3]. A final remark is that the LSI sufficient condition Hess(V) ≥ σI becomes in this case (V = θ · h − ψ + k) Therefore Hess(V) is not bounded below and the LSI condition fails. In the above cited papers the dynamic approximation procedure exposed in Section 6 is introduced to compute the evolution of the averages of the observables h without solving the FPE which is a computationally hard problem.
In this paper we have shown that this model problem is partly amenable to the controlled exponential family framework with some insight, however remain peculiar difficulties that prevent from a complete analysis of the model. We refer the interested reader to the specific literature, see [3].