On the Limiting Behaviour of the Fundamental Geodesics of Information Geometry

The Information Geometry of extended exponential families has received much recent attention in a variety of important applications, notably categorical data analysis, graphical modelling and, more specifically, log-linear modelling. The essential geometry here comes from the closure of an exponential family in a high-dimensional simplex. In parallel, there has been a great deal of interest in the purely Fisher Riemannian structure of (extended) exponential families, most especially in the Markov chain Monte Carlo literature. These parallel developments raise challenges, addressed here, at a variety of levels: both theoretical and practical—relatedly, conceptual and methodological. Centrally to this endeavour, this paper makes explicit the underlying geometry of these two areas via an analysis of the limiting behaviour of the fundamental geodesics of Information Geometry, these being Amari’s (+1) and (0)-geodesics, respectively. Overall, a substantially more complete account of the Information Geometry of extended exponential families is provided than has hitherto been the case. We illustrate the importance and benefits of this novel formulation through applications.


Introduction
Information Geometry has developed enormously, both theoretically and in its range of applications, since the seminal works of [1][2][3].Excellent summaries of this approach, which we shall call classical, can be found in [4], and recently [5].This approach has the property that the fundamental geometric objects are smooth manifolds.In particular, they are open sets, of constant dimension.However, there has been recent interest in studying the Information Geometry of closures of exponential families, as defined in [6]: these closures typically being unions of manifolds of varying dimension.As discussed in [7], this development gives a more exact duality between sample and model space, which is the key to the intrinsic duality of Information Geometry.From an applications' point of view, studying closures of statistical manifolds is very natural in categorical data analysis [8,9] and graphical [10], random graph [11], and log-linear [12] models.A strongly related approach, which gives an excellent treatment of the closure of statistical models, uses algebraic geometry.See, for example, [13,14].
This paper focuses on extending the manifold-based approach of classical Information Geometry by looking at the limiting behaviour of key objects: (α)-geodesics, where we follow the standard notation in information geometry where α = +1 is the exponential representation and α = 0 is the Fisher/spherical representation of the manifold.To be precise, we note that we use the term "limiting" here to denote the behaviour of a geodesic as it approaches the boundary of the closure of an exponential family.This may mean that a natural parameter tends to infinity, in the case of a (+1)-geodesic, or a path-length parameter tends to a finite value, in the case of the (0)-geodesics.
We look specifically at the finite, discrete case and study two key types of geodesics α = +1 and 0. In particular, we show how these two types of geodesics have fundamentally different boundary behaviours.By studying the first of these, we construct an explicit representation of the limiting behaviour of finite dimensional exponential families.The behaviour of the second was introduced in our recent paper [15], which studied applications involving Markov chain Monte Carlo (MCMC) methods.This paper gives the theoretical foundations to the MCMC applications found in the early work.It also extends the results on (0)-geodesics found there to show both the asymptotic and limiting behaviour of (+1)-geodesics.We begin with an insightful example.
Example 1.The extended trinomial model.
Figure 1 shows, in the simple case of the extended trinomial model, the behaviour of key geodesics.The model is plotted in a mean, or (−1)-affine, parameterisation where boundaries are completely explicit, being the points where at least one π i = 0.In this figure, three geodesics, passing through the same point and having the same initial tangent vector, have been computed.The (−1)-geodesic in this parametrisation is, of course, a straight line and this cuts the boundary of the extended family (see panel (a)).The (0)-geodesic, panel (b), smoothly touches the boundary.We show that this is generic behaviour.We note here that the closed loop nature of the (0)-geodesic is not generic extended exponential family behaviour.Instead, as we explain in Section 3, it reflects something quite specific about the multinomial distribution.The (+1)-geodesic, panel (c), reaches the boundary at a vertex which, as we also show, is generic behaviour.Furthermore, it approaches the vertex close to one of the edges of the simplex.In fact, it does this exponentially fast.Again, we show that this behaviour is quite general.The rest of the paper is organised as follows.Section 2 looks at the limiting behaviour of (+1)-geodesics.This allows us to explicitly characterise the-sometimes subtle and surprising-boundary behaviour of general discrete exponential families.The results clearly illustrate the differences between the open-set, manifold-based classical information geometry and the geometry required to take into account the boundaries that naturally occur in categorical data analysis.Section 3 looks at the limiting behaviour of Fisher or (0)-geodesics.We show how the boundary behaviour of these geodesics allows them to be used as tools that have important applications.These include designing efficient MCMC algorithms and solving optimisation problems on the closures of exponential families.Throughout, we illustrate our results visually with simple but representative examples.

Limits of (+1)-Geodesics
This paper looks at general finite exponential families, as used in categorical data analysis, graphical modelling, random discrete graph models, and log-linear modelling.Each of these models can be embedded in a sufficiently high dimensional closed simplex.
The key intuition behind the behaviour of (+1)-geodesics is that they are normalised exponentials of linear functions (see Definition 1).Hence, in the limit, their behaviour is determined by the maximisers of these linear functions.The structure of these maximizers is further determined by the polar duality of the support set of the exponential model.For illustrative examples, see Figures 2 and 3, and, in Theorem 1, we give explicit asymptotic expressions for this behaviour.(a) Mean parameters µ 1 µ 2 q q q q q q q q q q q q q −6 −4

Notation
We start with some notational issues.Define where k ≥ 1 is the dimension of the simplex.Let I : = {0, ..., k} be the labels that we associate with the vertices of ∆ k .In a convenient mild abuse of notation, identify any proper-i.e., not the relative interior (r.i.) of ∆ k -face of ∆ k with the set ∅ ⊂ F ⊂ I of vertices spanning it-i.e., {i ∈ I : π i > 0}, or, equivalently, with the complementary set ∅ ⊂ F C ⊂ I-i.e., {i ∈ I : π i = 0}.For any m ≥ 1, let 1 m denote the vector of m 1s, C m : = {c ∈ R m : 1 T m c = 0} the (m − 1)-dimensional subspace of all centred (i.e., zero sum) vectors, C m : = I m − m −1 1 m 1 T m the (Euclidean) orthogonal projector of R m onto C m , and put Q : = {the unit Euclidean sphere in R p } ≡ {q ∈ R p : q T q = 1}.
Using this notation, we can define a p-dimensional full exponential family in r.i.(∆ k ) as follows.
Since these exponential families are (+1)-affine sets, the geometry of all one dimensional affine subsets-i.e., (+1)-geodesics-determines the underlying geometry.Thus, for each q ∈ Q, define v(q) = (v i (q)) : = Vq, a centred unit vector in R k+1 .The set M q , comprising all π(θ q ) = (π i (θ q )), θ q ∈ R, with i ∈ I, is a one-dimensional exponential sub-family of M. Indeed, M q is a (+1)-geodesic in both M and r.i.(∆ k ).As q varies over Q, we get all such (+1)-geodesics in M, and the strategy of this section is to carefully analyse the boundary behaviour of each M q .

Limits at the Boundary
In [15], we gave an explicit representation of how the p-dimensional model ( 1) is attached to the boundary of ∆ k .The key idea was to analyse the polar dual of the convex hull of the columns of V.This convex hull defines the extremal points of the mean parameters, and its polar dual determines the directions of recession [12].These are the directions in the natural parameter space that attain these extreme points.Here, we look much more explicitly at the way that these limits are attained.
Without loss, relabel the bins so that: the elements of v are in non-increasing order and, for each j, the m j corresponding values of π 0 i are also in non-increasing order.Then, we may replace the single index "i" by a double index "(j, r)", thus: and, correspondingly, , where π 0 (j) = (π 0 (j,r) ), j = 0, ..., g, r = 1, ..., m j .
Definition 3. Using this double index, we can define some key notation, the requirement j > 0 being implicit in all terms but the first: Before giving the main results of this section, we make some comments on these terms.Since all terms δ j > 0, each j (θ)-in particular, (θ)-tends to zero exponentially fast as θ → ∞, and we will compute first order expansions in these terms.While these are "first order", we emphasise the exponentially fast convergence noted in Example 1.
We look at the limiting behaviour of key geometric terms: probabilities, tangent vectors, and the Fisher information as θ → ∞.We comment that since we are working in the closure of an exponential family, we cannot assume that the usual, open set based, geometric intuition holds.Thus, for example, even the existence of tangent vectors, and their transformation rules, need careful checking.Theorem 1.With 1 ≤ r 0 ≤ m 0 defining a bin (0, r 0 ) and for 1 ≤ r j ≤ m j , we have the following asymptotic expansions as θ → +∞.
(i) For the probabilities, we have: (ii) The mean parameter has the expansion: (iii) The Fisher information has the expansion: (iv) Finally, for tangent vectors, with respect to µ, we have the expansions: Proof.See Appendix A.
Corollary 1.The set of limit points of a p-dimensional exponential family in r.i.(∆ k ) is a finite union of exponential families, each lying in its own specific proper face of ∆ k .
Proof.The limit points in Theorem 1 are functions of an initial point π (0) and an initial direction v.However, the support set-denoted by I(v)-of the limit points is purely a function of v. Furthermore, since the initial point can be anywhere in the p-dimensional exponential family, it can be written as having general element The corresponding limit points have general elements positively proportional to which is an exponential family with support I(v).It is not, of course, necessarily in the minimal form since the columns defining V, once restricted to the subset I(v), need not be linearly independent.
It is important to note that for a fixed {π 0 i } the set of limit points of (+1)-geodesics of the form given by Equation (3) does not form the complete closure of a statistical model (see Example 3 for an illustration of this fact).
which is consistent with the fact that ∂π(θ)/∂µ is a tangent vector in (−1)-coordinates and, hence, is centred.
Proof.By direct calculation.

Example 2. Extended Trinomial Model.
We return to the extended trinomial model in order to visualise and interpret the results of Theorem 1 and its corollaries.In Figure 2a, we select a fixed π 0 and a number of different unit vectors, q, to define a set of (+1)-exponential families.For each value of q, we compute the corresponding double index.The "generic" case has g = 2 and m 0 = m 1 = m 2 = 1.That is, the vector has no ties, and thus has a unique maximum and minimum value.These cases are plotted with a solid line in the figure.We see that these all converge to a vertex, exponentially approaching one of the edges, as predicted.The process of convergence is emphasised in panel (b), showing the convergence in detail.
There are two non-generic cases, where there is a tie for the maximum, or the minimum value.Note that, since the vector is centred and non-zero, all three values cannot be the same.In this case, we have g = 1 and either m 0 = 2, m 1 = 1 or m 0 = 1, m 1 = 2.As the theorem shows, the limit, in any such case, lies on the face spanned by the two largest (smallest) values, with the other limit point being a vertex.The position of the point on the edge is determined by the expression . These geodesics are plotted with a dashed line in the figure .We note that, in this special case, these (+1)-geodesics are also (−1)-geodesics.
When we look at the set of tangent vectors, we see behaviour considerably at variance with what would be expected in a manifold-based setting.As mentioned above, since we are not working in open sets, care is needed in checking even standard properties of tangent vectors.First, we note that the set of tangent vectors to (+1)-geodesic, which meet at a vertex, has a conal, rather than a vector space structure.In addition, if we consider the "generic" case where g = 2, then, from Theorem 1 (iv), we have all limiting tangent vectors that are parallel to (a permutation of) (1, −1, 0) , for all such corresponding (+1)-geodesics.These are plotted with solid lines in the figure, and a close-up of the local behaviour; panel (b) shows clearly that all tangent vectors have the same limit.Note that this means that the exponential map, which maps tangent vectors to points in a manifold, cannot be uniquely defined at a boundary point.The fact that the set of limiting directions at a vertex is a (closed) cone comes not, principally, from the "generic" case, but, rather, from the case where g = 1.
In the figure, one such geodesic converging to (0, 0) is shown with a dashed line.However, for this case, the limiting tangent direction depends on π 0 , so all values in the relative interior of the cone can be attained, while it is the boundary directions of the cone that come from the generic case.
A two-dimensional extension of the binomial family, as described by Altham in [16], is given by where we take T(y) = (y − ȳ) 2 , y = 0, • • • , k.This allows both over and under-dispersion relative to the binomial model and for large k can be thought of as a finite, discrete approximation to the normal model (see [15] for more details).Figure 3 shows, for k = 12, some of the details of the boundary convergence of (+1)-geodesics that define this family.Panel (a) is shown in the mean, or (−1)-affine, parameterisation.The solid lines are "generic" (+1)-geodesics that converge to a vertex, as predicted by Theorem 1.As can be seen by close inspection, all of these geodesics have a tangent vector that is parallel to the corresponding edge.The dashed lines correspond to the case where there is a tie in the largest component of the initial direction of the geodesic.
Panel (b) of Figure 3 shows the same geodesics in the natural, or (+1), parameters.Here, the dual polytope is shown as the convex hull of a set of vertices, each of which corresponds to a "direction of recession" (for details, see [12] or [15]).The polar duality can be clearly seen, with all (+1)-geodesics cutting an edge in (b) intersecting the corresponding vertex in panel (a), while the dashed lines hit vertices in (b), intersecting edges in (a).

Fisher Geodesics
We turn now to the Fisher, or (0), geodesic in an exponential family embedded in a finite simplex.For completeness, we recall that the 0-representation maps the simplex to the sphere, and the Fisher metric is the pullback of the standard metric on the sphere.Here, we shall define a new class of geometric object-the extended Fisher geodesic-which lies naturally in the extended exponential family.
In an exponential family, Fisher geodesics are the geodesics of the Levi-Civita connection and have the property of being (local) minimisers of path length and energy [17].They were one of the first differential geometric objects studied in statistics [1].In general, they cannot be computed in closed form, except in a few special cases, but can be computed numerically using their defining differential equations.Since these equations need to be defined on open sets, the analysis here is required to understand their limiting behaviour in the closure.
In Figure 1b, we see a Fisher geodesic in the extended trinomial model.This is a case where there is a closed form (see [1], p. 32).It can be directly calculated in the (0)-representation of the simplex, given by ξ i = √ π i , i = 0, 1, . . ., k.The image of Fisher geodesic connecting ξ and ξ is the set of points of the form where i = 0, • • • , k, and c(t) is the positive normalising constant, which ensures ∑ k i=0 ξ i (t) 2 = 1.Since we are working in the extended multinomial model, there is no constraint on the positivity of ξ i (t) and the figure shows the image of the full great circle in the sphere, which is the Fisher geodesic in the (0)-representation.We can alternatively think of it as the union of Fisher geodesics in the relative interior, which is smoothly connected at the boundary.It is this smooth touching of the boundary that motivated the results in this section.In fact, the curve in Figure 1b was computed by solving the underlying differential equation numerically using the methods of Section 3.2 below.The local solution is guaranteed to exist in open neighbourhoods, but the numerical solution was extended smoothly into, and out of, the boundary.The main result of this section shows, both theoretically and numerically, that this a general property of extended Fisher geodesics in extended exponential families in the simplex.
We note that the way that the (0)-geodesic smoothly intersects with the boundary in Figure 1b is generic in all exponential families, as shown in Theorem 6.However, for clarity, we also note that the closed nature of the (0)-geodesic, seen in the figure, is a special property of multinomial models.This follows since they are equivalent to standard spheres under their Fisher Reimannian structure.Hence, in this special case, the extended geodesics are the images of great circles and hence closed.In general, as illustrated in Example 3, the geodesics do not form closed loops.

The Fisher Geodesic and the Boundary
In order to define extended Fisher geodesics, we need to consider how to measure the energy of a curve in an extended model.In particular, we need to understand the energy of a path whose limit lies in the boundary, as seen in Figure 1b.The following result on how the Fisher information behaves near the boundary follows from results in [18] and was stated in [15].It shows the singularity of the metric, in both the mean and natural parameters at the boundary.The importance of this result is to emphasise that standard Riemannian geometry does not extend directly to the boundary of the extended exponential family.
Theorem 2. (a) Let {µ i } be a sequence of points in the mean parameter space of an exponential family, lying in r.i.(∆ k ), which converge to µ, which lies on a face of the boundary polytope, defined by the half space, characterised by an equation of the form a, µ ≤ 1, for a unit normal vector a.
Let I(µ) be the Fisher information, λ min (µ) its minimum eigenvalue, assumed simple, and e min (µ) a corresponding unit eigenvector, and unique up to overall sign.Then, lim i→∞ λ min (µ i ) = 0 and lim i→∞ e min (µ i ) = a.
(b) Let {θ i } be the corresponding sequence to µ i in the natural parameters, I(θ) := I(µ(θ)) −1 the Fisher information, with λ max (θ) its maximum eigenvalue, assumed simple, e max (µ) a corresponding unit eigenvector, unique up to overall sign.Then, lim i→∞ λ max (θ i ) = 0 and lim i→∞ e max (θ) = a, which is the vertex in the polar which corresponds to the face in (a).
From the proof of Corollary 1, we have that the closure, M, of an exponential family M ⊂ r.i.(∆ k ) can be written explicitly as a finite union of exponential families each lying in its own, proper, face.
We first define what it means for a curve to be smooth in the closure.
It is defined to be S-smooth in the closure when it can be partitioned into the union of smooth subpaths each in an exponential family, for j = 1, ..., J − 1, where: (iii) For each j = 2, . . ., J, and s = 1, . . ., S, We denote the set of S-smooth curves as C S .
A curve in C 1 has a finite arc length if lim where •, • π is the Fisher information in ∆ k .Furthermore, the curve has finite energy if lim It is common, and convenient, in Riemannian geometry [17] (Theorem 13, p. 128), to characterise Levi-Civita geodesics as being local minimisers of the energy functional, since these are the same paths that are local minimisers of the length functional.We follow this approach here when extending the definition of a (0)-geodesic to the extended exponential family.
We can now define an extended geodesic on an extended exponential family.
Definition 5. Let M ⊆ ∆ k be an extended exponential family, and let π, π ∈ M. Define the set of finite energy paths by Definition 6.If γ ∈ D(π, π), we call γ an extended Fisher geodesic if it (locally) minimises the energy functional.
Theorem 3. (a) Consider a curve γ ∈ C 1 , where γ(t) ∈ r.i.(∆ k ) for t ∈ [0, 1) and lim t→1 γ(t) lies in the proper face defined by a support set I * ⊂ I, i.e., if γ(1) = 0, then γ (1) = 0.Then, the curve has finite energy implies that lim t→1 dγ dt lies in the tangent space to the proper face defined by the support set I * .
(b) Let M ⊂ r.i.(∆ k ) be a p-dimensional exponential family, whose closure is M. Let π ∈ M and let π ∈ M lie in a proper face of ∆ k defined by the index set F 1 .If γ ∈ D(π, π) has the property that γ(t) ∈ M for t ∈ [0, 1) and is an extended Fisher geodesic, then we have: in the relative interior, after writing γ| M in terms of the mean parameters of M as (µ where Γ k ij (µ) are the Christoffel symbols for the Levi-Civita connection of the Fisher metric.On the boundary, we have that the curve γ has the property that lim t→1 dπ dt (t) is tangent to the face containing π.
Proof.See Appendix B.

Computing the Extended Fisher Geodesic
Figure 1b shows an example of a Fisher geodesic's limiting behaviour.In particular, it is smooth on the boundary.However, it shows more, as we see a smooth curve in the extended multinomial model that has three points lying on the edges and three disconnected, Fisher geodesics in the relative interior.These geodesics are smoothly connected in the extended family.This example motivated our investigation of the properties of extended Fisher geodesics.In this section, we investigate if it is possible to numerically find extended Fisher geodesics for arbitrary exponential families and numerically investigate their limiting properties.We have, from the consideration of the limiting properties of (+1)-geodesics, that, near the boundary, the exponential family lies almost parallel to a low-dimensional face of the simplex.Thus, locally (0)-geodesics in general exponential families will behave rather like (0)-geodesics in multinomial families, shown in Figure 1b, and reflect back into the the interior.At least locally, the geodesic behaves like the projection of the continuation of geodesics on the sphere.Example 4 below shows an explicit example of such a solution, where we have added the so-called reflection principle, Definition 7, in order to ensure both uniqueness and numerical stability in the solution.
The characterisation of the exponential family via Equation ( 1) is the familiar explicit representation in terms of the natural parameters, but is problematic numerically in representing the limiting distributions since the natural parameter needs to be unbounded to attain the boundaries.Thus, we replace this with a (−1)-representation, and then, invoking the reflection principle, we will work numerically in a (0)-representation.
From above, a p-dimensional exponential familiy is defined by v (1) , . . .v (p) , an orthonormal set of centred vectors.This set can be extended to form a k-dimensional orthonormal set of centred vectors, by selecting u (p+1) , . . ., u (k) .Within r.i.(∆ k ), a (0)-geodesic within this exponential family can be characterised by a set of differential equations of the form: for where Equation ( 16) constrains the curve to lie in the space of unit measures, Equation ( 17) forces the solution to lie in the p-dimensional exponential family, and finally Equation ( 18) constrains the solution to be a Fisher geodesic inside that family.
In order to solve these equations numerically, we discretise in the standard way, but, near the boundary, we recognise that there is numerical instability in Equations ( 16)-( 18) for small values of π i .From the analysis of Section 2, we know that the small values are a fundamental part of the limit process.To illustrate such a solution, consider the following example.We return to Altham's two-dimensional extension of the binomial family.We take the Equations ( 16)-( 18) and solve them numerically, in order to get an extended Fisher geodesic.This numerical solution is shown in Figure 4, with panel (a) showing the complete extended exponential family and panel (b) showing detail of the boundary behaviour.
As can be seen, in this example, the extended Fisher geodesic smoothly touches the boundary in two places.In fact, we can think of the path as the smooth union of a set of extended geodesics.q q q q q q q q q q q q q 2 4 6 As the previous example shows, we can think about the smooth union of extended geodesics.To ensure smoothness, we employ the following idea, which ensures the paths 'reflect' at a boundary and join in a smooth way.

Definition 7. The reflection principle.
To the conditions of Theorem 3, we add the condition that the limit of the second derivative, d 2 π dt 2 , is finite and continuous on the path and use this to smoothly connect extended Fisher geodesics at the boundary.
In Appendix C, we discuss how we implement code that computes smooth unions of extended Fisher geodesics numerically using this principle by working in the ξ = √ π parameters.

Applications
While Fisher geodesics were studied very early in the Information Geometry literature, the importance of their applied utility is still an open question.The geodesic and the corresponding geodesic distance seem to be a natural thing to study and has, for example, found applications in image analysis (see [19,20], for example).
The seminal paper [21] illustrates a very important way that Fisher Riemannian geometry can have an impact on statistical practice.It considers, under regularity, parameter spaces of statistical models as smooth manifolds, and designs highly efficient Markov chain Monte Carlo algorithms by using Langevin methods on Riemannian geometric structures.In our recent paper [18], we showed the way that Fisher geodesics can smoothly attach to the boundaries of exponential families, and how this is one of the ways that the MCMC method achieves its efficiency.The results of this paper give the details of the results announced there.
In the paper [18], it was also shown how the boundary effects in extended exponential families mean that the log-likelihood can be very far from approximately quadratic in the mean and natural parameters.
Returning to Altham's model, Figure 5 shows an example of the shape of the log-likelihood function, in the mean parameters, when the maximum likelihood estimate is near the boundary.We see that the log-likelihood is very far from being approximately quadratic.q q q q q q q 4 6 8 10 Example 5 illustrates, in a simple visual way, how the log-likelihood can be far from approximately quadratic when near the boundary.The condition that the maximum likielihood estimate is on, or close to, the boundary is very common in categorical data analysis [9] and other discrete models [11].This means that that standard iterative gradient based method, such as Newton's method, can fail in the mean or natural parameters.This was explored in [22] where it was shown that the boundary effects mean that commonly used first order asymptotic analyses, in, say logistic regression, can also fail.
We propose here that the smooth way that the Fisher Riemannian geometry deals with the boundary can be a useful tool to help deal with these problems.If we explore the model space using its Riemannian geodesics structure, we can smoothly reach the boundary.We see that this is in sharp contrast to working in the mean parameters, which, using Newton's method, would jump outside the boundary, or the natural parameters, which can never reach the boundary in finite time using a gradient approach.We note in fact Amari's highly efficient natural gradient method [23], while often motivated by divergence ideas, exactly uses the Fisher Riemannian geometry.

Figure 1 .
Figure 1.Key geodesics in the extended trinomial model.

Definition 4 .
A (−1)-representation of a curve in the closure, M, of an exponential family in