^{1}

^{2}

^{*}

This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

A broad view of the nature and potential of computational information geometry in statistics is offered. This new area suitably extends the manifold-based approach of classical information geometry to a simplicial setting, in order to obtain an operational universal model space. Additional underlying theory and illustrative real examples are presented. In the infinite-dimensional case, challenges inherent in this ambitious overall agenda are highlighted and promising new methodologies indicated.

The application of geometry to statistical theory and practice has seen a number of different approaches developed. One of the most important can be defined as starting with Efron’s seminal paper [

The strengths usually claimed of such a result are that, for a worker fluent in the language of information geometry, it is explicit, insightful as to the underlying structure and of clear utility in statistical practice. We agree entirely However, the overwhelming evidence of the literature is that, while the benefits of such inferential improvements are widely acknowledged in principle, in practice, the overhead of first becoming fluent in information geometry prevents their routine use. As a result, a great number of powerful results of practical importance lay severely underused, locked away behind notational and conceptual bars.

This paper proposes that this problem can be addressed computationally by the development of what we call computational information geometry. This gives a mathematical and numerical computational framework in which the results of information geometry can be encoded as “black-box” numerical algorithms, allowing direct access to their power. Essentially, this works by exploiting the structural properties of information geometry, which are such that all formulae can be expressed in terms of four fundamental building blocks: defined and detailed in Amari [

The paper is structured as follows. Section 2 looks at the case of distributions on a finite number of categories where the extended multinomial family provides an exhaustive model underlying the corresponding information geometry. Since the aim is to produce a computational theory, a finite representation is the ultimate aim, making the results of this section of central importance. The paper also emphasises how the simplicial structures introduced here are foundational to a theory of computational information geometry. Being intrinsically constructive, a simplicial approach is useful both theoretically and computationally. Section 3 looks at how simplicial structures, defined for finite dimensions, can be extended to the infinite dimensional case.

This section shows how the results of classical information geometry can be applied in a purely computational way. We emphasise that the framework developed here can be implemented in a purely algorithmic way, allowing direct access to a powerful information geometric theory of practical importance.

The key tool, as explained in [

with a label associated with each vertex. Here, ^{k}

This paper builds on the theory of information geometry following that introduced by [_{i}_{iϵA}._{i}_{i}_{ϵa},

_{i}_{i∈A} is_{mix}_{mix}

In Definition 1, the space of (discretised) distributions is a −1-convex subspace of the affine space, (_{mix}_{mix}

Examples 2 and 3 are used for illustration. The second of these is a moderately high dimensional family, where the way that the boundaries of the simplex are attached to the model is of great importance for the behaviour of the likelihood and of the maximum likelihood estimate. In general, working in a simplex, boundary effects mean that standard first order asymptotic results can fail, while the much more flexible higher order methods can be very effective. The other example is a continuous curved exponential family, where both higher order asymptotic sampling theory results and geometrically-based dimension reduction are described.

_{1}(_{21}

_{ij}, its inverse, g^{ij}^{(−1)},

_{h}^{15};

_{1} = (1, 2, 3,4), _{2} = (1, 4, 9, −1) _{i}

One of the most powerful set of results from classical information geometry is the way that geometrically-based tensor analysis is perfect for use in multi-dimensional higher order asymptotic analysis; see [

Two more fundamental issues, which the global geometric approach of this paper highlights, concern numerical stability. The ability to invert the Fisher information matrix is vital in most tensorial formulae, and so understanding its spectrum, discussed in Section 2.4, is vital. Secondly, numerical underflow and overflow near boundaries require careful analysis, and so, understanding the way that models are attached to the boundaries of the extended multinomial models is equally important. The four-cycle model, to which we now return, illustrates computational information geometry doing this effectively.

We focus now on the second numerical issue identified above. In any multinomial, the Fisher information matrix and its inverse are explicit. Indeed, the 0-geodesics and the corresponding geodesic distance are also explicit; see [

With π_{−0} denoting the vector of all bin probabilities, except π_{0}, we can write the Fisher information matrix (in the +1 form) as

This has an explicit spectral decomposition, which can be computed by using interlacing eigenvalue results (see for example [_{1}, …, _{k}_{1} >…> λ_{g} >

with
_{i}_{i}

We give a complete account of the spectral decomposition (SpD) of _{1} ≥…≥ π_{k}

Case 1 For some _{−0} vanish: the sub-case _{0} = 1 ⇔ _{+}_{1},…, _{l})^{t}_{+} _{+}

follows at once from that of

Case 2

Case 3 _{k}

where _{k} = I_{k} – J_{k}_{k}^{⊥}, while
_{k}_{0} =

Case 4
_{1} >…> λ_{g} >

This is the generic case, having the spectrum of (_{m}

where: _{i}_{−} _{j}_{i+}_{j}

In particular,
_{i}_{i}_{i}

so that
_{i}_{i}_{+1}. For, considering the graph of _{i}_{i}_{+1})/2 + _{i}_{i}_{+1})/2) (−1 <

whose unique zero δ_{∗} over (−1, 1) is positive whenever, as will typically be the case, _{i} = m_{i}_{+1} (both will usually be one), while (_{i}λ_{i}_{i}_{+}_{1}λ_{i}_{+1}) < 1/2. Indeed, a straightforward analysis shows that, for any m_{i}_{+1}, δ_{∗} _{i}_{i} → 0.

Mixture modelling is an exemplar of a major area of statistics in which computational information geometry enables distinctive methodological progress. The −1-convex hull of an exponential family is of great interest, mixture models being widely used in many areas of statistical science. In particular, they are explored further in [

_{i}^{k} with each π_{i}_{0} _{k} and s_{0} _{k}, let B_{0}),…, _{k}

_{j}_{0}).

_{i}θ_{j}

This section will start to explore the question of whether the simplex structure, which describes the finite dimensional space of distributions, can extend to the infinite dimensional case. We examine some of the differences with the finite dimensional case, illustrating them with clear, commonly occurring examples.

In the previous sections, the underlying computational space is always finite dimensional. This section looks at issues related to an infinite dimensional extension of the theory in that paper. There is a great deal of literature concerning infinite dimensional statistical models. The discussion here concentrates on information geometric, parametrisation and boundary issues.

The information geometry theory of Amari [

In this paper, in contrast to previous authors, a simplicial, rather than a manifold-based, approach is taken. This allows distributions with varying support, as well as closures of statistical families to be included in the geometry. Another difference in approach is the way in which geometric structures are induced by infinite dimensional affine spaces rather than by using an intrinsic geometry. This approach was introduced by [

In exponential families, the −1-affine structure is often called the mean parametrisation, and using moments as parameters is one very important part of modelling. In the infinite dimensional case, the use of moments as a parameter system is related to the classical moment problem—when does there exist a (unique) distribution whose moments agree with a given sequence?—which has generated a vast literature in its own right; see [

The geometry of the Fisher information is also much more complex in general spaces of distributions than in exponential families. Simple mixture models, including two-component mixtures of exponential distributions [

The rest of this section looks at the topology and geometry of the infinite dimensional simplex and gives some illustrative examples, which, in particular, show the need for specific Hilbert space structures, discussed in the final section.

For simplicity and concreteness, in this section, we will be looking at models for real valued random variables. In this paper, we restrict attention to the cases where the sample space is ℝ^{+} or ℝ and has been discretised to a countably infinite set of bins, _{i}^{k}_{emp}

_{|ℐ|<∞} Δ_{ℐ}

In what follows, it is important to note that for any given statistical inference problem, the sample size, _{p}

_{p} limit of the zero sequence for p ∈

_{p} extreme points of ∆, for p ∈_{i} (i ∈

For _{p}

^{(n)} ^{(n)},

^{(n)},

It is immediate that the spaces, ∆ and
_{1} and that
_{∞}

In the same way as for the finite case, the −1-geometry can be defined using an affine space structure using the following definition.

_{mix}_{mix}

_{i}_{i})

In order to define the +1-geometric structure, we also follow the approach used in the finite case. Initially, to understand the +1- structure, consider the case where all distributions have a common support, _{i}

_{i}

On sets of +1-points with the same support, we can define the +1-geometry in the same way as in the finite case. With “exp” connoting an exponential family distribution, we have:

_{exp} to be all_{exp}_{i}

_{exp}_{exp}

_{i}_{i}), is also. Since we have:

This result shows that sets in

In order to get a sense of how the +1-geometry works, let us consider a few illustrative examples.

Thus, in this example, the path joining the two distributions is an extended, rather than natural, exponential family, since we have to include the boundary point where the mean is unbounded.

_{i}_{i}_{+}_{1}],

_{n} = n × ϵ for some fixed, ϵ >

_{i}_{0})}

_{0}. In the case where_{0}, _{i}_{0}_{i}_{i}_{0})},

From this example, we see that the limit points of exponential families can lie in the space,

_{i}_{∞}

The last illustrative example is from [

_{0} = 1;

Following these examples, we can consider the Hilbert space structure of exponential families inside the infinite simplex with the following results.

The spaces, ^{c}

_{π}

^{c}

_{i}^{c}

_{i}

Hence, this result illustrates the point above regarding the existence of “nice” geometric structure in the sense of Amari’s information geometry developed for finite dimensional exponential families. Infinite dimensional families have a richer structure; for example, they include the possibility of having an infinite Fisher information; see Examples 7 and 10.

The authors would like to thank Karim Anaya-Izquierdo and Paul Vos for many helpful discussions and the UK’s Engineering and Physical Sciences Research Council (EPSRC) for the support of grant number EP/E017878/.

All authors contributed to the conception and design of the study, the collection and analysis of the data and the discussion of the results. All authors read and approved the final manuscript.

The authors declare no conflict of interest.

Undirected graphical model showing the cyclic graph of order four.

The envelope of a set of linear functions. Functions, dashed lines; envelope, solid lines.

Attaching a two-dimensional example to the boundary of the simplex.

Using the Edgeworth expansion near the boundary of four-cycle model.

Spectrum of the Fisher information matrix of a discretised normal distribution.

Normalising constant for normal-Cauchy exponential mixing example.