Information geometric complexity of a trivariate Gaussian statistical model

We evaluate the information geometric complexity of entropic motion on low-dimensional Gaussian statistical manifolds in order to quantify how difficult is making macroscopic predictions about a systems in the presence of limited information. Specifically, we observe that the complexity of such entropic inferences not only depends on the amount of available pieces of information but also on the manner in which such pieces are correlated. Finally, we uncover that for certain correlational structures, the impossibility of reaching the most favorable configuration from an entropic inference viewpoint, seems to lead to an information geometric analog of the well-known frustration effect that occurs in statistical physics.


Introduction
One of the main efforts in physics is modelling and predicting natural phenomena using relevant information about the system under consideration. Theoretical physics has had a general measure of the uncertainty associated with the behaviour of a probabilistic process for more than 100 years: the Shannon entropy [1]. The Shannon information theory was applied to dynamical systems and became successful in describing their unpredictability [2].
Along a similar avenue we may set Entropic Dynamics [3] which makes use of inductive inference (Maximum Entropy Methods [4]) and Information Geometry 2 Statistical models and information geometry complexity Given n real-valued random variables X 1 , . . . , X n defined on the sample space Ω with joint probability density p : R n → R satisfying the conditions p(x) ≥ 0 (∀x ∈ R n ) and R n dx p(x) = 1, let us consider a family P of such distributions and suppose that they can be parametrized using m real-valued variables (θ 1 , . . . , θ m ) so that P = {p θ = p(x|θ)|θ = (θ 1 , . . . , θ m ) ∈ Θ}, where Θ ⊆ R m is the parameter space and the mapping θ → p θ is injective. In such a way, P is an m-dimensional statistical model on R n . The mapping ϕ : P → R m defined by ϕ(p θ ) = θ allows us to consider ϕ = [θ i ] as a coordinate system for P. Assuming parametrizations which are C ∞ , we can turn P into a C ∞ differentiable manifold (thus, P is called statistical manifold) [5].
The values x 1 , . . . , x n taken by the random variables define the micro-state of the system, while the values θ 1 , . . . , θ m taken by parameters define the macro-state of the system.
Let P = {p θ |θ ∈ Θ} be an m-dimensional statistical model. Given a point θ, the Fisher information matrix of P in θ is the m × m matrix G(θ) = [g ij ], where the (i, j) entry is defined by with ∂ i standing for ∂ ∂θ i . The matrix G(θ) is symmetric, positive semidefinite and determines a Riemannian metric on the parameter space Θ [5]. Hence, it is possible to define a Riemannian statistical manifold M := (Θ, g), where g = g ij dθ i ⊗ dθ j (i, j = 1, . . . , m) is the metric whose components g ij are given by Eq. (1) (throughout the paper we use the Einstein sum convention).
Given the Riemannian manifold M = (Θ, g), it is well known that there exists only one linear connection ∇(the Levi-Civita connection) on M that is compatible with the metric g and symmetric [12]. We remark that the manifold M has one chart, being Θ an open set of R m , and the Levi-Civita connection is uniquely defined by means of the Christoffel coefficients where g kl is the (k, l) entry of the inverse of the Fisher matrix G(θ). The idea of curvature is the fundamental tool to understand the geometry of the manifold M = (Θ, g). Actually, it is the basic geometric invariant and the intrinsic way to obtain it is by means of geodesics. It is well-known, that given any point θ ∈ M and any vector v tangent to M at θ, there is a unique geodesic starting at θ with initial tangent vector v. Indeed, within the considered coordinate system, the geodesics are solutions of the following nonlinear second order coupled ordinary differential equations [12] with τ denoting the time. The recipe to compute some curvatures at a point θ ∈ M is the following: first, select a 2-dimensional subspace Π of the tangent space to M at θ; second, follow the geodesics through θ whose initial tangent vectors lie in Π and consider the 2dimensional submanifolds S Π swiped out by them inheriting a Riemannian metric from M; finally, compute the Gaussian curvature of S Π at θ, which can be obtained from its Riemannian metric as stated in the Theorema Egregium [13]. The number K(Π) found in such manner is called the sectional curvature of M at θ associated with the plane Π. In terms of local coordinates, to compute the sectional curvature we need the curvature tensor, For any basis (ξ, η) for a 2-plane Π ⊂ T θ M, the sectional curvature at θ ∈ M is given by [12] K(ξ, η) = R(ξ, η, η, ξ) where R is the Riemann curvature tensor which is written in coordinates as R = R ijkl dθ i ⊗ dθ j ⊗ dθ k ⊗ dθ l with R ijkl = g lh R h ijk and ·, · is the inner product defined by the metric g.
The sectional curvature is directly related to the topology of the manifold; along this direction the Cartan-Hadamard Theorem [13] is enlightening by stating that any complete, simply connected n-dimensional manifold with non positive sectional curvature is diffeomorphic to R n .
We can consider upon the statistical manifold M = (Θ, g) the macro-variables θ as accessible information and then derive the information dynamical equation (3) from a standard principle of least action of Jacobi type [3]. The geodesic equations (3) describe a reversible dynamics whose solution is the trajectory between an initial and a final macrostate θ initial and θ final , respectively. The trajectory can be equally traversed in both directions [10]. Actually, an equation relating instability with geometry exists and it makes hope that some global information about the average degree of instability (chaos) of the dynamics is encoded in global properties of the statistical manifolds [7]. The fact that this might happen is proved by the special case of constant-curvature manifolds, for which the Jacobi-Levi-Civita equation simplifies to [7] where K is the constant sectional curvature of the manifold (see (5)) and J is the geodesic deviation vector field. On a positively curved manifold, the norm of the separating vector J does not grow, whereas on a negatively curved manifold, the norm of J grows exponentially in time, and if the manifold is compact, so that its geodesic are sooner or later obliged to fold, this provide an example of chaotic geodesic motion [14]. Taking into consideration these facts, we single out as suitable indicator of dynamical (temporal) complexity, the information geometric complexity defined as the average dynamical statistical volume [15] vol D where vol D det(G(θ)) dθ, with G(θ) the information matrix whose components are given by Eq. (1). The integration space D where θ k ≡ θ k (s) with 0 ≤ s ≤ τ ′ such that θ k (s) satisfies (3). The quantity vol D (geodesic) Θ (τ ′ ) is the volume of the effective parameter space explored by the system at time τ ′ . The temporal average has been introduced in order to average out the possibly very complex fine details of the entropic dynamical description of the system's complexity dynamics. Relevant properties, concerning complexity of geodesic paths on curved statistical manifolds, of the quantity (8) compared to the Jacobi vector field are discussed in Ref. [16].

The Gaussian statistical model
In the following we devote our attention to a Gaussian statistical model P whose element are multivariate normal joint distributions for n real-valued variables X 1 , . . . , X n given by where µ = E(X 1 ), . . . , E(X n ) is the n-dimensional mean vector and C denotes the n × n covariance matrix with entries Since µ is a n-dimensional real vector and C is a n × n symmetric matrix, the parameters involved in this model should be n + n(n+1)

2
. Moreover C is a symmetric, positive definite matrix, hence we have the parameter space given by Hereafter we consider the statistical model given by Eq. (10) when the covariance matrix C has only variances σ 2 i = E(X 2 i ) − (E(X i )) 2 as parameters. In fact we assume that the non diagonal entry (i, j) of the covariance matrix C equals ρσ i σ j with ρ ∈ R quantifying the degree of correlation.
We may further notice that the function f ij (x) := ∂ i log p(x|θ)∂ j log p(x|θ), when p(x|θ) is given by (10), is a polynomial in the variables x i (i = 1, . . . , n) whose degree is not grater than four. Indeed, we have that and, therefore, the differentiation does not affect variables x i . With this in mind, in order to compute the integral in (1), we can use the following formula [17] 1 where the exponential denotes the power series over its argument (the differential operator).

The monovariate Gaussian statistical model
We now start to apply the concepts of the previous section to a Gaussian statistical model of Eq.(10) for n = 1. In this case, the dimension of the statistical Riemannian manifold M = (Θ, g) is at most two. Indeed, to describe elements of the statistical model P given by Eq. (10), we basically need the mean µ = E(X) and variance σ 2 = E(X − µ) 2 . We deal separately with the cases when the monovariate model has only µ as macro-variable (Case 1), when σ is the unique macro-variable (Case 2), and finally when both µ and σ are macro-variables (Case 3).

Case 1
Consider the monovariate model with only µ as macro-variable by setting σ = 1. In this case the manifold M is trivially the real flat straight line, since µ ∈ (−∞, +∞). Indeed, the integral in (1) is equal to 1 when the distribution p(x|θ) ; so the metric is g = dµ 2 . Furthermore, from Eqs.
since this quantity must be positive we assume A 1 , A 2 > 0. Finally, the behaviour of the IGC (7) is This shows that the complexity linearly increases in time meaning that acquiring information about µ and updating it, is not enough to increase our knowledge about the micro state of the system.

Case 2
Consider now the monovariate Gaussian statistical model of Eq.(10) when µ = E(X) = 0 and the macro-variable is only σ. In this case the probability distri- while the Fisher-Rao metric becomes Emphasizing that also in this case the manifold is flat as well, we derive the information dynamics by means of Eqs. (2) and (3) and we obtain the geodesic Again, to have positive volume we have to assume This shows that also in this case the complexity linearly increases in time meaning that acquiring information about σ and updating it, is not enough to increase our knowledge about the micro-state of the system.

Case 3
The take home message of the previous cases is that we have to account for both mean µ and variance σ as macro-variables to look for possible non increasing complexity. Hence, consider the probability distribution function is given by, The dimension of the Riemannian manifold M = (Θ, g) is two, where the parameter space Θ is given by Θ = {(µ, σ)|µ ∈ (−∞, +∞), σ > 0} and the Fisher-Rao metric reads as g = 1 Here, the sectional curvature given by Eq. (5) is a negative function and despite the fact that is not constant, we expect a decreasing behaviour in time of the IGC. Thanks to Eq. (2), we find that the only non negative Christoffel coefficients are Γ 1 12 = − 1 σ , Γ 2 11 = 1 2σ and Γ 2 22 = − 1 σ . Substituting them into Eq.(3) we derive the following geodesic equations The integration of the above coupled differential equations is non-trivial. We follow the method described in [10] and arrive at where σ 0 and A 1 are real constants. Then, using (17), the volume of Eq.
Since the last quantity must be positive, we assume A 1 > 0. Finally, employing the above expression into Eq. (7) we arrive at We can now see a reduction in time of the complexity meaning that acquiring information about both µ and σ and updating them allows us to increase our knowledge about the micro state of the system.
Hence, comparing Eqs. (13), (14) and (19) we conclude that the entropic inferences on a Gaussian distributed micro-variable is carried out in a more efficient manner when both its mean and the variance in the form of information constraints are available. Macroscopic predictions when only one of these pieces of information are available are more complex.

Bivariate Gaussian statistical model
Consider now the Gaussian statistical model P of the Eq. (10) when n = 2. In this case the dimension of the Riemannian manifold M = (Θ, g) is at most four. From the analysis of the monovariate Gaussian model in Section 3.1 we have understood that both mean and variance should be considered. Hence the minimal assumption is to consider E(X 1 ) = E(X 2 ) = µ and E(X 1 − µ) 2 = E(X 2 − µ) 2 = σ 2 . Furthermore, in this case we have also to take into account the possible presence of (micro) correlations, which appear at the level of macro-states as off-diagonal terms in the covariance matrix. In short, this implies considering the following probability distribution function where ρ ∈ (−1, 1).
Thanks to Eq. (12) we compute the Fisher-Information matrix G and find g = g 11 dµ 2 + g 22 dσ 2 with, The only non trivial Christoffel coefficients (2) are Γ 1 12 = − 1 σ , Γ 2 11 = 1 2σ(ρ+1) and Γ 2 22 = − 1 σ . In this case as well, the sectional curvature (Eq. (5)) of the manifold M is a negative function and so we may expect a decreasing asymptotic behaviour for the IGC. From Eq.(3) it follows that the geodesic equations are, whose solutions are, Using (23) in Eq. (8) gives the volume, To have it positive we have to assume A 1 > 0. Finally, employing (24) in (7) leads to the IGC, with ρ ∈ (−1, 1). We may compare the asymptotic expression of the ICGs in the presence and in the absence of correlations, obtaining where "strong" stands for the fully connected lattice underlying the micro-variables. The ratio R strong bivariate (ρ) results a monotonic increasing function of ρ. While the temporal behaviour of the IGC (25) is similar to the IGC in (19), here correlations play a fundamental role. From Eq. (26), we conclude that entropic inferences on two Gaussian distributed micro-variables on a fully connected lattice is carried out in a more efficient manner when the two micro-variables are negatively correlated. Instead, when such micro-variables are positively correlated, macroscopic predictions become more complex than in the absence of correlations.
Intuitively, this is due to the fact that for anticorrelated variables, an increase in one variable implies a decrease in the other one (different directional change): variables become more distant, thus more distinguishable in the Fisher-Rao information metric sense. Similarly, for positively correlated variables, an increase or decrease in one variable always predicts the same directional change for the second variable: variables do not become more distant, thus more distinguishable in the Fisher-Rao information metric sense. This may lead us to guess that in the presence of anticorrelations, motion on curved statistical manifolds via the Maximum Entropy updating methods becomes less complex.

Trivariate Gaussian statistical model
In this section we consider a Gaussian statistical model P of the Eq. (10) when n = 3. In this case as well, in order to understand the asymptotic behaviour of the IGC in the presence of correlations between the micro-states, we make the minimal assumption that, given the random vector X = (X 1 , X 2 , X 3 ) distributed according to a trivariate Gaussian, then E(X 1 ) = E(X 2 ) = E(X 3 ) = µ and E(X 1 − µ) 2 = E(X 2 − µ) 2 = E(X 2 − µ) 2 = σ 2 . Therefore, the space of the parameters of P is given by Θ = {(µ, σ)|µ ∈ R, σ > 0}.
The manifold M = (Θ, g) changes its metric structure depending on the number of correlations between micro-variables, namely, one, two, or three . The covariance matrices corresponding to these cases read, modulo the congruence via a permutation matrix [17],

Case 1
First, we consider the trivariate Gaussian statistical model of Eq. (10) when C ≡ C 1 . Then proceeding like in Section 3.2 we have g = g 11 dµ 2 +g 22 dσ 2 , where g 11 = 3+ρ (1+ρ)σ 2 and g 22 = 6 σ 2 . Also in this case we find that the sectional curvature of Eq. (5) is a negative function. Hence, as we state in Section 2, we may expect a decreasing (in time) behaviour of the information geometry complexity. Furthermore, we obtain the geodesics where A(ρ) = A 2 1 (3+ρ) 6(1+ρ) and A 1 ∈ R. We remark that A(ρ) > 0 for all ρ ∈ (−1, 1). Then, the volume (8) requiring A 1 > 0 for its positivity. Finally, using (29) in (7) we arrive at the asymptotic behaviour of the IGC vol D Comparing (30) in the presence and in the absence of correlations yields where "weak" stands for low degree of connection in the lattice underlying the micro-variables Notice that R weak trivariate (ρ) is a monotonic increasing function of the argument ρ ∈ (−1, 1).

Case 2
When the trivariate Gaussian statistical model of Eq.(10) has C ≡ C 2 , the condition C > 0 constraints the correlation coefficient to be ρ ∈ (− √ 2 2 , √ 2 2 ). Proceeding again like in Section 3.2 we have g = g 11 dµ 2 + g 22 dσ 2 , where g 11 = 3−4ρ (1−2ρ 2 )σ 2 and g 22 = 6 σ 2 . The sectional curvature of Eq. (5) is a negative function as well and so we may apply the arguments of Section 2 expecting a decreasing in time of the complexity. Furthermore, we obtain the geodesics where A(ρ) = We have to set A 1 > 0 for the positivity of the volume (33), and using it in (7) we arrive at the asymptotic behaviour of the IGC vol D Then, comparing (34) in the presence and in the absence of correlations yields where "mildly weak" stands for a lattice (underlying micro-variables) neither fully connected nor with minimal connection. This is a function of the argument ρ ∈ (− 2 ) it tends to zero.

Case 3
Last, we consider the trivariate Gaussian statistical model of the Eq. (10) when C ≡ C 3 . In this case, the condition C > 0 requires the correlation coefficient to be ρ ∈ (− 1 2 , 1). Proceeding again like in Section 3.2 we have g = g 11 dµ 2 + g 22 dσ 2 , where g 11 = 3 (1+2ρ)σ 2 and g 22 = 6 σ 2 . We find that the sectional curvature of Eq. (5) is a negative function; hence, we may expect a decreasing (in time) behaviour of the complexity. It follows the geodesics where A(ρ) = A 2 1 2(1+2ρ) and A 1 ∈ R. We note that A(ρ) > 0 for all ρ ∈ (− 1 2 , 1). Using (36), we compute Also in this case we need to assume A 1 > 0 to have positive volume. Finally, substituting Eq. (37) into Eq. (7), the asymptotic behaviour of the IGC results The comparison of (38) in the presence and in the absence of correlations yields where "strong" stands for a fully connected lattice underlying the (three) microvariables. We remark the latter ratio is a monotonically increasing function of the argument ρ ∈ (− 1 2 , 1).
The non-monotonic behavior of the ratio R mildly weak trivariate (ρ) in Eq. (35) corresponds to the information geometric complexities for the mildly weak connected threedimensional lattice. Interestingly, the growth stops at a critical value ρ peak = 1 2 at which R mildly weak trivariate (ρ peak ) = R strong bivariate (ρ peak ). From Eq. (26), we conclude that entropic inferences on three Gaussian distributed micro-variables on a fully connected lattice is carried out in a more efficient manner when the two micro-variables are negatively correlated. Instead, when such micro-variables are positively correlated, macroscopic predictions become more complex that in the absence of correlations. Furthermore, the ratio R strong trivariate (ρ) of the information geometric complexities for this fully connected three-dimensional lattice increases in a monotonic fashion. These conclusions are similar to those presented for the bivariate case. However, there is a key-feature of the IGC to emphasize when passing from the two-dimensional to the three-dimensional manifolds associated with fully connected lattices: the effects of negative-correlations and positive-correlations are amplified with respect to the respective absence of correlations scenarios, where ρ ∈ (− 1 2 , 1). Specifically, carrying out entropic inferences on the higher-dimensional manifold in the presence of anti-correlations, that is for ρ ∈ − 1 2 , 0 , is less complex than on the lower-dimensional manifold as evident form Eq. (40). The vice-versa is true in the presence of positive-correlations, that is for ρ ∈ (0, 1).

Concluding remarks
In summary, we considered low dimensional Gaussian statistical models (up to a trivariate model) and have investigated their dynamical (temporal) complexity. This has been quantified by the volume of geodesics for parameters characterizing the probability distribution functions.
We uncover that in order to have a reduction in time of the complexity one has to consider both mean and variance as macro-variables. This leads to different topological structures of the parameter space in (11); in particular, we have to consider at least a 2-dimensional manifold in order to have effects such as a power law decay of the complexity. Hence, the minimal hypothesis in a multivariate Gaussian model consists in considering all mean values equal and all covariances equal. In such a case, however, the complexity shows interesting features depending on the correlation among micro-variables (as summarized in Fig.1). For a trivariate model with only two correlations the information geometric complexity ratio exhibits a non monotonic behaviour in ρ (correlation parameter) taking zero value at the extrema of the range of ρ. In contrast to closed configurations (bivariate and trivariate models with all micro-variables correlated each other) the complexity ratio exhibits a monotonic behaviour in terms of the correlation parameter. The fact that in such a case this ratio cannot be zero at the extrema of the range of ρ is reminiscent of the geometric frustration phenomena that occurs in the presence of loops [11]. Specifically, recall that a geometrically frustrated system cannot simultaneously minimize all interactions because of geometric constraints [11,18]. For example, geometric frustration can occur in an Ising model which is an array of spins (for instance, atoms that can take states ±1) that are magnetically coupled to each other. If one spin is, say, in the +1 state then it is energetically favorable for its immediate neighbours to be in the same state in the case of a ferromagnetic model. On the contrary, in antiferromagnetic systems, nearest neighbor spins want to align in opposite directions. This rule can be easily satisfied on a square. However, due to geometrical frustration, it is not possible to satisfy it on a triangle: for an antiferromagnetic triangular Ising model, any three neighboring spins are frustrated. Geometric frustration in triangular Ising models can be observed by considering spin configurations with total spin J = ±1 and analyzing the fluctuations in energy of the spin system as a function of temperature. There is no peak at all in the standard deviation of the energy in the case J = −1, and a monotonic behaviour is recorded. This indicates that the antiferromagnetic system does not have a phase transition to a state with long-range order. Instead, in the case J = +1, a peak in the energy fluctuations emerges. This significant change in the behaviour of energy fluctuations as a function of temperature in triangular configurations of spin systems is a signature of the presence of frustrated interactions in the system [19].
In this article, we observe a significant change in the behaviour of the information geometric complexity ratios as a function of the correlation coefficient in the trivariate Gaussian statistical models. Specifically, in the fully connected trivariate case, no peak arises and a monotonic behaviour in ρ of the information geometric complexity ratio is observed. In the mildly weak connected trivariate case, instead, a peak in the information geometric complexity ratio is recorded at ρ peak ≥ 0. This dramatic disparity of behaviour can be ascribed to the fact that when carrying out statistical inferences with positively correlated Gaussian random variables, the maximum entropy favorable scenario is incompatible with these working hypothesis. Thus, the system appears frustrated.
These considerations lead us to conclude that we have uncovered a very interesting information geometric resemblance of the more standard geometric frustration effect in Ising spin models. However, for a conclusive claim of the existence of an information geometric analog of the frustration effect, we feel we have to further deepen our understanding. A forthcoming research project along these lines will be a detailed investigation of both arbitrary triangular and square configurations of correlated Gaussian random variables where we take into consideration both the presence of different intensities and signs of pairwise interactions (ρ ij = ρ ik if j = k, ∀i).