On Fisher Information and the Spatial Dependence Structure of Isotropic Pairwise Gaussian-Markov Random Field Models

Markov Random Field (MRF) models are powerful tools for contextual modelling in the study of complex systems. However, little is known about how the spatial dependence between their elements is encoded in terms of information-theoretic measures. In this paper, we enlight the connection between Fisher information, Shannon entropy and spatial properties of the random ﬁeld in case of Gaussian random variables (a Gaussian Markov random ﬁeld, or GMRF), by deﬁn-ing analytical expressions to compute local and global versions of these measures using Besag’s pseudo-likelihood function (conditional independence assumption). The proposed expressions provide analytical tools for the analysis of contextual patterns in data, allowing among other things, the detection of the most informative ones. Besides, we show that from a statistical inference perspective these measures are directly related to the uncertainty in the estimation of the global system behavior by means of the asymptotic variance of the maximum pseudo-likelihood estimator of the spatial dependence parameter. Moreover, the results indicate that the accuracy on the estimation of the global behavior on GMRF’s (inverse temperature) depends essentially on two quite intuitive conditions: concentration of patterns with high local log-likelihood value (minimization of type-I Fisher information), which means patterns “aligned” to the expected global behavior, and also concentration of patterns showing high local log-likelihood curvature (maximization of type-II Fisher information), which means that small perturbations on data cannot cause abrupt changes on the system’s spatial dependence structure (stability). Inspired by these ﬁndings we deﬁned L-information , a measure calculated by the ratio of the ﬁrst two derivatives of the log-pseudo-likelihood function regarding the spatial dependence parameter on a MRF. Computational experiments with both simulated and real data show the eﬀectiveness of the proposed equations in practical applications involving spatially dependent random variables.


INTRODUCTION
With the increasing value of information in the modern society and the massive volume of digital data that is available nowadays, there is an urgent need of developing novel methodologies for knowledge discovery from the vasts sets of simbols found in several domains of science. In this scenario, where the focus is on data itself, complex systems arise as a powerful framework for a new model of data analysis and interpretation. Patterns that at first may appear to be locally irrelevant may turn out to be extremely informative in a more global perspective. Hence, the concept of what is an informative pattern is a top priority in dealing with a complex system, where often its components (elements, patterns and clusters) relate in a non-linear fashion making the whole greater than the sum of its parts, since there is relevant information encoded in these complex relationships.
Within this context, information theoretic measures play a fundamental role in a huge variety of applications once they represent statistical knowledge in a sistematic, elegant and formal framework. Since the first works of Shannon [1], and later with many other generalizations [2][3][4], the concept of entropy has been adapted and successfully applied to almost every field of science, among which we can cite physics [5], mathematics [6][7][8], economics [9] and fundamentally, information theory [10][11][12]. Similarly, the concept of Fisher information has been shown to reveal important properties of statistical procedures, from lower bounds on estimation methods [13][14][15] to information geometry [16,17]. Basically, Fisher information can be thought as the likelihood analog of entropy, which is a probabilitybased measure of uncertainty.
In general, classical statistical inference is focused on capturing information about location and dispersion of unknown parameters of a given family of distribution and studying how this information is related to uncertainty in estimation procedures. In typical situations, exponential family of distributions and independence hypothesis (independent random variables) are often assumed, giving the likelihood function a series of desirable mathematical properties [13][14][15].
Although mathematically convenient, in many applications such as image processing, contextual data mining, and complex systems analysis, independence assumption may not be reasonable [18,19], because much of the information is somehow encoded in the relations between the random variables. In this scenario, Markov Random Field (MRF) models appear as a natural generalization of the classical model by the simple replacement of the independence assumption by the conditional independence assumption. Roughly speaking, in every MRF, knowledge of a finite-support neighborhood aroung a given variable isolates it from all the remaining variables. A further simplification is to consider a pairwise interaction model, constraining the size of the maximum clique to be two. Moreover, if the MRF model is isotropic, which means the spatial dependence parameter is not dependent on the directions, all information regarding the spatial dependence structure of the system is conveyed by a single parameter, from now on denoted by β (also known as inverse temperature).
In this paper, we assume an isotropic pairwise Gaussian Markov Random Field (GMRF) model [20,21] (also known as auto-normal model or conditional auto-regressive model [22,23]). Basically, the question that motivated this work and we are trying to elucidate here is: What kind of information is encoded by the β parameter in such a model? We want to know how this parameter, and consequently the whole spatial dependence structure of a complex system modelled by a Gauss Markov random field, is related to both local and global information theoretic measures, more precisely the observed and expected Fisher information as well as self-information and Shannon entropy.
In searching for answers for our fundamental question, investigations led us to an exact expression for the asymptotic variance of the maximum pseudo-likelihood (MPL) estimator of the spatial dependence parameter on a pairwise GMRF model as a ratio of two forms of Fisher information, indicating that asymptotic efficiency is not granted, once information equality fails. An approximation for the asymptotic variance of the spatial dependence parameter using the observed Fisher information has been proposed in [24]. Here, however, we use the expected Fisher information as it appears on the Cramer-Ráo lower bound.
To the best of our knowledge, closed expressions for the expected Fisher information in the GMRF model have not been derived before. In the context of statistical data analysis, Fisher information plays a central role in providing tools and insights for modeling the interactions between complex systems and their components. The advantage of MRF models over the traditional statistical ones is that MRF's take into account the dependence between pieces of information as a function of the parameter that controls the temperature of the system, which may be variable through time. Briefly speaking, this research aims to explore ways to measure and quantify distances between complex systems in different thermodynamical conditions by analyzing and comparing the behavior of local patterns that can be observed in a regular 2D lattice and by determining how informative these patterns are regarding that particular inverse temperature (β).
The remaining of the paper is organized as follows: Section 2 discusses maximum pseudolikelihood (MPL) estimation and provides derivations for the expected Fisher information regarding the spatial dependence parameter β using both first and second derivatives of the pseudo-likelihood function on a pairwise isotropic GMRF model. Intuitive interpretations for these two measures are also discussed. In Section 3 we show an expression for the global entropy on a GMRF model (a probability-based counterpart to Fisher information), given by the expected value of self-information, a local uncertainty measure based on the observation of a contextual configuration pattern defined by a Markovian neighborhood.
The results suggest a connection between maximum pseudo-likelihood and minimum entropy conditions on a MRF. Section 4 presents an exact expression for the asymptotic variance of the MPL estimator of β as a ratio of both forms of Fisher information, showing that accuracy on the estimation of β depends essentially on the massive presence of contextual patterns satisfying two intuitive conditions: high local likelihood (minimization of one form of Fisher information) and stability, that is, the local log-likelihood is not flat, which means that small variations on data cannot cause abrupt changes on the spatial dependence structure (maximization of another form of Fisher information). Section 6 shows some illustrative computational experiments using Markov Chain Monte Carlo simulations and some results on detecting and extracting relevant information from GMRF outcomes, represented here by noisy images. Finally, Section 7 presents the conclusions, final remarks and possibilities for future works.

FISHER INFORMATION ON PAIRWISE GMRF'S
The remarkable Hammersley-Clifford theorem [25] states the equivalence between Gibbs Random Fields (GRF) and Markov Random Fields (MRF), which implies that any MRF can be defined either in terms of a global (joint Gibbs distribution) or a local (set of local conditional density functions) model. For our purposes, we will choose the later representation.
Let X = {x 1 , x 2 , . . . , x n } be a set of gaussian random variables defined on a rectangular lattice and η a non-causal neighborhood system. A GMRF is completely characterized by a set of n (number of variables) local conditional density functions (LCDF's), given by [26]: where θ = (µ, σ 2 , β) is the vector of parameters, with µ and σ 2 denoting the mean (expected value) and variance, β denoting the spatial dependence parameter (inverse temperature) and η i representing the neighbohood around the i-th random variable in the field. It is interesting to note that for β = 0, the expression degenerates to a Gaussian density. From an information geometry perspective [16,17], it means we are constrained to a sub-manifold within the Riemmanian manifold of probability distributions, where the natural Riemmanian metric (tensor) is given by the Fisher information. It has been shown that the geometric structure of exponential family distributions exhibit constant curvature. However, little is known about information geometry on more general statistical models, such as Gaussian Markov Random Fields.

Maximum Pseudo-Likelihood Estimation
Maximum likelihood estimation is intractable in MRF parameter estimation due to the existence of the partition function in the joint Gibbs distribution. An alternative, proposed by Besag [22], is maximum pseudo-likelihood estimation, which is based on the conditional independence principle. The pseudo-likelihood function is defined as the product of the LCDF's for all the n elements of the complex system, modelled as a random field. So, for a GMRF model, the log pseudo-likelihood function is defined by the following: By differentiating equation (2) with respect to each parameter and properly solving the pseudo-likelihood equations one obtains the following MPL estimators: where k denotes the cardinality of the non-causal neighborhood set η i . Furthermore, assuming a regular neighborhood system such as the 2D lattice (each variable is dependent on a fixed number of neighboring variables), we can rewriteβ M P L as: where σ ij denotes the covariance between the central variable x i and a variable x j belonging to a neighborhood system (typical choices are first or second order systems, which correspond to the four and eight nearest neighbors on a rectangular bidimensional lattice) and σ jk denotes the covariance between two variables belonging to the neighbohood of x i . Note that if β = 0, the MPL estimators of both µ and σ 2 become the widely known sample mean and sample variance.

Fisher information of spatial dependence parameters
Basically, Fisher information measures the amount of information the observation of a random variable conveys about an unknown parameter. It can be thought as the likelihood analog of entropy, which is a probability-based measure of uncertainty. Often, when we are dealing with independent and identically distributed (i.i.d) random variables, the computation of the global Fisher Infomation presented in a random sample X = {x 1 , x 2 , . . . , x n } is quite straighforward, since each observation x i , i = 1, 2, . . . , n, brings exactly the same amout of information. However, this is not true for spatial dependence parameters in MRF's, since different configuration patterns provide distinct contributions to the local observed Fisher information, which can be used to derive a reasonable approximation to the global Fisher information [27].

Observed Fisher information
Considering a complex system modelled by a GMRF defined by a set of LCDF's (eq. 1), the observed Fisher information can be calculated in terms of the pseudo-likelihood equation as (we will recall this measure type-I observed Fisher information): and it can be estimated by a sample average, justified by the Law of Large Numbers: where p(x i |η i , θ) is the LCDF of the Markovian model. Thus, φ β is an unbiased estimator of the observed Fisher information, that is, I , making φ β a good approximation to I (1) obs (β). Replacing equation (1) in (8) and after some manipulations, a closed expression for the type-I observed Fisher information, φ β , in the pairwise GMRF model is given by the following: Note that φ β is simply an average of the local observed Fisher information along the random field. Thus, we can think of φ β (x i ), for i = 1, 2, . . . , n, as being the information that a particular contextual pattern provides as contribution to the global observed Fisher information. In this sense, the observed Fisher information is explicitly defined in terms of local measures. Note the similarity between φ β (x i ) and self-information, −log p(x i ).
Basically, the main difference is that while the former is based on the likelihood, the latter is based on the probability.
Alternatively, one can compute the observed Fisher information by the negative of the second derivative (we will recall this measure type-II global Fisher information): which, after some basic algebra, results in the following approximation: Note that while φ β is a function of β, ψ β does not depend explicitly on the spatial dependence parameter. Once again, ψ β is the average of another local Fisher information measure, ψ β (x i ), along the entire random field.
Therefore, with these two local measures, φ β (x i ) and ψ β (x i ), we can assign two information values to every element of a complex system modelled by a MRF. Note that the same measures can be derived for other models, such as the Potts model [28]. promissing results [29,30]. In other words, Ψ β measures the degree of agreement among the neighboring elements of the pattern themselves. These rather informal arguments defines the basis for understanding the meaning of the asymptotic variance of maximum pseudo-likelihood estimators, as it will be discussed in the next Sections. Basically, ψ β (x i ) is a measure of how sure or confidente we are about the local spatial dependence structure (at a given point x i ), since a high average curvature is desired for predicting the system's global spatial dependence structure (better accuracy of β MPL estimator).

Expected Fisher information
Unlike the observed Fisher information, the expected Fisher information is strictly a global measure. It is defined by the expected value of the squared score function. In our approach, we replace the likelihood function (the intractable joint Gibbs distribution) by the pseudo-likelihood function in the definition of the score function: or equivalently, in classical inference (exponential family i.i.d random variables), by taking the expectation of the negative of the second derivative, which can be interpreted as being an average curvature: In the following, closed-form expressions for both Φ β and Ψ β on the GMRF model are presented, showing that, in general, information equality Φ β = Ψ β fails. Pluging equation (2) in (12) and after some algebra, we obtain the following expression, which is composed by three main terms: Expanding the first term of the previous expression gives us: According to the Isserlis' theorem [31], for normally distributed random variables, we can compute higher order moments in terms of the covariance matrix through the following identity: which finally lead us to: We now proceed to the expansion of the second main term of (14). By expanding the square, multiplying the summations and using the identity in (16), we have: Finally, the thrid term of (14) is given by: Therefore, combining all the parts, the complete expression for Φ β (Fisher information for the pairwise GMRF spatial dependence parameter given by the square of the score function) is given by the following equation: At this moment some observations can be pointed out. First, the summation involving the variable i may be interpreted in two ways: in a temporal aspect, when we are observing the outcomes of a random pattern along time (a fixed position of the lattice is assumed to produce several outcomes); or in a spatial aspect, when the patterns are arranged in a 2D lattice, where a moving window is used to extract the observed patterns from the entire lattice at a given time. Thus, the proposed expression allows us to model and analyze different kinds of data: spatial, temporal or even spatiotemporal. In case of static observations, like an image (or possibly a random regular graph) in which each vertex represents a Gaussian random variable, all possible observed patterns extracted with a moving window are treated as outcomes of a single random variable at a fixed time instant (in this case we are interested only in studying spatial correlations). In this case, the summation over n in the previous equation vanish, since we have n = 1. From now on, we will consider this specific situation where all the patterns in a given time instant t are outcomes of a single random variable, in order to estimate the spatial correlations σ ij and σ kl , given a neighborhood system.
Note also that in this situation all information concerning the system's global dependence structure (i.e., the information conveyed by β, the inverse temperature) at a time instant t, can be extracted directly from the covariance matrix of the contextual configuration patterns defined by the neighborhood system (as if each pattern were a vector sampled from a multivariate Gaussian density). In this sense, the proposed expression shows an explicit connection between the covariance matrix of the patterns and the system's global behavior or inverse temperature (in other words, in a GMRF model, temperature is a quantity that is dependent only on second order statistics, as it would be expected).
Thus, to make computations easier, it is possible to rewrite equation (20) using Kronecker products, vectors and submatrices extracted from the covariance matrix of the contextual patterns, from now on denoted by Σ p . For example, considering a given neighborhood system (K-nearest-neighbors), the general form of Σ p is given by: where A + is the simple summation of all the entries of matrix A (not to be confused with a matrix norm) and ⊗ denotes the Kronecker product. From an information geometry perspective, the presence of tensor products indicates the intrinsic differential geometry of a manifold in the form of the Riemman curvature tensor [16].
Following the same methodology, a closed form expression for Ψ β can also be obtained using the second derivative of the pseudo-likelihood function. Pluging equation (2) into (13), leads us to the following expression. Note that unlike Φ β , Ψ β does not depend explicity on β (inverse temperature) (Φ β is a quadratic function of the spatial dependence parameter).
Assuming the observed data as the outcome of a complex system at a given time instant t (n = 1), Ψ β can be rewritten as: Looking at the derived Fisher information expressions we note that a trivial condition for information equality, Φ β = Ψ β , is β = 0 and σ ij = 0, ∀j, which are, essentially, equivalent conditions to express no correlation between the MRF variables. Formally, the condition for information equilibrium (Φ β = Ψ β ) is achieved when: Solving this simple quadratic equation lead us to the following values for β * : Experiments and empirical analysis with MCMC simulations shown in Section 6 reveal an interesting relationship between Φ and Ψ, indicating, among other things, that they can be useful in extracting relevant information from data, estimating and predicting the behavior of a dynamical system.
Before the experimental simulations, the next Section discusses another important information-theoretic connection which is the relation between Fisher information, Shannon entropy and the inverse temperature parameter (β) in pairwise GMRF's.

ENTROPY ON PAIRWISE GMRF'S
In the classical situation of independent and identically distributed random variables X 1 , X 2 , . . . , X n , the global Shannon entropy H is given in terms of the expected value of the self-information, H(X i ) = E[I(X)], where I(X) = −log(p(X i )), by simply multiplying H(X i ) by the sample size n. In this Section we derive an expression for the global entropy by considering the pseudo-likelihood function. Considering a pairwise GMRF model, the expression for the global entropy of n spatially dependent observations of a random variable is given by the following: After some algebra we have: Analogous to the previous expressions, assuming a matrix-vector notation, we have: where H G denotes the entropy of a Gaussian random variable.
Note that, Shannon entropy is a quadratic function of the spatial dependence parameter β. Since the coefficient of the quadratic term is strictly non-negative (it is the expected Fisher information), entropy is a convex function of β. Also, as expected, when β = 0, the resulting expression is the traditional equation for the entropy of a Gaussian random variable. Thus, calculating the value β M H that minimizes this form of entropy, leads to: showing that maximum pseudo-likelihood and minimum-entropy estimation are essentially equivalent in GMRF models. Moreover, using the derived equations we have the relation between Φ β , Ψ β and H β as: where the functional ∆ β ( ρ, Σ p ) that represents the difference between Φ β and Ψ β is given by equation (25). These equations relate the entropy and one form of Fisher information (Ψ β ) in GMRF models, showing that Ψ β can be roughly viewed as the curvature of H β . In this sense, in an information equilibrium condition Ψ β = Φ β = 0, the entropy's curvature is null. Since Ψ β is strictly positive (Fisher information), β M H is actually a critical point. The results also suggest that an increase in the value of Φ β , which means stability (a measure of how the neighboring variables of a given point agree in their behavior), contributes to curve, and therefore to cause a change in the entropy of the system.

Experiments with both Markov Chain Monte Carlo simulation and real image data show
that entropy can be quite useful in detecting the appearance of informative patterns. More details can be found in Section 6. In the next Section, we will move forward to see how Fisher information is also related to measuring the uncertainty in the estimation of the inverse temperature parameter, β, which determines the global system behavior, by means of the asymptotic variance of maximum pseudo-likelihood estimators. Asymptotic evaluations uncover the most fundamental properties of a mathematical procedure, providing a powerful and general tool for statistical analysis. In this Section we derive an expression for the asymptotic variance of the MPL estimator of the pairwise GMRF spatial dependence parameter β. It is known from the statistical inference literature that both ML and MPL estimators share two important properties: consistency and asymptotic normality [32,33], making it possible to completely characterize their behavior in the limiting case. In other words, β M P L ≈ N β M P L , υ β , where υ β denotes the asymptotic variance. It is known that the asymptotic covariance matrix of MPL estimators is given by [34]:

ASYMPTOTIC VARIANCE OF MPL ESTIMATORS
with where H and J denotes the Jacobian and Hessian matrices regarding the log pseudolikelihood function, respectively. Thus, in the uniparametric case we have the following definition for the asymptotic variance υ β : Noting that the expected value of the derivative of the log pseudo-likelihood equation is zero: we finally get the resulting expression for the asymptotic variance ofβ M P L as the ratio between Φ β and Ψ 2 β , given by: Using the matrix-vector notation and considering the observed data as outcomes of a single random pattern, the above equation is simplified to: Note that, since υ β = Φ β /Ψ 2 β , if we had equivalence between the two forms of Fisher information, that is, Φ β = Ψ β = 0, the expression for υ β would be simplified to the traditional Cramer-Ráo lower bound. The interpretation of this equation indicates that the accuracy in β estimation depends essentially on two main factors: 1) minimization of Φ β , which means the variance of the local log-likelihood functions is close to zero (concentration of patterns that are "aligned" to the expected global behavior, that is, patterns showing a high degree of agreement between the central element and its neighbors) and 2) maximization of Ψ β , which essentially means that, in average, the local log-likelihood functions are not flat, that is, small variations on patterns cannot cause abrupt changes in the spatial dependence structure and therefore in the expected global behavior (that is, concentration of stable patterns showing a high degree of agreement among the neighboring elements themselves). Finally, once we cannot obtain a unique lower bound in terms of Fisher information, since information equality fails, it is not possible to conclude that the estimator is asymptotic efficient, as it would happen if we had independent observations (maximum-likelihood estimation).

Analysis of local Fisher information
To illustrate the application of both forms of local observed Fisher information, φ β and ψ β , we generated some synthetic images by using the Metropolis-Hastings algorithm with a white noise gaussian field as initialization. Figure 2 shows an example of input and an valid outcome for a second order (8 nearest neighbors) pairwise isotropic GMRF model using the following parameters setttings: µ = 0, σ 2 = 5 and β = 0.125. The number of iterations used in the MCMC simulation was 1000.
Three Fisher information maps were generated from the resulting synthetic image. The first one was obtained by calculating the value of type-I local Fisher information (equation 9 for n = 1), φ β , for every element of the system. Similarly, the second one was obtained by using type-II local Fisher information (equation 11 for n = 1), ψ β . For the last one, we used the ratio of type-I and type-II observed Fisher information, motivated by the fact that boundaries are mostly composed by patterns that are not expected to be "aligned" to the global behavior (high φ β ) and also are somehow unstable (low ψ β ). We call this measure, φ β /ψ β , L-information, since it is defined in terms of the two first derivatives of the local Log-PsudoLikelihood function. Figure 3 shows the visual results. Note that wihle φ β has a strong response for boundaries, ψ β has a weak one, an evidence in favor of considering L-information in detection procedures.
The same experiment was repeated for grayscale images corrupted by additive gaussian noise. It is known from image processing and computer vision literatures that the problem Another result concerns the Boat image, which is known to have a complex structure due to the presence of fine details, thin lines and some texture. Basically, the same methodology described in the previous experiment was adopted here. Figure 5 shows the obtained results.
Again, a reasonable amount of relevant information could be extracted with the proposed method in comparison with the Laplacian and Canny techniques.
To illustrate how entropy in GMRF models can be used to quantify and and measure variability in data, we show an example using some grayscale images. For this experiment we would like to know which of the four images -Baboon, Lena, Cameraman and a texture piece -contains higher entropy in terms of a GMRF model, that is, which one of them has more variation of contextual patterns. Figure 7 shows the values of Fisher information and entropy for each image. The results indicate that the Baboon image has the lowest entropy in comparison to the others and the texture piece has the highest value. This fact is possibly due to the presence of strong edges and some fine details making the distribution of patterns more uniform, defining a more unstable configuration. Finally, to show how entropy is related to the distribution of patterns along the GMRF we compared its value for 4 versions of the lena image from a very blurred one (low variability of contextual patterns) to a very noisy one (high variability of contextual patterns). Figure   6 shows the obtained results. As expected the entropy of the noisy images is higher.

Analysis of expected Fisher information and entropy
In order to study the behavior of both forms of the expected Fisher information and the entropy in GMRF models, a sequence of valid GMRF outcomes was generated by the Metropolis-Hastings algorithm. The purpose of this experiment is to observe what happens to Φ β , Ψ β and H β when the system evolves from a random initial state to other configurations.
To simulate a system in contact with a reservoir of heat that can control the temperature (or in other words the parameter β) we defined the value of β in the current iteration as where the next value of β will be defined as the current one minus ∆β while β > 0. This process is repeated for a fixed number of iterations during the MCMC simulation. As a result of this algorithm, a sequence of GMRF samples is produced. We used this sequence to calculate Φ β , Ψ β and H β . Figure 8 shows some of the system's configurations along a MCMC simulation.
A plot of both forms of the expected Fisher information, Φ β and Ψ β , in each iteration of the MCMC simulation is shown in Figure 9. The graph produced by this experiment show some interesting results. First of all, regarding upper and lower bounds on these quantities, it is possible to note that when there is no induced spatial dependence structure (β ≈ 0), Φ β = Ψ β , showing the information equilibrium condition. This information equilibrium means that the variables are statistically independent in the sense that when we observe configuration patterns, they all have the same amount of information so it is hard to find different kinds of patterns: informative and non-informative (since they all behave in a similar way and there is no informative pattern to highlight). Moreover, in the information equilibrium, Ψ β reaches its lower bound indicating that this condition happens when the system is most susceptible to a global change in the sense that modification of the behavior of a small subset of patterns may guide the system to a different path of evolution. The graph also shows that the difference between Φ β and Ψ β is maximum when the system has large values of β, that is, when organization emerges and the system shows a strong dependence between its elements with clear visible clusters and boundaries. In such state, it is expected that the majority of patterns be aligned to the global behavior, which causes the appearance of few but highly informative patterns: those defining the boundaries between different regions. Also, the simulations suggest that it takes more time for the system to go from the information equilibruim state to organization than the opposite. From a statistical inference perspective, according to the asymptotic variance of β's MPL, the uncertainty on the estimation of the inverse temperature is minimized when the system is more organized (for larger values of β), which is somehow intuitive since stability (Ψ β ) is higher and the universe of possible values for the temperature of the system is restricted (it is bounded by a critical value). Finally, the results suggest that both Φ β and Ψ β are bounded by a superior value related to the size of the neighborhood system (with the minimum possible value being zero, obviously).
Despite showing a similar behavior, entropy can have larger values in comparison to both Φ β and Ψ β . Besides, from the graphs it is possible to note that knowledge of both Fisher information forms allows us to predict an increase in H β . According to our experiments, the results suggest that a necessary condition to increase the entropy of a GMRF model, is a divergence in the values of Φ β and Ψ β , that is, the information equilibrium condition cannot prevail. Looking at Figures 9 and 10 we can see that Φ β and Ψ β start to diverge around the iteration 100 while entropy start to grow a little bit later, around the iteration 170.
Another interesting measure is the global L-information, that is, the ratio between the two forms of expected Fisher information, Φ β /Ψ β . An interesting property of this measure is that its values are always bounded by the [0, 1] interval. With this single measurement it is possible to estimate and predict the global system behavior as shows Figure 11. A value close to one indicates the system is approximating the information equilibrium and a value close to zero indicates the system tends to a very organized and cohesive behavior (segmented), with clear informative patterns (boundaries).
To show the intrinsic non-linear relation between Φ β , Ψ β and H β , we plotted the trajectories of the system's state along the MCMC simulation in the information space defined by each one of these quantities. Each configuration of the system at a given iteration becomes a 3-D vector I β = (Φ β , Ψ β , H β ). Figure 12 shows the results. It is clear from the graph that which causes entropy to increase is a given combination of values of Φ β and Ψ β , more FIG. 11. Evolution of L-information along a MCMC simulation where the parameter β is modified in order to control the system's behavior. When L-information approaches one the system tends to information equilibrium. For values close to zero, we have stable, organized and informative configuration patterns coexisting throughout the system (different classes of patterns are observable). precisely when they start to diverge, that is, a decrease in the L-information value. Hence, the empirical analysis suggests that entropy and L-information are inversely proportional, as show Figure 13. Note also that, from a differential geometry perspective, the majority of the points of the parametric curve (those showing higher curvature values) are concentrated in the two main regions of the space: A) around the information equilibrium condition (desorganization) and B) around the maximum entropy value, where the divergence between the information values are maximum (organization emerges). The few points in the trajectory connecting these two regions possibly represent the system going through a phase transition.
Its properties change rapidly and and in an assimetric way (the path from state A to state B is different from the path from state B to state A).
Finally, to show the influence of the system's global dependence structure on the asymptotic variance of the inverse temperature MPL estimator (uncertainty about the β parameter) we show a curve of its variances for different values of this parameter. As expected, the uncertainty is higher in states close to the information equilibrium since there is no relevant information in the observed contextual patterns (the patterns tend to behave sim- by (Φ β , Ψ β ) and (Φ β , Ψ β , H β ) along a MCMC simulation. The graph shows a parametric curve obtained by varying the β parameter from 0 to 0.15 and back, repeatedly. Note that, from a differential geometry perspective, as the divergence between Φ β and Ψ β increases, the torsion of the parametric curve becomes evident (the curve leaves the plane) ilarly). Figure 14 illustrates this behavior with the evolution of β (starting in 0 and with fixed increments until a maximum value is obtained and back) and β * (given by equation 26) along a MCMC simulation. It is interesting to mention that for larger values of β, the corresponding information equilibrium value β * is lower than β, but not null (greater than zero). In our MCMC simulations, the measured values of β * were always less than 0.0577, indicating an upper bound for the observation of the information equilibrium condition.
During the experiments we verified that, given an observed global configuration at a time instant t, a less abrupt method to perturb the system's behavior in order to induce small changes and control the evolution of some components of the system is to set β = β * instead of zero directly. Figures 15 and 16 shows the evolution of Φ β and Ψ β after the system is disturbed by setting β = 0 and β = β * for the next consecutive five iterations of the MCMC simulation, returning to its original value in sequence. When the system is disturbed by making β null the simulations indicate that the system is not successful in recovering components from its previous stable configuration (note that Φ β and Ψ β clearly touch one another in the graph). When the same perturbation is applied but using the smallest of the two β * values (minimum solution of equation 26), after a short stabilization period, the system is able to recover parts of its previous stable state. This result suggests that this kind of perturbation is not enough to remove all the information encoded within the spatial dependence struture of the configuration patterns, preserving some short and long range correlations in data but slightly remodeling large cluster and removing smaller ones. simulation. The uncertainty about the spatial dependence parameter increases as the system tends to information equilibrium since it is hard to extract information from the observed patterns.
depicted in the graph of Figure 15) while the second row indicates that the system was able to evolve to a stable configuration that is quite similar the previous one, recovering most of the original components (regarding the simulation depicted in the graph of Figure   16). Future works may include a detailed comparison on the evolution of complex systems considering two situations: normal evolution or evolution after an induced perturbation.
This would allow us to study how similar or not the systems will behave in a future time, both in terms of cluster structure similarity (their major components remain similar) and information metrics.

CONCLUSIONS AND FINAL REMARKS
In this paper we addressed the problem of characterizing the spatial dependence structure of a pairwise isotropic Gaussian Markov Random Field defined on a lattice by means of information theoretic measures. Analytical expressions for observed and expected Fisher information regarding the spatial dependence parameter on a GMRF model were derived using the pseudo-likelihood function, elucidating the connection between the inverse temperature parameter β and the covariance matrix of contextual patterns presented in data.
FIG. 15. Variation on Φ β and Ψ β after the system is disturbed by a change on the value of β to zero along a MCMC simulation. Note that Φ β and Ψ β touch one another indicating that no residual information is kept as if the simulation had been restarted from a random configuration.
Intuitive physical interpretations for these information-theoretic quantities were discussed in the context of MRF models. However, to allow the computation of these measures, a proper β parameter estimative is required. Maximum Pseudo-Likelihood estimators for the GMRF model parameters were derived, indicating thatβ M P L is completely defined in terms of spatial correlation coefficients. Using the same methodology, an expression for the Shannon entropy of a GMRF was derived, showing its relation with Fisher information and maximum pseudo-likelihood estimation. Finally, using the derived expressions for the Fisher information, an exact expression for the asymptotic variance ofβ M P L in the GMRF model to β * along a MCMC simulation. Note that the perturbation is not enough to remove all the information within the spatial patterns allowing the system to recover a significant part of its original configuration.

ACKNOWLEDMENTS
The author would like to thank CNPQ (Brazilian Council for Research and Development) for the finantial support through the research grant number 475054/2011-3.