Analyzing Uncertainty in Complex Socio-Ecological Networks

Socio-ecological systems are recognized as complex adaptive systems whose multiple interactions might change as a response to external or internal changes. Due to its complexity, the behavior of the system is often uncertain. Bayesian networks provide a sound approach for handling complex domains endowed with uncertainty. The aim of this paper is to analyze the impact of the Bayesian network structure on the uncertainty of the model, expressed as the Shannon entropy. In particular, three strategies for model structure have been followed: naive Bayes (NB), tree augmented network (TAN) and network with unrestricted structure (GSS). Using these network structures, two experiments are carried out: (1) the impact of the Bayesian network structure on the entropy of the model is assessed and (2) the entropy of the posterior distribution of the class variable obtained from the different structures is compared. The results show that GSS constantly outperforms both NB and TAN when it comes to evaluating the uncertainty of the entire model. On the other hand, NB and TAN yielded lower entropy values of the posterior distribution of the class variable, which makes them preferable when the goal is to carry out predictions.


Introduction
Socio-ecological systems (SESs) constitute an outstanding example of complex systems, where multiple social and ecological components interact with each other in space and time [1,2]. SESs are complex adaptive systems whose interactions might change as a response to external events or endogenous changes [3,4]. As a consequence, the state of the SES evolves to a new one to adapt to these changes [5]. This brings about challenges not only from the modeling perspective but also when it comes to making predictions and diagnosing problems. An example of such complex socio-ecological systems is cultural landscapes, which are the outcome of the interaction of humans and nature over time [6]. Cultural landscapes [7] are typically heterogeneous systems providing diverse ecosystem services as the result of a complex relationship between human cultural management and the ecosystem.
Furthermore, there is a strong relationship between cultural landscapes and the socio-economy [8][9][10] and this relationship must be appropriately modeled in order to make well founded decisions on, for instance, implementing suitable landscape conservation policies [9]. Traditional analysis methods have been applied to this problem [11][12][13] but they sometimes fail to capture the complexity of the cultural landscape elements, connections and cause-effect relations, specially when ecosystem services are taken into account [14].
Another key issue is handling the uncertainty in data and in the predictions made by the models. In this sense, Bayesian networks (BNs) [15], provide a sound approach for handling complex domains endowed with uncertainty. The underlying formalism for uncertainty treatment is probability theory, which entails to quantify the uncertainty associated with the decisions made from BNs using measures as, for instance, Shannon entropy [16].
BNs have been widely used in the last decade as a modeling tool in environmental problems in general [17] and in cultural landscapes applications in particular [18]. A recent example employs the so-called object-oriented Bayesian networks (OOBNs) which are basically a structured way of representing Bayesian networks taking advantage of repeated and hierarchical components [19] so that the modeling task is simplified [20].
In this paper, we analyze the resulting model uncertainty when complex socio-ecological systems are modeled using Bayesian networks. More precisely, we investigate the impact of different network structures on the value of Shannon entropy from an experimental point of view. This analysis is relevant for practitioners when making decisions, since less uncertain models are potentially more reliable when making predictions using the model.

Materials and Methods
From now on, we will use uppercase letters to denote random variables and lowercase letters to denote a value of a random variable. Boldfaced characters will be used to denote random vectors (i.e., multidimensional random variables). The set of all possible values of a random vector X (also called its support) is denoted as Ω X . A Bayesian network [15] with variables X = {X 1 , . . . , X n } is a directed acyclic graph with n nodes where each one corresponds to a variable in X. Attached to each node X i ∈ X, there is a conditional distribution of X i given its parents in the network, Pa(X i ), so that the joint distribution of random vector X factorizes as where pa(x i ) denotes a configuration of the values of the parents of X i . A simple example of a Bayesian network representing the joint distribution of variables X 1 , . . . , X 5 is shown in Figure 1. It encodes the factorization (2) From a modeling perspective, one advantage of Bayesian networks is that the induced factorization avoids the specification of large multivariate distributions that are replaced by a set of smaller ones, which are more easily specified, since the number of parameter is lower. For example, the factorization in Equation (2) replaces the specification of a joint distribution over 5 variables by the specification of 5 smaller distributions, each one of them with at most 3 variables. Another advantage is that the network structure describes the interaction between the variables in the model, in a way that can be easily interpretable.
One of the most successful areas of application of Bayesian networks is classification [21], which is a prediction task in which there is a discrete target variable C, called the class, whose value is to be forecasted from the values of a set of feature variables X = {X 1 , . . . , X n }. The predicted value c * of C is computed as the one that maximizes the posterior distribution of C given the observed values of the features, that is, Note that which means that solving the classification problem requires the specification of an n-dimensional distribution for X 1 , . . . , X n given C. The problem can be simplified by representing the joint distribution using a Bayesian network and taking advantage of the factorization encoded by its structure. The strongest simplification is achieved when the network is forced to adopt a naive Bayes (NB) structure, where the feature variables are assumed to be conditionally independent given the class.
The BN structure is depicted in Figure 2a. Adopting an NB structure actually means a strong independence assumption, but in practice it is compensated by the low number of parameters that need to be specified. Notice that, in this case the factorization results in meaning that n one-dimensional conditional distributions must be specified, instead of one n-dimensional conditional distribution. The independence assumption underlying NB models can be relaxed, resulting in more expressive models that still keep a reduced number of parameters. This is the motivation of the tree augmented network (TAN) structure [21], where each feature variable is allowed to have another feature as a parent, besides the class, as long as the resulting subgraph containing the features is a tree (i.e., it contains no directed cycles). An example of a TAN model is given in Figure 2b, corresponding to the factorization Given that there are multiple structures that one can choose when facing classification problems, ranging from NB to unrestricted Bayesian networks, a natural question is to know whether this choice has an impact on the performance of the classification model. This problem has been analyzed from the point of view of the accuracy of the classification model [21]. In this paper we are more interested in analyzing the impact of the model structure on the uncertainty over the predictions, which in this context can be evaluated as the uncertainty of the used Bayesian network.
After all, a Bayesian network represents a probability distribution and a well known approach to quantifying the uncertainty of a probability distribution is to use Shannon entropy [16]. The Shannon entropy of a discrete random variable X is Analogously, it can be defined over a random vector X = {X 1 , . . . , X n } as which in the case of a Bayesian network can be written as Particularly, for a Bayesian network with NB structure and variables X = {C, X 1 , . . . , X n }, the entropy can be computed as Shannon entropy is usually preferred to other entropies as a measure of uncertainty within the context of Bayesian networks due to its decomposability properties, which allow to efficiently compute it by taking advantage of the factorization of the distribution induced by the Bayesian network.

Experimental Analysis
In order to study the impact of the Bayesian network structure on the model uncertainty, we have conducted an experiment taking as a basis a Bayesian network that models a complex socio-ecological system. More precisely, we use the network described in [20]. It models the entire region of Andalusia (southern Spain) which contains a wide variety of scenarios from an ecological point of view.
We conducted two experiments: The goal of this experiment is to assess the impact of the Bayesian network structure on the entropy of the model. The starting point was the Bayesian network in [20], that will be referred to as Original BN. Its structure is displayed in Figure 3 and it gives an idea of the complexity of the described system. Out of Original BN, we generated samples of sizes ranging from 500 to 100,000. From each sample, we constructed 9 networks with NB structure, each one of them with a different class variable, 9 networks with TAN structure, with the same class variables as NB and 1 network where we imposed no restriction on its structure. NB and TAN networks were built using package bnlearn in R [22] while the other network was constructed using the greedy search (GSS) method implemented in Hugin (http://www.hugin.com).
Instead of computing the entropy of each of the obtained networks using Equations (9) and (10), we decided to estimate them. The reason is that a straight application of those formulas requires summing over a number of terms that grows exponentially with the number of variables. For instance, in the case of Original BN, that contains 75 variables, assuming that all of them had only 2 possible values, evaluating the entropy would require summing over 2 75 terms (approximately 3.8 × 10 22 ).
where x (r) j denotes the value of variable X j in the r-th element of the sample and pa(x (r) j ) is the value of the parent variables of X j in the r-th element of the sample.
Similarly, we estimated H NB (X) aŝ Note thatĤ BN (X) andĤ NB (X) are, respectively, unbiased estimators of H BN (X) and H NB (X). It can be easily proved taking into account that where E p denotes the expectation computed with respect to distribution ∏ n i=1 p(x i |pa(x i )). Therefore, H BN (X) is just the sample mean estimator of H BN (X), which is known to be unbiased. Likewise, H NB (X) is the sample mean estimator of H NB (X). Since both estimators are unbiased, their accuracy can be measured using their variance or equivalently, their standard deviation, as variance coincides with mean squared error for unbiased estimators.

Experiment 2
In this experiment we used the same networks as in Experiment 1. Then we generated three scenarios in the socio-ecosystem described by the Bayesian network. Each scenario corresponds to a particular configuration of values of some variables in the network. For each scenario, we computed the posterior distribution of the class variable-see Equation (4)-from each one of the nine networks in Experiment 1 and estimated the entropy of the posterior distribution as we describe next. The prior distribution of the class variable corresponds to the marginal distribution of variable C in the corresponding network in Experiment 1, without taking into account the data corresponding to the three scenarios analyzed here. This is equivalent to adopting a parametric empirical Bayes approach, where the parameters of the prior distribution are estimated by maximum likelihood. This is the usual way of approaching prediction problems with Bayesian networks when we have an initial sample with a high number of elements and without missing values. If we denote by q(c) the posterior distribution of the class variable for one particular scenario, then the entropy in this experiment is calculated as Note that in this case there is no need to estimate the entropy from the sample, as we only need to sum over the values of the class variable.

Results and Discussion
The results of Experiment 1 are reported in Figure 4. The dashed line corresponds to the Original BN, that constitutes the ground truth. The dots represent the estimated entropy values, while the bars centered on each point represent the standard deviation, and thus the accuracy of the estimated value. It can be seen how in this case the network with unrestricted structure (GSS), consistently outperforms both NB and TAN. In fact, the entropy of the GSS network converges to the exact one (Original BN) when the sample size increases. Focusing on the classification-oriented networks, the uncertainty is clearly lower (lower entropy) for TAN compared to NB. This comes to no surprise, as the structure of the NB is the most simple one and therefore it is more unlikely that it is able to capture the exact model accurately and this is reflected in the model uncertainty. In the case of NB and TAN, the increase in sample size does not lead to a reduction in the entropy. This is also consistent with the lack of ability to fit the right model of both structures, due to the independence assumptions.
With respect to Experiment 2, the results for the three scenarios considered is similar, as can be inferred from Figures 5-7. The comparison carried out in this experiment is more fair with respect to NB and TAN because it refers to prediction scenarios, in which case we are only interested in the distribution over the target variable and not the entire model. In the three scenarios, the entropy corresponding to NB and TAN, likewise GSS, also converges to the entropy of the class posterior distribution computed with the original network. For smaller sample sizes, the uncertainty of GSS is typically higher than the exact one, which is in-line with the result obtained in Experiment 1 for this network. However, the uncertainty of the class posterior obtained from NB and TAN structures is often below the entropy of the Original BN and, in general, clearly below the uncertainty obtained from GSS. The extreme case is the posterior of variable MCR in scenario 1 computed from NB (bottom left panel of Figure 5). The observed behavior of the analyzed models support the idea of using NB and TAN for classification instead of unrestricted Bayesian network structures. The fact that the uncertainty is lower means that the class posterior distribution is less smooth. In other words, it better discriminates the most probable value of the class, which is in fact the value that corresponds to the outcome of the prediction model, as seen in Equation (3). This is precisely the effect that is sought by NB and TAN, which are focused on being accurate in the predictions rather than in goodness of fit.

Conclusions
In this paper we have carried out two experiments analyzing the uncertainty in various Bayesian network structures representing complex environmental networks. More precisely, we have tested unrestricted structure, NB and TAN models representing a complex socio-economic system with 75 variables.
According to the results of the experiments, the conclusion is that, from the point of view of uncertainty, unrestricted structures are preferable when the goal is the representation of the entire complex system, that is, the full model. However, if the goal is to carry out predictions, then NB and TAN yield less uncertain results.