Modeling Semiarid River–Aquifer Systems with Bayesian Networks and Artificial Neural Networks

In semiarid areas, precipitations usually appear in the form of big and brief floods, which affect the aquifer through water infiltration, causing groundwater temperature changes. These changes may have an impact on the physical, chemical and biological processes of the aquifer and, thus, modeling the groundwater temperature variations associated with stormy precipitation episodes is essential, especially since this kind of precipitation is becoming increasingly frequent in semiarid regions. In this paper, we compare the predictive performance of two popular tools in statistics and machine learning, namely Bayesian networks (BNs) and artificial neural networks (ANNs), in modeling groundwater temperature variation associated with precipitation events. More specifically, we trained a total of 2145 ANNs with different node configurations, from one to five layers. On the other hand, we trained three different BNs using different structure learning algorithms. We conclude that, while both tools are equivalent in terms of accuracy for predicting groundwater temperature drops, the computational cost associated with the estimation of Bayesian networks is significantly lower, and the resulting BN models are more versatile and allow a more detailed analysis.


Introduction
Rivers of small basins have a limited thermal capacity and are very sensitive to transient thermal disturbances [1], especially those originated from storm episodes. In semiarid areas, precipitations are characterized by a high space-time variability with strong stormy events [2], which affect the aquifer through water infiltration [3], causing groundwater temperature changes [4][5][6]. Groundwater temperature variations have an impact on physical, chemical, and biological processes, including solubility of gases in groundwater, geochemical reaction rates, microbial respiration rates or pathogen propagation [7][8][9]. Since groundwater is a common source of drinking water and irrigation supply, especially in semiarid areas, the identification of the type of precipitation is essential for the proper management of these water bodies, and becomes crucial regarding the fact that the extension of semiarid areas is likely to increase in the future [10].
Groundwater temperature is also a matter of study in other disciplines. For instance, from the renewable energy viewpoint, there is an increasing interest in groundwater temperature due to its potential use as heat pump systems [11,12]. Since the temperature of groundwater is relatively constant, it can be used as a source of geothermal energy for heating in winter and cooling in summer. Moreover, heat is a natural tracer broadly used to typify and quantify surface water-groundwater exchange due to its easiness at measuring [6,[13][14][15][16][17][18]. Many works started using water temperature measurements to assess the vertical flows [15,[19][20][21] combined with hydraulic data [22][23][24], hydrochemical [25,26], developed a Decision Support System based on Dynamic Bayesian Networks for assessing the impacts in an over-exploited aquifer system caused by different Climate Change scenarios. The authors of [53] used BNs to model extreme river discharges in Europe using only geographical properties of catchments; the proposed model was as accurate as other large-scale hydrological models in simulating mean annual maximum of daily discharges being able to create basic flood scenarios at ungauged locations. The authors of [54] improved the capability of BNs for uncertainty estimation by developing a framework for incorporating the uncertainties associated with parameters, input and structures into BNs, and evaluated its relevance in hydrologic forecasting. In addition, Ref. [55] improved the efficiency of the learning of the hierarchical BN by using an incremental data selection algorithm to update the training BN when modeling the flow rate values for small rivers.
In this work, we compare the efficiency of BNs and ANNs in modeling groundwater behaviors. In particular, we are interested in modeling the groundwater temperature variations associated with stormy precipitation episodes. To do so, we have carried out a comparative study, in which all possible configurations of ANNs with one to four layers, as well as a sample of ANNs with 5 layers, have been trained and their predictive performance compared in terms of 6 different metrics. In addition, we have trained and evaluated three different BNs using different structure learning algorithms: naive Bayes, tree augmented network and hill-climbing. The remaining of the paper is organized as follows. Section 2 is devoted to the methodological aspects of the paper. In particular, we describe the study area, data source and data pre-processing in Sections 2.1 and 2.2; we briefly introduce ANNs and BNs in Section 2.3; and propose scenarios of change in Section 2.4. The performance of the models is analyzed in Section 3, and the results discussed in Section 4. The paper ends with conclusions in Section 5.

Study Area and Data Description
The study area ( Figure 1) is located in the Andarax river (Southeastern Iberian Peninsula), in a temporal stream reach [56]. This area has some characteristics of a typical semiarid climate, including mild temperatures in the winter and long, dry and warm summers. Precipitation tends to be approximately 200 mm/year [57] and occurs in the form of convective storm systems [58,59], with up to 70% of the annual precipitation taking place over 25% of the rainy days [60]. The hydrology is conditioned by the special rainfall regime, which, as a Mediterranean river, is distinguished by rain and snow and has a swift hydrologic response [61]. The river flows over some Quaternary alluvial deposits (number 5 in Figure 1) that form a small and highly permeable aquifer [62]. A loamy and sandy-loamy formation number 6 in Figure 1 with low permeability [63] is situated on both sides of the alluvial materials [62,64], disconnecting them from the limestone and dolomites (number 8 in Figure 1) that make up the carbonated aquifer system of the Sierra de Gádor range [65]. Downstream is the Plio-Quarternary aquifer, made up of fluvial-deltaic, sandy-loamy conglomerates of continental origin (number 9 in Figure 1) [66] and in connection with the alluvial aquifer.
Since we are interested in modeling the groundwater temperature variations associated with stormy precipitation episodes, the variables considered for the model are those that change over time. Therefore, some other factors that influence groundwater behavior but are constant, such as the geological characteristics of the catchment, were not considered as they should not influence the groundwater temperature variations. The data used in this study were extracted from a control network made up of sampling points of air temperature and precipitation at two meteorological stations; surface water temperature and flow level at the gauging station; and groundwater temperature and piezometric level ( Figure 1, Table 1). All the measurements were obtained at hourly intervals from October 2015 to September 2017 ( Figure 2). Precipitation and water flow level data come from two meteorological stations and one gauging station of the Ministry of the Environment and Territorial Planning of the Regional Government of Andalusia, situated in the subsystem IV of the province of Almeria-River basin of the Andalusian Mediterranean Watersheds (Figure 1). For the piezometric level and temperature data, HOBO U20-001-02 pressure and temperature data loggers were used: two for air temperature, one in the river for surface water temperature, and another to measure the groundwater temperature (8 meters) ( Figure 1). The place where the groundwater data logger is installed is situated over the alluvial at a short distance from the surface water. Air and water temperature was measured in • C, groundwater and flow level in meters, and precipitation in mm.

Data Pre-Processing
Raw groundwater temperature data were used to compute the groundwater temperature change (DT), which represents the difference in temperature between two consecutive time steps. More precisely, given two consecutive time steps, t and t + 1, where T t+1 is the groundwater temperature at time t + 1 and T t is the groundwater temperature at time t. On the other hand, the remaining raw data measures were shifted based on the observed lag between the response variable (groundwater temperature change) and each predictive variable. The lags represent the delay between a change in a variable and its corresponding response in the groundwater temperature. In order to do that, the crosscorrelation between the response and the predictive variables was computed. Table 2 shows the observed lag, used to shift the time series. Table 2. Observed lag (in hours) between the groundwater temperature and the predictive variables.

Predictive Variable Lag (in Hours)
Groundwater After shifting the time series, the initial 17,352 observations were filtered to keep only those in which precipitations events occurred. This process removed nearly 98% of the instances, yielding a dataset of 383 observations.
Next, an exploratory analysis was carried out to find redundant variables. Since severe collinearity was found between variables T1 and T2 (with variance inflation factors equal to 14.31 and 15.43, respectively, correlation coefficient equal to 0.9626 and a p-value under 0.001 in the Bartlett's test of sphericity), a Principal Component Analysis was performed to obtain a new variable (T) that picks up the information of T1 and T2. To sum up, the data pre-processing yielded a dataset of 383 observations taking values over 7 variables. Figure 3 shows the histograms of the six predictive variables used to model the response variable (DT). Finally, since our goal is to determine when the temperature in the aquifer decreases (DT < 0), the variable DT was discretized into two classes (D = 'temperature drop' and N = 'no temperature drop'), with 100 and 283 observations in each class, respectively. We have used the raw (continuous) predictive variables to train the ANNs. On the other hand, given that the software we have used for estimating the BNs only handles discrete or Gaussian variables, and the predictive variables involved in this study are not Gaussian, we have discretized them into equal frequency interval classes by the function equal_ f req from the R package funModelling [67]. We have tried to split the variables in as many classes as possible, taking 40 interval classes for variables G, SWT and T and 16, 22 and 27 for PPT1, PPT2, and Q, respectively, since the high skewness of these variables causes many empty interval classes when they are split in more intervals.

Artificial Neural Networks and Bayesian Networks
An artificial neural network (ANN) is a structure formed of several layers with nodes arranged in each one. The first layer, called input layer, consists of the input variables for the problem. The last layer, called output layer, consists of the values predicted by the ANN, constituting the model output. Between the input and output layers, there are one or more hidden layers whose nodes are fully connected to those in the previous and next layers but not to those in the same layer. Figure 4 shows an example of an ANN with one hidden layer and three nodes in it.
Hidden Layer Output Layer The connection link between nodes i and j has an associated weight, W ij , which represents the connection strength between both nodes. Besides, each node has associated a threshold or bias that must be exceeded in order to activate it. If a node is activated, its output is computed as the value of a so-called activation function, with respect to the product of the vector of inputs and weights associated with the node, minus the threshold. The role of the activation function can range from simply activating the node to transforming input signals to output signals.
Once the number of layers and nodes in each layer is fixed, as well as the activation function to be used, the training process, or learning, consists of finding the optimal weights and thresholds which minimize the error in the ANN output. The activation function can be chosen from a wide range of functions (the most commonly used are the logistic, hyperbolic tangent, ReLu, and Softmax functions) and the training process from a set of learning algorithms available in the literature. However, the hardest task when deciding the architecture of the ANN is to decide the number of hidden layers and the number of nodes in each layer because there is no procedure to determine both attributes and, thus, they are usually determined ad hoc. The chosen number of hidden layers can affect significantly the performance of the ANN: too few nodes may lead to poor approximations whereas too many nodes are likely to overfit the training data.
Bayesian networks (BN) offer an alternative procedure to avoid this issue. A BN [46,68] is a statistical multivariate probabilistic graphical model described by two components: the structure of the network and the conditional distributions of each node given its parents. The structure of the network consists of a directed acyclic graph (DAG), where each variable is represented by a node and the presence of an edge linking two nodes indicates the existence of statistical dependence between them. Figure 5 shows an example of the structure of a BN with five variables. This structure allows to analyze which variables have an effect on the target variables without the need for numerical calculations since the network structure encodes the conditional independence relations between the variables according to the d-separation criterion [69]. Moreover, the network structure encodes a factorization over the joint distribution of all the variables, which, for the network in Figure 5 is Hence, a BN is fully specified just by giving a conditional distribution for each variable given its parents.
The structure of a BN can be learned by hand (from expert knowledge) or from data, using any structure learning algorithm. See [70] for a thorough review on structure learning algorithms and available software tools. A commonly used score-based algorithm is the Hill-Climbing (HC) search [71], which, briefly speaking, starts from a DAG with no edges and then adds, deletes, and reverses one at a time in order to increase the network score. For BN classifiers, one can also choose fixed or restricted structures, including the Naive Bayes (NB) [72,73] and Tree Augmented Network (TAN) [73] structures. The former is the simplest structure, in which the variable of interest (called the class) is the parent of the remaining predictive variables, which are independent of each other given the class (Figure 6, left). This independence assumption is relaxed in the TAN structure, since it expands the NB structure by allowing each predictive variable to have one more parent besides the class (Figure 6, right). In general, there are several possible TAN structures for a set of variables so, in order to choose among them, a maximum weight spanning tree containing the predictive variables is constructed, using the mutual information between the linked variables, conditioned on the class, as the weight of each edge [73]. To compare the performance of the classifiers based on ANNs and BNs, we have trained the two most commonly used feed-forward back-propagation neural networks: multilayer perceptron (MLP) [74] and radial basis function networks (RBFN) with the dynamic decay adjustment (DDA) algorithm [75]. The activation function used was also the most commonly used in the bibliography for binary classification, this is the logistic (or sigmoid) activation. Although the use of only one hidden layer is more common, to study in depth the classification skill of the ANNs, we have considered all the possible configurations of nodes from one up to four layers and, due to the high computational cost, a random sample of node configurations for ANN with five layers was drawn. In total, 2.145 ANNs were trained and evaluated. The R package RSNNS [76] was used to build the ANNs.
In the case of BN, we have chosen the simplest structure (NB), the tree-augmented network (TAN) structure and the hill-climbing (HC) algorithm for structure learning. The R package bnlearn [77] was used to learn the parameters from data, using the maximum likelihood estimate (MLE) and Bayesian parameter estimation, and to learn the network structure, in the case of TAN and HC. Classification is carried out by the Bayesian networks as a prediction task in which the predicted class c of the target variable DT is computed as the one maximizing the posterior distribution of DT given the observed values of the predictive variables: To evaluate the performance of the different classifiers, the k-fold cross-validation technique [78] was used, with k = 5, and as the accuracy criteria, we have computed the following measures:

Scenarios of Change
Of special interest is the analysis of different scenarios to study how some changes in the predictive variables influence the behavior of the response variable. The study area shows typical characteristics of a semi-arid climate, with precipitations being around 200 mm/year [57], reaching up to 700 mm/year in the highest peaks. Commonly, precipitations appear as stormy episodes [59], where 70% of the annual precipitation takes place over hardly 25% of the rainy days [60]. This kind of rainfall is usually associated with convective storm phenomena [58]. On the other hand, Mediterranean rivers have a character markedly associated with rainfall and snow, showing an immediate hydrological response [61]. The special precipitation regime directly conditions the hydrological regime in the study area.
Both, ANNs and BNs are able to obtain the predicted class c of the target variable given known values of the regressors. Moreover, BNs are able to return not only the predicted class but also the full probability distribution of a variable of interest thanks to its probabilistic nature, which allows to identify to which extent changes in the regressors have an impact on the response variable. In other words, once we know which variables (X E ) have a direct bearing on the target variable (X T ), the distribution p(x T |x E ) can be used as a measurement of the likelihood of X T given each possible scenario of X E . This probability distribution can be computed by means of algorithms that make use of the factorization encoded by the network structure [79,80]. There are a number of efficient exact and approximate inference algorithms, such as Variable elimination [81] or Penniless propagation [82], providing a powerful combination of predictive, diagnostic, and explanatory reasoning [83].
Based on the meteorological and hydrological characteristics of the study area, we are interested in the posterior probability distribution of DT when the precipitation events are abundant (i.e., PPT1 ≥ 5 mm/hour or PPT2 ≥ 5 mm/hour) and when they are poor (i.e., PPT1 < 1 mm/hour or PPT2 < 1 mm/hour). To do so, we learnt a new BN, with the predictive variables PPT1 and PPT2 being discretized into the values Low (L), for values in the interval class [0, 1), Moderate (M) for values in the interval class [1,5) and High (H) for values in the interval class [5,30). The hill-climbing algorithm was used to learn the model structure. The R package gRain [84] was used to compute the posterior probability distribution of the response variable.

Predictive Accuracy Comparison between ANNs and BNs
The goal of the built models is to classify the behavior of the groundwater, measured as the difference of temperature (DT) and labeled depending on whether the temperature decreases or not, between two consecutive time steps (hours). Table 3 displays the average metrics resulting from the cross-validation. In the case of ANN, it is shown the best configuration of nodes for each number of hidden layers. Given a certain number of hidden layers, the best configuration represents the configuration of nodes that yield the highest area under the ROC curve (AUC). This measure has been chosen to summarize the classification performance because it shows the ability of the model at distinguishing between classes. Among 590 randomly sampled MLP with five hidden layers, none of them could classify properly the positive class (D=Temperature drop) and neither could 50.08% of the MLP with four hidden layers (649 networks). This might be due to the limited sample size available for the classification task. In BNs, the best scores were obtained using the Bayesian parameter estimation for the Naive Bayes structure and the Maximum Likelihood parameter estimation for the HC structure. Table 3 shows high AUC values, indicating a good level of performance of all the models with exception of the TAN model. MLP with three hidden layers gives the best score in both accuracy and AUC, and BN-based models score higher than RBF in these two metrics. RBF and Naive Bayes classifiers attained the best G-mean. In general, ANN classifiers obtained lower results in RECALL and F-SCORE than BN classifiers, indicating that BN classifiers are more robust than ANN classifiers when a huge amount of data is not available.
An important advantage of BNs is that the computational cost is dramatically reduced, compared with ANNs. Besides, BNs do not require a specification of the number of layers and the configuration of nodes in each layer in advance. This is not an insignificant question because the precision of the classification can vary in an important way. For example, in our dataset, the G-MEAN of the classification given by the MLP with three layers and node configuration (6, 5, 5) is 0.850109032, whereas the same MLP, with four layers and node configuration (2, 1, 3, 2) yield a G-MEAN of 0.154010708; this is more than five times lower than the previous one. The variability between the other metrics can also be important depending on the configuration of the nodes: MLP with configuration (4, 2, 5) almost doubles the RECALL of the classifier MLP with configuration (3, 6) (0.96 and 0.578947368, respectively) and the FSCORE of the MLP with three hidden layers and node configuration (4, 2, 5) is 52.6% higher than the FSCORE of the MLP with two hidden layers and node configuration (3,6).
Classifiers based on RBF with DDA algorithm do not present this problem but classifiers based on BNs yield higher or similar values in all the performance metrics. Moreover, BNs allow to analyze to which extent changes in the predictive variables have an impact on the response variable thanks to the computation of the conditional probability of the target variable given its parents in the network as we show in the next subsection.

Evidence Propagation in Bayesian Networks
In this study, the quantitative component of the BNs was used to identify the drops in the temperature of the aquifer (variable DT) caused by changes in the precipitation. More specifically, we are interested in obtaining the posterior probability distribution of DT when the precipitation events are abundant, moderate and poor. To do so, we built a new BN, with variables PPT1 and PPT2 discretized in three intervals, instead of 16 and 22 intervals, respectively, used in the classification problem. We used the hill-climbing algorithm for the structure learning to analyze the relationships obtained between the variables. Figure 7 shows the DAG of the model learned to perform the probabilistic update. It can be noted that three out of the six predictive variables are not connected in the network structure, which means that they do not have any effect on the target variable. Therefore, only variable PPT2 was used as evidence to obtain the posterior probability distribution of DT. The other three predictive variables are connected to DT as in the naive Bayes structure. Therefore, they are independent of the other variables given the target variable, DT. Table 4 shows the performance metrics of this classifier. These metrics indicate a good performance of the classifier and are higher than those reported by HC in the comparison of BNs and ANNs (Table 3).  Table 5 and Figure 8 show the behavior of the groundwater temperature depending on the intensity of the precipitation: light precipitations (<1 mm) hardly cause a change in the temperature, whereas the probability of a drop in the groundwater temperature when the precipitation is moderate (1-5 mm) increases from 26.1% to 32%. When precipitations are greater than 5 mm/h at the meteorological station 2, the probability of a drop in the temperature of the aquifer is 83%.

Discussion
The rivers of the small basins have a limited thermal capacity and are very sensitive to transient thermal disturbances [1], revealing a vulnerability in the case of any disruption [85], especially with storm episodes that affect the aquifer through water infiltration [3]. The infiltration and flow of the river are directly associated with the precipitations, but the way in which they take place conditions the infiltration of water in the aquifer [4,5]. The arrival of the water to the aquifer causes temperature changes [6], with the most extreme temperatures being associated with intense water infiltration. It is necessary to understand this behavior, especially in areas of potentially high risk of waste discharge or presence of pollutants, since episodes of quick aquifer recharge may affect the dissolution, precipitation, and mobilization of substances, which could imply a potential risk in terms of groundwater pollution. The proposed methodology might be useful to anticipate these events, and can be applied to other aquifer systems (detritial and carbonate), whose results will depend on their environmental characteristics and particularities.
While the excellent performance of MLP as classifiers is unquestionable, BNs perform better at classifying the positive class (temperature drops), scoring much higher on RECALL (true positive rate) and F-SCORE than ANNs. Therefore, classifiers based on NB or HC structures are both precise and robust in the sense that they do not miss a significant number of instances, saving computational cost and avoiding the decision about the configuration of hidden layers and nodes. Furthermore, BNs offer two advantages to improve the understanding of the problem: the structure of the BN and the probability distribution of the target variable.
The structure of the BN offers a framework of the problem that allows the identification of the variables with an impact on the target variable through the statistical dependency denoted by the edges of the graph. In this sense, the structure of the BN in Figure 7 reveals that precipitations at the meteorological station 1 have no significant effect on the temperature of the groundwater. According to [86], small variations in temperature appear in alluvial groundwater due to precipitations, which become increased after storms. In our case, the use of the structure of the BN highlights the relevance of the distance between the rainfall episode and the aquifer.
On the other hand, the evidence propagation operation allows to determine which variables (or which values of the variables) have a significant impact on the target variable. The propagation analysis in our study shows that precipitation intensities above 5mm/h at the meteorological station 2 cause a large increase in the probability of a drop of temperature, which indicates a fast infiltration of water to the aquifer [87].

Conclusions
Bayesian networks constitute an appropriate methodology to reach a better understanding of the system and the relation between storm episodes and the groundwater. The experiments carried out in this paper show that, in general, both ANNs and BNs perform well as classifiers, with BNs showing better performance skills at classifying the positive class. Moreover, BNs show other advantages over ANNs. The computational cost associated with the estimation of BN models is significantly lower, and the resulting model is more versatile. In addition, the BN structure (in the case of HC) is informative about the predictive variables that have an impact on the target variable. Furthermore, BNs facilitate the analysis of different scenarios using the evidence propagation operation, where the likelihood of the outcomes resulting from instantiating the observable variables can be easily computed, which lets us assess to what extent changes in the values of the predictors modify the target variable. These features of the BN might be useful to ensure the proper management of resources.  Data Availability Statement: Data was requested to the regional Andalusian government through http://www.redhidrosurmedioambiente.es/saih/ (accessed on 4 October 2021).

Conflicts of Interest:
The authors declare no conflicts of interest.

Abbreviations
The following abbreviations are used in this manuscript: