1. Introduction
One of the most important sources of water supply for industrial, drinking, and irrigation purposes is groundwater (GW). GW has a significant role in economic development, environmental management, and ecosystem sustainability [
1,
2]. However, in recent years undue exploitation has caused a tremendous pressure on GW resources, resulting in GW crisis [
3]. As a result, the GW level (GWL) in different regions of the world has been decreasing rapidly. Further, widespread pollution of surface water is severely affecting GW. A decrease in GWL can also be caused by climate factors and can lead to a number of eco-environmental problems [
4]. For proper water resources management, particularly effective utilization and sustainable management of groundwater resources, accurate and reliable prediction of GWL is essential [
5,
6]. Thus, it is necessary to predict the Ardebil groundwater level for water resources management. Mathematical models incorporating GW dynamics are applied to predict GWL for optimizing groundwater use, optimal management, and development of conservation plans [
5,
7]. Since such models are costly, time-consuming, and data-intensive, their use in practice is limited because of data-scarcity [
8,
9]. In such cases, when geological and hydro-geological data are insufficient, soft computing models become an attractive option [
10]. Artificial neural network (ANN), adaptive neuro-fuzzy interface (ANFIS), genetic programming (GP), support vector machine (SVM), and decision tree models are among the important soft computing models that are suited for modeling dynamic and uncertain nonlinear systems [
7].
Recently, soft computing models have been widely used worldwide to predict GWL. Jalal Kameli et al. [
11] evaluated neuro-fuzzy (NF) and ANN models to estimate GWL using rainfall, air temperature, and GWLs in neighboring wells, and showed that the NF model performed better than the ANN model. Identifying the lag time of time series for observed rainfall by correlation analysis, Trichakis et al. [
12] used the ANN model to predict GWL and found the ANN model to be useful to model Karst aquifers that are difficult to simulate using numerical models. Using evaporation, rainfall, and water levels in observation levels as input, Fallah-Mehdipour et al. [
13] applied the ANFIS and genetic programming models for predicting GWL and showed that GP decreased the value of mean root square error (RMSE) compared to the RMSE by the ANFIS. Moosavi et al. [
14] evaluated the ANN, ANFIS-wavelet, and ANN-wavelet models and showed that predicted GWL was more accurate for 1 and 2 months ahead than for 3 and 4 months ahead. Predicting GWL in the Bastam plain by ANFIS and ANN models in Emamgholizadeh et al. [
15] study confirmed that if the water shortage of the aquifer remained equal to the pumping rate of water from wells, the minimum reduction of GWL occurred. Suryanarayana et al. [
16] proposed a hybrid model integrating the SVM model with the wavelet transform and indicated that the SVM-wavelet model was more accurate in predicting GWL. Using rainfall, pan evaporation, and river stage as input, Mohanty et al. [
17] indicated that the ANN model was better using shorter lead times for GWL predictions than the larger lead times. Yoon et al. [
18] demonstrated that the SVM model was superior to the ANN model in predicting GWL. Zho et al. [
19] found that the wavelet-SVM model was better than the wavelet-ANN model for modelling GWL. Comparing ANN and autoregressive integrated moving average (ARIMA), Choubin and Malekian [
20] showed that the ARIMA model was more accurate than ANN in modelling GWL. Das et al. [
21] found ANFIS to be better than ANN for predicting GWL.
Literature review shows that although soft computing models are capable for predicting groundwater level, they have weaknesses and uncertainties [
22]. The ANN models have different parameters, such as weight connections, bias, and need training algorithms to fine-tune their parameters. ANFIS and SVM models have nonlinear and linear parameters and use different kinds of training algorithms, such as backpropagation algorithm, descent gradient method, etc. However, the standard training algorithms have two major defects: slow convergence and getting trapped in local optima [
22]. Recently, nature-based optimization algorithms have been developed for finding the appropriate values of model parameters to improve ANN, ANFIS, and SVM models. Jalalkamali and Jalalkamali [
23] applied a hybrid model of ANN and genetic algorithm (ANN-GA) to find the best number of neutrons for the hidden layer and predict GWL in an individual well. Mathur [
24] applied hybrid SVM-PSO (particle swarm optimization) model for predicting GWL in Rentachintala region of Andhra Pradesh, India, where optimal parameters of SVM were determined using PSO. Results showed that SVM-PSO was more accurate than the ANN, ANFIS, and ARMA models. Hosseini et al. [
25] hybridized ANN and ant colony optimization (ACO) to predict the GWL in Shabestar plain, Iran, and found that the hybrid ANN-ACO model reduced overtraining errors. Zare and Koch [
26] demonstrated that the hybridized wavelet-ANFIS model was superior in modelling GWL to other regression models. Balavalikar et al. [
27] found that the hybrid ANN-PSO model was better in predicting monthly GWL of Udupi district, India, than the classical ANN model. Malekzadeh et al. [
28] evaluated ANN, wavelet extreme machine learning (WEML), SVM, wavelet-SVM, and wavelet-ANN for predicting GWL, and concluded that WEML was more accurate. These studies reveal that hybrid models are more accurate and efficient than single models in predicting GWL and it is inferred from these studies that meta-heuristic optimization algorithms are superior to the classical ones, but require uncertainty analysis for artificial intelligence models.
New hybrid intelligent optimization models can be regarded as appropriate alternative methods with an acceptable range of error for predicting GWL. Among the nature-inspired optimization algorithms, the grasshopper optimization algorithm (GOA) is a novel and robust meta-heuristic method that mimics the swarming behavior of grasshoppers in nature. The GOA is a multi-solution-based algorithm during the optimization process to avoid higher local optima and has high convergence ability toward the optimum [
29]. It has different functions than other optimization algorithms that enable it to find the best optimal solution in the search space with high probability. Therefore, this algorithm escapes from local optima and finds the global optimum in the search space. This capability is considered as an advantage of GOA [
30] and as reason for the selection of GOA for the current study. Several researchers used GOA for monthly river flow [
31], soil compression coefficient [
32], coefficients of sediment rating curve [
33], and concrete slump [
34], but the uncertainty analysis and GWL modeling has not yet been studied.
These models have some drawbacks in the previous studies that are addressed in the current paper. These models are robust tools for modeling many of the nonlinear hydrologic processes such as rainfall-runoff, stream flow, and ground-water level. Despite the wide application of soft computing models, few studies have investigated the capability of novel optimization algorithms, such as GOA integrated with typical predictive methods, for GWL prediction, uncertainty evaluation, and spatial variation modeling. The main problem in developing these models is the using of an appropriate training procedure. Especially, AI tend to be very data intensive in training stage, and there appears to be no established methodology for design and successful implementation of training procedure and error minimizations. Therefore, there are still some questions about AI tools that must be further studied, and important aspects such as local trapping, uncertainty analysis of results, uncertainty due to meta-heuristic optimization algorithms in training, spatial changes modelling with hybrid models must be explored further. Based on the best knowledge of the authors, no published papers exist that evaluate the uncertainty of different meta-heuristic optimizations for groundwater level prediction in hybridization with ANN, ANFIS, and SVM. The main contribution and novelty of the present study is comparative uncertainty analysis of the novel hybrid models, spatial changes modelling by considering PCA as appropriate input selection in regard to uncertainty results. Despite the wide application of soft computing models, few studies have investigated the capability of novel optimization algorithms, such as GOA integrated with typical predictive methods, for GWL prediction, uncertainty evaluation, and spatial variation modeling. The state-of-art models, including ANN, ANFIS, and SVM, have been employed to predict GWL, but these models are easily trapped in local optima and often need longer training times. Hence, the main contribution of this study is to develop and to assess the applicability of hybrid ANFIS-GOA, SVM-GOA, and ANN-GOA models for predicting monthly GWL and uncertainty of results in Ardabil basin in Iran. Application of GOA method integrated with ANN, ANFIS, and SVM models is useful to search the best numerical weights of neurons and bias values. The other objectives of this paper were to (1) compare the GOA with different optimization algorithms of particle swarm (PSO), weed algorithm (WA), cat algorithm (CA), and genetic algorithm (GA); (2) evaluate the uncertainty of the hybridized models for predicting monthly GWL; (3) use principal component analysis to select the appropriate input combinations from time-series data up to 12-month lag; (4) modeling spatial variation of GWL by using hybrid intelligence models results in geospatial analysis.
2. Materials and Methods
2.1. Case Study and Data
The Ardebil plain, with the area of 990 km
2, is located in the northwest of Iran between latitudes 38′3° and 38′27 and the longitudes of 47′55° and 48′20° (
Figure 1). The average annual rainfall is 304 mm. The hottest month in this plain is May and the driest month is July. The average annual temperature is 9 °C. In Ardebil plain, groundwater supplies water for drinking, agricultural, and industrial purposes. There is a negative balance of about 550 million m
3 in the Ardebil aquifer. The GWL decreases by 20–30 cm per year, which is the fastest decline. The Ardebil plain has 89 villages, that use groundwater for agricultural uses. The current condition of the GWL in the Ardebil plain has negative impacts on the farmers as its main users. In this study, the following parameters were used as the input to the hybrid ANN, ANFIS, and SVM models. Then, the principal component analysis was used to select the best input combination up to 12-month lag.
where,
is the GWL at month t,
is the 1-month lagged H,
is the 2-month lagged H,
is the 3-month lagged H, and
is the 12-month lagged H. The data of 140 months (2000 (January)–2012 (September)) were selected for the current study. A total of 20% of the data set was used for testing, and 80% of the data set was used for the training, that were selected randomly. Nine observed wells (wells 6, 9, 10, 24, 11, 4, 7, 8, and 1) were used to provide the spatiotemporal variation of GWL for different months. Each piezometer had 140 monthly data points. The measurements were made one time during each month.
2.2. ANFIS Model
The ANFIS model uses fuzzy interface systems which use fuzzy if-then rules to construct a predictive model. The ANFIS model has been widely used for predicting rainfall [
33], temperature [
34], runoff [
35], evaporation [
36], and sediment load [
37].
Figure 1 shows the structure of the ANFIS model in the framework of the study. The square nodes and circle nodes show the adaptive and fixed nodes, respectively. The ANFIS model has five layers [
38]. (1) The inputs are fuzzified in the first layer whose nodes are constant. The membership grade of inputs is the output of the first layer:
where,
is the output of the first layer,
and
are the fuzzy membership functions for the fuzzy set A
i and B
i-2, respectively. The bell-shaped member function is selected for the current study due to its smoothness and concise notation:
where
a,
b, and
c are the premise parameters (training algorithms obtain these parameters).
(2) The nodes of the second layer are labelled with M, which shows that they carry out a simple multiplier function. The fuzzy strengths
of each rule are the output of the second layer:
(3) The nodes of the third layer are also fixed. The fuzzy strengths from the previous layer are normalized in the third layer. The sum of weight functions is used to compute the normalization factor. The normalized fuzzy strengths are the output of the third layer:
(4) The nodes of the fourth layer are adaptive and its outputs are computed as:
where,
pi,
qi, and
ri are the consequent parameters.
(5) The output in the fifth layer is labelled with S. A fixed node is observed in this layer. This layer computes the total summation of all the incoming signals:
In the classical training approach, a combination of the least square and gradient descent methods is commonly used as a hybrid learning algorithm to adjust the parameters of the ANFIS model. The consequent parameters of ANFIS model are updated by applying the least square method in the forward pass. Additionally, in the backward pass, the gradient descent method is used for updating the premise parameters. In the hybridized schemes, tuning and adjusting the consequent and premise parameters are determined by the optimization algorithms as the hybrid training scheme.
2.3. ANN Model
The artificial neural network uses behavioral patterns to provide a framework for modeling mechanisms. It consists of three layers: input, hidden, and output layers, and includes the processing units named neurons which are arranged in several layers [
39]. The connection weights link the neurons of preceding layers to the neurons of the following layers. The output of the middle layer (hidden layer) is used as the input to the following layer. The input data is received by the input layer, while the last layer generates the final output of the ANN model. The middle layers receive and transmit the input data to the connected nodes in the following layers. The weighted sum of inputs is used by the hidden neurons to produce the intermediate output. The ANN model uses the activation functions to compute the outputs of the hidden and output neurons. It uses the bias values to set the output along with the weighted sum of inputs to the neuron. The process of ANN modelling has two major levels: (1) preparing the network structure, and (2) adjustment of the weights of connections. The literature review indicates that the backpropagation training algorithm is wildly used in different fields, such as water engineering [
40]. First, the output of the ANN model is obtained as a response of the ANN model. In the next level, the error between observed and estimated values is minimized to find the weights of the model. If the output is different from the observed value, the modification of weights and biases will start to decrease the error values. However, the backpropagation algorithm has a slow convergence rate and to overcome its inherent weakness the meta-heuristic optimization algorithms are used in the present study.
Figure 1 shows the structure of the ANN model and its hybridization with intelligence algorithms.
2.4. SVM Model
The SVM model has been widely used for predicting solar radiation [
41], rainfall [
42], landslides [
43], and drought [
44]. In the SVM model, the input data are divided into testing and training samples. The selected input vector (training sample) is mapped into a high-dimensional feature space. Then, the optimal decision function is generated [
44]. Equation (7) shows the regression estimation function of the SVM model:
where,
is the nonlinear mapping function for mapping sample data (x) into an m-dimensional feature vector,
b is the bias, and
is the weight vector of the independent function.
and
b are computed by minimizing the following function:
where,
D(
f) is the generalized optimal function,
is the complexity of the model,
C is the penalty parameter, and
is the error control function of
. Thus, the optimization problem is defined as follows:
where,
and
are the relation factors. Adjusting the partial derivatives of
W,
b,
, and
to 0 and using the Lagrangian equation, an optimization problem can be formulated as follows:
where,
is the kernel function. The most popular kernel function is the radial basis function:
where,
is the radial basis function parameter. The SVM based model uses the grid search algorithm (GS) to find the optimal value of parameters C and
. Specifically, a set of initial values is chosen for both parameters
and C. To select
and C using cross-validation, the available data are divided into k subsets. One subset is regarded as testing data and then assessed using the remaining k-1 training subsets. Then, the cross-validation error is computed using the split error for the SVM model using different values of C and
. Various combination of parameters C and
are evaluated and the one yielding the lowest cross-validation error is chosen and used to train the SVM model for the whole dataset. The structure of the SVM model is shown in
Figure 2.
2.5. Optimization Algorithms
2.5.1. Grasshoppers Optimization Algorithm (GOA)
Grasshoppers are regarded as pests because they damage agricultural crops. They are a group of insects that can generate large insect swarms. The mathematical function to investigate the swarming behavior of grasshoppers is demonstrated with the following equation [
45]:
where,
is the position of the
ith grasshopper,
is the classical interaction,
is the gravity force on the
ith grasshopper, and
is the wind advection. The classical interaction is simulated as follows:
where,
dij is the distance between the
ith and
jth grasshoppers, and
s is a function for the definition of the strength of social forces.
The function
s is computed as follows:
where,
f is the intensity of attraction, and
l is the attractive length scale. The distance between grasshoppers ranges between 0 and 15. Repulsion is observed in the interval [0 2.079]. The grasshoppers enter the comfort zone if they are far from 2.079 units from other grasshoppers.
G component is computed as follows:
where,
g is the gravitational constant and
is a unity vector towards the center of the earth. The
A parameter is computed as follows:
where,
u is a constant drift and
is a unit vector in the direction of the wind. Finally, the new position of a grasshopper is computed using its common position, the food source position, and the position of all other grasshoppers:
where,
N is the number of grasshoppers. However, Equation (18) cannot be directly used for optimization because grasshoppers do not converge to a specified point. Thus, a corrected equation is used to update the grasshopper’s position:
where,
is the upper bound;
is the lower bound;
is the value of the
Dth dimension in the target space (optimal solution found so far); and
c is a decreasing coefficient to shrink the comfort zone, repulsion zone, and attraction zone.
Figure 2 shows the flowchart of GOA.
2.5.2. Weed Algorithm (WA)
Weeds have a very adaptive nature that converts them to undesirable plants in agriculture.
Figure 3 shows the flowchart of the WA algorithm [
46]. The WA starts with initializing a random population of weeds in the search space. A predefined number of weeds are randomly distributed over the entire dimensional space, indicated as a solution space. The fitness of weeds is assessed by considering its fitness function to optimize the problem. Each agent of the current population can produce some seeds via a predefined region considering its own location. In this way, the number of produced seeds relies on its fitness function in the population regarding the best and worst solutions, as observed in
Figure 3. The number of seeds is computed as follows [
46]:
where,
is the worst fitness function,
is the best fitness function,
is the minimum number of seeds,
is the maximum number of seeds, and
is
ith fitness function. The distribution of seeds is random over the search space and is based on the standard deviation
and zero mean. The standard deviation of the distribution of seeds varies as follows:
where,
is the maximum number of iterations,
is the standard deviation at the current iteration,
is the final value of standard deviation,
is the predefined initial value of standard deviation, and
n is the nonlinear modulation index. Seeds are produced by each weed and then are distributed over the space. The competitive exclusion is the final level in the WA. If a weed does not generate seeds, it will be extinct. If all the weeds generate seeds, the number of weeds increases exponentially. Therefore, the number of seeds is limited to the maximum value (P
max). The weeds with better fitness function are allowed to reproduce. Weeds with worse fitness function are removed (see
Figure 4).
2.5.3. Cat Swarm Optimization (CSO)
Recently, CSO has gained popularity among other optimization algorithms because of its exploration ability and is widely used in different fields, such as wireless sensor networks [
47], robotics [
48], data clustering [
49], and dynamic multi-objective algorithms [
50]. Chu et al. (2006) introduced the cat swarm algorithm [
51].
Figure 5 shows the flowchart of CSO. The CSO uses hunting and resting skills for optimization. First, the initial population of cats is initialized randomly. The seeking mode and tracing mode are two important operation modes in the CSO model. The seeking mode demonstrates the resting ability of cats which change their position and remain alert. This mode is regarded as a local search for the solutions. The seeking memory pool (SMP), the seeking range of selected dimension (SRD), and counts of dimension to change (CDS) affect the cat’s behavior. The number of duplicate cats is denoted by SMP. CDC shows that the dimensions are to be mutated and SRD denotes change value of chosen dimensions. In the seeking mode, most of the cat’s time is in the resting time, even though they remain alert [
52]. The seeking mode includes the following levels:
Generate replicas of the cats as per SMP.
The position of each copy is updated as follows:
where,
is the position of the
kth cat in the
dth dimension (new position of the cat),
is the random number,
N is the number of cats,
D is the number of dimensions, and
is the position of
jth cat in the
d dimension (old position of the cat).
Compute the objective function for all copies and choose the best objective function value (xbest) of the cat.
Substitute xj,p with the best cat if the xbest is better than xj,p in terms of the objective function value.
The hunting skill of cats is represented by the tracing mode. Cats trace the objectives with high energy by changing their locations with their own velocities. The velocity is updated as follows:
where,
is the inertia weight,
is a constant, and
is the velocity of
jth cat in the
d dimension, and
is the new velocity of the
jth cat. The position of cats in the tracing mode is updated as follows:
where,
is the
jth position of the
kth cat in the
dth dimension (new position of the cat).
2.5.4. Particle Swarm Optimization (PSO)
In PSO, a set of particles that are generated randomly search the best adjacent solutions for optimization. The updating equations for the new position and velocity of particles are written as [
53]:
where,
d is the number of dominions;
is the inertia weight;
and
are the random values;
and
are the acceleration coefficients;
is the global best position obtained by neighbors; and
is the personal best position.
The particles find the solutions of optimization problems by adjusting the position and velocity of particles. The main advantages of PSO are easy implementation and computational efficiency.
2.5.5. Genetic Algorithm (GA)
Genetic algorithm is one of the most popular algorithms that is extensively applied for optimization problems. Each chromosome in GA is a candidate solution [
19]. The genes of chromosomes simulate the variables of optimization. First, the initial population of chromosomes is randomly initialized for optimization and the selection operator is used to select the best chromosomes for the production of the next generation. The chromosomes with better fitness values have a great chance of being chosen by the selection operator. The crossover operator is used to exchange genes between two chromosomes for producing new solutions. Finally, the mutation operator is used to cause changes in the genes. The mutation operator is applied to the chromosomes of new genes to generate different solutions with new genes. If the convergence criteria are satisfied, the algorithm stops; otherwise, the algorithm runs again. The drawback of GA shows that GA requires a high number of iterations [
20].
2.5.6. Krill Herd Algorithm (KHA)
Gandomi and Alavi [
54] introduced the KHA using the krill’s behavior in nature [
54]. The KHA is widely used in different fields, such as text document clustering analysis [
55] and structural seismic reliability [
56]. The KHA acts, based on three main concepts: (1) mutation-induced, (2) foraging mutation, and (3) physical diffusion. The following formulation uses the three behaviors mentioned above [
54]:
where,
is the location of the
ith krill,
Ni is the motion induced by another krill,
is the foraging motion, and
is the physical diffusion of the
ith krill. Equation (28) describes the motion-induced by another individual krill.
where,
is the maximum induced speed,
is the neighbor’s local effect,
is the krill’s target direction,
is the inertia weight of induced motion, and
is the old motion-induced for the
ith individual krill. The foraging motion can be formulated as:
where,
is the foraging speed,
is the food attractive,
is the effect of the best fitness of the
ith krill, and
is the last foraging motion. The diffusion can be computed as:
where,
is the maximum diffusion speed, and
is the random direction.
Finally, the position of a krill is computed as follows:
where,
is the value of the next individual krill location, and
represents the current position of solution number I, and
is the essential constant.
Figure 5 shows the flowchart of the krill algorithm.
2.6. Principal Component Analysis (PCA)
PCA is a statistical orthogonal transformation to obtain a set of values of linearly uncorrelated (principal components) from a set of observations. When the user has the number of inputs but he cannot identify the appropriate inputs, the PCA is used to reduce the number of inputs. The final data set should be able to demonstrate most of the variance of the original input data by creating a variable reduction [
57]. PCA can be explained, based on the following equation [
57]:
where,
shows the principal component,
is the related eigenvector, and x
i is the input variable. The information is obtained by solving Equation (34):
where,
is the variance-covariance matrix,
I is the unit matrix, and
is the eigenvalues.
2.7. Taguchi Model
The random parameters of optimization algorithms are the most important parameters affecting the outputs of the optimization algorithms. Thus, determining the appropriate values of random parameters is necessary to construct the optimization models. The Taguchi model is widely used to design different parameters of different experiments or experimental models. First, the initial level is determined for each of the random parameters in the optimization algorithms. In the Taguchi method, parameters are classified into two groups: (1) controllable, and (2) uncontrollable (noise). In the Taguchi model, each parameter combination that has a higher
S (signal)/
N (noise) ratio is regarded as the best combination [
58].
where,
n is the number of data, and
is the fitness function that is obtained by the Taguchi model. For example, consider the PSO algorithm with four parameters and three levels. When the population size is at level 1, the acceleration coefficient is tested at levels 1, 2, 3, and 4. Similarly, the inertia coefficient is tested at levels 1, 2, 3, and 4.
2.8. Hybrid ANN, ANFIS, and SVM Models with Optimization Algorithms
The optimization algorithms can be used as a robust training algorithm for the ANN models. The process starts with the initialization of a group of random agents (particles, chromosomes, krill, grasshoppers, weeds, or cats). The position of agents represents the ANN weights and biases. Following this level, using the initial biases and weights (i.e., the initial position of agents), the hybrid ANN-optimization algorithms are trained, and the error between the observed and estimated value is calculated. At each iteration, the calculated error is decreased by the updating of agent locations.
The model procedure in ANFIS-optimization algorithm models starts with the initialization of a set of agents (particles, chromosomes, krill, grasshoppers, weeds, or cats) and continues with the random choice of agents and finally adjusts a location for each agent. First, the ANFIS model is trained. Then, the consequent and premise parameters are optimized by the optimization algorithms. The root mean square error (RMSE) is defined as an objective function. The aim of optimization algorithms is to minimize the objective function value with finding the appropriate values of consequent and premise parameters.
In SVM, the C parameter and kernel function parameters have significant effects on the accuracy of the SVM. The random population of agents (particles, chromosomes, krill, grasshoppers, weeds, or cats) are initialized for training the SVM parameters. The RMSE is defined as an objective function. The aim of hybrid SVM-optimization algorithm models is to minimize model errors.
Figure 2 shows the developed framework of hybrid ANN, ANFIS, and SVM-optimization models for modeling groundwater level.
Thus, the model parameters are considered as decision variables for optimization algorithms. The optimization algorithms aim to minimize the error function to find the optimal value of model parameters. The PCA selects the appropriate input combinations. Then the hybrid and standalone models uses the input combinations to forecast GEWL. The models uses the optimized model parameters to accurately forecast monthly GWL.
2.9. Uncertainty Analysis of Soft Computing Models
The input data and the inability of model structure are the sources of uncertainty. In this research, an integrated framework is developed to simultaneously evaluate the input data and model structure.
Input Data Uncertainty
The combined Bayesian uncertainty was used to compute the uncertainty contributed by input data. The input error model was used to account for the uncertainty of input data [
59]:
where,
: the adjusted groundwater level (GWL),
: the observed GWL,
t: the given month,
K: the normally distributed random,
m: mean, and
: variance. For each soft computing model,
m and
were added to the system. A dynamically dimensioned search was used to find the value of
m: mean and
: variance as defined by [
59].
Mode Structure Uncertainty
Bayesian model average (BMA) is used for model uncertainty. The posterior model probability and averaging over the best models were used to estimate the uncertainty of the models. The weighted average prediction of quantity of target variable is computed as follows [
59]:
where,
Fj: the point prediction of each model,
ej: noise,
: the weight vector of model,
H: n observation of GWL,
k: number of models, and
j: number of observations. For accurate application of BMA model, the standard deviation of normal probability distribution functions and weights should be estimated accurately. The log-likelihood function is used to calculate the weights and standard deviation as follows [
59]:
where,
: maximum likelihood Bayesian weight. Markov Chain Monte Carlo (MCMC) simulations are used to compute the log-likelihood function. The integrated framework is defined as follows:
A number of models are selected to simulate the GWL.
The prior probability is assigned to each model.
An error input model is defined.
The posterior distribution of input error models and model parameters are obtained.
A predetermined number of GWLs for each model is provided using probabilistic parameter estimations obtained from level 2 to level 4.
The variance and weight of models are estimated.
The weights for ensemble members of models are summed to compute the weight models.
To the experimental soft computing models. The following indices were used to quantify the uncertainty of models:
where
k is the number of observed data,
is the upper bound of data,
is the lower bound of data,
is standard deviation,
p is bracketed by 95% of predicted uncertainties,
d is the distance between the upper and lower bounds, and
is the average distance between the upper and lower bounds [
59,
60].
2.10. Statistical Indices for Evaluation of Different Models
In this study, the following indices were used to evaluate the performance of models:
Nash Sutcliffe efficiency:
Percent bias (PBIAS):
where,
N is the number of data,
H0 is the observed value, and
Ps is the predicted value.
RMSE and MAE show a good match between observed data and estimated values when it equals 0. The NSE shows a good match between the observed values and estimated values when it equals 1. The best value of PBIAS is zero.