A Developed Multiple Linear Regression (MLR) Model for Monthly Groundwater Level Prediction

: Groundwater level (GLW) prediction is essential for monitoring water resources. Our study introduces a novel model called convolutional neural network (CNN)–long short-term memory neural network (LSTM)–Multiple linear regression (MLR) for groundwater level prediction. We combine two deep learning models with the MLR model to predict GWL and overcome the limitations of the MLR model. The current paper has several innovations. Our study develops an advanced hybrid model for predicting groundwater levels (GWLs). The study also presents a novel feature selection method for selecting optimal input scenarios. Finally, an advanced method is developed to examine the impact of inputs and model parameters on output uncertainty. The current paper introduces the gannet optimization algorithm (GOA) for choosing the optimal input scenario. A CNN-LSTM-MLR model (CLM), CNN, LSTM, MLR model, CNN-MLR model (CNM), LSTM-MLR model (LSM), and CNN-LSTM model (CNL) were built to predict one-month-ahead GWLs using climate data and lagged GWL data. Output uncertainty was also decomposed into parameter uncertainty (PU) and input uncertainty (IU) using the analysis of variance (ANOVA) method. Based on our ﬁ ndings, the CLM model can successfully predict GWLs, reduce the uncertainty of CNN, LSTM, and MLR models, and extract spatial and temporal features. Based on the study’s ﬁ ndings, the combination of linear models and deep learning models can improve the performance of linear models in predicting outcomes. The GOA method can also contribute to feature selection and input selection. The study ﬁ ndings indicated that the CLM model improved the training Nash–Sutcli ﬀ e e ﬃ ciency coe ﬃ cient (NSE) of the CNL, LSM, CNM, LSTM, CNN, and MLR models by 6.12%, 9.12%, 12%, 18%, 22%, and 30%, respectively. The width intervals (WIs) of the CLM, CNL, LSM, and CNM models were 0.03, 0.04, 0.07, and, 0.12, respectively, based on IU. The WIs of the CLM, CNL, LSM, and CNM models were 0.05, 0.06, 0.09, and 0.14, respectively, based on PU. Our study proposes the CLM model as a reliable model for predicting GWLs in di ﬀ erent basins.


Introduction
Groundwater plays a key role in managing water demands [1].Groundwater is a valuable resource for reducing droughts.Groundwater level prediction is essential for sustainable water resource management [2].Predicting groundwater levels can be used to monitor water availability during droughts [3].Coastal areas may experience saltwater intrusion into freshwater aquifers due to excessive groundwater extraction.
Groundwater level (GWL) prediction is necessary to prevent saltwater intrusion into freshwater aquifers [3,4].GWL prediction is complex because it is affected by many variables, including human activities and climatic conditions [5].Human activities can affect groundwater levels.Climate parameters also influence the GWL [5].Variables such as precipitation, temperature, humidity, and evaporation can alter groundwater levels [6].
Machine learning and numerical models can be used to predict the GWL.There are different packages and numerical models for predicting the GWL.For example, the modular groundwater flow model is a mathematical groundwater model that can predict the GWL.Numerical models have complex equations and boundary conditions [7].The complexity of numerical models can limit their interpretation [8].
A machine learning model can predict a variety of hydrological variables [7].Machine learning models (MLM) can capture nonlinear components of time series data [8].Since MLM models use different layers and operators, they can accurately predict the GWL [8].Hydrologists look for simple and accurate models to predict hydrological variables [8].
Multiple linear regression (MLR) is a simple and accurate machine learning model.The MLR model provides a linear relationship between inputs and outputs [9].Since the MLR model is user-friendly, scholars use it for hydrological predictions [9].Table 1 displays a review of studies implementing the MLR model for GWL prediction.

References Results Discussion
Sahoo and Jha [10] Developed the MLR model to predict Groundwater Level (GWL).Rainfall, river stage, and temperature were used to predict GWL.They tested different input combinations for predicting GWL.They reported that the MLR model was a reliable tool for groundwater modeling For predicting GWL, the paper used the original SVM and MLR models.However, the original versions of these models may not fully extract important features and accurately predict GWL.Furthermore, random input selection may not result in high accuracy.
Ebrahimi and Rajaee [11] Ebrahimi and Rajaee [11] coupled the wavelet method with the MLR, artificial neural network models (ANNs), and support vector machine models (SVMs) to predict GWL.They used the wavelet method to decompose time series into multiple subtime series.Compared to other wavelets, the Db5 wavelets performed better.
The wavelet-MLR model improved the precision of the MLR model for predicting GW.
Wavelet preprocessed data points.
Wavelet transforms can be used to extract valuable features from time series data.These features allow predictive models to capture both high-frequency and low-frequency components.
Bahmani and Ouarda [12] Bahmani and Ouarda [12] used the decisiontree model, the genetic programming (GEP) model, and the MLR model for predicting Groundwater Level (GWL).The Ensemble Empirical Mode Decomposition (EEMD) was used to preprocess the data.Wavelet-GEP performed better than other models.
Since the MLR model lacked advanced operators for analyzing nonlinear data, its accuracy was lower than that of the GEP and EEMD-GEP models.EEMD decomposes the original signal into IMFs that capture more detailed and local features.
Poursaeid et al. [13] Poursaeid et al. [13] used the MLR, the extreme learning machine model (ELM), and the SVM model to predict GWL.The study concluded that the ELM model performed better than the MLR model.
Due to the lack of robust optimizers, the model did not achieve high accuracy.To improve the performance of the ELM, SVM, and MLR models, robust optimizers were needed.
The MLR model has been widely used to predict GWLs, but its main limitations have not been addressed.Studies have failed to improve the performance of the MLR model for analyzing nonlinear data.While GWL data exhibit spatial and temporal variations, MLR models have not been developed to extract spatial and temporal features.The main problem is that MLR assumptions cannot handle GWL data and nonlinear relationships.
The MLR model does not capture nonlinear components of time series data, so it cannot accurately predict the GWL [10].MLR uses linear assumptions to predict the GWL, but inputs and outputs may have intricate and nonlinear relationships [13].By improving the efficiency of the MLR model, we can ensure that it captures the underlying relationships between variables more effectively.
Additionally, MLR cannot effectively handle output uncertainty.Furthermore, the performance of the MLR model depends on the correct selection of inputs [10].In recent years, researchers have used hybrid models to overcome the limitations of classical machine learning models [14].By combining multiple models, a hybrid model improves prediction accuracy and overcomes the shortcomings of traditional models [14].We present an innovative model for predicting GWLs that addresses the limitations of the MLR model.This new model improves the ability of the MLR model to capture nonlinear components, quantify uncertainty values, and determine the best input scenarios.
Since linear models such as MLR cannot fully extract important features from GWL data [10], our article focuses on developing robust models to predict GWLs spatially and temporally.To overcome the drawbacks of linear models, this article combines two robust deep learning models with MLR to predict GWLs spatially and temporally.Since deep learning models use advanced operators to extract spatial and temporal features, we choose them to improve the performance of the MLR model for extracting features and predicting data.
A deep learning model consists of multiple layers that analyze and predict data with high accuracy [4].A convolutional neural network (CNN) is a deep learning algorithm that extracts valuable features from time series data [15].The CNN model is a reliable tool for spatially analyzing GWL data and extracting important patterns [16].The long shortterm memory (LSTM) neural network is another robust tool for temporal analysis of groundwater level (GWL) data [17].CNN and LSTM models can be used to improve the accuracy of the MLR model by analyzing and predicting nonlinear and complex data [15,16].
Ghasemlounia et al. [18] reported that different variants of the LSTM model could improve the accuracy of GWL predictions.They indicated that the bidirectional LSTM model could successfully predict the GWL.Sheikh Khozani et al. [19] stated that LSTM models were robust tools for improving the accuracy of linear models.A linear-LSTM model was used to predict groundwater levels (GWL) and the hybrid models enhanced the linear models' performance.Afshari Nia et al. [14] stated that the CNN model played a key role in extracting spatial features from hydrological time series data.
Our study employs a new model called convolutional neural network-long shortterm memory neural network-MLR (CLM) to predict GWLs.The new method uses CNNs and LSTMs to analyze GWL data spatially and temporally.As a result, the hybrid structure of the new model can improve the predictive capability of the MLR model.A hybrid model improves the ability of the MLR model to analyze nonlinear time series data.This paper also proposes an efficient method for quantifying output uncertainty, which is necessary for evaluating the accuracy of models.
To select the optimal input scenario for groundwater level prediction, the current study proposes a novel feature selection method and compares it with other methods.The main novelties of this paper are the development of a new predictive model for groundwater level prediction and the introduction of a new method for selecting the best input scenarios.
The current paper includes the following novelties: • An advanced machine learning model is developed to predict the monthly GWL.We combine three models to predict monthly groundwater levels.As our proposed model consists of two robust deep learning models, it can be superior to linear models such as MLR models.Since our proposed model uses the capabilities of CNN and LSTM models simultaneously, it outperforms standalone CNN and LSTM models.
The CLM model is presented as a reliable tool for predicting GWLs in different basins.Since it can be adapted to different watersheds, it can be a reliable tool for water resource managers around the world.

•
A new optimizer is introduced to identify the best input combination.An optimization algorithm determines the best input combination, while the correlation method only determines important inputs.To identify the most optimal set of features, the new method performs an extensive search and explores various combinations.The new optimization method handles high-dimensional feature spaces and can be finetuned to maximize model accuracy, minimize error, or enhance specific evaluation metrics.As a result, our study contributes to feature selection.

•
An effective method is suggested to quantify uncertainty values.This study contributes to the quantification of output uncertainty values.The study illustrates how model parameters and inputs contribute to model uncertainties.Through the analysis of variance (ANOVA) method, the study provides insight into the sources and values of output uncertainty.

•
We use an advanced method to decompose the output uncertainty into different sources of uncertainty.Our study proposes a new method to calculate the percentage contribution of inputs and model parameters to overall uncertainty.
This research contributes to water resource management by providing a reliable model for predicting groundwater levels.It addresses the limitations of traditional methods, introduces innovative techniques, and provides a comprehensive analysis of uncertainty and model performance.
The second section of the current study presents materials and methods The third, fourth, and fifth sections present the case study, result, and the discussion, respectively.

MLR Model
The MLR model is a linear regression model that uses regression coefficients to determine the relationship between inputs and outputs [20].The MLR model assumes a linear relationship between input and output data.An MLR model optimizes regression coefficients by minimizing prediction error.Equation ( 1) is used to provide outputs for the MLR model [20]: where Out : Output of variable X 1 : input variable, n X : nth input variable, 0 β , 1 β and n β : regression coefficients.As the MLR model does not use advanced operators, it may not be able to predict complex phenomena accurately.Since the original MLR model was unable to analyze nonlinear data, our study developed the MLR model to overcome these limitations.

Mathematical Model of the LSTM
The LSTM is a type of recurrent neural network that utilizes information gates and advanced structures to predict outputs [21].The LSTM automatically extracts relevant features from sequences without the need for manual feature engineering.The input gate is one of the most important components of an LSTM model [22].An input gate controls the flow of information.The current input data and the previous hidden state are processed using the TANH (hyperbolic tangent) activation function.LSTM models use forget gates to remember and forget information at each time step [23].Output gates monitor the flow of information between memory cells and model outputs.Equations ( 2)-( 6) provide outputs.

Mathematical Model of the CNN Model
The CNN model extracts spatial features from time series data [24].CNNs use convolutional layers to automatically learn hierarchical features from input data.These layers consist of small filters (also called kernels) that slide over the input data and extract spatial features [25].Convolutional layers produce feature maps.The pooling layer is the next layer of the CNN model.The pooling layer reduces the spatial dimension of feature matrixes.The fully connected layer is the final layer of the CNN model.The average pooling method reduces the dimension of feature maps [26].
CNNs usually have one or more fully connected layers (also called dense layers) that perform classification or regression tasks [15].Before applying the LSTM model, CNN output should be flattened into a one-dimensional vector [24].

Input Selection
Different climate parameters can change GWLs, so choosing the right input combination is important.Binary optimization algorithms are powerful tools for selecting the best input scenario.The algorithms can find the optimal input scenario by accurately searching the problem space [27].Correlation coefficients only identify key input parameters, while binary algorithms automatically identify the best input combination.The Gannet optimization algorithm (GOA) is a novel algorithm for solving optimization problems.It was inspired by the life of Gannets.Pan et al. [28] reported that GOA was capable of solving mathematical functions and engineering problems.The GOA method converged faster than other algorithms.Our study selected this method because it provided promising results.Binary and continuous versions of the GOA were used to determine the LSTM and CNN parameters.

Structure of GOA
Gannets are fat birds with slender necks.A gannet is a seabird that resides in coldtemperate regions and uses unique methods to catch fish [28].Gannets will form a straight line to catch fish.The gannet dives below the surface of the water to catch fish [28].First, the initial position of the gannets is initialized: ( ) where ij GH : location of the ith gannet in the jth dimension, j UB and j LB : upper and lower bounds of the decision variable, respectively; and 1 r : random value.Gannets follow prey using two types of diving patterns.Gannets use the U-shaped dive to reach deeper depths.Gannets use V-shaped dives to rapidly enter and exit the water [28].Equations ( 8)-( 10) are used to simulate these dive patterns.Equations ( 9) and ( 10) define the controller parameters.Equation ( 11) also defines a function based on the V-shaped dive to update the location of gannets.
where φ : the current solution; a and b : controller parameters; ( ) V φ : the V-shaped dive; IT : iteration number; max IT : maximum number of iterations; and 2 r and 3 r : random numbers.Equations ( 13)-( 16) are used to define controller parameters for updating the location of gannets.We choose one of the diving strategies to update the agent's location.
( ) ( ) ( ) , . ) ( ) where ( ) ( ) r GH t : the random location of gan- net.Based on two scenarios, gannets pursue fish.The first scenario assumes that the gannet has enough energy to chase fish.However, the second scenario assumes that the gannet lacks the energy to chase fish [28].In order to determine the energy level of gannets, we define a parameter called capture ability.Equation ( 17) is used to define the capture ability parameter.This equation consists of two controller parameters.Equations ( 19) and ( 20) are used to define values of controller parameters.
( ) where 6 r : random number, M = mass of gannet (2.5), VEL: velocity (1.5 m/s), and 6 r : ran- dom number.When the gannet's energy level exceeds the threshold, the position of the gannet is updated, and the fish is captured.Otherwise, the gannet randomly moves [28].
( ) ( ) ( ) where μ and σ : random values and ψ : constant value.A comparison is made be- tween GOA and several algorithms to evaluate its capabilities in selecting inputs and adjusting model parameters.The following algorithms are selected as they can provide accurate solutions.

Bat Algorithm (BA)
The bat optimization algorithm was inspired by bat life.The echolocation ability of bats helps them locate prey [29].The bat makes loud noises and receives echoes from its surroundings to find prey [30].The optimization process begins with the installation of frequency.Bats change their velocity at each iteration.

(
) ( ) where ε : a random value; i fr : the ith frequency; min fr : minimum frequency; max fr : maximum frequency; t i VE : the ith velocity at iteration t; X − : the position of the ith agent at iteration t -1; and * i X : the best location of the bat.The bats perform a local search based on loudness: where new i X : the new location of bat; old i X : the old location of bat; t A : loudness; and ξ : a random number.Finally, we update loudness and rate of pulse emission (RAP) at each iteration: where A − : the ith loudness at iteration t -1; 0 i RAP : the previous value of RAP, and t i RAP : the new value of RAP.

Particle Swarm Optimization (PSO)
The PSO was inspired by the behavior particles.The PSO is selected for optimization and input selection due to its simple structure and high accuracy [31].The velocity of the particles is updated using Equation ( 34) [32]: where r : random parameters.The opti- mization process begins with the installation of an initial population of particles.For updating the particle's velocity and position, Equations ( 31) and (32) are used.Until convergence is reached, the optimization process continues.

Genetic Algorithm (GA)
GA is one of the most popular algorithms for solving optimization problems.GA consists of a group of chromosomes [33].Three operators, namely selection, mutation, and crossover, are used to maintain population diversity.The selection operator is used to select the best solution.Crossover plays a key role in the evolution of solutions.A crossover operator combines genetic information to create new offspring [34].The mutation operator randomly alters genes to maintain genetic diversity in a population.

Binary Version of Optimization Algorithms
Binary algorithms select inputs.Conversion of a continuous algorithm to a binary version is necessary for feature selection.Transfer functions convert continuous values to binary values [35].The S-shaped function is used as a transfer function.Previous articles reported that the S-shaped transfer function gave the most accurate results [36].

Hybrid CLM Model for Predicting GWL
Our study uses the CLM model to predict GWLs.We follow the following steps to create the hybrid CLM model.

•
The parameters of the CNN model and the names of the inputs are initialized as the initial population of the binary GOA.In the modeling process, it is important to determine the training and testing data.K-fold cross-validation (KCV) is one of the suitable methods for partitioning input data into training and testing datasets [37].The KCV method evaluates model accuracy by dividing data into K folds.Training data are divided into K-1 folds.Finally, the model is run K times to calculate the average error function.The average error function value is used to evaluate model performance and determine the optimal number of folds [38].

•
Each possible input scenario is encoded as a binary string.Each bit represents the presence (1) or absence (0) of a specific input variable.An initial population of potential solutions (binary strings) is generated.These strings represent various combinations of input variables.The binary string 11111 indicates that all inputs are considered during modeling.The algorithm can identify the most influential input combinations based on the defined fitness function by exploring various combinations and evaluating their performance.

•
The CNN model is executed by using training data.Our study uses the RMSE as a fitness function.By finding input combinations with the smallest prediction errors, RMSE is minimized.The GWL prediction model is trained and tested using the appropriate subset of input variables (denoted by "1" in the binary string).The model's predictions are compared with the actual GWL values.The RMSE value is then calculated to determine the quality of the selected input combination.If a solution violates constraints (upper and lower values of decision variables), we should correct the decision variables to achieve feasible solutions.When an infeasible solution is identified, boundary correction methods are applied to modify the solution.For example, if a decision variable xi violates its upper or lower bounds, it can be updated using Equation (35). Figure 1 shows the values of decision variables and constraints.We benchmark the CLM model against the LSTM-MLR (LSM), CNN-MLR (CNM), CNN-LSTM (CNL), LSTM, CNN, and MLR models.In order to create the LSM, LTC, and CNM models, the input data are fed into the first model.Then, the outputs of the first model are fed into the second model.

Quantitation of Uncertainty Values
The input parameter is one of the most important sources of uncertainty.The model parameter is another source of uncertainty [39].Therefore, it is important to quantify output uncertainty based on input and model parameters.The normal distribution is used as a prior distribution of input parameters.Normal distributions cannot be used as prior distributions for model parameters, because they may not have any physical meaning [39].We study the variability of parameters during the training process to calculate the prior distribution of model parameters.The Monte Carlo simulation method is applied to regenerate sample parameters, including model and input parameters.Then, the Nash-Sutcliffe model efficiency (NSE) coefficient is applied to estimate the likelihood function value.If the likelihood value of a parameter is below a threshold, the parameter is removed from the calculation cycle.Finally, the posterior distribution of the parameter is calculated based on the prior distribution and the likelihood function.

Determination of Contribution of Input Parameters and Models to Output Uncertainty
In order to study the effect of input parameters and models on output uncertainty, it is necessary to decompose output uncertainty into parameter uncertainty and input uncertainty.The analysis of variance (ANOVA) method can be used to decompose output uncertainty [40].The ANOVA method decomposes the total sum of squared errors (SSE) to determine the percentage contribution of each uncertainty factor [40]. Equations ( 34)-( 40) explain the mathematical model of ANOVA.
( ) . . _ where ij GWL : predicted GWL based on ith input and jth model parameter; Finally, we use Equations ( 40)-( 42) to determine the contribution of the uncertainty resource to the total uncertainty: ( ) ( ) the percentage contribution of the interaction between inputs and model parameters to the output uncertainty.

Trend Analysis
Water resources management requires trend detection of groundwater levels (GWL).GWL trend detection helps determine groundwater sustainability and availability in basins.Our study uses the Innovative-Şen trend (IŞT) method for trend analysis of GWLs [41].Time-series data are split into two equal sets.The data of the first and second half of a time series are plotted on the x-axis and y-axis.The slope is calculated using Equation (44): ( ) where S : slope; 2 X : the average value of the second subset; 1 X : the average value of the first subset; and N: the number of data.If S is higher than cr S (critical slope), the trend is significant.

Details of Case Study
Yazd-Ardekan, located in the desert, has a dry and hot climate.Water scarcity is a serious challenge in the study area due to the dry climate and excessive groundwater consumption.There has been a reduction in aquifer storage by 65 MCM.To predict the groundwater level in the study, the GWL of all piezometers is estimated.Thiessen polygons are defined by the geometric scope of each piezometer.Then, the area of each polygon is multiplied by the groundwater level of the corresponding piezometer.This process is repeated for all piezometers.The obtained outputs are summed together, and their average value represents the groundwater level in the area.Figure 2a,b show the study area and time series data.Table 2 lists the characteristics of data points.The lagged GWL data and climate parameters are used to predict one-month-ahead GWLs.Climate parameters include wind speed (WSP), evaporation (EVA), relative humidity (REH), rainfall, and temperature (TEM).There are 1194 pumping wells in this basin.Overexploitation of the aquifer reduces groundwater storage by 65.93 MCM.Since the study area experiences hot and dry summers, water supply is one of the real challenges for decision-makers.The months of July, June, and May have the lowest rainfall amounts.Furthermore, these months have the highest evaporation rates observed.The data of 71 piezometers were used to predict GWLs.The data were collected from 1995 to 2010.
Our study used different inputs, such as lagged climate and GWL data, to predict GWLs.Data augmentation involves creating modified or synthetic versions of existing data to increase the dataset's size, diversity, and richness.Feature augmentation enhances the richness of the input data and helps the model learn from the temporal dependencies and trends of the historical data.A lagged data set consists of features that represent historical GWL measurements over different periods of time.Adding lagged data to the dataset can enrich the information available.This technique can enhance the model's understanding of temporal dependencies and patterns of GWL time series data.While feature augmentation using lagged data is valuable for predicting GWLs, it does not create new data samples or artificially augment the dataset.The next studies can use augmentation methods such as window slicing, time wrapping, and data decomposition to improve model performance.
Our study uses Equations ( 46)-(53) to explore the potential of models for predicting GWLs:

•
Width interval (WI) • Prediction interval coverage probability (PICP) where i L and i U : upper and lower values of variables, i y : observed data, and R: the difference between maximum and minimum values.

Error Function for Evaluating the Performance of Optimization Algorithms in Choosing Inputs
There is a good balance between the exploration and exploitation capabilities of an ideal algorithm.Equations ( 58) and (59) are used to compute the dimension-wise diversity.Equations ( 60) and (61) are used to compute the percentage of exploration and exploitation: ( ) ( ) max div : maximum diversity.The aver- age selection size (ASS) index is used to evaluate the performance of each optimizer in selecting the best input scenario.The ideal algorithms select a minimum number of inputs that guarantee the highest accuracy for predicting GWL [42].
( ) ( ) ( ) ( ) where M : The number of runs of optimization algorithms (M = 1000), NU: the number of total inputs, i so : the size of chosen features at each iteration, and SD: standard deviation.
The low values of U95 and ASS show the best algorithm for choosing inputs.We couple different algorithms with the CLM model to determine the optimal input scenarios and model parameters for predicting GWLs.The algorithms send a chosen input scenario to the CLM model to predict GWLs.Then, we compute the 95 U of optimization algorithms to assess their performance in choosing inputs.

Determination of Algorithm Parameters
Random parameters affect the performance of algorithms.The maximum number of iterations (MNOI) is one of the important random parameters of algorithms.In addition, population size is another important random parameter.Figure 3  ( To run the models at training and testing levels, we need to determine the number of folds.Figure 4

Selection of the Best Optimization Algorithm for Choosing Inputs
Table 3c displays the U95 and ASS values of various algorithms.The U95 values of GHO, BA, PSO, and GA were 3.45, 5.75, 8.25, and 9.12, respectively.The ASS values of GOA, BA, PSO, and GA were 0.04, 0.07, 0.08, and 0.12, respectively.The GAO had the lowest ASS and U95 values.The GHO method could attain the highest accuracy.Additionally, the GOA method used fewer inputs to predict the GWL. Figure 5 evaluates the average exploration and exploitation percentages of different algorithms.A good optimization algorithm should strike a balance between exploration and exploitation capabilities.The algorithms were run 1000 times.The exploration and exploitation percentages of the GOA were 51% and 49%, respectively, while the exploration and exploitation percentages of the GA were 71% and 29%, respectively.The GOA achieved a good balance between exploration and exploitation abilities.Previous studies also showed that optimization algorithms could improve the accuracy of machine learning models [43,44].Table 2d shows a list of model parameter values.
GOA BA PSO GA Our study used climate inputs and GWL data to predict the one-month-ahead GWL.The lag times of GWL data and climate data were variable from t = 1 to t = 12.Thus, there are 72((number of climate parameters = 5) × (number of lag times = 12)) + 12 (lagged GWL data) input variables and 2 72 − 1 input combinations.The GOA chose the input combination of GWL (t − 1), RELH (t − 1), WIS (t − 1), GWL (t − 2), RA (t − 1), and TEM (t − 1) as the optimal input scenario for predicting GWLs.Previous studies also reported that lagged  [45].Seasonal and monthly patterns of climate data can affect GWLs [46].Feature selection is an important step in building accurate predictive models.It involves choosing the most relevant and informative input variables (features) from a large pool of candidates.Selecting the right features can significantly improve the model's performance.GOA automates this process by determining the optimal input features for groundwater level prediction.GOA searches for the best combination of input features that minimizes prediction errors or maximizes the predictive performance metric (e.g., Nash-Sutcliffe efficiency coefficient or other relevant metrics used in the study).By selecting the most important features, GOA reduces the dimensionality of the input data.It is important because too many irrelevant or redundant features can lead to overfitting and computational inefficiency.GOA automates the selection of optimal input features.By selecting the most relevant features, GOA helps create more interpretable models.Features are used to improve model accuracy.The optimization process determines which features produce the best prediction of groundwater levels.

Evaluation of the Performance of Predictive Models in Predicting GWL
Figure 6a shows the NSE and PBIAS models for predicting GWLs.Heatmaps of NSE and PBIAS of different models were used to evaluate the performance of the models.The CLM improved the training NSE of the CNL, LSM, CNM, LSTM, CNN, and MLR models by 6.12%, 9.12%, 12%, 18%, 22%, and 30%, respectively.Testing NSE models of LSM and MLR were 0.94 and 0.68, respectively.The testing PBIASs of CLM, CNL, LSM, CNM, LSTM, CNN, and MLR were 7, 11, 14, 17, 21, 23, and 25, respectively.The study results showed that CLM improved the MLR model's PBIAS.Figure 6b shows scatterplots of different models.The R2 values of the CLM, CNL, LSM, CNM, LSTM, CNN, and MLR were 0.9973, 0.9873, 0.9685, 0.9565, 0.9490, 0.9403, and 0.9351, respectively.It was observed that hybrid models improved LSTM, CNN, and MLR NSE.The CNM model improved the testing NSE values of CNN, LSTM, and MLR models by 4.08%, 14%, and 17%, respectively.The LSTM model enhanced the testing and training NSE of the CNN models by 5.2% and 7.6%, respectively.The CNN model improved the training and testing PBIAS values of the MLR model by 8.3% and 8.0%, respectively.Taylor diagram is a graphical tool that assesses the accuracy of a data set or model compared to a reference data set.Taylor diagrams show each dataset or model as a point on a polar coordinate system.The distance between a point and the reference dataset is used to measure the standard deviation.The azimuth angle measures the correlation between the dataset/model and the reference point.The contour line shows different values of the centralized root mean square error (CRMSE).Figure 6c displays a Taylor diagram, which compares the accuracy of models.The CRMSEs of CLM, CNL, LSM, CNM, LSTM, CNN, and MLR were 0.19, 0.37, 0.52, 0.72, 1.07, 1.42, and 1.77, respectively.The correlation coefficients of CLM, CNL, LSM, CNM, LSTM, CNN, and MLR were 0.99, 0.98, 0.97, 0.95, 0.93, 0.91, and 0.90, respectively.
Figure 7 displays the uncertainty bounds of models based on two scenarios.The PICWs of the CLM model were 0.99 and 0.98 based on input and parameter uncertainties, respectively.The PICPs of the CNL, LSM, and CNM models were 0.97, 0.95, and 0.92 based on input uncertainty (IU).The PICPs of the CNL, LSM, and CNM models were 0.96, 0.92, and 0.91 based on parameter uncertainty (PU).Based on the IU, the WIs of the CLM, CNL, LSM, and CNM models were 0.03, 0.04, 0.07, and 0.12, respectively.Based on the PU, the WIs of the CLM, CNL, LSM, and CNM models were 0.05, 0.06, 0.09, and 0.14, respectively.The models provided a high uncertainty based on the input uncertainty.The CLM and MLR exhibited the largest and lowest uncertainty.The arrows indicate that the data points do not fall within the upper and lower bounds.The combination of CNN and LSTM with MLR enhances the model's capacity to extract both local and global features from the data.By integrating two deep learning models (CNN and LSTM) with the MLR model, the CLM method overcomes the limitations of the traditional MLR model.The CLM model provides narrower width intervals (WIs) for both input uncertainty (IU) and parameter uncertainty (PU) compared to the other models, indicating a higher level of confidence in its predictions.The CLM model leverages the advantages of individual models.CNN identifies spatial patterns and features, LSTM models temporal dependencies, and MLR captures linear relationships.By combining these components, CLM can represent both linear and nonlinear patterns.In summary, the CNN-LSTM-MLR model enhances prediction accuracy by effectively capturing spatiotemporal features, using optimal inputs, addressing output uncertainty, and outperforming other models.The Gannet optimization algorithm (GOA) can help identify the most relevant input features.The GOA method reduces data complexity and potentially reduces data acquisition costs, making the model more practical for real-world applications.
The Wilcoxon signed-rank test can be employed to compare two machine learning models.The Wilcoxon signed-rank test computes a p-value to determine statistical significance.Table 4 shows the results of the Wilcoxon signed-rank test.The study results indicated that the CLM model outperformed other models as p-values were lower than 0.05 (threshold value).H0 was rejected as the p-values were lower than 0.05.

Evaluation of the Accuracy of Models for Different Horizon Predictions
The CLM model was used for predicting the one-month ahead GWL.However, it is necessary to assess the performance of the new model over different time horizons.Figure 8a shows the REL index values for different time horizons.The CLM model was used for one-, three-, and six-month-ahead GWL predictions.The CLM model provided REL index values of 0.85-0.99,0.76-0.96,and 0.69-0.96for one-, three-, and six-month-ahead values, respectively.The CLM model provided RES index values of 0.87-0.96,0.82-0.90, and 0.65-0.89for one, three, and six months ahead.The results revealed that the accuracy of a predictive model decreased as the prediction horizon increased.The system can become more complex as the prediction horizon increases.Long-term trends and patterns can change over time because of various factors, such as climate change or human intervention.Therefore, all of these factors can affect model accuracy.

The Contribution of Input and Parameter Uncertainty to the Output Uncertainty
The ANOVA method was utilized to determine the impact of inputs and model parameters on total uncertainty.Figure 8b illustrates the relative contribution of inputs and model parameters to total uncertainty.The CLM model had 46% parameter uncertainty, 34% interaction uncertainty, and 20% input uncertainty.MLR output uncertainty consisted of 60% parameter uncertainty, 20% parameter uncertainty, and 20% interaction uncertainty.The total uncertainty of the models was significantly influenced by parameter uncertainty.

Memory Usage Percentage of Models
The previous sections reported that the CLM model was more accurate than other models.However, it is necessary to evaluate the computational cost of the models.Figure 9 displays the percentage of memory usage of different models.The percentage of memory usage of the CNL, CNL, LSM, CNM, LSTM, CNN, and MLR was 42.9%, 39%, 37%, 35%, 30%, 27, 25%, and 20%, respectively.Although the CLM model requires more memory than other models, we can utilize advanced systems and memories to run it for predicting outputs.

Trend Defection
Figure 10 shows the S values of models.All months have negative trends.Summer months had lower S values than other months.June, July, and August had S values of −0.04-(−0.12),−0.03-(−0.12),and −0.02-(−0.11),respectively.January Month had the lowest S values.S values of January month varied from −0.02 to (−0.04).Winter months had higher   The groundwater level showed a decreasing trend.However, different techniques can be used for groundwater recovery.Percolation ponds or basins can be constructed to allow surface water to infiltrate into the ground.Advanced monitoring technologies such as groundwater level sensors and remote sensing can be used to better understand groundwater dynamics.
Gathering data is one of the limitations of the study.However, different climate variables may not be available.Also, advanced computer systems are required to run models.A modeler may not have access to advanced systems.Also, adjusting model and algorithm parameters can be challenging as these parameters influence output accuracy.Although the suggested models can predict GWLs accurately, they may provide high uncertainty because of many parameters.

Conclusions
Groundwater plays a key role in meeting irrigation and drinking water needs.Groundwater level prediction can be used for better water resource planning.Our study introduced a new model to predict one-month-ahead GWL.The CNN-LSTM-MLR (CLM) was developed to enhance the efficiency of the MLR model for predicting GWL.The CNN and LSTM models help the MLR model extract fetuses and predict GWL accurately.Different optimizers were used to determine the best input scenarios.GOA outperformed other algorithms and successfully chose the inputs for predicting the GWL.The U95 values of GOA, BA, PSO, and GA were 3.45, 5.75, 8.25, and 9.12, respectively.Also, the ASS of the CLM was lower than that of the other algorithms.The study results indicated that CLM improved the training NSE of the CNL, LSTM, CNM, LSTM, CNN, and MLR models by 6.12%, 9.12%, 12%, 18%, 22%, and 30%, respectively.The study results indicated that CLM improved the training NSE of the CNL, LSTM, CNM, LSTM, CNN, and MLR models by 6.12%, 9.12%, 12%, 18%, 22%, and 30%, respectively.The uncertainty analysis indicated that the CLM model was the most reliable model for predicting GWL.The input parameters had less uncertainty than the model parameters.The total uncertainty of the CLM model consisted of 46% parameter uncertainty, 34% interaction uncertainty, and 20% input uncertainty.Long prediction horizons also decreased the accuracy of the CLM model.CLM can be used to predict GWL under different water stress conditions in the next studies.Also, the climate scenarios and models can be combined with the CLM model to predict GWL for feature periods.The next studies can also utilize the GOA algorithm as a robust feature selection method to determine optimal input scenarios and enhance the performance of deep learning models.

4 r
and r5: random numbers; m1: a random number between -a and a; n1: a random number between -b and b; where υ : constant value ( υ = 0.20), best location of gannet, and( )Levy Dim : levy flight function.
jth solution in the mth position.

Figure 1 .
Figure 1.Values of decision variables and constraints.If the current value of xi is below the lower bound, Equation (35) uses the maximum value between xi and the lower bound to correct it.If the current value of xi exceeds the upper bound, Equation (35) uses the minimum between xi and the upper bound to correct it.We benchmark the CLM model against the LSTM-MLR (LSM), CNN-MLR (CNM), CNN-LSTM (CNL), LSTM, CNN, and MLR models.In order to create the LSM, LTC, and CNM models, the input data are fed into the first model.Then, the outputs of the first model are fed into the second model.
on ith input; 0, j GWL : predicted GWL based on jth parameter; , o o GWL : the mean monthly GWL; N: the number of input parameters; M: the number of model parameters; ( ) _ SS IN PA : the SSE based on the interaction between input parameter and model parameter; ( ) SS IN : the SSE based on the input parameter; and ( ) _ SS IN PA : the SSE based on the model parameter.

) where 2 INη
: the percentage contribution of input parameter to the output uncertainty; 2 PA η : the percentage contribution of model parameter to the output uncertainty; and 2 _ IN PA η :
ob GW : observed data; N: the number of data; and δ : a constant value ( δ based on the Chinese standard = 0.20).Equations (55)-(57) are applied to quantify uncertainty values.
ith solution in the jth dimension, ( ) j medain s : the median value of the jth dimension, D: the number of dimensions, and

Figure 3 .Table 3 .
Figure 3. Determination of the population size and the maximum number of iterations of algorithms.Table 3. (a): Determination of optimal values of BA parameters, (b): determination of optimal values of PSO parameters (c): U95 and ASS values of different algorithms, and (d): optimal values of model parameters.

Figure 4 .
Figure 4. Determination of number of folds for the K-fold cross-validation (KCV) method.

Figure 5 .
Figure 5.The average exploration and exploitation percentages of different algorithms.

Figure 6 .
Figure 6.(a): Assessment of the efficiency of models based on NSE and PBIAS values, (b): scatterplots of models, and (c): Taylor diagram for assessment of the precision of models.

Figure 7 .
Figure 7. Uncertainty bound of models for predicting GWL.

Figure 8 .
Figure 8. (a): The evaluation of the accuracy of models for predicting different horizons, and (b): the percentage contribution of uncertainty resources to output uncertainty.

Figure 9 .
Figure 9. Memory usage of models.

Figure 10 .
Figure 10.Trend of GWL in the basin.

Table 1 .
Summary of studies employing multiple linear regression for Groundwater Level Prediction.

Table 2 .
characteristics of monthly data.

Table 4 .
Results of the Wilcoxon signed-rank test.
values than other months.The SCr shows critical S values.Climate and rainfall patterns can improve the GWL in the winter months. S