A Comprehensive Review of Conventional, Machine Leaning, and Deep Learning Models for Groundwater Level (GWL) Forecasting

: Groundwater level (GWL) refers to the depth of the water table or the level of water below the Earth’s surface in underground formations. It is an important factor in managing and sustaining the groundwater resources that are used for drinking water, irrigation, and other purposes. Ground-water level prediction is a critical aspect of water resource management and requires accurate and efﬁcient modelling techniques. This study reviews the most commonly used conventional numerical, machine learning, and deep learning models for predicting GWL. Signiﬁcant advancements have been made in terms of prediction efﬁciency over the last two decades. However, while researchers have primarily focused on predicting monthly, weekly, daily, and hourly GWL, water managers and strategists require multi-year GWL simulations to take effective steps towards ensuring the sustainable supply of groundwater. In this paper, we consider a collection of state-of-the-art theories to develop and design a novel methodology and improve modelling efﬁciency in this ﬁeld of evaluation. We examined 109 research articles published from 2008 to 2022 that investigated different modelling techniques. Finally, we concluded that machine learning and deep learning approaches are efﬁcient for modelling GWL. Moreover, we provide possible future research directions and recommendations to enhance the accuracy of GWL prediction models and improve relevant understanding.


Introduction
Groundwater level (GWL) assessment is crucial to maintain groundwater resources, as one-third of the world's water requirements are met through this resource [1].It is used for domestic water supply and meets irrigation needs and industrial requirements in some parts of the world.Excessive and unplanned extraction leads to the depletion of this important resource and results in a severe issue globally, particularly in surface-watershortage countries.So, in this regard, researchers have developed different models and techniques to simulate GWL.Modeling groundwater ranges from conceptual to numerical methods and artificial intelligence (AI) models.In numerical techniques, MODFLOW was extensively used until the previous decade to simulate GWL.However, its prediction accuracy was mainly dependent on the availability of extensive hydrogeological data and the physical characteristics of the aquifer [2].To minimize the shortcomings of numerical methods, researchers have extensively employed artificial intelligence (AI) models over the last decade [3].AI models do not require the physical properties of the aquifers in the GWL simulation, making them appealing to use.AI models include the most superficial artificial neural networks (ANNs), often called multilayer perceptrons (MLPs), with two or more hidden layers.ANNs having one hidden layer known as feed-forward neural networks (FFNNs) have been the most used model in the early days of AI-based research in hydrological studies [4].Since the GWL time-series data are quite nonlinear and nonstationary, the capability of ANNs is confined to a limited set of variables.Therefore, the adaptive neuro-fuzzy inference system (ANFIS) was developed to analyze complex systems using a backpropagation algorithm and fuzzy logic [5].It has been reported that AI (machine learning) models used to simulate GWL have shown better results than the traditional physical and numerical models because the latter needs comprehensive details of the physical properties associated with the aquifers to make a prediction [6].
However, classical machine learning models cannot learn long-term dependencies because they do not have the architecture to maintain prior information to make future predictions.To resolve this problem, researchers investigated recurrent neural networks, including RNS, GRUs, and LSTMs, and wavelet transform pre-processing data analysis to study the temporal dependencies between the multiscale input variables very well [6].However, the prediction efficiency of the monthly, weekly, daily, and hourly basis simulations improved significantly.However, less improvement in prediction accuracy and work in the literature has been reported on yearly GWL simulation despite knowing that water management requires multi-year assessments to formulate long-term strategies to keep the balance between the supply and demand of the groundwater.Shahid et al. [7] proposed advanced studies for water treatment technologies and removing emerging contaminants.The water we consume in homes, commercial settings, or industry goes underground and damages the pure underground water.Wastewater treatment is also playing an essential role in water purification.A novel technology called "Reverse osmosis technology" is widely used on a massive scale for groundwater treatment [8].Another experimental study on CO2 utilization in water treatment systems is based on the membrane for reducing the capability of ionic precipitation on the membrane surface and successive level expansion [9].
The comparison discussed in this review aims to evaluate the performance [10] of different machine learning (ML) [11][12][13][14] and deep learning models in predicting groundwater level (GWL) [15][16][17][18].The groundwater level is an essential indicator of the availability of freshwater resources and is closely related to various hydrological and ecological processes.Therefore, accurate groundwater level prediction is crucial for sustainable water management and resource allocation.Machine learning is a branch of artificial intelligence that focuses on developing algorithms to learn patterns from data and make predictions based on that knowledge.There are various types of machine learning models, including decision trees [19][20][21], random forests [22][23][24][25], support vector machines (SVM) [26], and artificial neural networks (ANN).On the other hand, deep learning is a subset of machine learning that focuses on developing artificial neural networks with multiple hidden layers.These deep neural networks can learn complex patterns and relationships in data, making them particularly useful for tasks such as image recognition, natural language processing, and prediction modeling.
Consequently, different machine learning and deep learning models are applied to predict groundwater levels and their performance is compared.The comparison is based on various evaluation metrics, such as accuracy, precision, recall [27,28], and mean absolute error (MAE), R 2 [29].The comparison results provide insight into the strengths and weaknesses of different models and can help researchers and practitioners choose the most appropriate model for their specific application.Overall, comparing groundwater level prediction modeling using different machine learning and deep learning models provides valuable information for researchers and practitioners working in hydrology and water resources management.
In this paper, a collection of new theories for developing and designing a novel methodology and improving modeling efficiency are also considered in the appropriate field of evaluation.They examine modeling techniques used in all the reviewed studies; it was estimated that the machine learning and deep learning approaches are efficient enough for modeling GWL.The primary purpose of this paper is to focus on the following research question: how is GWL predicted?The recent research refers to the different stages of groundwater level prediction.In every step, the methods discussed in the reviewed studies are analyzed and compared based on their benefits and drawbacks.A new model is proposed in this study to simulate yearly GWL using wavelet Bidirectional-LSTM (W-Bi-LSTM).
The structure of the paper is as follows: Section 2 goes over the methodology of the research, Section 3 presents the groundwater and surface water data sources and availability.Section 4 illustrates the conventional, ML-, and deep-learning-based groundwater level prediction techniques.Section 5 briefly discusses the performance evaluation of different models and Section 6 represents the future research direction and discussion.Finally, Section 7 ends the paper with a conclusion.

Methodology of the Research
In the first stage of this research, a comprehensive review of GWL forecasting has been explored and analyzed.A few major scientific research databases, Web of Science, Scopus, etc., were decided to organize the research.The papers with the word "survey" or "review" in the keyword or abstract are reviewed.The majority of the papers were examined on GWL and selected to cite.The only available research papers on GWL prediction were studied and were chosen for our research.Once these research papers are analyzed, numerous studies are published every year.Osman et al. [30] surveyed 78 articles, and Tao et al. [31] surveyed 318 articles.As far as we know, no comprehensive study on GWL prediction is available using deep learning.These two review articles were published this year and are growing in popularity as more new research is published.
Data processing and the separation of training and testing are not included in the analysis.As the global climate continues to change, recent studies of the GWL model have used new kinds of data and applied different methods.For this reason, it is essential to consider the latest algorithms and methods, including deep learning algorithms and hybrid algorithms, along with the proprietary processing methods applied.After the analysis of available databases was concluded, the search equation was identified as the latest and very significant equation for GWL prediction.We explore the available databases to update the latest trends and analyses.
After examining the online searched databases, more than 731 papers suit the search strategy.A total of 182 of these papers were rejected, and 549 other papers were excluded from this review because their main objective was not GWL prediction.After studying the most relevant papers, the analysis was then conducted.Figure 1 shows the arithmetic conceptualization of GWL research using an AI-based model during 2008-2022.Several papers were chosen based on specific measures.
The main objectives of this research are 1.
To discuss the conventional methodology for GWL. 2.
To explore the current GWL methodologies.
We used different searching keywords to find relevant studies: Set 1: "GWL" [32], "Ground-Water-level" [33]; and "Groundwater Level prediction" [34]; set 2: "Prediction", "forecasting", "Deep Learning", "analysis", "estimation".We used the keyword AND between set1 and set2, and the OR operator was used between keywords in a set.Figure 2 illustrates the relevant and irrelevant papers selection process.Once read through the database, 731 papers met the search criteria.One hundred and eighty-two (182) duplicates were excluded from this analysis.After reviewing the titles and journals, 440 were excluded from the review because they did not go through the GWL criteria.After a thorough reading of these articles, 109 articles were finally analyzed.The main objectives of this research are 1.To discuss the conventional methodology for GWL. 2. To explore the current GWL methodologies.
We used different searching keywords to find relevant studies: Set 1: "GWL" [32], "Ground-Water-level" [33]; and "Groundwater Level prediction" [34]; set 2: "Prediction", "forecasting", "Deep Learning", "analysis", "estimation".We used the keyword AND between set1 and set2, and the OR operator was used between keywords in a set.Figure 2 illustrates the relevant and irrelevant papers selection process.Once read through the database, 731 papers met the search criteria.One hundred and eighty-two (182) duplicates were excluded from this analysis.After reviewing the titles and journals, 440 were excluded from the review because they did not go through the GWL criteria.After a thorough reading of these articles, 109 articles were finally analyzed.
All relevant papers were selected where the GWL, the data collection time, and the research project variables were tested.Many studies have used technological and water variables to model GWL.However, some studies have considered other factors to measure GWL, such as tree rings diameter, climatic conditions, area, change in population, duration, elevation, land use data, paved area, and so on.The research results are classified and analyzed in the next section based on the variables used for GWL modeling.Groundwater is a primary source of water for living things around the globe.Large urban areas generate enormous demands for water and food.India, Iran, and China are the main countries in the GWL study.Of the 20 countries surveyed, about half of the studies took place in India, Iran, and China.Other studies are centered on data collected from Azerbaijan, Greece, Bangladesh, Taiwan, Serbia, Slovenia, South Korea, the USA, and Canada.

Groundwater and Surface Water Data Sources and Availability
The modeling process in the groundwater-surface water (GW-SW) system is essential in understanding the interactions between these two water sources and how they impact each other.This process requires adjusting the hyperparameters of the system to ensure the simulations produced are reliable and accurate.However, data availability can sometimes pose a challenge in modeling, particularly in small areas or basins where data may be limited.Despite this, the use of GW and SW models has increased significantly in recent years due to the availability of a growing number of regional and global datasets.All relevant papers were selected where the GWL, the data collection time, and the research project variables were tested.Many studies have used technological and water variables to model GWL.However, some studies have considered other factors to measure GWL, such as tree rings diameter, climatic conditions, area, change in population, duration, elevation, land use data, paved area, and so on.The research results are classified and analyzed in the next section based on the variables used for GWL modeling.Groundwater is a primary source of water for living things around the globe.Large urban areas generate enormous demands for water and food.India, Iran, and China are the main countries in the GWL study.Of the 20 countries surveyed, about half of the studies took place in India, Iran, and China.Other studies are centered on data collected from Azerbaijan, Greece, Bangladesh, Taiwan, Serbia, Slovenia, South Korea, the USA, and Canada.

Groundwater and Surface Water Data Sources and Availability
The modeling process in the groundwater-surface water (GW-SW) system is essential in understanding the interactions between these two water sources and how they impact each other.This process requires adjusting the hyperparameters of the system to ensure the simulations produced are reliable and accurate.However, data availability can sometimes pose a challenge in modeling, particularly in small areas or basins where data may be limited.Despite this, the use of GW and SW models has increased significantly in recent years due to the availability of a growing number of regional and global datasets.Global model products and open data, which contain a large amount of environmental information, have become easily accessible and, combined with the advancement of remote sensing data, provide a strong foundation for developing some water models.One of the advantages of these models is the ability to obtain critical structural aspects such as watershed boundaries, surface flow direction, and slope.This information can be taken from managing products of a digital elevation model (DEM) with the help of GIS spatial analysis.A MERIT DEM is a popular product in this field, it is a worldwide map with a resolution of approximately 90 m.The development took place using current spatial DEMs.Numerous error components, such as stripe noise, speckle noise, tree height bias, and absolute bias, have been removed to provide an unbiased representation of terrain elevation [35].
In conclusion, the modelling process in the GW-SW system is essential for understanding the interactions between these two water sources.Although data availability can sometimes pose a challenge, the use of GW and SW prototypes has dramatically increased in recent years due to the availability of open data and global model products, which provide a solid foundation for building water models.The processing of DEM products allows for the easy extraction of critical morphological features, such as surface flow direction, watershed boundaries, and slope, making it an indispensable tool in this field.
In any situation, a DEM with a finer resolution designed for a specific region or nation can also be obtained from light detection and ranging (LiDAR) products or by spatially interpolating point elevations.Soil properties of spatial division, such as texture (proportion of clay, sand, and silt), organic matter, porosity, bulk density, and hydraulic conductivity, can greatly affect the modeling results, particularly in the surface water (SW) component.These properties play a crucial role in determining soil quality and infiltration capacity [36].The coherent world soil catalog provides a global distribution of soil characteristics [37].Additionally, the World Soil Information Service (WOSIS) offers access to over 196,000 soil columns [38].The given dataset contains information about soil that is standardized and ideal for mapping soil and the Earth's system modeling.Hydrological modeling can be affected by the lack of climate data, so many databases have been created to offer first-class meteorological data.One of these databases is the Climate Forecast System Reanalysis (CFSR) [39], which extends global meteorological information for 36 years at a resolution of less than 1 degree, allowing for detailed historical data analysis.Furthermore, the CORDEX program under the World Climate Research Program (www.euro-cordex.net)provides a platform for the compilation of comprehensive climate data at the continental level, both for historical and future predictions.These data are commonly utilized in water modeling [40,41].Obtaining information about subsurface elements, such as hydraulic conductivity and porosity, typically requires permeability tests, which can be both expensive and time-consuming.The other solution is the version 2.0 of Global Hydrogeology Maps (GLHYMPS) of porosity and permeability [42].The "Copernicus Land Monitoring System" provides information about the spatial distribution and changes in land cover on a continental scale through its Corine product for Land Cover (CLC), covering the period from 1990 to 2018.In order to obtain accurate results, a multi-constraint measurement is frequently necessary.The use of appropriate data and their availability for model validation and measurement is still the crucial issue that determines the effectiveness of the model.To validate and calibrate surface models, various findings have assessed the effectiveness of the moderate resolution imaging spectroradiometer (MODIS) product with encouraging outcomes.Data related to soil water content, snow cover, Normalized Vegetation Index (NDVI), and evapotranspiration can be obtained using the AppEEARS interface [43].
In addition to these open datasets, a number of modeling products have become available over the past decade.Two of the global hydrological models that have been developed are PCR-GLOBWB v2.0 [44] and WaterGAP v2.2d [45], which aim to quantify human use of surface water and groundwater, along with storage, water flows, and resources on a global level.They also provide the capability to output the post-process, such as groundwater spatiotemporal recharge and volume of river flow.However, their main limitation is the low spatial resolution.It is important to note that while these globally available datasets can be useful, it is critical to be cautious when using them as they may contain errors and inconsistencies that can result in inaccuracies in simulations.Estimating the share of groundwater through a simulation of flood hydrographs using two different time-based rainfall distributions is presented in [46].Table 1 shows the various datasets available for use in modeling parameters and their prediction possibilities.Table 2 shows the various datasets available for use in modeling parameters and their corresponding links.

Groundwater Level Prediction Techniques
Forecasting is achieved using the latest and past collected data to forecast the future.This review is focusing on the evaluation of the GWL as a regression problem, and researchers investigated different types: the SVM, ANN, DT, ANFIS, GP, hybrid, and genetic models.A novel type (O) was created by introducing new algorithms that do not fit any of the former categories.ANN [47] methods are the most frequently used technique in GWL forecasting, and the number of ANN-based studies increases every year.Figure 3 shows the groundwater prediction process.

Stream flow
Variation measurement Soil Soil texture Morphology Digital elevation model (DEM)

Groundwater Level Prediction Techniques
Forecasting is achieved using the latest and past collected data to forecast the future.This review is focusing on the evaluation of the GWL as a regression problem, and researchers investigated different types: the SVM, ANN, DT, ANFIS, GP, hybrid, and genetic models.A novel type (O) was created by introducing new algorithms that do not fit any of the former categories.ANN [47] methods are the most frequently used technique in GWL forecasting, and the number of ANN-based studies increases every year.Figure 3 shows the groundwater prediction process.

Physically Based Numerical Method-MODFLOW
Physically based numerical models remain the best methods to study the characteristics of groundwater.This is because they require comprehensive details of the physical properties of aquifer.Among the different physically based numerical models, MOD-FLOW is the most used model in the literature; it models groundwater movement in three dimensions using finite differences.Until the last decade, MODFLOW was used extensively, especially when sufficient data are not available.Depending upon the problem, several approaches are designed for MODFLOW, i.e., the head-oriented approach (HOA) is used to determine the three-dimensional flow of groundwater, the velocity-oriented approach (VOA) comes in handy when computing the velocity of flowing groundwater [48].However, certain steps are needed to formulate such a model, i.e., grid design, boundary setting, time steps, and hydrologic and aquifer characteristic variables selection.Shukla and Singh [49] calibrated MODFLOW in Uttar Pradesh, India to simulate groundwater levels.Data mostly comprising of water levels collected between 2005 and 2013 were used in the study.In addition, the impact of pumping and recharge rate on the groundwater levels was also studied, and it aimed to predict the groundwater levels for five years ahead.The results showed a declining trend in groundwater levels in the region.

Machine Learning-Artificial Neural Networks (ANN)
ANN is computational representation of a mathematical model inspired by the human brain's biological network.Simple elements called neurons, operating in parallel, constitute ANN [50].ANNs are used to calculate unknown functions or to make future predictions of the given time series based on historical data.The most basic ANN is a three-layer structure, with input, hidden, and output layers [51].The structural representation of classical FFNN into the network and the desired outcome is computed by the output layer.The hidden layer nodes which are situated between the input and output layers receive a set of scaled inputs and calculate an output after applying a certain learning (activation) function [52].
A sample dataset Is used to train the ANN model.Training is a process of fine-tuning the network's adjustable parameters (known as weights and biases) to optimize the output of the algorithm."The Levenberg-Marquardt (LM) algorithm, the backpropagation (BP) algorithm, the Bayesian regularization (BR) algorithm, and the gradient descent with momentum and adaptive learning rate back-propagation (GDX) algorithm" are some learning algorithms that have been employed to train models in the literature.Feedforward neural networks (FFNNs), usually known as multilayer perceptrons (MLPs), are a popular and robust type of ANN that has been widely studied in hydrological studies [53].Figure 4 shows the different kinds of data used for prediction the GWL.

Adaptive Neuro-Fuzzy Inference System (ANFIS)
This is a hybrid technique that aims to utilize the advantage of a fuzzy inference system (FIS) with an adaptable neural network (AN).FIS is based on fuzzy logic and is good at capturing uncertainties and noise in data.Jang [59] pioneered the use of fuzzy if-then rules with right membership functions (MPs) to construct input-output pairs and a neural network learning algorithm.The fuzzy inference aystem is further classified into two approaches, namely Mamdani and Sugeno.Linear MFs are used by the Sugeno approach while Mamdani uses fuzzy MFs.ANFIS consists of five layers.The structural representa- ANNs have been widely used in hydrology, hydraulics, rainfall-runoff estimation, groundwater level, and quality forecasting [54][55][56].According to recent GWL modeling studies, it has been reported that ANN simulations have shown promising results compared to conceptual techniques.In one of the first studies, Lallahem et al. [4] used ANNs to simulate monthly groundwater (GWL) for an aquifer.Inputs included evapotranspiration, averaged temperature, precipitation, rainfall, and GWL at the previous lag of 13 piezometers and the primary objective was to anticipate GWL for a specific piezometer in northern France.The advantage of the multi-layer perceptron MLP was proven by simulation results.Krishna et al. [57] compared several types of FFNNs to simulate the monthly GWL in Andhra Pradesh urban aquifer, India.Results revealed the merit of an ANN trained with the LM algorithm as compared to BP and BR algorithms.Moreover, in the experiment, the best-performing network model parameters were used to predict the GWL in nearby wells.
Sreekanth et al. [5] developed ANFIS and FFNN with an LM algorithm to estimate GWL for India's Maheshwaram watershed.Monthly groundwater (GWL) of 22 wells, rainfall, temperature, evaporation, and relative humidity are among the input variables.FNN outperformed ANFIS in terms of accuracy when results were compared.Kouziokas et al. [58] compared multiple FFNN networks and learning methods to simulate the daily groundwater (GWL) in a well.The study area is located in Montgomery County, Pennsylvania, USA.The best model was found to be FFNN trained using the LM learning algorithm with the humidity, precipitation, and temperature as inputs.

Adaptive Neuro-Fuzzy Inference System (ANFIS)
This is a hybrid technique that aims to utilize the advantage of a fuzzy inference system (FIS) with an adaptable neural network (AN).FIS is based on fuzzy logic and is good at capturing uncertainties and noise in data.Jang [59] pioneered the use of fuzzy if-then rules with right membership functions (MPs) to construct input-output pairs and a neural network learning algorithm.The fuzzy inference aystem is further classified into two approaches, namely Mamdani and Sugeno.Linear MFs are used by the Sugeno approach while Mamdani uses fuzzy MFs.ANFIS consists of five layers.The structural representation of ANFIS is similar to the ANN model, except it has two input parameters, linear and non-linear, which makes it difficult to train.Input parameters are optimized simultaneously in the training process.
Zhang et al. [60] applied three different algorithms for GWL prediction, namely, radial basis function neural network (RBFNN), ANFIS, and the grey self-memory (GSM) method.Evaluation reveals the superiority of ANFIS over the other applied algorithms based on the performance metrics result (i.e., NSE, RMSE, R 2 , and MARE).Bak and Bae [61] trained the ANFIS algorithm with precipitation (P) and mean temperature (T mean ) to predict GWL and reported the performance metrics RMSE as 0.1381 and MAPE as 37.869%.
Gong et al. [62] investigated the prediction accuracy of ANFIS, FNN, and SVM for monthly GWL simulation and concludes the superiority of ANFIS over other algorithms.Previous GWL, lake level, precipitation (P), and Tmean were used as input variables.Khaki et al. [63] investigated the performance of ANFIS, FFNN, and the cascade forward network (CFN) model to simulate monthly GWL at Langat Basin in Selangor state's southeastern part.R and MSE were used as performance metrics.The ANFIS model outperformed FFNN and CFN with R = 0.94 and MSE = 0.005.Emamgholizadeh et al. [64] analyzed the differences in the monthly GWL prediction of ANN and ANFIS in Bastam plain, Iran.The following input variables were used in the study: pumping rate, rainfall recharge, and irrigation returned flow.ANFIS performed significantly better than ANN and it was also found that high accuracy can be achieved by applying different structures.Sometimes, hydrological time series data can be highly non-stationary which makes it hard for models, such as ANN and ANFIS, to better understand the underlying seasonality and thus leads to inaccurate predictions.In this situation, some researchers, such as Hsu and Li [65] and Loboda et al. [66], applied the wavelet data decomposition technique to first preprocess the input data.Wavelet transform can decompose data at various resolution levels to obtain useful information and give insights about trends and irregularities in the data.Therefore, it has several applications in hydrological studies because of the non-stationary nature of the data.
The performance of regular ANNs, ANFISs, and both coupled with the wavelet technique, i.e., WANN and WANFIS, was examined by Moosavi et al. [67].They conducted a study to simulate monthly GWL for two subbasins in Mashad, Iran.Precipitation (P), evaporation (E), temperature (T), and previous GWL were the input variables.ANN and ANFIS failed to cope with the noise in the data while the ones coupled with wavelet performed considerably better.However, the authors reported that wavelet transform does contribute more to the efficiency of ANFIS than ANN.Another study was performed by Ebrahimi and Rajaee [68] to analyze the impact of the wavelet pre-processing technique.They developed wavelet-ANN, multi-linear regression (wavelet-MLR), and support vector machine (wavelet-SVM) up to two decomposition levels, and their regular counterparts.GWL at previous lag was used as the only input variable to simulate GWL with a onemonth lead.The results showed that data decomposition translates into the high prediction accuracy of the models.Nevertheless, wavelet-ANN is reported as the best model.Machine learning models using prior wavelet data decomposition are good at yielding underlying trends and patterns at various levels in non-linear and non-stationary input data.Figure 5 shows the basic architecture of ANFIS model.such as Hsu and Li [65] and Loboda et al. [66], applied the wavelet data decomposition technique to first pre-process the input data.Wavelet transform can decompose data at various resolution levels to obtain useful information and give insights about trends and irregularities in the data.Therefore, it has several applications in hydrological studies because of the non-stationary nature of the data.
The performance of regular ANNs, ANFISs, and both coupled with the wavelet technique, i.e., WANN and WANFIS, was examined by Moosavi et al. [67].They conducted a study to simulate monthly GWL for two subbasins in Mashad, Iran Precipitation (P), evaporation (E), temperature (T), and previous GWL were the input variables.ANN and ANFIS failed to cope with the noise in the data while the ones coupled with wavelet performed considerably better.However, the authors reported that wavelet transform does contribute more to the efficiency of ANFIS than ANN.Another study was performed by Ebrahimi and Rajaee [68] to analyze the impact of the wavelet preprocessing technique.They developed wavelet-ANN, multi-linear regression (wavelet-MLR), and support vector machine (wavelet-SVM) up to two decomposition levels, and their regular counterparts.GWL at previous lag was used as the only input variable to simulate GWL with a one-month lead.The results showed that data decomposition translates into the high prediction accuracy of the models.Nevertheless, wavelet-ANN is reported as the best model.Machine learning models using prior wavelet data decomposition are good at yielding underlying trends and patterns at various levels in non-linear and non-stationary input data.Figure 5 shows the basic architecture of ANFIS model.

Genetic Programming (GP)
A general genetic algorithm (GA) was developed called genetic programming (GP) [69].Darwinian theories of evolution are used for genetic programming and ecological choice as the GA.The author in [70] developed a GP-based model to predict the GWL changes and calculate the vagueness in the forecasting.The paper used Indian monthly rainfall data to predict the GWL.The GP model proposed by the author could successfully predict variations by using only hydrometeorological parameters for GWL, i.e., the model predicts without knowing the physical characteristics of the wells.GP has been mostly affected for feature selection work and optimization.Furthermore, because of its flexibility and intelligible tree structure it is more used in GW modeling.The author in [71] proposed GWL for the next day and prediction intervals of up to 7 days and applied SVM, GP, ANN, and ANFIS.All of these algorithms have prediction capabilities to predict GWL.There are several GWL combinations, including evapotranspiration and rainfall data, which are used as input to the prediction model, using data gathered from Republic of Korean, Hongcheon well station.After making a model, the autoregressive moving average (ARMA) model is used for comparison to validate the accuracy.The final conclusions proved that the ARMA methodology performed well compared to other ML methods, which is therefore the most effective with the GP model.

Deep Learning
Despite the significant performances of ANN and ANFIS in accurately predicting the GWL, these methods were confined by the vanishing and exploding gradient problem, thus hindering the capability of the machine learning models to make predictions for long-time series.A recurrent neural network (RNN) is a type of neural network that was introduced to solve the long-term dependency problem when dealing with large-scale data in the temporal domain.However, regular RNN cannot remember temporal information for long sequences, i.e., in the machine translation tasks, etc., and require large computational resources.To overcome the limitations of regular RNN, the long short-term memory (LSTM) model was proposed to keep the information for an arbitrary length.LSTM is mainly developed for continuous data-time-series data.Recently, it has been employed in various water level assessment studies.
Zhang et al. [6] proposed the LSTM model to simulate the fluctuations in water table levels using monthly water diversion, precipitation, evaporation, temperature, and previous water table level data spanning 14 years (2000-2013).The results achieved were dramatically high (R 2 score, 0.789) when compared with the R 2 scores (0.004-0.495) of the traditional feed-forward neural network (FFNN or regular ANN).To select relevant predictors, the authors used a statistical technique that contributed to the model's ability to generalize from the unseen data.The study was performed in five sub-areas of Hetao, China.GWL fluctuations data are prone to the existence of missing values because of several factors, i.e., human negligence, failure of recording equipment, etc. Gaps in data can make it difficult to grasp the hidden trends and seasonality.Therefore, this has led the missing values being reconstructed to fully interpret the data and make accurate predictions so that strategists can make plans for water resource management in the long run.Ren et al. [72] evaluated the ability of an LSTM model against a traditional gap-filling algorithm, ARIMA, to fill missing temporal observations for a 10-year-long dataset with dynamic gaps.The model was designed to reconstruct specification measurements (groundwater and river water interactions).The results revealed that LSTM is better at filling high dynamic gaps (daily, weekly, and sub-daily), while ARIMA excelled in reconstructing trends and seasonality-based gaps.In addition, the authors reported that LSTM can fill gaps for up to 2 days when spatial data from neighboring stations are used to make predictions.Table 3 presents detail research categorized by different algorithms: deep learning, GP, MODFLOW, ANFIS, and ANN.

Performance Evaluation
GWL modeling is mainly divided into two categories with regards to time, i.e., longterm, and short-term.Long-term forecasting is of great importance in various domains, for instance urban planning and water resource management, which require years of data to learn long-range dependencies.Short-term prediction is usually conducted to study variations in patterns and trends in the input variables related to the problem, for instance climatic conditions in the case of GWL.Since the target value of GWL modeling is a constant value, regression models are used in such studies.Different evaluation metrics have been used in the literature to measure the efficiency of proposed models.However, it is important to select appropriate performance metrics as it measures how well a model's predictions compare against the true values.Root mean square error (RMSE), mean absolute error (MAE), relative error (RE), and coefficient of determination (R 2 ) are the most common choices of researchers in the literature.Moreover, the peak elevation criteria (PEC), and low elevation criteria (LEC) are special performance measures to evaluate the model against critical parameters such as rainfall, groundwater, etc. in the case of GWL.However, most of the time RMSE and R 2 have been used in GWL modeling studies.Table 4 shows different performance evaluation measures used by different experts for GWL prediction.

Future Research Direction and Discussion
We recommended the wavelet Bi-LSTM (W-Bi-LSTM) approach [85] to predict the groundwater level.There are two strategies, one is wavelet data decomposition [104][105][106][107] and the second is bi-directional long short-term memory (Bi-LSTM).Satellite-based techniques [108] can be used for groundwater monitoring by measuring changes in the Earth's gravity field and surface deformation caused by water movement underground.Point-topoint satellite-based techniques, such as interferometric synthetic aperture radar (InSAR) and global navigation satellite system (GNSS), can be used to detect changes in ground elevation and surface displacement, which can be used to infer changes in the amount of groundwater.These techniques provide valuable information for managing groundwater resources and mitigating the impacts of groundwater depletion.

Wavelet-Bi-LSTM (W-Bi-LSTM)
As discussed above, wavelet transform (WT) is a data pre-processing tool to decompose time series in the time-frequency scale.WT is capable of decomposing the time series at various scales and into several sub-time series that give insights into the relationships between time-dependent features.To capture high-frequency information, short time intervals are used and, conversely, long-duration intervals analyze low-frequency information.Researchers report that wavelet-coupled ML models have often achieved higher prediction accuracy than regular ML models [87,88].WT is categorized into two types: continuous wavelet transform (CWT), and discrete wavelet transform (DWT).CWT is time-consuming and computationally expensive; therefore, DWT is mostly preferred in hydrological problems, particularly in groundwater level simulation.The mathematical equation of a discrete wavelet can be represented as [89].
In Equation ( 1), i and j represent the integral values, and a 0 , and b 0 are the location parameter with specified fined dilation steps and the most common values are 1 and 2, respectively.For details refer to (Cohen and Kovacevic) [90].

Wavelet-Bi-LSTM (W-Bi-LSTM)
Unlike the conventional LSTM [52], Bi-LSTM has a simultaneous two-way flow of prior information to better understand the contextual dependencies between the variables using forward hidden layers and backward hidden layers [91].Bi-directional LSTM manages the flux of the input and output variables using several gates called memory cells, while classical recurrent neural networks (RNNs) use hidden layer nodes with nonlinear activation functions [92].Figure 6 shows the graphical representation of Bi-LSTM.hf and hb are two memory cells in the Bi-LSTM network which manage the forward and backward computed values.This study sheds light on a review of the most used conventional numerical and machine learning (ML) and deep learning models for groundwater levels (GWL) simulation.Significant advancements have been made in terms of prediction efficiency over the last 2 decades.In addition, most of the time researchers' focus has remained on predicting GWL on a monthly, weekly, daily, and hourly basis.However, the water managers and strategists need multi-year GWL simulation to take effective steps towards the sustainable supply of groundwater.In this paper, a collection of state-of-the-art theories for developing and designing a novel methodology and improving modeling efficiency are also considered in the applicable field of evaluation.Examining modeling techniques used in all the reviewed studies, it was estimated that the machine learning and deep learning approaches are efficient enough for modeling GWL.Moreover, we also provide possible future research directions and recommendations to enhance the accuracy of the groundwater level prediction models and improve the relevant understanding.

Conclusions
This survey paper provides a brief review of the most commonly used conventional, numerical, machine learning, and deep learning models for predicting groundwater levels (GWL) using different simulations or data driven models.Over the last two decades, significant improvements have been made in terms of prediction accuracy.The survey covers the period of 2008-2022 and includes papers from Scopus-and Web-of-Scienceindexed journals.While most researchers have focused on predicting monthly, weekly, daily, and hourly GWL, water experts require multi-year simulations to ensure the sustainable supply of groundwater.This paper also compiles 109 papers that presented state-of-the-art concepts and techniques for developing a novel approach and improving modeling efficiency in this field.After examining modeling techniques used in all the reviewed studies, we find that machine learning and deep learning approaches are effective for modeling GWL.Additionally, we provide recommendations and identify research gaps for improving the accuracy of groundwater level prediction models.In the above equations, hf represents the forward layer LSTM output and hb is the backward layer LSTM output.The final output value of the hidden layer is computed by combining the results of forward and backward layers [93].ot = g(wo1 * h f + wo2 * hb) In Equations ( 2)-( 4), wi is the weight coefficient matrix that is repeatedly applied at each time step.I hypothesize that using wavelet transform (WT) with Bi-LSTM (W-Bi-LSTM), groundwater levels can be simulated yearly with higher prediction efficiency.To the best of my knowledge, no such model has been proposed, given that, in the literature, most of the studies focused on monthly, weekly, and daily GWL predictions.Water managers and strategists need long-term assessments to keep a balance between the supply and demand of groundwater resources, therefore, yearly simulation of GWL is critical.W-Bi-LSTM can utilize the advantages of both wavelet transform and Bi-LSTM networks to make year ahead predictions.As discussed above, collected data (both meteorological and hydrological) are vulnerable to varying missing values and noise because of certain factors including human error and data collection sensor failure.The wavelet transform has the capability to decompose large-scale noisy data and find hidden periodic trends, which Bi-LSTM then uses to learn underlying long-term dependencies between the input variables and make predictions.Considering the data span that is required to make yearly predictions, it is important to mention that standalone LSTM might not learn the data as well as Bi-LSTM can, since the latter processes data in two directions and maintains contextual information.To evaluate the performance of the said model, correlation coefficient (R 2 ) and root mean square error (RMSE) are good choices as both are widely used in groundwater level studies.I believe this proposed model can serve to predict yearly GWL with complex [109] input variables.
This study sheds light on a review of the most used conventional numerical and machine learning (ML) and deep learning models for groundwater levels (GWL) simulation.Significant advancements have been made in terms of prediction efficiency over the last 2 decades.In addition, most of the time researchers' focus has remained on predicting GWL on a monthly, weekly, daily, and hourly basis.However, the water managers and strategists need multi-year GWL simulation to take effective steps towards the sustainable supply of groundwater.In this paper, a collection of state-of-the-art theories for developing and designing a novel methodology and improving modeling efficiency are also considered in the applicable field of evaluation.Examining modeling techniques used in all the reviewed studies, it was estimated that the machine learning and deep learning approaches are efficient enough for modeling GWL.Moreover, we also provide possible future research directions and recommendations to enhance the accuracy of the groundwater level prediction models and improve the relevant understanding.

Conclusions
This survey paper provides a brief review of the most commonly used conventional, numerical, machine learning, and deep learning models for predicting groundwater levels (GWL) using different simulations or data driven models.Over the last two decades, significant improvements have been made in terms of prediction accuracy.The survey covers the period of 2008-2022 and includes papers from Scopus-and Web-of-Scienceindexed journals.While most researchers have focused on predicting monthly, weekly, daily, and hourly GWL, water experts require multi-year simulations to ensure the sustainable supply of groundwater.This paper also compiles 109 papers that presented state-of-theart concepts and techniques for developing a novel approach and improving modeling efficiency in this field.After examining modeling techniques used in all the reviewed studies, we find that machine learning and deep learning approaches are effective for modeling GWL.Additionally, we provide recommendations and identify research gaps for improving the accuracy of groundwater level prediction models.

Figure 2 .
Figure 2. Relevant and irrelevant papers selection process.

Figure 2 .
Figure 2. Relevant and irrelevant papers selection process.

Figure 4 .
Figure 4. Different kinds of data used for prediction of GWL.

Figure 4 .
Figure 4. Different kinds of data used for prediction of GWL.

Figure 5 .
Figure 5.A basic architecture of ANFIS model.

Table 1 .
[15]ous datasets available for use in modeling parameters and their prediction possibilities[15].

Table 2 .
Various datasets available for use in modeling parameters.

Table 2 .
Various datasets available for use in modeling parameters.

Table 4 .
Different performance evaluation measures used by different experts for GWL prediction.