Benchmarking Machine Learning and Deep Learning Models for Groundwater Level Prediction in Karst Aquifers: The Dominant Role of Hydrogeological Complexity

Zhu, Qingmin; Zhu, Yinxia; Niu, Jie; Huang, Jinqiang; Huang, Fen; Zhou, Xiangyang; Liu, Dongdong; Hu, Bill X.

doi:10.3390/w18080939

Open AccessArticle

Benchmarking Machine Learning and Deep Learning Models for Groundwater Level Prediction in Karst Aquifers: The Dominant Role of Hydrogeological Complexity

by

Qingmin Zhu

¹,

Yinxia Zhu

²,

Jie Niu

^1,*,

Jinqiang Huang

^1,*,

Fen Huang

³,

Xiangyang Zhou

¹

,

Dongdong Liu

¹ and

Bill X. Hu

²

¹

College of Resources and Environmental Engineering, Guizhou University, Guiyang 550025, China

²

School of Water Conservancy and Environment, University of Jinan, Jinan 250022, China

³

Key Laboratory of Karst Dynamics, Ministry of Natural Resources and Guangxi Zhuang Autonomous Region, Institute of Karst Geology, Chinese Academy of Geological Sciences, International Research Center on Karst Under the Auspices of UNESCO, National Center for International Research on Karst Dynamic Systems and Global Change, Guilin 541004, China

^*

Authors to whom correspondence should be addressed.

Water 2026, 18(8), 939; https://doi.org/10.3390/w18080939

Submission received: 2 March 2026 / Revised: 1 April 2026 / Accepted: 7 April 2026 / Published: 14 April 2026

(This article belongs to the Special Issue Groundwater Hydrology in Karst Media: Resources and Sustainability in Engineering)

Download

Browse Figures

Versions Notes

Abstract

Karst aquifers present unique challenges for groundwater level prediction due to their dual-porosity structures and highly nonlinear hydrological responses. This study systematically evaluates nine machine learning and deep learning models (RF, XGBoost, LSTM, CNN, Transformer, N-BEATS, CNN-LSTM, Seq2Seq-LSTM, and Attention-Seq2Seq-LSTM) for rainfall-driven groundwater level forecasting in the Maocun subterranean river catchment, Guilin, Guangxi, China. Two years of hourly high-frequency data from three monitoring sites representing distinct hydrogeological zones (recharge, flow, and discharge) were employed within a multidimensional evaluation framework integrating single-step accuracy, multi-step stability, and computational efficiency. Results indicate that the Transformer achieved the highest single-step prediction accuracy, attaining the lowest RMSE (0.130–0.606 m) and highest R² (0.813–0.965) across all three sites. CNN-LSTM offered the best balance between predictive performance and computational cost, requiring an average training time of only 27.97 s and 28.0 convergence epochs. N-BEATS demonstrated superior long-term stability in 12-steps-ahead forecasting, achieving R² = 0.914 at ZK1, outperforming all other architectures. More fundamentally, hydrogeological complexity exerted a dominant control on predictive skill that systematically outweighed differences arising from model architecture. All models yielded R² below 0.813 at the geologically complex ZK2 site, whereas R² exceeded 0.950 across all models at ZK1, indicating that aquifer complexity, rather than algorithm selection, constitutes the primary constraint on prediction feasibility. This study presents the first application of N-BEATS to karst groundwater level forecasting and proposes a replicable multidimensional evaluation framework, providing a scientific reference for intelligent modelling of complex karst systems.

Keywords:

karst groundwater level prediction; machine learning/deep learning; time series forecasting; multi-step prediction; Transformer; N-BEATS

Graphical Abstract

1. Introduction

Groundwater, as a critical component of the global water cycle, plays an irreplaceable role in maintaining ecological balance and supporting socioeconomic development [1,2]. Accurate prediction of groundwater level dynamics is of great significance for water resources management, agricultural irrigation planning, flood mitigation, and ecological conservation [3,4]. However, under intensifying climate change and growing human disturbances, groundwater systems have become increasingly complex in their evolutionary behavior [5,6]. Traditional physically based hydrological models face severe challenges in parameterization and scale matching, particularly in highly heterogeneous aquifer systems such as karst.

Karst aquifers are characterized by dual-porosity structures and highly heterogeneous conduit–matrix systems, resulting in strong nonlinearity and rapid rainfall–response dynamics [7,8]. The coexistence of conduit and matrix flows makes the rainfall–groundwater relationship extremely complex, posing unique challenges for groundwater level prediction in karst regions [9,10,11]. In China, karst landscapes cover approximately one-third of the national territory [12], with the Guangxi Zhuang Autonomous Region exhibiting particularly well-developed karst landforms [13], providing a natural laboratory for advancing karst hydrogeological research.

In recent years, machine learning (ML) and deep learning (DL) methods have gradually become important tools for hydrological time series prediction due to their powerful nonlinear fitting capabilities [14]. In the field of groundwater level prediction, traditional ML models such as Random Forest (RF) and eXtreme Gradient Boosting (XGBoost) have been proven effective in handling moderate-complexity nonlinear relationships [15,16,17]. The introduction of DL models has further enhanced prediction performance: Long Short-Term Memory networks (LSTM) and Convolutional Neural Networks (CNN) can automatically extract temporal features and capture long-term dependencies [18,19]. Hybrid models such as CNN-LSTM demonstrate promising potential in complex hydrological processes by integrating local feature extraction with sequence modeling capabilities [20,21]. Recently, Transformer models with self-attention mechanisms have demonstrated potential in handling long-range dependencies in hydrological forecasting [22].

In the specific domain of groundwater level prediction, studies have evaluated the performance of several models included in the present study. Traditional machine learning models, including RF and XGBoost, have been applied to groundwater level prediction in the Najafabad plain, demonstrating competitive accuracy in handling nonlinear feature interactions under data-limited conditions [23]. LSTM and Transformer models have been assessed for extended-horizon groundwater level forecasting in the Thames Basin, with the Transformer showing advantages in capturing long-range temporal dependencies over traditional recurrent architectures [24]. The CNN-LSTM hybrid architecture has been validated for groundwater level forecasting in arid regions, confirming the benefit of combining local convolutional feature extraction with sequential memory modeling [25]. Although Seq2Seq-LSTM and Attention-Seq2Seq-LSTM have not yet been applied to groundwater level prediction, both architectures have demonstrated strong performance in multi-step time series prediction tasks: Seq2Seq-LSTM has shown effectiveness in capturing temporal dynamics through its encoder–decoder structure [26], while the attention mechanism further enhances prediction stability by enabling the decoder to selectively focus on the most relevant historical states [27]. N-BEATS, despite achieving state-of-the-art results in energy and financial time series forecasting [28] and showing promise in streamflow prediction [29,30], similarly lacks any reported application in groundwater level estimation. The absence of systematic evaluation for these three architectures in groundwater level prediction, particularly in complex karst systems, motivates their inclusion in the present study.

Despite significant progress in hydrological prediction using machine learning and deep learning, their application in karst groundwater level prediction still faces numerous challenges. The rapid proliferation of model architectures has exacerbated the model selection dilemma in karst groundwater level prediction [31,32,33]. Moreover, existing studies predominantly focus on single-step-ahead prediction, with limited attention to multi-step prediction stability, while multi-step prediction is of greater practical relevance for long-term water resource management [34,35,36]. More critically, the influence of hydrogeological conditions on model performance has not been sufficiently elucidated, and their potential dominant role may exceed the differences arising from model architectures themselves.

To address the above challenges, this study systematically evaluates nine ML and DL models (RF, XGBoost, LSTM, CNN, Transformer, N-BEATS, CNN-LSTM, Seq2Seq-LSTM, and Attention-Seq2Seq-LSTM) for rainfall–groundwater level forecasting in a typical karst watershed in Guilin, Guangxi. Three monitoring sites representing distinct hydrogeological zones, namely the recharge, flow, and discharge zones, were selected, and two years of hourly high-frequency data were employed. The central scientific question addressed in this study is to what extent hydrogeological complexity, relative to model architecture, governs prediction feasibility in karst groundwater systems. To investigate this question, three specific objectives are pursued: (1) to quantify the relative influence of hydrogeological conditions versus model architecture on prediction performance by systematically comparing nine models across three sites with contrasting hydrogeological characteristics; (2) to evaluate single-step and multi-step prediction performance and characterise stability degradation trends across increasing forecast horizons; (3) to assess the trade-off between prediction accuracy and computational efficiency and identify model structures best suited to karst groundwater forecasting under different application scenarios. The scientific contributions of this work are threefold. First, we provide systematic cross-site evidence that hydrogeological complexity constitutes a dominant constraint on model predictive skill, consistently outweighing architectural differences across all nine model families, a finding that reframes model selection in karst systems as a hydrogeological problem rather than purely an algorithmic one. Second, we propose a replicable multidimensional evaluation framework integrating single-step accuracy, multi-step stability, and computational efficiency, which can serve as a methodological reference for model selection and performance assessment in complex hydrological systems. Third, we present the first application of N-BEATS to karst groundwater level forecasting, expanding the methodological toolkit available for heterogeneous aquifer environments.

2. Materials and Methods

2.1. Study Area

The Maocun subterranean river catchment is situated in the southeastern part of Chaotian Township, Lingchuan County, Guilin City, Guangxi Zhuang Autonomous Region (110°30′–110°35′ E, 25°09′–25°13′ N), encompassing an area of approximately 11.2 km² (Figure 1). The region experiences a typical subtropical monsoon climate, with a long-term mean annual precipitation of 1903.9 mm (Figure 2) and a mean annual temperature of 18.6 °C [37]. Precipitation is unevenly distributed throughout the year: the concentrated rainfall period from April to July accounts for 60–70% of the annual total.

Carbonate lithologies occupy 7.6 km² of the catchment, approximately two-thirds of its total area [37]. Karst landforms are well developed, with numerous surface features such as sinkholes and dolines. Subsurface development includes an intricate network of caves and underground river systems, rendering the Maocun catchment a paradigmatic karst watershed [38].

2.2. Data Sources and Data Preprocessing

To compare the performance of different models in karst groundwater level prediction, this study utilized data from three groundwater monitoring wells: Shanwan (ZK1), Zhangshandi (ZK2), and Maocun outlet (ZK3), along with one automatic meteorological station. The three monitoring wells are characterized as follows. ZK1, located in the upper reaches of the watershed, has a shallow water table depth of approximately 2 m. ZK2, situated in the central catchment area, exhibits a significantly deeper water table depth of about 12 m and is characterized by complex topographic and geological conditions. ZK3, near the watershed outlet, also has a shallow water table depth of approximately 2 m. The observation period spans from 15 July 2021 to 10 July 2023, comprising 728 days of continuous monitoring records. Precipitation data were recorded at 15 min intervals, while groundwater level measurements were taken every 30 min.

Karst groundwater level variations exhibit pronounced lag and continuity characteristics, with response times typically measured in hours [39]. Although the original monitoring data were collected at 30 min intervals, groundwater level fluctuations are generally small at such short time scales, and high-frequency noise may interfere with model training. To achieve a balance between data precision and computational efficiency, this study employed mean resampling to convert the original 30 min data to hourly resolution, with precipitation data correspondingly resampled to hourly intervals. This processing strategy preserved the primary variational characteristics of the data while effectively reducing data volume and improving model training efficiency.

To ensure data quality and enhance model training effectiveness, several preprocessing procedures were applied to the original time series data. Outliers, defined as values significantly exceeding normal ranges, were identified in approximately 1% of the dataset, primarily caused by sensor handling during data collection. These outliers were replaced with the average of adjacent values to maintain data continuity. Missing values, resulting from sensor malfunctions or signal interruptions, were addressed as follows. For short gaps, linear interpolation was used to fill missing data points. For continuous data gaps exceeding 6 h, regression models were constructed using observations from nearby monitoring stations during corresponding periods to estimate missing values.

The dataset from monitoring stations ZK1, ZK2, and ZK3 was analyzed for completeness. ZK3 data were complete, while ZK1 and ZK2 exhibited missing values, accounting for approximately 35% of their respective datasets. These missing periods occurred primarily from July to September 2021 and January to March 2022, coinciding with low precipitation and stable water levels. The missing data were attributed to sensor failures. Given the strong correlations between measured data from the stations (all pairwise correlations > 0.8), regression models were employed to reconstruct missing values. The correlation matrix for the measured data, after removing missing values, is shown in Table 1.

Cross-validation of the regression models used for data reconstruction demonstrated high accuracy, as shown in Table 2.

Following preprocessing, the dataset for model training comprised hourly precipitation and groundwater level observations from three monitoring wells, with each well containing 17,442 records (Figure 3). Basic statistical characteristics of the groundwater level data at each monitoring site are summarized in Table 3. To maintain temporal integrity and prevent data leakage, the dataset was chronologically divided into training set (15 July 2021, to 31 December 2022, totaling 12,823 records, about 70%), validation set (1 January 2023, to 15 April 2023, totaling 2520 records, about 15%), and test set (16 April 2023, to 12 July 2023, totaling 2099 records, about 15%). This partitioning strategy ensures the training period captures multiple seasonal cycles for comprehensive pattern learning, while the test period covers the critical spring-summer transition for robust model evaluation.

2.3. Experimental Setup

The experimental environment configuration for this study is presented in Table 4, with all models utilizing identical random seeds (seed = 42) to ensure reproducibility. To establish a fair and consistent comparison, a unified experimental framework was established across all models. All input features and target variables were normalized using MinMaxScaler (version 1.5.1). A sliding window approach was adopted, defining an input sequence length of 6 time steps and a prediction window of 12 time steps. A consistent feature engineering process was applied to incorporate temporal dependencies, with details provided in Section 2.4.

While the input/output window was identical across models, the data was formatted differently to suit the inherent structure of each model family. For the tree-based models, namely Random Forest and XGBoost, the input sequence of 6 time steps was flattened into a single feature vector, transforming the time series problem into a standard tabular regression task. In contrast, the deep learning models received the input as a sequence of shape (6, number of features), allowing their architectures to directly process the temporal structure of the data.

Crucially, all models employed a Direct Multi-output strategy to generate the 12-step forecast, effectively avoiding the error accumulation issues inherent in recursive forecasting. For the machine learning models, this was achieved using the MultiOutputRegressor wrapper from the multioutput module of scikit-learn (sklearn.multioutput.MultiOutputRegressor) in Python, which trains an independent regressor for each of the 12 future time steps. For the deep learning models, this strategy was implemented architecturally through a final output layer designed to produce a 12-dimensional vector, where each dimension corresponds to a step in the prediction window.

All deep learning models were uniformly trained using the Adam optimizer with an initial learning rate of 0.001, a batch size of 32, and the mean squared error loss function. The training process was regularized using a ReduceLROnPlateau learning rate scheduler (patience = 5) and an early stopping mechanism (patience = 15), with a maximum of 100 training epochs. The machine learning models adopted fixed parameter configurations: Random Forest was set with 100 decision trees (n_estimators = 100) [17], while XGBoost was configured with 100 boosting rounds (n_estimators = 100), a maximum depth of 6 (max_depth = 6), and a learning rate of 0.1 [40].

2.4. Feature Engineering

Groundwater levels in karst regions exhibit distinct lag response characteristics to precipitation, with response patterns varying among monitoring sites due to differences in hydrogeological conditions. Statistical analysis of precipitation events during the study period revealed average response times of 5.75 h at ZK1, 3.95 h at ZK2, and 4.875 h at ZK3 (Supplementary Materials Table S1, Figures S1–S3). To adequately capture diverse response characteristics while establishing a unified modeling framework, this study configured precipitation accumulation windows of 3 and 6 h based on the response time distribution patterns of monitoring sites. The 3 h accumulation window primarily captures early response signals from sites with relatively rapid responses, while the 6 h accumulation window encompasses the main lag response processes of all monitoring sites, ensuring that models can fully utilize temporal information from precipitation-groundwater level responses.

Karst groundwater level variations exhibit significant temporal autocorrelation, with current water levels largely influenced by historical values [41], particularly the water level at the previous time step, which plays an important role in current predictions. Based on this, we constructed lagged water level features and water level rate-of-change features in this study.

For lagged water level features, we incorporated the previous time step water level value as an input feature, which can be expressed as:

H_{l a g} (t) = H (t - 1)

(1)

where H_lag(t) represents the water level value with a 1 h time lag, and H(t − 1) denotes the observed water level at the previous time step. Additionally, to better characterize water level variation trends and dynamic features, we constructed water level differential features as follows:

∆ H (t) = H (t) - H (t - 1)

(2)

This feature can effectively quantify the rate of water level change per unit time, enhancing the model’s perception of both the direction and intensity of water level fluctuations, thereby contributing to improved response prediction accuracy.

2.5. Prediction Models

2.5.1. Conventional Machine Learning Models

Traditional machine learning models have been widely applied in hydrological and environmental forecasting due to their strong performance in handling small samples, high-dimensional, and nonlinear problems [42,43]. Compared to deep learning methods, traditional machine learning models offer advantages such as faster training speed, better interpretability, and relatively lower data requirements. This study selects the following two representative algorithms.

As a typical ensemble learning method, RF (Figure S4a) constructs a nonlinear regression model by building multiple decision trees and making predictions based on averaging [44]. This method is capable of handling nonlinear interactions between features, while also being robust, resistant to overfitting, and easy to tune [45,46]. In groundwater forecasting, RF can automatically identify key influencing factors, making it well-suited for handling multivariate and heterogeneous hydrological data [15,47].

As an efficient implementation of gradient boosting algorithms, XGBoost (Figure S4b) constructs weak learners sequentially and combines them through weighted aggregation to form a strong predictive model [48]. The algorithm employs regularization techniques to control model complexity, effectively capturing complex feature-response relationships, and performing excellently in structured data modeling scenarios. Its efficient parallel computing capability and superior generalization performance make it an important benchmark model for time series forecasting [40,49].

2.5.2. Single Deep-Learning Architectures

Deep learning models possess powerful nonlinear mapping capabilities and automatic feature extraction abilities, offering inherent advantages when dealing with high-dimensional, dynamic, and nonlinear problems such as groundwater systems [50]. Deep architectures can automatically learn hierarchical feature representations from raw data, eliminating the need for manual feature engineering, and are particularly well-suited for handling groundwater dynamics with complex spatiotemporal dependencies [51]. This study selects the following four deep learning models.

As an important variant of recurrent neural networks (RNNs), LSTM effectively addresses the gradient vanishing problem of traditional RNNs by introducing gating mechanisms (forget gate, input gate, output gate) (Figure S5a), where Xt − 1, Xt, Xt + 1 represent input vectors at consecutive time steps, and ht − 1, ht, ht + 1 denote corresponding hidden state outputs. The symbol σ represents the sigmoid activation function (output range 0 to 1), tanh denotes the hyperbolic tangent activation function (output range −1 to 1), ⊗ indicates element-wise multiplication (Hadamard product), and ⊕ represents element-wise addition operations. These gating structures work collaboratively to control information flow and effectively capture long-term temporal dependencies. This model selectively remembers and forgets historical information, making it particularly suitable for capturing long-term dependencies and lag effects in groundwater level time series [52]. In karst groundwater systems, LSTM can model the complex lagged response relationships between rainfall, runoff, and groundwater levels [53].

CNN employs local receptive fields and parameter sharing mechanisms for efficient feature extraction [54]. In time series modeling, CNN can automatically identify local patterns, trend changes, and periodic features (Figure S5b). Starting from input, the architecture extracts local temporal features through one-dimensional convolutional layers (Conv1D), introduces non-linear transformations via ReLU activation functions, further extracts high-level features through additional Conv1D and ReLU layers, applies Global Average Pooling (GAP) for feature dimensionality reduction, and finally generates predictions through a fully connected layer (FC). The 1D convolutional kernel captures short-term dependencies and local correlations in the time series, while the multi-layer convolutional structure extracts features at different temporal scales, providing rich feature information for groundwater level prediction [55,56].

Based on the self-attention mechanism, Transformer models the direct relationships between any two positions in an input sequence, enabling parallel processing of sequence information and capturing global dependencies [57] (Figure S5c). Transformer presents the attention-based encoder architecture where input time series undergo input embedding to convert into high-dimensional vector representations, combined with positional encoding to preserve temporal sequence information. The multi-head self-attention mechanism (h = 4 indicates 4 parallel attention heads) computes attention weights across different subspaces in parallel to capture complex intra-sequence dependencies. Add and LayerNorm represent residual connections and layer normalization operations for training stabilization, while the Feed Forward Network consists of two fully connected layers for feature transformation. The entire encoder repeats for 2 layers (Transformer Encoder Layer), with final predictions generated through the Last Token Selection mechanism. This model overcomes the limitations of recurrent structures, offering higher computational efficiency and stable training when handling long sequences. The self-attention mechanism can automatically learn the importance weights of different time steps in the sequence, making it particularly suitable for capturing complex, cross-time-scale association patterns in karst groundwater systems [58].

N-BEATS is a deep neural network architecture specifically designed for time series forecasting [28] (Figure S5d). This model adopts residual connections and hierarchical forecasting concepts, decomposing complex time series into trend and seasonal components for modeling. Trend Stack captures long-term trend components, Seasonality Stack models periodic patterns, and Generic Stack handles other complex non-linear patterns. Each stack contains multiple structurally identical blocks (Block 1, Block 2, Block 3), with each block internally composed of four fully connected layers (FC1, FC2, FC3, FC4). The dual-output mechanism generates both forecast and backcast signals, implementing residual learning through “residual = residual − backcast” to progressively remove modeled components, while accumulating predictions from all stacks via “total_forecast += forecast” to achieve decomposed modeling and ensemble prediction of different frequency and pattern components in time series. N-BEATS performs decomposition and reconstruction of time series using learnable basis functions, capturing multi-scale temporal patterns and demonstrating exceptional performance in pure time series forecasting tasks [59].

2.5.3. Hybrid Deep-Learning Architectures

To leverage the advantages of different network structures and enhance the model’s expressive power, this study introduces three hybrid deep learning models. By combining the strengths of different types of neural networks, hybrid architectures enable complementary advantages in feature extraction, sequence modeling, and temporal prediction, further improving the accuracy of modeling complex groundwater dynamic processes [60].

The Seq2Seq-LSTM (Figure S6a) model consists of the Sequence-to-Sequence (Seq2Seq) model adopting an encoder–decoder architecture and two LSTM networks responsible for encoding the input sequence and generating the output sequence [26]. The encoder LSTM compresses the input historical groundwater level sequence into a fixed-length context vector, and the decoder LSTM generates the future prediction sequence step by step based on this context vector. This architecture is naturally suited for multi-step prediction tasks, maintaining semantic consistency between the input and output sequences, and providing an effective modeling framework for medium- to long-term groundwater level forecasting.

The CNN-LSTM (Figure S6b) hybrid architecture combines the local feature extraction ability of CNN with the sequence memory capability of LSTM, forming a hierarchical feature learning framework. The CNN layer first performs convolution on the input time series to extract local patterns and short-term dependency features. The LSTM layer then processes the feature sequences extracted by the CNN, modeling long-term temporal dependencies [61,62]. This structure is particularly suitable for handling groundwater level data with multi-scale temporal features, capable of capturing both short-term fluctuations and long-term trends.

Based on the Seq2Seq-LSTM model, the attention mechanism is introduced (Attention-Seq2Seq-LSTM model) (Figure S6c) to address the information bottleneck issue in the traditional encoder–decoder architecture. The attention mechanism allows the decoder to dynamically focus on different hidden states of the encoder when generating each prediction step, rather than relying solely on a fixed context vector. This dynamic attention allocation mechanism improves the model’s efficiency in utilizing key historical information, making it particularly suitable for modeling complex scenarios in groundwater systems where different historical periods contribute differently to future predictions [63].

2.6. Model Evaluation Metrics

To objectively evaluate the performance of constructed models in karst groundwater level prediction, this study employed strict temporal partitioning for model training, validation, and testing to avoid evaluation bias caused by data leakage. Based on the characteristics of regression prediction tasks, three widely adopted evaluation metrics were selected to quantify model performance.

Root Mean Squared Error (RMSE) measures the mean squared deviation between the predicted values and the observed values [64]. Its mathematical expression is as follows:

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}

(3)

where

y_{i}

represents the true value of the i-th observation, and

{\hat{y}}_{i}

represents the predicted value of the i-th observation. RMSE penalizes large errors more strongly and is sensitive to outliers, making it particularly suitable for evaluating the fitting ability of models in processes with abrupt changes, such as heavy rainfall responses. In karst groundwater level forecasting, RMSE effectively identifies the model’s prediction stability under extreme weather events.

Mean Absolute Error (MAE) reflects the average deviation between the predicted values and the observed values [65]. Its calculation formula is as follows:

M A E = \frac{1}{n} \sum_{i = 1}^{n} | y_{i} - {\hat{y}}_{i} |

(4)

MAE possesses intuitive interpretability, as its unit is consistent with that of groundwater level (meters), enabling direct physical understanding of prediction deviations. Unlike RMSE, MAE is less sensitive to outliers and extreme errors, making it more suitable for characterizing the overall prediction bias. By assigning equal weight to all errors, MAE better reflects the model’s average predictive capability across the entire dataset.

Coefficient of Determination (R²) represents the goodness-of-fit of the model to the trend of groundwater level variations [64] and is defined as:

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}

(5)

where

\bar{y}

denotes the mean of the observed values. The closer the R² value approaches 1, the better the model’s fitting performance for groundwater level variation trends, indicating greater effectiveness in capturing the intrinsic patterns of groundwater level fluctuations.

Comprehensive evaluation using these three metrics quantifies model prediction performance from different perspectives, providing objective criteria for selecting appropriate karst groundwater level prediction methods.

3. Results and Discussion

3.1. Performance Analysis of Single-Step Prediction

To comprehensively evaluate the single-step prediction performance of each model, this study conducted a comparative analysis of nine models across test sets from three monitoring sites (ZK1, ZK2, and ZK3), employing RMSE, MAE, and R² metrics for quantitative assessment. Results are presented in Figure 4.

The prediction model performance corresponded well with the hydrogeological characteristics at each monitoring site. ZK1, located in the upper catchment with a shallow water table, achieved the highest prediction accuracy (R² > 0.950, RMSE: 0.130–0.168), which can be attributed to its straightforward hydrological processes dominated by the direct influence of allogenic water and precipitation. The simple and direct rainfall–groundwater coupling at this site provides models with high-quality, low-noise input–output mappings that are amenable to accurate prediction regardless of architectural complexity. In contrast, ZK2, situated in the central catchment with complex topography and geology and a deep-water table, presented the greatest prediction challenge (R²: 0.769–0.813, RMSE: 0.606–0.673). Its elevated prediction uncertainty reflects the complexity of groundwater flow paths in the highly karstified zone and the strong nonlinearity introduced by heterogeneous media structures. ZK3, located at the watershed outlet with a shallow water table, exhibited intermediate performance (R²: 0.855–0.877, RMSE: 0.278–0.302), consistent with its role as the regional discharge boundary that effectively dampens high-frequency fluctuations induced by internal catchment complexities.

Overall, deep learning models demonstrated superior performance compared to traditional machine learning methods in groundwater level prediction, consistent with findings from other studies [18]. Hybrid deep learning architectures outperformed single-model structures, with CNN-LSTM surpassing individual CNN and LSTM models, and Seq2Seq-LSTM and Attention-Seq2Seq-LSTM both exceeding the basic LSTM model, aligning with previous research conclusions [34,66]. The performance gain of hybrid architectures over their single-network counterparts reflects the multi-scale nature of karst groundwater fluctuations, where short-term precipitation pulses and longer-term recession dynamics operate at distinct temporal scales that benefit from complementary feature extraction strategies. Specifically, the Transformer model exhibited optimal overall performance, achieving the lowest RMSE and highest R² values across all three monitoring sites. The CNN-LSTM model also demonstrated strong predictive capability, ranking second only to the Transformer, and obtained lower MAE values than the Transformer at the ZK1 and ZK3 sites, consistent with prior studies [22,62,67,68]. In contrast, traditional machine learning models (XGBoost, Random Forest) showed relatively lower prediction accuracy across all monitoring sites, particularly at the geologically complex ZK2 site, where R² values were only 0.774 and 0.769, significantly lower than deep learning models.

Notably, N-BEATS, a deep learning architecture specifically designed for time series forecasting and initially applied in financial and energy forecasting, has seen limited application in the hydrological field. Based on the current literature review, N-BEATS has only been applied to monthly inflow prediction for dam reservoirs [30], with no reported applications in groundwater level forecasting, particularly in karst regions. The present study demonstrates that the N-BEATS model exhibits promising performance in groundwater level prediction, ranking third after Transformer and CNN-LSTM models, and achieving lower MAE values than the Transformer at the ZK1 and ZK3 sites. This competitive performance may be attributed to the structural compatibility between the N-BEATS stacked block decomposition mechanism and the multi-scale composition of karst groundwater level signals, which comprise both slowly varying baseflow recession components and high-frequency storm-driven recharge pulses. This provides new evidence for extending N-BEATS applications in hydrological forecasting.

3.2. Performance Analysis of Multi-Step Prediction

To evaluate the long-term prediction capability of each model, this study conducted 1 to 12-step prediction experiments at three monitoring sites (ZK1, ZK2, and ZK3), systematically analyzing the impact of prediction horizon on model performance using RMSE, MAE, and R² metrics (Figure 5), while the complete MAE results are provided in Figure S7.

Results demonstrate that all models exhibit pronounced performance degradation during multi-step prediction, which is an inherent characteristic of long-term time series forecasting [69]. However, significant differences in performance degradation rates across monitoring sites reveal that prediction reliability depends not only on model architecture but is also constrained by the intrinsic hydrodynamic characteristics of groundwater systems. Among the three sites, ZK1 exhibited the slowest performance degradation, followed by ZK3, while ZK2 experienced the most severe deterioration, with R² values of some models dropping below 0.6, rendering predictions unreliable.

This spatial variability corresponds closely to the hydrogeological conditions at each site. ZK1 features relatively simple hydrological processes with weak nonlinearity, enabling models to readily capture its regular variations. In contrast, ZK2 is characterized by deep groundwater levels, prolonged system residence time, complex flow pathways, and coupling effects of multiple factors, resulting in significantly elevated prediction uncertainty. ZK3, situated in the watershed discharge zone, is influenced by upstream complex processes yet possesses buffering capacity that attenuates high-frequency fluctuations, thereby achieving intermediate prediction performance. These findings demonstrate that the inherent complexity of groundwater systems constitutes a critical constraint on multi-step prediction capability, and even advanced models remain limited by actual hydrogeological settings [70,71].

Overall, traditional machine learning models (XGBoost and Random Forest) exhibit a marked decline in performance in multi-step ahead prediction. As the prediction horizon increases, their R² continuously decreases, remaining the lowest among all models across the three monitoring sites. This accelerating degradation is structurally consistent with the inability of tree-based models to represent temporal state evolution: because RF and XGBoost encode the input window as a static feature vector, they cannot model the propagation of hydrological states through time, causing predictive skill to collapse as the forecast horizon exceeds the temporal scale captured by the input features. This finding is consistent with previous studies, which have shown that while traditional machine learning models perform well in single-step or short-term predictions, they struggle to effectively model the dynamic evolution of hydrological systems over longer horizons [18]. The single neural network architecture, LSTM, also shows similar limitations, with prediction accuracy significantly deteriorating as the forecast step increases, particularly at ZK2, where hydrological responses are highly complex.

To address this challenge, Attention-Seq2Seq-LSTM and Seq2Seq-LSTM incorporate attention mechanisms and encoder–decoder structures to enhance model capability. Although these architectures improve the model’s ability to focus on critical historical states and show relatively better adaptability at the ZK2 site, their overall performance improvement over the basic LSTM remains limited. This may be attributed to suboptimal network design, attention weight allocation, or gradient propagation efficiency during training, suggesting that further optimization is needed to strengthen their modeling of long-term dependencies.

In contrast, the CNN and its hybrid with LSTM (CNN-LSTM) demonstrate superior predictive performance, indicating that convolutional neural networks are effective in extracting local features from input variables. By integrating CNN with LSTM, the hybrid architecture enables synergistic modeling of spatial local patterns and temporal dependencies, thereby improving prediction accuracy. This result aligns with findings from river conductivity forecasting [66] and runoff simulations in the headwater region of the Yellow River [34], validating the effectiveness of the CNN-LSTM framework in hydrological prediction. Nevertheless, its long-term predictive capability still lags behind that of N-BEATS and Transformer, reflecting limitations in modeling distant future dynamics under highly nonlinear conditions.

The Transformer model demonstrates a distinct advantage in capturing seasonal fluctuations and abrupt events, exhibiting stronger robustness in medium- to long-term predictions. By dynamically weighting the importance of historical states, the self-attention mechanism effectively models long-range dependencies, thereby mitigating the prediction uncertainty caused by information decay. This advantage is consistent with prior research: Transformer has been shown to significantly outperform traditional LSTM models in multi-step groundwater level prediction due to its powerful sequence modeling capacity [72]. Comparative experiments in karst spring flow prediction further confirm the applicability of Transformer in complex aquifer systems [67]. These evidences collectively suggest that Transformer is particularly suitable for modeling groundwater systems characterized by memory effects, nonlinear responses, and multi-scale dynamics.

Most remarkably, the N-BEATS model demonstrated exceptional performance. Whether at the relatively simple hydrological ZK1 site or at the dynamically complex, nonlinearly responsive ZK2 and ZK3 sites, N-BEATS consistently exhibited the strongest long-term stability, significantly outperforming other models. This result is highly consistent with findings from multi-step prediction of harmful algal blooms [73] and reservoir monthly runoff prediction [30], further confirming the superiority of N-BEATS in long-horizon forecasting tasks. The results indicate that N-BEATS not only performs well in single-step prediction but also exhibits robust modeling capability for distant forecast horizons under a direct multi-step prediction framework, offering new perspectives for the selection and optimization of intelligent models in future groundwater prediction.

To provide a more intuitive visualization of model performance across different prediction horizons, this study conducted a comprehensive visual analysis of representative models at the ZK1 site. Figure 6 presents the time series fitting performance of nine models for 1-step, 6-step, and 12-step predictions, intuitively illustrating their capability to track groundwater level dynamics. Figure 7 quantitatively evaluates the accuracy distribution and consistency of three typical deep learning models (CNN-LSTM, Transformer, and N-BEATS) across different prediction steps through scatter plots of predicted versus observed values. Figure 8 depicts the evolution of cumulative absolute errors over time during the testing period for each model, serving as a measure of their stability in long-term prediction. These three figures systematically reveal the evolution patterns of model performance with increasing prediction steps from three perspectives: temporal dynamic fitting, numerical accuracy performance, and error growth trends. Furthermore, to comprehensively assess model generalization capability under different hydrogeological conditions, the prediction time series and cumulative error evolution for the other two monitoring sites (ZK2 and ZK3) are presented in the Supplementary Materials (Figures S8 and S9 and Figures S10 and S11, respectively).

3.3. Comparative Analysis of Computational Efficiency

Computational efficiency represents a critical factor for the practical deployment of groundwater level prediction models. This section evaluates nine models across multiple dimensions, including parameter complexity, training time, and convergence characteristics.

Model parameter counts exhibit substantial variation, ranging from 34,764 parameters for CNN to 490,770 parameters for N-BEATS, representing a 14-fold complexity difference (Table 5). It should be noted that these values reflect static weight file sizes only; runtime memory consumption during training and inference is substantially larger and scales with batch size and sequence length. Traditional machine learning methods (XGBoost, Random Forest) employ different architectures that do not involve trainable parameters in the traditional deep learning sense.

Training time comparison results reveal significant differences in computational efficiency across model categories (Table 6). Traditional machine learning methods demonstrate superior training speed, with XGBoost (3.49 s) and Random Forest (15.85 s) completing training within 20 s. Among deep learning models, CNN (24.25 s) and CNN-LSTM (27.97 s) achieved relatively high training efficiency, while sequence-to-sequence models required significantly longer training times.

Convergence characteristic analysis reveals differences in convergence speed and stability among deep learning models during training (Table 7). CNN-LSTM exhibited the fastest convergence speed, requiring only 28.0 training epochs on average to achieve optimal performance. Sequence-to-sequence architectures demonstrated slower convergence patterns, with Seq2Seq-LSTM requiring an average of 65.3 training epochs.

Comprehensive analysis of the trade-off relationship between computational cost and model performance reveals distinct efficiency-accuracy characteristics across model categories. Traditional machine learning methods provide exceptional training efficiency but limited predictive performance in handling complex temporal patterns. Among deep learning models, CNN-based architectures (CNN, CNN-LSTM) offer a favorable balance between accuracy and computational cost. Although N-BEATS requires substantial computational resources (average training time 334.42 s), its superior long-term prediction stability may justify the increased computational investment in applications requiring extended prediction horizons.

From a practical application perspective, CNN-LSTM emerges as the most computationally efficient deep learning model, combining fast convergence (28.0 epochs) and reasonable training time (27.97 s) while maintaining competitive predictive performance. Traditional methods retain advantages in scenarios prioritizing training speed over prediction accuracy. In practical applications, model category selection should consider specific trade-offs between computational constraints and prediction requirements.

3.4. Limitations and Transferability

Data quality. Approximately 35% of the records at ZK1 and ZK2 required regression-based reconstruction prior to model training. Three considerations limit the practical impact of this constraint on the reported conclusions. First, the missing periods were concentrated in hydrologically stable intervals (July–September 2021 and January–March 2022), with low precipitation and minimal water-table variation, so the reconstructed values represent low-variance segments rather than the high-energy recharge events that most challenge model performance. Second, reconstruction accuracy was high for both wells (R² > 0.90, Table 2), supported by strong inter-well correlations (>0.80, Table 1). Third, and most critically, the test set (April–July 2023) on which all performance metrics are based consists of more than 90% original, uninterpolated observations, ensuring that the comparative evaluation reflects genuine model behaviour. Nevertheless, future studies with continuous long-term monitoring records would eliminate this constraint and enable more robust assessment of model performance under extreme recharge conditions.

Interpretability. The present study concludes that hydrogeological complexity exerts a dominant control on model predictive skill, exceeding the influence of model architecture. This inference is grounded in a consistent cross-site performance gradient that persists across all nine architecturally distinct models, a pattern more parsimoniously explained by site-specific hydrogeological complexity than by any model-specific factor. The systematic and architecture-independent nature of this cross-site evidence provides strong indirect support for the conclusion. Nevertheless, no formal quantitative attribution analysis was conducted in the present study. Future work will couple a MODFLOW-based physically distributed groundwater model with SHAP analysis to explicitly partition the relative contributions of hydrogeological and architectural factors to prediction uncertainty, thereby providing direct mechanistic validation of the hydrogeological dominance finding.

Transferability. All findings derive from a single karst catchment in subtropical southern China, with a well-developed conduit–matrix system and pronounced monsoon seasonality. The transferability of the reported conclusions to karst systems with contrasting geological structures, recharge mechanisms, and climatic regimes remains to be validated. Future research could advance transferability through three directions: replicating the evaluation framework in karst systems with different conduit development and climate regimes; developing transfer learning protocols that leverage pre-trained weights from data-rich sites; and embedding physically interpretable parameters into physics-informed hybrid architectures to improve cross-site generalisation without requiring extensive local calibration.

4. Conclusions

This study addresses a fundamental question in karst hydrology: whether model architecture or hydrogeological complexity is the primary determinant of groundwater level prediction feasibility. By systematically evaluating nine ML and DL models across three hydrogeologically distinct monitoring sites within a unified multidimensional framework, we demonstrate that aquifer complexity exerts a dominant and consistent control on predictive skill that outweighs architectural differences, a finding with direct implications for how model selection should be approached in heterogeneous karst environments. Three principal conclusions emerge from this work.

First, among the evaluated architectures, the Transformer achieves the highest single-step prediction accuracy, benefiting from its self-attention mechanism that effectively captures multi-scale temporal dependencies. N-BEATS demonstrates superior long-term stability in multi-step prediction across all sites, suggesting that its stacked block architecture with backcast–forecast decomposition is particularly well-suited to systems with prolonged hydrological memory. CNN-LSTM achieves the best balance between prediction accuracy and computational cost, making it the most practically deployable option for engineering applications.

Second, and more fundamentally, hydrogeological complexity exerts a dominant control on predictive skill that systematically outweighs differences arising from model architecture. This cross-site performance contrast—persisting consistently across all nine model families—indicates that aquifer complexity, rather than model choice, is the primary constraint on prediction feasibility. Consequently, model selection for karst groundwater prediction should be treated as a hydrogeological problem first and an algorithmic problem second.

Third, this study presents the first application of N-BEATS to karst groundwater level forecasting and proposes a replicable multi-dimensional evaluation framework that can serve as a standardised paradigm for intelligent modelling of complex hydrological systems.

These findings collectively advocate a shift from one-size-fits-all model selection toward a site-adaptive, geology-informed modelling paradigm. Future research should prioritise physics-informed hybrid frameworks that embed hydrological prior knowledge into model design, multi-source data integration, and cross-basin transferability assessments to further advance intelligent modelling of heterogeneous karst systems.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/w18080939/s1, Text S1. Rainfall Event Classification and Time-Lagged Response Modeling of Karst Groundwater. Table S1. Classification of Rainfall Events. Figure S1. Relationship between rainfall intensity and start delay time (the interval between rainfall periods and the initiation of groundwater level rise) for different monitoring sites. Figure S2. Average start delay (mean start delay time across multiple rainfall events) for different rainfall intensities at ZK1, ZK2, and ZK3. Figure S3. Average start delays (mean start delay time across multiple rainfall events) for different rainfall intensity intervals at ZK1, ZK2, and ZK3. Figure S4. Machine learning model framework. Random Forest (a) and XGBoost (b). Figure S5. Deep learning model framework. Long Short-Term Memory (LSTM) (a), One-dimensional Convolutional Neural Network (CNN) (b), Transformer (c), Neural Basis Expansion Analysis for Time Series (N-BEATS) (d). Figure S6. Hybrid architecture model framework. Seq2Seq-LSTM (a), CNN-LSTM (b), and Attention-Seq2Seq-LSTM (c). Figure S7. Heatmaps of mean absolute error (MAE) for nine predictive models across three monitoring boreholes (ZK1–ZK3) and 12 forecast steps. Figure S8. Model prediction vs. observation at ZK2 for steps 1 (a), 6 (b), and 12 (c). Figure S9. Model prediction vs. observation at ZK3 for steps 1 (a), 6 (b), and 12 (c). Figure S10. Cumulative prediction error comparison of nine models at ZK2. 1-step (a), 6-step (b), and 12-step (c) forecasts. Figure S11. Cumulative prediction error comparison of nine models at ZK3. 1-step (a), 6-step (b), and 12-step (c) forecasts.

Author Contributions

Q.Z.: Conceptualization, Methodology, Software, Formal analysis, Data curation, Visualization, Writing—original draft. Y.Z.: Data curation, Formal analysis, Visualization, Writing—review & editing. J.N.: Writing—review & editing, Visualization, Funding acquisition, Supervision. J.H.: Conceptualization, Methodology, Writing—review & editing, Supervision. F.H.: Investigation, Data curation. X.Z.: Validation, Writing—review & editing. D.L.: Methodology, Writing—review & editing. B.X.H.: Project administration, Funding acquisition, Resources, Supervision. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Key Project of National Natural Science Foundation of China [42430712], National Natural Science Foundation of China [41972244], Guizhou Provincial Science and Technology Support Projects (Qiankehe Support [2024] General 057), Guizhou Provincial Qiankehe Talent Program (KJZY [2025]030), and the High-Level Talent Training Program in Guizhou Province (GCC [2023]045). We thank colleagues and field assistants for their valuable help during data collection and discussion.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Acknowledgments

The authors thank the Institute of Karst Geology, Chinese Academy of Geological Sciences for providing the data, and appreciate all those who offered assistance during the writing process.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

Gleeson, T.; Befus, K.M.; Jasechko, S.; Luijendijk, E.; Cardenas, M.B. The global volume and distribution of modern groundwater. Nat. Geosci. 2016, 9, 161. [Google Scholar] [CrossRef]
Rodell, M.; Famiglietti, J.S.; Wiese, D.N.; Reager, J.T.; Beaudoing, H.K.; Landerer, F.W.; Lo, M.H. Emerging trends in global freshwater availability. Nature 2018, 557, 650. [Google Scholar] [CrossRef] [PubMed]
Rajaee, T.; Ebrahimi, H.; Nourani, V. A review of the artificial intelligence methods in groundwater level modeling. J. Hydrol. 2019, 572, 336–351. [Google Scholar] [CrossRef]
Sahoo, S.; Russo, T.A.; Elliott, J.; Foster, I. Machine learning algorithms for modeling groundwater level changes in agricultural regions of the US. Water Resour. Res. 2017, 53, 3878–3895. [Google Scholar] [CrossRef]
Condon, L.E.; Atchley, A.L.; Maxwell, R.M. Evapotranspiration depletes groundwater under warming over the contiguous United States. Nat. Commun. 2020, 11, 873. [Google Scholar] [CrossRef]
Jasechko, S.; Perrone, D. Global groundwater wells at risk of running dry. Science 2021, 372, 418. [Google Scholar] [CrossRef]
Ford, D.; Williams, P.D. Karst Hydrogeology and Geomorphology; John Wiley & Sons: Hoboken, NJ, USA, 2007. [Google Scholar]
Shuster, E.T.; White, W.B. Seasonal fluctuations in the chemistry of lime-stone springs: A possible means for characterizing carbonate aquifers. J. Hydrol. 1971, 14, 93–128. [Google Scholar] [CrossRef]
Stroj, A.; Briški, M.; Oštrić, M. Study of groundwater flow properties in a karst system by coupled analysis of diverse environmental tracers and discharge dynamics. Water 2020, 12, 2442. [Google Scholar] [CrossRef]
Guo, F.; Yang, J.; Li, H.; Li, G.; Zhang, Z. A convLSTM conjunction model for groundwater level forecasting in a karst aquifer considering connectivity characteristics. Water 2021, 13, 2759. [Google Scholar] [CrossRef]
Zhang, W.; Liu, T.; Duan, L.; Zhou, S.; Shi, Z.; Qu, S.; Singh, V.P. Forecasting groundwater level of karst aquifer in a large mining area using partial mutual information and NARX hybrid model. Environ. Res. 2022, 213, 113747. [Google Scholar] [CrossRef]
Yuan, D.; Li, B.; Liu, Z. Karst in China. Epis. J. Int. Geosci. 1995, 18, 62–65. [Google Scholar] [CrossRef]
Liao, C.M.; Li, L.; Yan, Z.Q.; Hu, B.Q. Ecological security evaluation of sustainable agricultural development in karst mountainous area: A case study of Du’an Yao Autonomous County in Guangxi. Chin. Geogr. Sci. 2004, 14, 142–147. [Google Scholar] [CrossRef]
Nearing, G.; Cohen, D.; Dube, V.; Gauch, M.; Gilon, O.; Harrigan, S.; Hassidim, A.; Klotz, D.; Kratzert, F.; Metzger, A.; et al. Global prediction of extreme floods in ungauged watersheds. Nature 2024, 627, 559–563. [Google Scholar] [CrossRef] [PubMed]
Arshad, A.; Mirchi, A.; Vilcaez, J.; Akbar, M.U.; Madani, K. Reconstructing high-resolution groundwater level data using a hybrid random forest model to quantify distributed groundwater changes in the Indus Basin. J. Hydrol. 2024, 628, 130535. [Google Scholar] [CrossRef]
Hikouei, I.S.; Eshleman, K.N.; Saharjo, B.H.; Graham, L.L.; Applegate, G.; Cochrane, M.A. Using machine learning algorithms to predict groundwater levels in Indonesian tropical peatlands. Sci. Total Environ. 2023, 857, 159701. [Google Scholar] [CrossRef] [PubMed]
Wang, X.; Liu, T.; Zheng, X.; Peng, H.; Xin, J.; Zhang, B. Short-term prediction of groundwater level using improved random forest regression with a combination of random features. Appl. Water Sci. 2018, 8, 125. [Google Scholar] [CrossRef]
Feng, F.; Ghorbani, H.; Radwan, A.E. Predicting groundwater level using traditional and deep machine learning algorithms. Front. Environ. Sci. 2024, 12, 1291327. [Google Scholar] [CrossRef]
Kow, P.; Liou, J.; Sun, W.; Chang, L.; Chang, F. Watershed groundwater level multistep ahead forecasts by fusing convolutional-based autoencoder and LSTM models. J. Environ. Manag. 2024, 351, 119789. [Google Scholar] [CrossRef]
Han, Z.; Li, F.; Zhao, Y.; Liu, C. Investigation into groundwater level prediction within a deep learning framework: Incorporating the spatial dynamics of adjacent wells. J. Hydrol. 2025, 657, 133097. [Google Scholar] [CrossRef]
Patra, S.R.; Chu, H.; Aman, M.A. Utilizing deep learning to investigate the impacts of climate change on groundwater dynamics and pumping variability. Sci. Total Environ. 2024, 957, 177784. [Google Scholar] [CrossRef]
Fang, J.; Yang, L.; Wen, X.; Yu, H.; Li, W.; Adamowski, J.F.; Barzegar, R. Ensemble learning using multivariate variational mode decomposition based on the Transformer for multi-step-ahead streamflow forecasting. J. Hydrol. 2024, 636, 131275. [Google Scholar] [CrossRef]
Davari, S.; Eslamian, S.; Jamali, M.; Safavi, H.R. Application of machine learning algorithms for groundwater level prediction in the Najafabad plain. Sci. Rep. 2026, 16, 2476. [Google Scholar] [CrossRef] [PubMed]
Ali, A.J.; Ahmed, A.A.; Abbod, M.F. Groundwater level predictions in the Thames Basin, London over extended horizons using Transformers and advanced machine learning models. J. Clean. Prod. 2024, 484, 144300. [Google Scholar] [CrossRef]
Hu, S.; Du, M.; Yang, J.; Liu, Y.; Tuo, Z.; Ma, X. Application of a Hybrid CNN-LSTM Model for Groundwater Level Forecasting in Arid Regions: A Case Study from the Tailan River Basin. ISPRS Int. J. Geo-Inf. 2025, 15, 6. [Google Scholar] [CrossRef]
Roy, B.; Goodall, J.L.; McSpadden, D.; Goldenberg, S.; Schram, M. Forecasting multi-step-ahead street-scale nuisance flooding using a Seq2seq LSTM surrogate model for real-time application in a coastal-Urban City. J. Hydrol. 2025, 656, 132697. [Google Scholar] [CrossRef]
Utama, I.B.K.Y.; Tran, D.H.; Jang, Y.M. Short-term PM2. 5 prediction using modified attention Seq2Seq BiLSTM. In 2022 Thirteenth International Conference on Ubiquitous and Future Networks (ICUFN); IEEE: Piscataway, NJ, USA, 2022; pp. 462–465. [Google Scholar]
Oreshkin, B.N.; Carpov, D.; Chapados, N.; Bengio, Y. N-BEATS: Neural basis expansion analysis for interpretable time series forecasting. arXiv 2019, arXiv:1905.10437. [Google Scholar]
Priya, R.Y.; Manjula, R. Enhancing daily streamflow prediction: A comparative analysis of univariate LSTM and N-BEATS models with coupled SWAT-LSTM and SWAT-N-BEATS models incorporating influential SWAT features. J. Earth Syst. Sci. 2025, 134, 112. [Google Scholar] [CrossRef]
Sattari, M.T.; Athari, E.; Aalami, M.T. Comparison of N-BEATS with Standalone and Hybrid Deep Learning Models in Monthly Inflow Forecasting to the Aras Dam Reservoir: A Feature Selection Analysis. J. Agric. Sci. 2025, 31, 747–766. [Google Scholar] [CrossRef]
Ali, A.S.A.; Jazaei, F.; Babakhani, P.; Ashiq, M.M.; Bakhshaee, A.; Waldron, B. An overview of deep learning applications in groundwater level modeling: Bridging the gap between academic research and industry applications. Appl. Comput. Intell. Soft Comput. 2024, 2024, 9480522. [Google Scholar] [CrossRef]
Tao, H.; Hameed, M.M.; Marhoon, H.A.; Zounemat-Kermani, M.; Heddam, S.; Kim, S.; Sulaiman, S.O.; Tan, M.L.; Sa’adi, Z.; Mehr, A.D.; et al. Groundwater level prediction using machine learning models: A comprehensive review. Neurocomputing 2022, 489, 271–308. [Google Scholar] [CrossRef]
Wunsch, A.; Liesch, T.; Broda, S. Groundwater level forecasting with artificial neural networks: A comparison of LSTM, CNN and NARX. Hydrol. Earth Syst. Sci. Discuss. 2020, 2020, 1–23. [Google Scholar] [CrossRef]
Hu, F.; Yang, Q.; Yang, J.; Luo, Z.; Shao, J.; Wang, G. Incorporating multiple grid-based data in CNN-LSTM hybrid model for daily runoff prediction in the source region of the Yellow River Basin. J. Hydrol. Reg. Stud. 2024, 51, 101652. [Google Scholar] [CrossRef]
Kumar, D.; Roshni, T.; Singh, A.; Jha, M.K.; Samui, P. Predicting groundwater depth fluctuations using deep learning, extreme learning machine and Gaussian process: A comparative study. Earth Sci. Inform. 2020, 13, 1237–1250. [Google Scholar] [CrossRef]
Lin, H.; Gharehbaghi, A.; Zhang, Q.; Band, S.S.; Pai, H.T.; Chau, K.; Mosavi, A. Time series-based groundwater level forecasting using gated recurrent unit deep neural networks. Eng. Appl. Comput. Fluid Mech. 2022, 16, 1655–1672. [Google Scholar] [CrossRef]
Guo, Y.; Huang, F.; Chi, F.; Zhang, N.; Ma, J.; Miao, Y.; Chen, F. Hydrogeological structures of karst features using hydrographs in an underground river basin formed in a peak cluster depression, southwest China. J. Hydrol. 2024, 634, 131085. [Google Scholar] [CrossRef]
Guo, Y.; Huang, F.; Sun, P.A.; Zhang, C.; Xiao, Q.; Wen, Z.; Yang, H. Hydrogeological functioning of a karst underground river basin in southwest China. Groundwater 2023, 61, 895–913. [Google Scholar] [CrossRef]
Xu, Q.; Liu, H.; Ran, J.; Li, W.; Sun, X. Field monitoring of groundwater responses to heavy rainfalls and the early warning of the Kualiangzi landslide in Sichuan Basin, southwestern China. Landslides 2016, 13, 1555–1570. [Google Scholar] [CrossRef]
Niazkar, M.; Menapace, A.; Brentan, B.; Piraei, R.; Jimenez, D.; Dhawan, P.; Righetti, M. Applications of XGBoost in water resources engineering: A systematic literature review (Dec 2018–May 2023). Environ. Model. Softw. 2024, 174, 105971. [Google Scholar] [CrossRef]
Basu, B.; Morrissey, P.; Gill, L.W. Application of nonlinear time series and machine learning algorithms for forecasting groundwater flooding in a lowland karst area. Water Resour. Res. 2022, 58, e2021WR029576. [Google Scholar] [CrossRef]
Boo, K.B.W.; El-Shafie, A.; Othman, F.; Khan, M.M.H.; Birima, A.H.; Ahmed, A.N. Groundwater level forecasting with machine learning models: A review. Water Res. 2024, 252, 121249. [Google Scholar] [CrossRef]
Kumar, V.; Kedam, N.; Sharma, K.V.; Mehta, D.J.; Caloiero, T. Advanced machine learning techniques to improve hydrological prediction: A comparative analysis of streamflow prediction models. Water 2023, 15, 2572. [Google Scholar] [CrossRef]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Hastie, T. The Elements of Statistical Learning: Data Mining, Inference, and Prediction; Springer Nature: Berlin/Heidelberg, Germany, 2009. [Google Scholar]
Liaw, A.; Wiener, M. Classification and regression by randomForest. R News 2002, 2, 18–22. [Google Scholar]
Xu, L.; Cui, X.; Bian, J.; Wang, Y.; Wu, J. Dynamic change and driving response of shallow groundwater level based on random forest in southwest Songnen Plain. J. Hydrol. Reg. Stud. 2024, 53, 101800. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
Guo, X.; Gui, X.; Xiong, H.; Hu, X.; Li, Y.; Cui, H.; Qiu, Y.; Ma, C. Critical role of climate factors for groundwater potential mapping in arid regions: Insights from random forest, XGBoost, and LightGBM algorithms. J. Hydrol. 2023, 621, 129599. [Google Scholar] [CrossRef]
Tripathy, K.P.; Mishra, A.K. Deep learning in hydrology and water resources disciplines: Concepts, methods, applications, and research directions. J. Hydrol. 2024, 628, 130458. [Google Scholar] [CrossRef]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
Chu, H.; Bian, J.; Lang, Q.; Sun, X.; Wang, Z. Daily groundwater level prediction and uncertainty using LSTM coupled with PMI and bootstrap incorporating teleconnection patterns information. Sustainability 2022, 14, 11598. [Google Scholar] [CrossRef]
Zhang, J.; Zhang, X.; Niu, J.; Hu, B.X.; Soltanian, M.R.; Qiu, H.; Yang, L. Prediction of groundwater level in seashore reclaimed land using wavelet and artificial neural network-based hybrid model. J. Hydrol. 2019, 577, 123948. [Google Scholar] [CrossRef]
Goodfellow, I.; Bengio, Y.; Courville, A.; Bengio, Y. Deep Learning; MIT Press: Cambridge, MA, USA, 2016; Volume 1. [Google Scholar]
Ismail Fawaz, H.; Forestier, G.; Weber, J.; Idoumghar, L.; Muller, P.A. Deep learning for time series classification: A review. Data Min. Knowl. Discov. 2019, 33, 917–963. [Google Scholar] [CrossRef]
Song, C.M. Data construction methodology for convolution neural network based daily runoff prediction and assessment of its applicability. J. Hydrol. 2022, 605, 127324. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. Paper Presented at the Advances in Neural Information Processing Systems (NeurIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar] [CrossRef]
Wen, Q.; Zhou, T.; Zhang, C.; Chen, W.; Ma, Z.; Yan, J.; Sun, L. Transformers in time series: A survey. arXiv 2022, arXiv:2202.07125. [Google Scholar]
Olivares, K.G.; Challu, C.; Marcjasz, G.; Weron, R.; Dubrawski, A. Neural basis expansion analysis with exogenous variables: Forecasting electricity prices with NBEATSx. Int. J. Forecast. 2023, 39, 884–900. [Google Scholar] [CrossRef]
Ng, K.W.; Huang, Y.F.; Koo, C.H.; Chong, K.L.; El-Shafie, A.; Najah Ahmed, A. A review of hybrid deep learning applications for streamflow forecasting. J. Hydrol. 2023, 625, 130141. [Google Scholar] [CrossRef]
Ghimire, S.; Yaseen, Z.M.; Farooque, A.A.; Deo, R.C.; Zhang, J.; Tao, X. Streamflow prediction using an integrated methodology based on convolutional neural network and long short-term memory networks. Sci. Rep. 2021, 11, 17497. [Google Scholar] [CrossRef]
Shi, H.; Wei, A.; Xu, X.; Zhu, Y.; Hu, H.; Tang, S. A CNN-LSTM based deep learning model with high accuracy and robustness for carbon price forecasting: A case of Shenzhen’s carbon market in China. J. Environ. Manag. 2024, 352, 120131. [Google Scholar] [CrossRef]
Stefenon, S.F.; Seman, L.O.; Aquino, L.S.; Dos Santos Coelho, L. Wavelet-Seq2Seq-LSTM with attention for time series forecasting of level of dams in hydroelectric power plants. Energy 2023, 274, 127350. [Google Scholar] [CrossRef]
Legates, D.R.; McCabe, G.J., Jr. Evaluating the use of “goodness-of-fit” measures in hydrologic and hydroclimatic model validation. Water Resour. Res. 1999, 35, 233–241. [Google Scholar] [CrossRef]
Chai, T.; Draxler, R.R. Root mean square error (RMSE) or mean absolute error (MAE)?–Arguments against avoiding RMSE in the literature. Geosci. Model Dev. 2014, 7, 1247–1250. [Google Scholar] [CrossRef]
Karbasi, M.; Ali, M.; Bateni, S.M.; Jun, C.; Jamei, M.; Farooque, A.A.; Yaseen, Z.M. Multi-step ahead forecasting of electrical conductivity in rivers by using a hybrid Convolutional Neural Network-Long Short-Term Memory (CNN-LSTM) model enhanced by Boruta-XGBoost feature selection algorithm. Sci. Rep. 2024, 14, 15051. [Google Scholar] [CrossRef]
Pölz, A.; Blaschke, A.P.; Komma, J.; Farnleitner, A.H.; Derx, J. Transformer versus LSTM: A comparison of deep learning models for karst spring discharge forecasting. Water Resour. Res. 2024, 60, e2022WR032602. [Google Scholar] [CrossRef]
Wang, S.; Wang, W.; Zhao, G. A novel deep learning rainfall–runoff model based on Transformer combined with base flow separation. Hydrol. Res. 2024, 55, 576–594. [Google Scholar] [CrossRef]
Lim, B.; Zohren, S. Time-series forecasting with deep learning: A survey. Philos. Trans. R. Soc. A 2021, 379, 20200209. [Google Scholar] [CrossRef] [PubMed]
Ding, H.; Zhang, X.; Chu, X.; Wu, Q. Simulation of groundwater dynamic response to hydrological factors in karst aquifer system. J. Hydrol. 2020, 587, 124995. [Google Scholar] [CrossRef]
Wali, S.U.; Usman, A.A.; Usman, A.B.; Abdullahi, U.; Mohammed, I.U.; Hayatu, J.M. Resolving challenges of groundwater flow modelling for improved water resources management: A narrative review. Int. J. Hydrol. 2024, 8, 175–193. [Google Scholar] [CrossRef]
Sun, W.; Chang, L.; Chang, F. Deep dive into predictive excellence: Transformer’s impact on groundwater level prediction. J. Hydrol. 2024, 636, 131250. [Google Scholar] [CrossRef]
Martín-Suazo, S.; Morón-López, J.; Vakaruk, S.; Karamchandani, A.; Aguilar, J.A.P.; Mozo, A.; Gómez-Canaval, S.; Vinyals, M.; Ortiz, J.M. Deep learning methods for multi-horizon long-term forecasting of Harmful Algal Blooms. Knowl.-Based Syst. 2024, 301, 112279. [Google Scholar] [CrossRef]

Figure 1. Location map of the study area in Guilin, China. The map shows the boundary of the study area, rivers, karst underground rivers, and DEM-based elevation distribution. “ZK” denotes the groundwater level observation wells.

Figure 2. Monthly Precipitation and Rainy Days (February 2015–July 2023).

Figure 3. Time series of groundwater levels at ZK1–ZK3 and rainfall in the study area during July 2021–July 2023. The dataset is divided into training, validation, and testing periods.

Figure 4. Comparisons of Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and Coefficient of Determination (R²) for different prediction models at ZK1, ZK2, and ZK3. The evaluation metrics were calculated on the testing datasets to assess the predictive performance of each model. Lower RMSE values indicate higher prediction accuracy, smaller MAE values represent better model performance, and R² values closer to 1 indicate stronger model fitting capability. The blue bars represent performance at monitoring site ZK1, the orange bars for ZK2, and the red bars for ZK3. Reading horizontally across models reveals relative performance rankings, while vertical comparison within each model shows site-specific variations.

Figure 5. Heatmaps of RMSE and R² for nine predictive models across three monitoring boreholes (ZK1–ZK3) and 12 forecast steps. For both metrics, blue indicates better model performance (lower RMSE or higher R²) and red indicates poorer performance (higher RMSE or lower R²). Each row represents a model and each column represents a forecast step. MAE results are provided in Figure S7.

Figure 6. Model prediction vs. observation at ZK1 for steps 1 (a), 6 (b), and 12 (c). Prediction results for ZK2 and ZK3 are provided in the Supplementary Materials (Figures S8 and S9). Black solid lines represent observed groundwater levels, while colored lines show predictions from nine different models. The inset boxes highlight periods of the fastest groundwater level changes. Model performance can be visually evaluated by examining how closely prediction lines follow the observed data, particularly in capturing peak and trough values and maintaining overall trends. In 1-step predictions (a), most models closely follow observations with minimal deviations. As the forecast horizon increases to 6 steps (b), all models show some degree of performance degradation, while CNN, CNN-LSTM, Transformer, and N-BEATS still maintain relatively good tracking ability. In 12-step predictions (c), performance decline becomes more pronounced, with N-BEATS and Transformer demonstrating stronger capability in capturing long-term trends, while other models exhibit greater lag and errors.

Figure 7. Prediction accuracy comparison of three optimal models at ZK1. Scatter plots for 1-step, 6-step, and 12-step forecasts. Scatter points represent individual predicted vs. observed value pairs; dashed lines indicate perfect prediction (1:1 relationship). Points closer to this line indicate better prediction accuracy, while scatter dispersion reflects prediction uncertainty. (a) CNN-LSTM model under 1-step prediction, with points tightly clustered along the 1:1 line (

R^{2} = 0.999

), indicating near-perfect short-term forecasting performance; (b) Transformer model under 1-step prediction, achieving equally excellent accuracy (

R^{2} = 0.999

) with minimal scatter dispersion around the 1:1 line; (c) N-BEATS model under 1-step prediction, performing slightly lower than the other two models but still maintaining high accuracy (

R^{2} = 0.994

), with a modest increase in scatter dispersion at the extremes of the value range; (d) CNN-LSTM model under 6-step prediction, showing a noticeable increase in scatter dispersion compared to the 1-step case while overall linearity is preserved (

R^{2} = 0.981

); (e) Transformer model under 6-step prediction, exhibiting performance slightly superior to CNN-LSTM at the same horizon (

R^{2} = 0.984

), with points remaining relatively well-aligned along the 1:1 line; (f) N-BEATS model under 6-step prediction, achieving comparable accuracy to the other models (

R^{2} = 0.980

) but with slightly greater scatter at higher observed values; (g) CNN-LSTM model under 12-step prediction, displaying the most pronounced scatter dispersion among all 12-step results (

R^{2} = 0.902

), with visible deviation clusters suggesting increased prediction uncertainty at longer horizons; (h) Transformer model under 12-step prediction, showing moderately increased dispersion relative to the 6-step case (

R^{2} = 0.908

), with some systematic deviation visible at intermediate observed values; (i) N-BEATS model under 12-step prediction, achieving the highest

R^{2}

among all 12-step forecasts (

R^{2} = 0.914

), demonstrating relatively optimal robustness as the forecast horizon extends.

Figure 7. Prediction accuracy comparison of three optimal models at ZK1. Scatter plots for 1-step, 6-step, and 12-step forecasts. Scatter points represent individual predicted vs. observed value pairs; dashed lines indicate perfect prediction (1:1 relationship). Points closer to this line indicate better prediction accuracy, while scatter dispersion reflects prediction uncertainty. (a) CNN-LSTM model under 1-step prediction, with points tightly clustered along the 1:1 line (

R^{2} = 0.999

), indicating near-perfect short-term forecasting performance; (b) Transformer model under 1-step prediction, achieving equally excellent accuracy (

R^{2} = 0.999

) with minimal scatter dispersion around the 1:1 line; (c) N-BEATS model under 1-step prediction, performing slightly lower than the other two models but still maintaining high accuracy (

R^{2} = 0.994

), with a modest increase in scatter dispersion at the extremes of the value range; (d) CNN-LSTM model under 6-step prediction, showing a noticeable increase in scatter dispersion compared to the 1-step case while overall linearity is preserved (

R^{2} = 0.981

); (e) Transformer model under 6-step prediction, exhibiting performance slightly superior to CNN-LSTM at the same horizon (

R^{2} = 0.984

), with points remaining relatively well-aligned along the 1:1 line; (f) N-BEATS model under 6-step prediction, achieving comparable accuracy to the other models (

R^{2} = 0.980

) but with slightly greater scatter at higher observed values; (g) CNN-LSTM model under 12-step prediction, displaying the most pronounced scatter dispersion among all 12-step results (

R^{2} = 0.902

), with visible deviation clusters suggesting increased prediction uncertainty at longer horizons; (h) Transformer model under 12-step prediction, showing moderately increased dispersion relative to the 6-step case (

R^{2} = 0.908

), with some systematic deviation visible at intermediate observed values; (i) N-BEATS model under 12-step prediction, achieving the highest

R^{2}

among all 12-step forecasts (

R^{2} = 0.914

), demonstrating relatively optimal robustness as the forecast horizon extends.

Figure 8. Cumulative prediction error comparison of nine models at ZK1. 1-step (a), 6-step (b), and 12-step (c) forecasts. Results for ZK2 and ZK3 are provided in Figures S10 and S11. The plots show cumulative absolute error (y-axis) accumulation over time steps (x-axis), where lower curves indicate better model performance and slower error accumulation. Steeper slopes represent rapid error growth, while flatter curves indicate more stable predictions. With increasing forecast horizons, N-BEATS and CNN-LSTM progressively exhibit superior prediction stability.

Table 1. Correlation matrix of the observed groundwater level data among monitoring boreholes.

Boreholes	ZK1	ZK2	ZK3
ZK1	1.0	0.81	0.95
ZK2	0.81	1.0	0.98
ZK3	0.95	0.98	1.0

Table 2. Cross-validation Results.

Borehole Combination	RMSE	$R^{2}$
ZK1-ZK3	$0.2871$	$0.9018$
ZK2-ZK3	$0.2551$	$0.9513$

Table 3. Basic Statistics (m).

Boreholes	Mean	Maximum	Minimum	Std. Deviation	Skewness
ZK1	264.7591	267.8155	263.2715	0.7695	1.5226
ZK2	234.2693	240.8120	233.0305	1.1668	2.1601
ZK3	189.2946	192.6158	188.2752	0.7247	1.9080

Table 4. Hardware and software environment configuration.

Component	Specification
Processor	13th Gen Intel Core (TM) i7-13700H (Intel Corporation, Santa Clara, CA, USA)
GPU	NVIDIA GeForce RTX 4060 Laptop GPU (NVIDIA Corporation, Santa Clara, CA, USA)
Operating System	Windows 11
Deep Learning Framework	Pytorch 2.4.0
Machine Learning Library	Scikit-learn 1.5.1
Development Environment	Visual Studio Code 1.113.0
Programming Language	Python 3.9

Table 5. Parameter size and complexity statistics of different models.

Model	Number of Parameters	Complexity Level	Memory Requirement (MB)
CNN	34,764	Low	0.13
LSTM	52,236	Low	0.2
Transformer	72,268	Medium	0.28
CNN-LSTM	101,900	Medium	0.39
Seq2Seq-LSTM	101,953	Medium	0.39
Attention-Seq2Seq-LSTM	126,722	Medium-High	0.48
N-BEATS	490,770	High	1.87

Table 6. Training time statistics of different models (s).

Model	ZK1	ZK2	ZK3	Average Time	Efficiency Level
XGBoost	3.54	3.36	3.56	3.49	Extremely High
Random Forest	14.52	14.77	18.27	15.85	Extremely High
CNN	20.47	24.12	28.17	24.25	High
CNN-LSTM	27.45	28.12	28.34	27.97	High
Transformer	72.39	66.62	51.4	63.47	Medium
LSTM	76.84	42.88	85.96	68.56	Medium
N-BEATS	373.84	269.03	360.38	334.42	Low
Attention-Seq2Seq-LSTM	278.13	483.13	597.19	452.82	Low
Seq2Seq-LSTM	630.48	527.35	318.91	492.25	Low

Table 7. Convergence epoch statistics of different models.

Model	Average Number of Epochs	Convergence Speed	Training Stability
CNN-LSTM	28.0	Fast	High
CNN	32.3	Fast	High
Transformer	35.0	Medium	Medium
N-BEATS	38.7	Medium	Medium
Attention-Seq2Seq-LSTM	44.0	Slow	Medium
LSTM	50.7	Slow	Low
Seq2Seq-LSTM	65.3	Slow	Low

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhu, Q.; Zhu, Y.; Niu, J.; Huang, J.; Huang, F.; Zhou, X.; Liu, D.; Hu, B.X. Benchmarking Machine Learning and Deep Learning Models for Groundwater Level Prediction in Karst Aquifers: The Dominant Role of Hydrogeological Complexity. Water 2026, 18, 939. https://doi.org/10.3390/w18080939

AMA Style

Zhu Q, Zhu Y, Niu J, Huang J, Huang F, Zhou X, Liu D, Hu BX. Benchmarking Machine Learning and Deep Learning Models for Groundwater Level Prediction in Karst Aquifers: The Dominant Role of Hydrogeological Complexity. Water. 2026; 18(8):939. https://doi.org/10.3390/w18080939

Chicago/Turabian Style

Zhu, Qingmin, Yinxia Zhu, Jie Niu, Jinqiang Huang, Fen Huang, Xiangyang Zhou, Dongdong Liu, and Bill X. Hu. 2026. "Benchmarking Machine Learning and Deep Learning Models for Groundwater Level Prediction in Karst Aquifers: The Dominant Role of Hydrogeological Complexity" Water 18, no. 8: 939. https://doi.org/10.3390/w18080939

APA Style

Zhu, Q., Zhu, Y., Niu, J., Huang, J., Huang, F., Zhou, X., Liu, D., & Hu, B. X. (2026). Benchmarking Machine Learning and Deep Learning Models for Groundwater Level Prediction in Karst Aquifers: The Dominant Role of Hydrogeological Complexity. Water, 18(8), 939. https://doi.org/10.3390/w18080939

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Benchmarking Machine Learning and Deep Learning Models for Groundwater Level Prediction in Karst Aquifers: The Dominant Role of Hydrogeological Complexity

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Data Sources and Data Preprocessing

2.3. Experimental Setup

2.4. Feature Engineering

2.5. Prediction Models

2.5.1. Conventional Machine Learning Models

2.5.2. Single Deep-Learning Architectures

2.5.3. Hybrid Deep-Learning Architectures

2.6. Model Evaluation Metrics

3. Results and Discussion

3.1. Performance Analysis of Single-Step Prediction

3.2. Performance Analysis of Multi-Step Prediction

3.3. Comparative Analysis of Computational Efficiency

3.4. Limitations and Transferability

4. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI