Study on the Evolution of Groundwater Level in Hebei Plain to the South of Beijing and Tianjin Based on LSTM Model

Guo, Wei; Yang, Huifeng; Li, Zeyan; Meng, Ruifang; Bao, Xilin; Bai, Hua

doi:10.3390/su17104394

Open AccessArticle

Study on the Evolution of Groundwater Level in Hebei Plain to the South of Beijing and Tianjin Based on LSTM Model

by

Wei Guo

¹,

Huifeng Yang

^1,2,3,*,

Zeyan Li

^1,2,3,

Ruifang Meng

^1,2,3,

Xilin Bao

^1,2,3 and

Hua Bai

^1,2,3

¹

Institute of Hydrogeology and Environmental Geology, Chinese Academy of Geosciences, Shijiazhuang 050061, China

²

Hebei Cangzhou Groundwater and Land Subsidence National Observation and Research Station, Cangzhou 061000, China

³

Key Laboratory of Groundwater Sciences and Engineering, Ministry of Natural Resources, Shijiazhuang 050061, China

^*

Author to whom correspondence should be addressed.

Sustainability 2025, 17(10), 4394; https://doi.org/10.3390/su17104394

Submission received: 6 March 2025 / Revised: 24 April 2025 / Accepted: 7 May 2025 / Published: 12 May 2025

Download

Browse Figures

Versions Notes

Abstract

This study addresses the limitations of machine learning in regional groundwater dynamics research, particularly the insufficient integration of the hydrogeological background and low simulation accuracy. Focusing on the shallow groundwater in the Hebei Plain south of Beijing and Tianjin, we integrate static data, including hydrogeological parameters, with the commonly used time-series data. A novel regionalization strategy based on depositional systems is proposed to enhance the model’s spatial adaptability. The Long Short-Term Memory (LSTM) model, augmented with an attention mechanism, adjusts the dynamic model weights using static data to reflect geological impacts on groundwater dynamics. Comparative results show that the refined regionalization and the inclusion of static data significantly improve the accuracy of the model. Based on the fitting results, the comparison of shallow groundwater level prediction between 2023 and 2040 under two mining conditions shows that the continuous implementation of the pressure mining policy has accelerated the recovery of water level, and the rise in groundwater level is obviously different between regions. The alluvial fan in the piedmont has the largest rise, and the marine sedimentary plain has the smallest rise. This study provides a new method for analyzing groundwater dynamics under complex hydrogeological conditions and provides a basis for regional groundwater management and sustainable utilization.

Keywords:

groundwater level prediction; deep learning model; hydrogeological parameters; depositional systems

1. Introduction

Groundwater is a vital strategic resource, serving as the primary water supply for urban populations, agricultural irrigation, and industrial activities. Moreover, it plays a crucial role in supporting national economic development. Accurately understanding groundwater level dynamics is fundamental for the scientific management and sustainable utilization of groundwater resources, particularly in regions facing over-extraction and climate variability. Given that groundwater is stored in subsurface aquifers with inherently complex geological structures, its level dynamics are subject to multiple hydrogeological and climatic influences [1]. The most direct way to monitor groundwater levels is by drilling observation wells for long-term monitoring. However, the installation of observation wells is time-consuming and expensive, and their locations are greatly restricted by topography and geomorphology. Consequently, the data obtained are often spatially sparse and prone to anomalies. In addition to field monitoring, numerical models are extensively utilized to simulate and predict groundwater levels over extended periods [2,3]. Physics-driven models such as MODFLOW [4], HYDRUS [5], and GMS [6] have been extensively applied. These models rely on conceptualized mathematical and physical frameworks to simulate relatively complex groundwater systems. However, these models demand extensive hydrogeological data inputs and significant time for setup and calibration. Additionally, challenges such as aquifer heterogeneity, conceptualization issues, and parameter scale effects often result in reduced accuracy in refined spatial zoning [7].

Machine learning techniques, increasingly applied in hydrological modeling since the 1990s, have shown performance comparable to or exceeding that of traditional numerical models [8,9,10,11,12]. Among these, deep learning neural network models such as Long Short-Term Memory (LSTM), specifically designed for processing time-series data, are the most commonly used in hydrogeological studies [13,14,15]. These models exhibit unique advantages when integrating large datasets. By identifying patterns in input data, the algorithms can automatically learn and capture the nonlinear relationships between input variables and target variables [16]. Using mathematical principles, they derive optimal functions from the provided data to perform tasks such as classification, prediction, or detection [16,17]. In recent years, attention mechanisms have been increasingly incorporated into hydrological modeling. By dynamically assigning weights, these mechanisms allow models to focus on the most relevant parts of an input sequence, enhancing the model’s capacity to capture significant patterns amidst large-scale, noisy, and complex datasets. Notably, the integration of attention mechanisms into LSTM architectures has shown considerable promise in improving the accuracy of groundwater level predictions, particularly in regions with complex geological settings [18,19]. For example, in China’s Hetao Plain, a hybrid model combining CNN, LSTM, and attention mechanisms achieved high-precision spatiotemporal predictions of groundwater vulnerability [20]. Similarly, in the Three Gorges Reservoir area, an attention-enhanced CNN-LSTM model significantly improved simulation accuracy and adaptability [21]. These studies suggest that incorporating attention into the LSTM framework has become an effective and promising approach in groundwater modeling, especially in regions with complex geological conditions and highly variable hydrological environments. Despite these advances, the research on machine learning models in incorporating static hydrogeological parameters closely related to groundwater behavior is still limited [22,23,24].

Currently, applications of deep learning to groundwater level dynamics in the North China Plain predominantly rely on time series inputs [13,25]. For instance, models incorporating precipitation and groundwater extraction volumes into LSTM frameworks have been employed to simulate and predict regional groundwater levels [26,27]. Integrating time series data with groundwater level observations enhances the model’s responsiveness to the target variable, thereby improving predictive accuracy. However, the selection of time series data types should be informed by the specific characteristics of the study area and its hydrogeological context. For example, the impact of the Normalized Difference Vegetation Index (NDVI) may be considered negligible in the North China Plain, where agricultural irrigation predominates. Furthermore, as groundwater resides within rock pore spaces, its dynamics are governed not only by temporal variables such as precipitation and extraction, but also by hydrogeological parameters. In the North China Plain, the principal sources of shallow groundwater recharge are precipitation infiltration and ecological river recharge, which are influenced by factors including infiltration coefficient, permeability, and topographic elevation. The primary discharge processes—extraction and evaporation—are governed by specific yield and evaporation coefficients, respectively.

To address these limitations, this study applies Singular Spectrum Analysis (SSA) to denoise time-series data, mitigating the effects of outliers and improving data quality. An LSTM model, enhanced with an attention mechanism and supplemented by hydrogeological parameters, is constructed to dynamically adjust the weighting of time-series inputs based on static characteristics. Furthermore, a refined regional zoning approach based on depositional systems is proposed to enhance spatial adaptability, thereby improving the model’s predictive accuracy under complex hydrogeological conditions.

2. Study Area

The study area of this research is the Hebei Plain to the south of Beijing and Tianjin (Figure 1), located within the North China Plain and including seven cities, including Shijiazhuang, covering an area of about 62,900 km². The area borders the Bohai Sea to the east, the northern Henan and western Shandong Plains to the south, and the Taihang Mountains to the west. The overall terrain slopes from northwest to southeast and can be divided into three geomorphological zones from west to east based on origin: the Piedmont alluvial fan, the central alluvial-lacustrine plain, and the eastern alluvial-marine coastal plain [28]. The study area primarily belongs to the Haihe River Basin, with the southeastern edge falling within the Yellow River Basin. Major rivers traversing the area include the Yongding River, Daqing River, Ziya River, and Zhangwei River. Groundwater in the region is predominantly stored in Quaternary pore aquifers, which can be divided into four aquifer groups from top to bottom based on burial characteristics and hydraulic properties. The first aquifer group is an unconfined aquifer with a depth of approximately 40–60 m, while the second to fourth aquifer groups are confined aquifers with burial depths of 120–170 m, 250–350 m, and 350–550 m, respectively [29]. In terms of water supply, groundwater is the primary source for the south-central Hebei Plain to the south of Beijing and Tianjin, with shallow groundwater accounting for 78.1% of the total groundwater supply. Before the official operation of the South-to-North Water Diversion Central Route Project in 2014, over-exploitation of groundwater led to a rapid decline in water levels, resulting in large areas of deep and shallow groundwater depression cones, accompanied by environmental geological problems, such as land subsidence [30,31]. Since 2014, the declining trend in groundwater levels has been curbed through the implementation of water diversion, control of over-exploitation, and ecological water replenishment from the river, and the groundwater level is gradually being restored [32]. As a region with a complex aquifer structure and highly active groundwater changes, studying the dynamics of shallow groundwater levels in the Hebei Plain to the south of Beijing and Tianjin Plain has significant practical implications for regional ecological and economic development.

3. Materials and Methods

3.1. Data Sources and Processing

Based on the above analyses, this study collected 10 types of data, including month-by-month groundwater level elevation, from 170 national shallow monitoring wells in the Hebei Plain south of Beijing and Tianjin from 2018 to 2022. These data were classified into dynamic and static features based on whether they were time-series data or not, as shown in Table 1. Dynamic characteristics include time series variables, such as precipitation, groundwater exploitation, evapotranspiration, and ecological recharge [33,34,35,36]. Precipitation and evapotranspiration data are derived from national meteorological stations. Evapotranspiration is calculated according to the depth of water level; assuming that the groundwater depth is more than 5 m, the evapotranspiration of groundwater can be ignored. The amount of groundwater exploitation is derived from the county-level water resources report, using an area-based extraction coefficient. The influence range of river ecological recharge is identified within a 2 km radius on both sides of the main river.

To mitigate the influence of outliers and noise in the time-series data, Singular Spectrum Analysis (SSA) was applied. SSA is an advanced method for decomposing time-series into trend, periodic, and noise components without requiring prior assumptions [37]. The procedure involves four main steps: embedding, singular value decomposition (SVD), grouping, and reconstruction.

1. Embedding: For a one-dimensional time series of length N, a suitable window length L is selected to generate K = N − L + 1 lagged vectors X_i of length L:

X_{i} = {(x_{i} \dots, x_{i + L - 1})}^{T} (1 \leq i \leq K)

(1)

These vectors form the trajectory matrix X:

2. Singular value decomposition: the trajectory matrix X is decomposed by computing eigenvalues,

λ_{i}

, and eigenvectors,

u_{i}

and

v_{i}

, from

X X^{T}

and

X^{T} X

.

(X X^{T}) u_{i} = λ_{i} u_{i}

(2)

(X^{T} X) v_{i} = λ_{i} v_{i}

(3)

The matrix, X, is then represented as the sum of rank-one matrices:

X = \sum_{i = 1}^{r} σ_{i} λ_{i} v_{i}^{T} = X_{1} + X_{2} + \dots + X_{r}

(4)

where r is the rank of X, and

σ_{i}

are the singular values.

3. Grouping: the decomposed components are grouped into subsets representing trends or periodicities, based on their singular values.

4. Reconstruction: each group is used to reconstruct a smoother version of the original series by diagonal averaging, resulting in a denoised and trend-enhanced sequence of length N = L + K − 1.

For the time series data used in this paper, singular spectrum analysis was carried out to eliminate the noise and abnormal fluctuations in the original data and ultimately achieve the purpose of eliminating the data outliers and improving the quality of the data, in order to validate the applicability of the singular spectrum method for the noise reduction in the data, this study was carried out with Wavelet Transform (WT) for the comparison of the data noise reduction, and the rainfall of the same rainfall station of the study area was selected for comparison. The parameters of the two methods were set as shown in Table 2. The comparison shows that the singular spectrum method has a better ability to capture data periodicity and outliers (Figure 2).

Additionally, to adapt the data input format for deep learning neural networks, ordinary Kriging interpolation was applied to the existing data for spatial interpolation across the study area. This method uses a covariance function to perform spatial interpolation of random processes or random fields [38,39]. The model chosen for the variational function is the spherical model, which is more adapted to a field with gentle regional variations than the other models, and the kriging parameters used to perform the interpolation are shown in Table 3 (groundwater elevation as an example). Point-based original data, such as precipitation and evaporation, were interpolated to obtain corresponding data for each monitoring well location in the study area. Furthermore, the predicted groundwater levels obtained for the forecast period were interpolated using this method to create a map of predicted groundwater levels for the entire study area.

3.2. Sample Division Based on Depositional Systems

Traditional groundwater simulation in the North China Plain mostly uses point-based data or broad geomorphological classification [40,41], which cannot capture the influence of complex sedimentary environment well, thus affecting the accuracy. Considering the spatial heterogeneity in the study area due to its large spatial extent and the varying response of groundwater levels to different variables caused by depositional systems, this study aims to improve the model’s adaptability to the study area and enhance the simulation accuracy. Based on the Late Pleistocene geomorphic map of the study area (Figure 3), the area is subdivided into seven subzones containing Piedmont alluvial-flood fan based on depositional systems. The boundaries of each subdivision were also controlled using Late Pleistocene geomorphological maps and verified using hydrogeological boreholes in the study area. The number of monitoring wells, as well as the input dynamic and static features for each subzone, are detailed in Table 4.

3.3. Model Construction

This study constructed a deep learning model based on LSTM combined with an attention mechanism for data fitting and prediction (Figure 4). LSTM (Long Short-Term Memory) is a special type of recurrent neural network (RNN), an optimized and improved variant of RNN [42,43]. Compared with traditional RNNs, LSTM introduces cell states and gate mechanisms. The gate mechanisms in LSTM include the forget gate, input gate, and output gate, which collectively control the flow of information within the LSTM cell. These gates address the short-term memory limitations of traditional RNNs and effectively mitigate issues like gradient vanishing and gradient explosion during training. In this model, dynamic feature data are exclusively fed into the LSTM component. Before being input into the LSTM layer, these features are normalized. The data are then processed by the input gate, forget gate, output gate, and cell state, ultimately producing the hidden state, ℎt, at time t. This hidden state will then be combined with the results from the attention mechanism to generate the final output and adjust the model weights.

The attention mechanism was originally introduced to address information bottlenecks in sequence modeling, particularly in natural language processing tasks [44]. Traditional models compress an entire input sequence into a fixed-size hidden state, which can result in loss of crucial information and equal weighting of all time steps, limiting the model’s ability to capture global dependencies [45]. Attention mechanisms overcome this by dynamically assigning weights, allowing the model to focus on the most relevant inputs for the task at hand.

In this study, attention is used to link dynamic (time-series) and static (spatial) features, automatically adjusting the importance of dynamic features at different time steps based on static attributes. This enhances the model’s ability to emphasize key drivers of groundwater level variations in different geological settings.

Let X_d and X_s denote the dynamic and static feature matrices, respectively. The attention score is calculated as follows:

s c o r e = t a n h (X_{d} W_{d} + X_{s} W_{s} + b)

(5)

The attention weights α are derived via softmax normalization:

α = s o f t m a x (s c o r e)

(6)

These weights are used to compute a context vector Vector_c that summarizes the weighted influence of dynamic features:

{V e c t o r}_{c} = \sum_{t = 1}^{T} α_{t} \cdot X_{d, t}

(7)

This context vector is concatenated with static features to form a combined feature X_c, which serves as input to the LSTM’s output layer:

X_{c} = C o n c a t ({V e c t o r}_{c}, X_{s})

(8)

The final prediction is made by integrating X_c with the LSTM hidden state h_t:

y = W_{o u t} \cdot C o n c a t (h_{t}, X_{c}) + b_{o u t}

(9)

At the same time, the weight of its dynamic features in the model is continuously adjusted by back propagation, and finally the purpose of dynamically adjusting the weight by using static features is achieved. For example, in the piedmont alluvial fan area with high permeability, surface precipitation is more likely to infiltrate into groundwater, so the training model in this area will give higher attention weight to precipitation and ecological recharge; in plains, lakes, and coastal areas with strong evaporation and shallow groundwater depth, the model tends to give higher weights to evaporation and exploitation, so as to better fit the evolution of groundwater level.

3.4. Hyperparameter Settings

In this study, the data were preprocessed and divided into training and test sets in the ratio of 7:3 for the training and validation of the models in each partition. The ReLU (Rectified Linear Unit) activation function was used for the output layer of the LSTM model, while the fully connected layer used the Linear function as the final output function. This combination was shown to provide the best fit, where the ReLU activation function is a nonlinear function with the expression:

f (x) = m a x (0, x)

(10)

The use of the ReLU activation function in the LSTM model enhances the feature extraction ability of the model for complex time series, which is especially suitable for prediction tasks with significant nonlinear features such as water level fluctuations.

The Linear function, on the other hand, is a simple linear mapping function with the expression:

f (x) = a x + b

(11)

The main advantage of using the Linear function in the fully connected layer is the ability to maintain the continuity and stability of the model output, making it ideal for regression problems such as predicting groundwater level elevations, which have continuous numerical features. The combination of these two features can both enhance the model’s ability to capture nonlinear features in the LSTM output layer using ReLU, and further map these extracted features into continuous predicted values in the fully connected layer using the Linear function, ensuring that the model is able to accurately fit the complex water level trends.

Meanwhile, to further optimize the model performance, we carried out targeted hyper-parameter tuning for each partition model, and the optimal hyper-parameter combinations for each partition were derived after several rounds of parameter tuning. The specific parameter settings are shown in Table 5. The implementation of these optimization measures significantly improve the performance of the LSTM model in the task of groundwater level elevation prediction.

3.5. Assessment Methodology

Given that the target variable in this study is groundwater level rise, which constitutes a regression problem, the model employs Mean Squared Error (MSE) as the loss function. Furthermore, MSE and the coefficient of determination (R²) were used to assess the model’s performance. MSE evaluates the discrepancy between predicted and observed values, with smaller values indicating lower prediction error. R² measures the model’s goodness-of-fit, ranging from (0, 1], with values closer to 1 signifying better fit. The equations for MSE and R² are as follows:

M S E = \frac{1}{n} \sum_{i = 0}^{n} {(y_{i} - \hat{y_{i}})}^{2}

(12)

R^{2} = 1 - \frac{\sum_{i}^{n} {(y_{i} - \hat{y_{i}})}^{2}}{\sum_{i}^{n} {(y_{i} - \bar{y_{i}})}^{2}}

(13)

where (n) represents the total number of samples, (

{\hat{y}}_{i}

) denotes the predicted value of the variable, (

\bar{y}

) is the mean value of the variable, and (

y_{i}

) represents the actual value.

To further assess the model’s fitting ability and robustness, K-fold cross-validation was implemented. The input data were partitioned into five sequential subsets based on temporal order, and R² values were computed for each subset during training. The reliability and robustness of the model were evaluated through comparison of these R² values.

4. Results

4.1. Fitting Result

The training performances of the sub-region models are summarized in Table 6, and the K-fold cross-validation results are presented in Table 7. Analysis of the training, testing, and cross-validation outcomes for the groundwater level prediction models across different regions indicate that the LSTM model developed in this study exhibits strong overall predictive performance and robustness. Specifically, in the training set, all sub-regions achieved mean squared errors (MSE) below 0.1 and coefficients of determination (R²) approaching 1, demonstrating a high degree of model fit. Although the performance of the test set decreased slightly, the overall R² value remained above 0.9, and the MSE remained below 0.5, confirming the strong fitting reliability. These results surpass those reported in previous studies of the North China Plain that utilized point-based fitting (R² = 0.89) or simplified regional fitting approaches (R² = 0.68) [40,41]. The cross-validation results further confirm the model’s robustness, with average R² values from the five-fold time-series validation exceeding 0.8, suggesting effective mitigation of overfitting. Additionally, the fitted curves of representative monitoring wells (see Figure 5) further illustrate the high precision of the model’s fit.

4.2. Groundwater Level Prediction Results

4.2.1. Scenario Setting for the Forecast Period

The precipitation data for the forecasting period was based on the multi-year precipitation records from the study area (1951–2022), with predictions from the Random Forest model for precipitation forecasting from 2023 to 2040 (Figure 6). This recursive forecasting approach, combined with the Random Forest regression model, was evaluated using MSE and R² metrics. The training set had an MSE of 1.56 and an R² of 1, while the test set had an MSE of 2.78 and an R² of 0.99.

The setting of groundwater extraction volumes is aligned with the ongoing groundwater withdrawal reduction policies in the Beijing-Tianjin-Hebei region. According to official records, by 2022, approximately 70% of the shallow groundwater extraction reduction target had been achieved in the study area, with the ultimate policy goal of achieving a balance between groundwater extraction and recharge by 2035. Based on this target, the extraction volumes were designed to decrease progressively on an annual basis, reaching equilibrium by 2035. To further assess the influence of these policy-driven reductions on groundwater recovery, an additional comparative scenario was established using the current (2022) extraction intensity as a baseline, enabling prediction and evaluation of shallow groundwater level changes under both constrained and unconstrained conditions.

Ecological recharge was assumed to remain constant throughout the forecast period, reflecting ongoing management policies. Evapotranspiration during the prediction period was based on the multi-year average, and other static features were kept unchanged, forming the basis for the prediction period from 2023 to 2040.

4.2.2. Predicted Results

Groundwater level rise distribution maps were produced at five-year intervals for 2025 to 2040 under two extraction scenarios (Figure 7 and Figure 8). Combined with the total groundwater level recovery values under both extraction scenarios from 2020 to 2040 (Table 8), a systematic analysis was conducted on the interannual variation patterns of groundwater levels in the study area, as well as the characteristics of phreatic surface variations from the piedmont to the coastal regions. Overall, whether groundwater extraction is maintained at current levels or further reduced according to policy, the groundwater level in the study area shows an upward trend from 2020 to 2040. However, there is considerable variation in the magnitude of this rise. A comparison of the overall groundwater level rise during the prediction period indicates that, if the remaining 30% reduction target under the current groundwater extraction reduction policy is achieved, an additional rise of approximately 2.5 m in groundwater level could be attained by 2040. This rise would be more pronounced in specific regions: approximately 4 m in the piedmont alluvial-proluvial fan depositional system, about 3.5 m in the central paleo-channel depositional system, and around 4 m in the central floodplain depositional systems.

During the prediction period, groundwater level rise varies significantly across different depositional systems. The piedmont alluvial-proluvial fan depositional system exhibits the greatest total water level rise, approaching 10 m. This region benefits from its relatively elevated topography, as well as higher permeability and rainfall infiltration coefficients compared to other areas, which facilitate efficient conversion of surface water infiltration into groundwater recharge. Additionally, this area has historically been a major zone of intensive groundwater extraction, and the implementation of groundwater extraction control policies has markedly contributed to the recovery of water levels through substantial reductions in extraction. The central floodplain and central paleo-channel depositional systems show a total water level rise of approximately 7 m. Despite their relatively flat topography, these regions possess good permeability and are situated near major river channels, receiving sustained ecological recharge from river systems. As key agricultural irrigation zones, they also benefit from groundwater extraction reduction policies. The combined influence of these factors results in a pronounced trend of groundwater level recovery. In contrast, groundwater level rise in other depositional systems is constrained by limited infiltration capacity. For instance, in the marine depositional systems, due to its low permeability and rainfall infiltration coefficients, the total rise is limited to only about 1 m. Furthermore, with minimal shallow groundwater extraction in this area, groundwater levels are minimally influenced by extraction reduction policies.

5. Discussion

This study highlights the significant improvement in model accuracy by subdividing the area based on depositional systems and incorporating hydrogeological parameters. Two controlled experiments were conducted: one using a simple zoning model based on three areas (mountain front, central part, and coastal plain) and another excluding hydrogeological parameters.

5.1. Comparison of Simple vs. Refined Partitioning

The model results from the simple zoning approach, divided into piedmont, central, and coastal plains, are shown in Table 9. As the coastal plain partition remains consistent across both models, the comparative analysis primarily addresses the piedmont and central regions. In this approach, the test set achieved an R² of 0.6–0.68, with a mean MSE of 0.68–0.74. The precision distribution maps (Figure 9) reveal that the refined zoning model has 158 out of 170 wells with high accuracy (R² > 0.75), compared to only 92 in the simple partition. For model accuracy (0.5 ≤ R² ≤ 0.75), the refined model includes 8 wells, while the simple partition has 46 wells. These 8 wells are distributed across the piedmont alluvial-flood fan and the central paleochannel zones. In contrast, only 4 wells with low precision (R² < 0.5) were observed under the fine model, while 32 wells were observed under the simple partition (R² < 0.5), mainly distributed in the piedmont area.

The refined zoning model, based on depositional systems, yields better results. Dividing regions by facies results in more uniform data distributions within each sub-region, reducing heterogeneity and allowing the model to capture local features more effectively. Moreover, reducing region size shortens sequence lengths in the LSTM model, improving its ability to handle local temporal dependencies. This subdivision also helps avoid confusion between spatial and temporal dynamics, making the model more stable and improving its fitting accuracy.

5.2. Comparison of Models with and Without Static Features

Table 10 presents the average fitting results for the refined zoning model without static data. Excluding static parameters results in significant discrepancies in model performance. The R² values for this model range from 0.7 to 0.85 with MSE values generally above 0.5. According to the accuracy distribution maps of R² and MSE (Figure 10), among the green markers representing high fitting accuracy (R² > 0.75), the refined partition model without static parameters includes 114 out of 170 monitoring wells showing high accuracy (R² > 0.75). An additional 38 wells (0.5 ≤ R² ≤ 0.75) are primarily located in the central paleochannel belt near Handan and Hengshui. Among the red markers representing low fitting accuracy (R² < 0.5), the model accounts for 18 wells, mainly located in the central paleochannel belt.

The inclusion of static data notably improves model performance. Hydrogeological parameters, which reflect groundwater system dynamics, enable the model to integrate both temporal changes and physical system characteristics, improving both interpretability and prediction accuracy. Static features also address data heterogeneity, enabling the model to better adapt to regional hydrogeological variability. Furthermore, the inclusion of static data enhances synergy with dynamic features, enabling the model to adjust the weights and capture interactions between dynamic. For example, the effect of precipitation (a dynamic feature) on groundwater levels varies across regions with different permeability coefficients (a static feature). The inclusion of static data allows the model to more accurately represent this complex nonlinear relationship.

In conclusion, the depositional systems-based zoning model, enhanced with static hydrogeological data, significantly improves fitting accuracy and predictive performance in hydrogeological systems. This approach offers a robust framework for studying groundwater dynamics in complex geomorphological regions.

6. Conclusions

This study addresses the complexity of spatiotemporal groundwater level variations in the Hebei Plain, south of Beijing and Tianjin, by proposing a comprehensive analytical framework. This framework integrates deep learning models, multi-source data, and geomorphological zoning, offering a novel methodological approach to regional groundwater dynamics. Incorporating an LSTM neural network with an attention mechanism, combined with hydrogeological parameters and depositional systems-based zoning, significantly improves the model’s interpretative and predictive accuracy. The key conclusions are as follows

1. Advantages of the LSTM Model Integrated with Attention Mechanism

This LSTM-based model with an attention mechanism effectively captures the long-term temporal dependencies of dynamic features, while adjusting model weights using static features. This integration allows the model to better capture the synergistic effects of multiple factors influencing groundwater level variations. This method outperforms traditional models by mitigating overfitting from data heterogeneity. The model consistently achieved R² values exceeding 0.9 on the test set, demonstrating its robustness in complex hydrological systems.

2. Advantages of Introducing Static Data

By incorporating static features, such as hydrogeological parameters, alongside temporal data, this study overcomes the limitations of single-dimensional data. Static features, reflecting essential background information, are closely linked to groundwater level fluctuations and influence broader hydrological trends. Additionally, spatial heterogeneity in groundwater dynamics, levels influenced by varying hydrogeological conditions, is better captured when static data are included. The model’s performance, with static data, results in lower MSE and higher R² values. For long-term or complex regions, integrating high-resolution land-use and deformation data to capture time-varying hydrogeological conditions may further enhance model accuracy.

3. Advantages of Depositional systems-based Partitioning

Subdividing the study area into seven zones based on Late Pleistocene depositional systems enhances model accuracy by mitigating spatial heterogeneity effects. This partitioning enables the model to better reflect the hydrogeological dynamics of different geomorphological units. For instance, the coarse-grained depositional layers of the piedmont alluvial-flood fan enhance precipitation infiltration, leading to a higher proportion of precipitation recharge to groundwater, while fine-grained sediments in the Central paleochannel belt delay this process. The partitioning strategy consistently produced a model with an MSE below 0.5 and an R² value around 0.9, outperforming simpler partitioning models. Given mature hydrogeological data, we propose a scalable data-driven zoning strategy for areas lacking detailed depositional maps. Basic geomorphological units can be delineated using borehole data and DEM-derived factors, followed by spatial clustering via machine learning (e.g., K-means). This strategy is able to construct a more reasonable modeling partitioning framework in the absence of detailed maps.

Based on current policies, the study projected groundwater levels from 2023 to 2040 under two extraction scenarios. Overall, against the backdrop of reduced groundwater extraction, ecological recharge, and related policy measures, shallow groundwater levels in the study area exhibit a significant upward trend, with continued implementation of these policies potentially resulting in an additional groundwater rise of approximately 2.5 m. Due to heterogeneity in depositional systems and recharge conditions, significant spatial variation in groundwater level recovery is evident. The maximum rise in the piedmont alluvial-proluvial fan region reaches approximately 10 m, while the minimum rise in the coastal plain is about 1 m. It can, thus, be concluded that in areas with favorable shallow groundwater recharge conditions, such as the piedmont alluvial-proluvial fan, central paleo-channel zone, and central floodplain, shallow groundwater extraction can be considered under controlled conditions, if necessary. Conversely, in the marine floodplain region, where poor permeability leads to slow recovery, stricter regulations on groundwater extraction should be enforced. Overall, the prediction results indicate that reducing groundwater extraction intensity and implementing comprehensive recharge and management measures will significantly enhance regional groundwater level recovery, particularly in areas previously affected by severe groundwater over-extraction.

In addition, it should be noted that the model developed in this study was primarily trained on historical observational data, and the scenario settings for the prediction period were primarily derived from the current conditions of the study area. As the model is trained on historical data, it may not fully capture the impact of extreme future events, such as prolonged droughts, extreme precipitation, or abrupt increases in extraction, which are underrepresented in the historical record. Such events are typically not represented in the historical dataset, and their occurrence may introduce uncertainties in the model’s extrapolative predictions.

In summary, this study presents a robust data-driven methodology to enhance the understanding of groundwater dynamics in complex geomorphological settings. By integrating static hydrogeological data and employing depositional systems-based zoning, the model’s accuracy and predictive power are substantially improved. This approach is particularly applicable to regions with similar geological variability and offers practical insights for sustainable groundwater management. Future enhancements could include the integration of high-resolution land-use and deformation data to better capture temporal changes in hydrogeological conditions.

Author Contributions

Conceptualization, H.Y. and W.G.; methodology, W.G.; software, W.G.; validation, W.G.; formal analysis, W.G.; investigation, W.G.; resources, Z.L.; data curation, Z.L.; writing—original draft preparation, W.G.; writing—review and editing, W.G. and H.Y.; visualization, W.G., X.B., R.M., and H.B.; supervision, H.Y.; project administration, H.Y.; funding acquisition, H.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China Joint Foundation Programme (U2244214), the Hebei Provincial Innovation Capacity Enhancement Programme for High-level Talent Team Building Special Project (225A4204D), the Science and Technology Basic Resources Survey (2022FY100104), the National geological survey project (DD20230078), and the Chinese Academy of Geological Sciences, Geological Survey of China, Basic Research Operating Costs Project (SK202101).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data used in this study are available upon request from the corresponding author.

Acknowledgments

The authors would like to thank the editor and two anonymous reviewers for taking the time to provide their helpful feedback and suggestions.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Chen, F.; Xu, X.; Yang, Y.; Ding, Y.; Li, J.; Li, Y. Investigation on the evolution trends and influencing factors of groundwater resources in China. Adv. Water Sci. 2020, 31, 811–819. [Google Scholar] [CrossRef]
Usman, M.; Qamar, M.U.; Becker, R.; Zaman, M.; Conrad, C.; Salim, S. Numerical modelling and remote sensing based approaches for investigating groundwater dynamics under changing land-use and climate in the agricultural region of Pakistan. J. Hydrol. 2020, 581, 124408. [Google Scholar] [CrossRef]
Ma, Z.; Wang, W.; Hou, X.; Wang, J.; Duan, L.; Wang, Y.; Zhao, M.; Li, J.; Jing, J.; Li, L. Examining the change in groundwater flow patterns: A case study from the plain area of the Baiyangdian Lake Watershed, North China. J. Hydrol. 2023, 625, 130160. [Google Scholar] [CrossRef]
Lachaal, F.; Mlayah, A.; Bédir, M.; Tarhouni, J.; Leduc, C. Implementation of a 3-D groundwater flow model in a semi-arid region using MODFLOW and GIS tools: The Zéramdine–Béni Hassen Miocene aquifer system (east-central Tunisia). Comput. Geosci. 2012, 48, 187–198. [Google Scholar] [CrossRef]
Huang, J.; Zhou, Y.; Wenninger, J.; Ma, H.; Zhang, J.; Zhang, D. How water use of Salix psammophila bush depends on groundwater depth in a semi-desert area. Environ. Earth Sci. 2016, 75, 556. [Google Scholar] [CrossRef]
Roy, P.K.; Roy, S.S.; Giri, A.; Banerjee, G.; Majumder, A.; Mazumdar, A. Study of impact on surface water and groundwater around flow fields due to changes in river stage using groundwater modeling system. Clean Technol. Environ. Policy 2015, 17, 145–154. [Google Scholar] [CrossRef]
Osman, A.I.A.; Ahmed, A.N.; Chow, M.F.; Huang, Y.F.; El-Shafie, A. Extreme gradient boosting (Xgboost) model to predict the groundwater levels in Selangor Malaysia. Ain Shams Eng. J. 2021, 12, 1545–1556. [Google Scholar] [CrossRef]
Demirel, M.C.; Venancio, A.; Kahya, E. Flow forecast by SWAT model and ANN in Pracana basin, Portugal. Adv. Eng. Softw. 2009, 40, 467–473. [Google Scholar] [CrossRef]
Hsu, K.; Gupta, H.; Sorooshian, S. Artificial neural network modeling of the rainfall-runoff process. Water Resour. Res. 1995, 31, 2517–2530. [Google Scholar] [CrossRef]
Kratzert, F.; Klotz, D.; Brenner, C.; Schulz, K.; Herrnegger, M. Rainfall–runoff modelling using long short-term memory (LSTM) networks. Hydrol. Earth Syst. Sci. 2018, 22, 6005–6022. [Google Scholar] [CrossRef]
Lees, T.; Buechel, M.; Anderson, B.; Slater, L.; Reece, S.; Coxon, G.; Dadson, S.J. Benchmarking data-driven rainfall–runoff models in Great Britain: A comparison of long short-term memory (LSTM)-based models with four lumped conceptual models. Hydrol. Earth Syst. Sci. 2021, 25, 5517–5534. [Google Scholar] [CrossRef]
Yang, S.; Yang, D.; Chen, J.; Santisirisomboon, J.; Lu, W.; Zhao, B. A physical process and machine learning combined hydrological model for daily streamflow simulations of large watersheds with limited observation data. J. Hydrol. 2020, 590, 125206. [Google Scholar] [CrossRef]
Li, B.; Li, R.; Sun, T.; Gong, A.; Tian, F.; Khan, M.Y.A.; Ni, G. Improving LSTM hydrological modeling with spatiotemporal deep learning and multi-task learning: A case study of three mountainous areas on the Tibetan Plateau. J. Hydrol. 2023, 620, 129401. [Google Scholar] [CrossRef]
Zounemat-Kermani, M.; Batelaan, O.; Fadaee, M.; Hinkelmann, R. Ensemble machine learning paradigms in hydrology: A review. J. Hydrol. 2021, 598, 126266. [Google Scholar] [CrossRef]
Nearing, G.S.; Kratzert, F.; Sampson, A.K.; Pelissier, C.; Klotz, D.; Frame, J.M.; Prieto, C.; Gupta, H.V. What Role Does Hydrological Science Play in the Age of Machine Learning? Water Resour. Res. 2020, 57, e2020WR028091. [Google Scholar] [CrossRef]
Cerqueira, V.; Torgo, L.; Soares, C. Machine learning vs statistical methods for time series forecasting: Size matters. arXiv 2019. [Google Scholar] [CrossRef]
Bai, T.; Tahmasebi, P. Graph neural network for groundwater level forecasting. J. Hydrol. 2023, 616, 128792. [Google Scholar] [CrossRef]
Yu, L.; Zhou, Y.; Yao, H. Research on Groundwater Level Prediction Method in Karst Areas Based on Improved Attention Mechanism Fusion Time Convolutional Network. Autom. Control. Comput. Sci. 2024, 58, 481–490. [Google Scholar]
Dian, S.; Li, X.; Yang, D.; Rui, S.; Guo, B. Adaptive robust prediction of groundwater level based on fusion attention mechanism LSTM network. Adv. Eng. Sci. 2024, 56, 54–64. [Google Scholar] [CrossRef]
Zhao, Y.; Yang, L.; Pan, H.; Li, Y.; Shao, Y.; Li, J.; Xie, X. Spatio-temporal prediction of groundwater vulnerability based on CNN-LSTM model with self-attention mechanism: A case study in Hetao Plain, northern China. J. Environ. Sci. 2025, 153, 128–142. [Google Scholar] [CrossRef]
Li, H.; Zhang, L.; Zhang, Y.; Yao, Y.; Wang, R.; Dai, Y. Water-Level Prediction Analysis for the Three Gorges Reservoir Area Based on a Hybrid Model of LSTM and Its Variants. Water 2024, 16, 1227. [Google Scholar] [CrossRef]
Gharehbaghi, A.; Ghasemlounia, R.; Ahmadi, F.; Albaji, M. Groundwater level prediction with meteorologically sensitive Gated Recurrent Unit (GRU) neural networks. J. Hydrol. 2022, 612, 128262. [Google Scholar] [CrossRef]
Cui, F.; Al-Sudani, Z.A.; Hassan, G.S.; Afan, H.A.; Ahammed, S.J.; Yaseen, Z.M. Boosted artificial intelligence model using improved alpha-guided grey wolf optimizer for groundwater level prediction: Comparative study and insight for federated learning technology. J. Hydrol. 2021, 606, 127384. [Google Scholar] [CrossRef]
Cai, H.; Liu, S.; Shi, H.; Zhou, Z.; Jiang, S.; Babovic, V. Toward improved lumped groundwater level predictions at catchment scale: Mutual integration of water balance mechanism and deep learning method. J. Hydrol. 2022, 613, 128495. [Google Scholar] [CrossRef]
Zhang, Q.; Li, P.; Ren, X.; Ning, J.; Li, J.; Liu, C.; Wang, Y.; Wang, G. A new real-time groundwater level forecasting strategy: Coupling hybrid data-driven models with remote sensing data. J. Hydrol. 2023, 625, 129962. [Google Scholar] [CrossRef]
Sun, J.; Hu, L.; Li, D.; Sun, K.; Yang, Z. Data-driven models for accurate groundwater level prediction and their practical significance in groundwater management. J. Hydrol. 2023, 608, 127630. [Google Scholar] [CrossRef]
Zhang, J.; Dong, D.; Zhang, L. A New Method for Estimating Groundwater Changes Based on Optimized Deep Learning Models—A Case Study of Baiquan Spring Domain in China. Water 2023, 15, 4129. [Google Scholar] [CrossRef]
Foster, S.; Garduno, H.; Evans, R.; Olson, D.; Tian, Y.; Zhang, W.; Han, Z. Quaternary aquifer of the North China Plain—Assessing and achieving groundwater resource sustainability. Hydrogeol. J. 2004, 12, 81–93. [Google Scholar] [CrossRef]
Fei, Y.H.; Miao, J.X.; Zhang, Z.J.; Chen, Z.Y.; Song, H.B.; Yang, M. Analysis on evolution of groundwater depression cones and its leading factors in North China Plain. Resour. Sci. 2009, 31, 394–399. [Google Scholar]
Liu, C.; Yu, J.; Kendy, E. Groundwater exploitation and its impact on the environment in the North China Plain. Water Int. 2001, 26, 265–272. [Google Scholar] [CrossRef]
Zhang, X.; Pei, D.; Hu, C. Conserving groundwater for irrigation in the North China Plain. Irrig. Sci. 2003, 21, 159–166. [Google Scholar] [CrossRef]
Zhu, L.; Gong, H.; Chen, Y.; Wang, S.; Ke, Y.; Guo, G.; Li, X.; Chen, B.; Wang, H.; Teatini, P. Effects of Water Diversion Project on groundwater system and land subsidence in Beijing, China. Eng. Geol. 2020, 276, 105763. [Google Scholar] [CrossRef]
Peng, S.; Ding, Y.; Wen, Z.; Chen, Y.; Cao, Y.; Ren, J. Spatiotemporal change and trend analysis of potential evapotranspiration over the Loess Plateau of China during 2011–2100. Agric. For. Meteorol. 2017, 233, 183–194. [Google Scholar] [CrossRef]
Ding, Y.; Peng, S. Spatiotemporal trends and attribution of drought across China from 1901–2100. Sustainability 2020, 12, 477. [Google Scholar] [CrossRef]
Peng, S.; Ding, Y.; Liu, W.; Li, Z. 1 km monthly temperature and precipitation dataset for China from 1901 to 2017. Earth Syst. Sci. Data 2019, 11, 1931–1946. [Google Scholar] [CrossRef]
Peng, S.; Gang, C.; Cao, Y.; Chen, Y. Assessment of climate change trends over the Loess Plateau in China from 1901 to 2100. Int. J. Climatol. 2018, 38, 2250–2264. [Google Scholar] [CrossRef]
Dong, Q.; Yang, Z.; Chen, Y.; Li, X.; Zeng, K. Anomaly detection in cognitive radio networks exploiting singular spectrum analysis. In Proceedings of the Computer Network Security: 7th International Conference on Mathematical Methods, Models, and Architectures for Computer Network Security, MMM-ACNS 2017, Warsaw, Poland, 28–30 August 2017; Springer International Publishing: Berlin/Heidelberg, Germany; pp. 247–259. [CrossRef]
Delhomme, J.P. Kriging in the hydrosciences. Adv. Water Resour. 1978, 1, 251–266. [Google Scholar] [CrossRef]
Wang, W.; Xue, X.; Wei, G. Spatial variability of water level in Hetao irrigation district of Inner Mongolia and their estimations by the kriging. J. Irrig. Drain. 2007, 26, 18–21. [Google Scholar]
Wu, Z.; Lu, C.; Sun, Q.; Lu, W.; He, X.; Qin, T.; Yan, L.; Wu, C. Predicting groundwater level based on machine learning: A case study of the Hebei plain. Water 2023, 15, 823. [Google Scholar] [CrossRef]
Nan, T.; Cao, W.; Wang, Z.; Gao, Y.; Zhao, L.; Sun, X.; Na, J. Evaluation of shallow groundwater dynamics after water supplement in North China Plain based on attention-GRU model. J. Hydrol. 2023, 625, 130085. [Google Scholar] [CrossRef]
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
Sherstinsky, A. Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Phys. D Nonlinear Phenom. 2020, 404, 132306. [Google Scholar] [CrossRef]
Bahdanau, D.; Cho, K.; Bengio, Y. Neural machine translation by jointly learning to align and translate. arXiv 2014. [Google Scholar] [CrossRef]
Ehteram, M.; Ghanbari-Adivi, E. Self-attention (SA) temporal convolutional network (SATCN)-long short-term memory neural network (SATCN-LSTM): An advanced python code for predicting groundwater level. Environ. Sci. Pollut. Res. 2023, 30, 92903–92921. [Google Scholar] [CrossRef]

Figure 1. Study Area.

Figure 2. Comparison of wavelet analysis and ssa for noise reduction ((a) is SSA, (b) is WT).

Figure 3. Late Pleistocene geomorphological map of the study area (Atlas of Sustainable Groundwater Utilisation in the North China Plain).

Figure 4. Flowchart of the model.

Figure 5. Plots of fitted curves for representative single-well training sets (a,b) and test sets (c,d).

Figure 6. Historical precipitation data line graph (a) and predicted precipitation line graph (b).

Figure 7. Simulated predictions of June (a–d) and December (e–h) shallow groundwater table elevations in the study area for the years, 2025, 2030, 2035, and 2040 (Maintaining groundwater extraction reduction policies).

Figure 8. Simulated predictions of June (a–d) and December (e–h) shallow groundwater table elevations in the study area for the years, 2025, 2030, 2035, and 2040 (Maintaining the current intensity of extraction).

Figure 9. MSE and R² distributions of model fitting results for refined and simplified partitions ((a) Refining the MSE distribution map of the partition model fitting results; (b) Refining the R² distribution map of the partition model fitting results; (c) Simplified partition model fitting results MSE distribution map; (d) Simplified partition model fitting results R² distribution map).

Figure 10. MSE and R² distribution of model fitting results with and without static data inputs. ((a) MSE distribution of model fitting results with static parameter inputs; (b) R² distribution of model fitting results with static parameter inputs; (c) MSE distribution of model fitting results without static parameter inputs; (d) R² distribution of model fitting results with static parameter inputs).

Table 1. Types of data, sources and scales.

Name of Data	Type of Data	Scale of Data	Source of Data
Elevation of Water Level	Dynamic data	Monthly	China Geological Survey
Precipitation	Dynamic data	Monthly	National Meteorological Science Data Centre
Evapotranspiration	Dynamic data	Monthly	National Meteorological Science Data Centre
Groundwater extraction	Dynamic data	Monthly	Water Resources Bulletin
Ecological recharge	Dynamic data	Monthly	Investigation and Evaluation on Groundwater Sustained Development in the North China Plain Project
Precipitation infiltration coefficient	Static data		Investigation and Evaluation on Groundwater Sustained Development in the North China Plain Project
Evaporation coefficient	Static data		Investigation and Evaluation on Groundwater Sustained Development in the North China Plain Project
Hydraulic conductivity	Static data		Investigation and Evaluation on Groundwater Sustained Development in the North China Plain Project
Specific yield	Static data		Investigation and Evaluation on Groundwater Sustained Development in the North China Plain Project
Elevation	Static data		GEBCO global land elevation data

Table 2. Parameterization of SSA and WT.

Name	Name of Parameters	Parameters Setting
SSA	Window Length	6
SSA	Number of Principal Components	4
WT	Wavelet function	bior4.4
	Level	2
	Thresholding mode	Soft

Table 3. Parameterization of Kriging.

Name of Parameters	Parameter Setting
Lag size	25 km
Major range	80 km
Partial sill	1.8 m
Nugget	0.2
Output cell size	1000 m
Search radius	Variable
Number of points	12
Maximum distance	50 km

Table 4. Fine zoning input information.

Name of Sample	Number of Monitoring Wells	Dynamic Features	Static Features
Piedmont alluvial-flood fan	33	Precipitation, Extraction, Ecological recharge	Hydraulic conductivity, Specific yield, Precipitation infiltration coefficient, Elevation
Piedmont lacustrine plain	23	Precipitation, Extraction, Ecological recharge	Hydraulic conductivity, Specific yield, Precipitation infiltration coefficient, Elevation
Piedmont inner terrace	12	Precipitation, Extraction, Ecological recharge	Hydraulic conductivity, Specific yield, Precipitation infiltration coefficient, Elevation
Central floodplain	43	Precipitation, Extraction, Ecological recharge Evapotranspiration	Hydraulic conductivity, Specific yield, Precipitation infiltration coefficient, Evaporation coefficient, Elevation
Central paleochannel belt	35	Precipitation, Extraction, Ecological recharge Evapotranspiration	Hydraulic conductivity, Specific yield, Precipitation infiltration coefficient, Ecological recharge, Evaporation coefficient, Elevation
Central lacustrine depression	11	Precipitation, Extraction, Ecological recharge Evapotranspiration	Hydraulic conductivity, Specific yield, Precipitation infiltration coefficient, Ecological recharge, Evaporation coefficient, Elevation
Marine depositional plain	13	Precipitation, Extraction, Evapotranspiration	Hydraulic conductivity, Specific yield, Precipitation infiltration coefficient, Evaporation coefficient, Elevation

Table 5. LSTM model hyperparameter settings.

Name of Sample	Number of Hidden Layers	Number of Neurons	Batch Size	Number of Iterations
Piedmont t alluvial-flood fan	2	100	64	500
Piedmont lacustrine depression	2	100	32	500
Piedmont inner terrace	2	50	12	500
Central floodplain	2	100	64	500
Central paleochannel belt	2	100	64	500
Central lacustrine depression	2	50	12	500
Marine depositional plain	2	50	12	500

Table 6. Partition modeling MSE and R² results.

Name of Sample	Number of Monitoring		MSE	R²
Piedmont alluvial-flood fan	33	Training set	0.07	0.98
Piedmont alluvial-flood fan	33	Test set	0.22	0.91
Piedmont lacustrine depression	23	Training set	0.04	0.99
Piedmont lacustrine depression	23	Test set	0.13	0.96
Piedmont inner terrace	12	Training set	0.08	0.99
Piedmont inner terrace	12	Test set	0.18	0.97
Central floodplain	43	Training set	0.05	0.98
Central floodplain	43	Test set	0.31	0.87
Central paleochannel belt	35	Training set	0.04	0.97
Central paleochannel belt	35	Test set	0.23	0.88
Central lacustrine depression	11	Training set	0.04	0.98
Central lacustrine depression	11	Test set	0.33	0.88
Marine depositional plain	13	Training set	0.01	0.99
Marine depositional plain	13	Test set	0.24	0.98

Table 7. R² results of K-fold cross-validation for each sub-region (R² scores for individual segments 1–5 and mean R²).

Name of Sample	Fold 1	Fold 2	Fold 3	Fold 4	Fold 5	Mean
Piedmont alluvial-flood fan	0.757	0.917	0.964	0.911	0.982	0.906
Piedmont lacustrine depression	0.611	0.931	0.976	0.902	0.983	0.880
Piedmont inner terrace	0.662	0.966	0.989	0.923	0.984	0.905
Central floodplain	0.502	0.922	0.951	0.825	0.974	0.835
Central paleochannel belt	0.490	0.952	0.945	0.921	0.969	0.855
Central lacustrine depression	0.776	0.871	0.914	0.799	0.917	0.855
Marine depositional plain	0.685	0.939	0.992	0.872	0.966	0.891

Table 8. Average groundwater level elevations and increases during the June low water period in the study area during the forecast period (Under two different extraction intensities).

Name of Sample		Total Increase of Groundwater Level Elevation (m) (2020.6–2040.6)	Total Increase of Groundwater level Elevation (m) (2020.12–2040.12)
Piedmont alluvial-flood fan	Maintaining groundwater extraction reduction policies	9.46	9.35
Piedmont alluvial-flood fan	Maintaining the current intensity of extraction	6.29	5.15
Piedmont lacustrine depression	Maintaining groundwater extraction reduction policies	4.8	3.6
Piedmont lacustrine depression	Maintaining the current intensity of extraction	3.38	2.87
Piedmont inner terrace	Maintaining groundwater extraction reduction policies	3.91	2.52
Piedmont inner terrace	Maintaining the current intensity of extraction	2.19	1.32
Central floodplain	Maintaining groundwater extraction reduction policies	7.14	6.92
Central floodplain	Maintaining the current intensity of extraction	2.65	2.57
Central paleochannel belt	Maintaining groundwater extraction reduction policies	7.2	4.59
Central paleochannel belt	Maintaining the current intensity of extraction	3.74	3.45
Central lacustrine depression	Maintaining groundwater extraction reduction policies	4.94	4.59
Central lacustrine depression	Maintaining the current intensity of extraction	2.64	1.4
Marine depositional plain	Maintaining groundwater extraction reduction policies	0.63	1.45
Marine depositional plain	Maintaining the current intensity of extraction	0.56	1.04

Table 9. Average MSE and R² for each zone in the simple partitioning model.

Sample Name	Number of Wells		MSE	R²
Piedmont plain	68	Training set	0.37	0.90
Piedmont plain	68	Test set	0.68	0.60
Central plain	86	Training set	0.55	0.73
Central plain	86	Test set	0.74	0.68
Coastal plain	13	Training set	0.01	0.99
Coastal plain	13	Test set	0.24	0.98

Table 10. Average MSE and R² averages for each zone in the refined model without static.

Name of Sample	Number of Monitoring		MSE	R²
Piedmont alluvial-flood fan	33	Training set	0.07	0.98
Piedmont alluvial-flood fan	33	Test set	0.22	0.91
Piedmont lacustrine depression	23	Training set	0.04	0.99
Piedmont lacustrine depression	23	Test set	0.13	0.96
Piedmont inner terrace	12	Training set	0.08	0.99
Piedmont inner terrace	12	Test set	0.18	0.97
Central floodplain	43	Training set	0.05	0.98
Central floodplain	43	Test set	0.31	0.87
Central paleochannel belt	35	Training set	0.04	0.97
Central paleochannel belt	35	Test set	0.23	0.88
Central lacustrine depression	11	Training set	0.04	0.98
Central lacustrine depression	11	Test set	0.33	0.88
Marine depositional plain	13	Training set	0.01	0.99
Marine depositional plain	13	Test set	0.24	0.98

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Guo, W.; Yang, H.; Li, Z.; Meng, R.; Bao, X.; Bai, H. Study on the Evolution of Groundwater Level in Hebei Plain to the South of Beijing and Tianjin Based on LSTM Model. Sustainability 2025, 17, 4394. https://doi.org/10.3390/su17104394

AMA Style

Guo W, Yang H, Li Z, Meng R, Bao X, Bai H. Study on the Evolution of Groundwater Level in Hebei Plain to the South of Beijing and Tianjin Based on LSTM Model. Sustainability. 2025; 17(10):4394. https://doi.org/10.3390/su17104394

Chicago/Turabian Style

Guo, Wei, Huifeng Yang, Zeyan Li, Ruifang Meng, Xilin Bao, and Hua Bai. 2025. "Study on the Evolution of Groundwater Level in Hebei Plain to the South of Beijing and Tianjin Based on LSTM Model" Sustainability 17, no. 10: 4394. https://doi.org/10.3390/su17104394

APA Style

Guo, W., Yang, H., Li, Z., Meng, R., Bao, X., & Bai, H. (2025). Study on the Evolution of Groundwater Level in Hebei Plain to the South of Beijing and Tianjin Based on LSTM Model. Sustainability, 17(10), 4394. https://doi.org/10.3390/su17104394

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Study on the Evolution of Groundwater Level in Hebei Plain to the South of Beijing and Tianjin Based on LSTM Model

Abstract

1. Introduction

2. Study Area

3. Materials and Methods

3.1. Data Sources and Processing

3.2. Sample Division Based on Depositional Systems

3.3. Model Construction

3.4. Hyperparameter Settings

3.5. Assessment Methodology

4. Results

4.1. Fitting Result

4.2. Groundwater Level Prediction Results

4.2.1. Scenario Setting for the Forecast Period

4.2.2. Predicted Results

5. Discussion

5.1. Comparison of Simple vs. Refined Partitioning

5.2. Comparison of Models with and Without Static Features

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI