A New Method for Estimating Groundwater Changes Based on Optimized Deep Learning Models—A Case Study of Baiquan Spring Domain in China

: Estimating groundwater level (GWL) changes is crucial for the sustainable management of water resources in the face of urbanization and population growth. Existing prediction methods for GWL variations have limitations due to their inability to account for the diverse and irregular patterns of change. This paper introduces an innovative approach to GWL prediction that leverages multisource data and offers a comprehensive analysis of inﬂuencing factors. Our methodology goes beyond conventional approaches by incorporating historical GWL data, examining the impacts of precipitation and extraction, as well as considering policy-driven inﬂuences, especially in nations like China. The main contribution of this study is the development of a novel hierarchical framework (HGP) for GWL prediction, which progressively integrates correlations among different hierarchical information sources. In our experimental analysis, we make a signiﬁcant discovery: extraction has a more substantial impact on GWL changes compared to precipitation. Building on this insight, our HGP model demonstrates superior predictive performance when evaluated on real-world datasets. The results show that HGP can increase NSE and R 2 scores by 2.8% during the test period compared to the current more accurate deep learning method: ANFIS. This innovative model not only enhances GWL prediction accuracy but also provides valuable insight for effective water resource management. By incorporating multisource data and a novel hierarchical framework, our approach advances the state of the art in GWL prediction, contributing to more sustainable and informed decision making in the context of groundwater resource management.


Introduction
Groundwater is an essential natural resource widely used in society.Effectively managing this water resource is critical to ensure an adequate and stable supply for future production and consumption.Analyzing groundwater levels (GWLs) and estimating changes in GWLs are essential to sustainable groundwater management.Weather, groundwater extraction, and land use influence GWLs; accurate data availability, funding, policy structure, and application are critical to groundwater management.An accurate and stable monitoring system is required for adequate groundwater storage, establishing longterm and short-term storage plans, and optimizing infrastructure operations.Optimizing groundwater distribution and supply can mitigate and prevent environmental problems, such as droughts, floods, famines, and landslides [1,2].
Over the years, numerous scholars have proposed many GWL prediction techniques to assist in managing groundwater resources.However, the complexity and dynamics of groundwater flow make accurate and comprehensive simulations challenging [3].The long period and wide spatial span of groundwater data make the selection of the best method to analyze these data complex.
In recent years, physical-based, or traditional, GWL prediction models have been used frequently.Sahoo and Jha studied multiple linear regression (MLR) to predict groundwater levels in nonpressurized aquifer systems [4].Their research showed that the MLR model developed for predicting GWLs had reasonable accuracy and could be used as a simple GWL modeling tool when data are limited.However, the limitation of this physicalbased method is that MLR cannot handle input and output variables.In a recent study, Yousefi predicted the GWL in Iran's Karaj for a decade using MATLAB [5].Their method used MODFLOW2005-NWT, an independent program that improves the solution flow of unconfined groundwater.The GWL modeling focused on three positive, negative, and sustained scenarios.Regarding this method, Yadav pointed out that predicting the GWL using this approach is complex because of the large number of physical operations in the groundwater system that need to be described [6].
Hydrology, geology, topography, meteorology, and climate contribute to data uncertainty, complicating the calibration and validation of physically based models [7].The nonlinear relationships among variables in groundwater and other hydrological systems require a large amount of data for modeling, making groundwater level prediction challenging [8].Many researchers have recently adopted machine learning techniques to overcome the limitations of physical models, which are increasingly important because they can independently adapt to new data and learn from previous computations to make reliable and even accurate predictions [9,10].
In the domain of groundwater level (GWL) prediction, machine learning advancements have progressively tackled the complexities inherent in hydrogeological systems, yet they often grapple with the nuanced interplay of exceptional events, such as droughts, and the critical role of anthropogenic factors [11,12].Artificial neural networks (ANNs) marked a significant shift from traditional methods, adeptly handling complex scenarios but occasionally constrained by overfitting and computational intensity [13].Subsequent developments, such as feed-forward neural networks (FFNNs) and recurrent neural networks (RNNs), offered improvements in accuracy and time series management, although they too faced specific limitations such as gradient vanishing [14,15].Innovations like long short-term memory (LSTM) and gated recurrent unit (GRU) models addressed some of these issues, showing enhanced stability and accuracy in predictions [16][17][18][19][20]. Concurrently, support vector machines (SVMs) and adaptive neuro-fuzzy inference systems (ANFIS) enriched the predictive landscape, especially in handling nonlinear and multivariable scenarios [21].The recent advent of nonlinear autoregressive networks with exogenous inputs (NARX) further exemplifies the field's evolution, particularly in challenging environments like urbanized and arid aquifers [22].Despite these technological strides, a holistic approach that encompasses both prediction accuracy and a comprehensive understanding of the underlying physical processes, including the impact of human activities, remains a vital consideration in sustainable groundwater management [23][24][25][26][27][28][29][30].
This study presents an innovative approach to groundwater level (GWL) prediction by integrating data from the micro-, meso-, and macrolevels, an endeavor driven by the aspiration to achieve unparalleled accuracy and a holistic understanding of GWL dynamics.This multitiered strategy, drawing on lessons from prior research [31], harmonizes data across different scales to decode the complexities of groundwater behavior.
This study meticulously examines groundwater level (GWL) prediction by harnessing data across three scales: micro, meso, and macro.At the microlevel, historical GWL records from monitoring wells provide insight into local fluctuations and temporal autocorrelation, offering detailed but narrowly scoped data.The mesolevel extends this analysis by including meteorological and groundwater extraction data from the Baiquan spring domain, shedding light on broader influences such as precipitation's effect on recharge and the impact of extraction practices on GWL.The macrolevel further broadens the perspective by integrating government policies using binary indicators to assess the effects of water resource management and groundwater utilization on GWL.
This holistic approach to data integration sets the stage for the main objective: the development and validation of a hierarchical groundwater level prediction (HGP) model, a novel approach that integrates multisource data across micro-, meso-, and macrolevels to enhance the accuracy of groundwater level (GWL) predictions.This model aims to overcome the limitations of traditional methods by providing a more comprehensive understanding of the factors influencing GWL, including environmental variables, human activities, and policy impacts.Through this innovative approach, we seek to advance the field of groundwater management by offering a nuanced and effective tool for predicting GWL.

Dataset Description
The multisource dataset was constructed by collecting and integrating various data streams, including precipitation measurements obtained from satellite observations, historical groundwater level records from well stations, and data on groundwater extraction rates from local water authorities.Additionally, we incorporated policy-related variables, such as regulatory measures, conservation policies, and groundwater management initiatives, to capture the influence of governance on groundwater dynamics (pertinent water management policies were obtained from governmental repositories).Before delving into the interpretation of the multisource data, it is essential to provide a brief introduction to the hydrogeological conditions of the study area.This will enable us to explain the impact of multisource data on groundwater levels (GWLs) from a hydrogeological perspective.The Baiquan karst water system is located in the plain area of the eastern foot of the Taihang Mountains in the west of Xingtai and Handan [32].It is an independent watershed in which the water supply is a vital function.Figure 1 shows an overview of the study area.The system covers an area of 3843 km 2 , with a significant difference in terrain height.The Baiquan karst water system is a complete drainage type, mainly recharged by atmospheric precipitation and supplied to Xingtai city through underground runoff.
Water 2023, 15, x FOR PEER REVIEW 3 of 21 the perspective by integrating government policies using binary indicators to assess the effects of water resource management and groundwater utilization on GWL.This holistic approach to data integration sets the stage for the main objective: the development and validation of a hierarchical groundwater level prediction (HGP) model, a novel approach that integrates multisource data across micro-, meso-, and macrolevels to enhance the accuracy of groundwater level (GWL) predictions.This model aims to overcome the limitations of traditional methods by providing a more comprehensive understanding of the factors influencing GWL, including environmental variables, human activities, and policy impacts.Through this innovative approach, we seek to advance the field of groundwater management by offering a nuanced and effective tool for predicting GWL.

Dataset Description
The multisource dataset was constructed by collecting and integrating various data streams, including precipitation measurements obtained from satellite observations, historical groundwater level records from well stations, and data on groundwater extraction rates from local water authorities.Additionally, we incorporated policy-related variables, such as regulatory measures, conservation policies, and groundwater management initiatives, to capture the influence of governance on groundwater dynamics (pertinent water management policies were obtained from governmental repositories).Before delving into the interpretation of the multisource data, it is essential to provide a brief introduction to the hydrogeological conditions of the study area.This will enable us to explain the impact of multisource data on groundwater levels (GWLs) from a hydrogeological perspective.The Baiquan karst water system is located in the plain area of the eastern foot of the Taihang Mountains in the west of Xingtai and Handan [32].It is an independent watershed in which the water supply is a vital function.Figure 1 shows an overview of the study area.The system covers an area of 3843 km 2 , with a significant difference in terrain height.The Baiquan karst water system is a complete drainage type, mainly recharged by atmospheric precipitation and supplied to Xingtai city through underground runoff.The study area, primarily situated in the eastern foothills of the southern Taihang Mountains in Hebei Province, China, features a topography of low mountains and hills with elevations ranging from 40 to 1200 m.Influenced by river and valley flood activities, a series of alluvial-proluvial fans of varying sizes have formed along the mountain front.
The stratigraphy of the region spans from the Archean to the Cenozoic era, encompassing a diverse range of geological formations.
The aquifer systems in this spring area are categorized into three major types: porous water-bearing rock systems in unconsolidated rocks, water-bearing systems in carbonate rock fractures and karst, and those in bedrock fractures.Karst water in the area is further classified based on burial conditions into exposed, covered, and buried types.The development and distribution of primary karst features, such as solution pores, fissures, and caves, are influenced by lithology, structural geology, and topography, as well as hydrological and hydrodynamic conditions, with rock type and structure being key factors.
Hydrogeologically, the area is bounded by the Inner Hill-Xingtai Arcuate major fault and the Xingtai-Fengfeng fault in the east, forming a water-blocking boundary.The southern boundary's western segment is demarcated by the groundwater divide of the North Ming River (with the Fengfeng Heilongdong spring area of Handan to the southwest), while its eastern segment is defined by coal strata and igneous rock formations.The western boundary aligns with the surface water divide of the Taihang Mountains, and the northern boundary is marked by the groundwater divide in the area of the Inner Hill Northwest Ridge (adjacent to the Shigu spring area of Lincheng, Xingtai, China).These boundaries delineate a largely independent and closed hydrogeological unit, predominantly characterized by karst water.
The system has the advantages of fast recharge, short cycle time, and excellent water quality, but changes in rainfall and large-scale extraction and drainage can have specific effects on the flow of the spring group.The causes of disconnection are groundwater overdraft, defective planning and construction of water supply sources, increased mining drainage, and reduced groundwater recharge.Therefore, an intelligent prediction mechanism is needed to manage groundwater resources.
To ensure data consistency and compatibility, we conducted rigorous preprocessing procedures, including data cleansing, normalization, and temporal alignment.The integrated dataset facilitated a unified framework for conducting in-depth analyses.

Empirical Observation
Groundwater level prediction is a complex task that requires a comprehensive understanding of the diverse factors influencing groundwater dynamics.To overcome the limitations of traditional single-source data-driven methods, we present a novel approach that leverages multisource data to forecast groundwater levels with enhanced accuracy and scientific rigor.The dataset comprises crucial components, including precipitation data, historical groundwater level records, groundwater extraction, and pertinent policy variables impacting groundwater management.Integrating this diverse dataset enables a more holistic analysis of groundwater behavior and its response to various environmental and anthropogenic influences.The associated observational findings and insight are presented in this section.

1.
Microlevel: Historical GWL Observation In Figure 2 (left), we analyzed historical groundwater levels from seven observation wells, focusing on well #7 in urban Xingtai city.The groundwater levels in the first half of 2018 decreased, followed by a gradual rise from July 2018 to a peak in February 2019.A continuous decline until June 2020 marked the lowest levels, followed by a gradual rise stabilizing in October 2020.A sharp increase occurred in July 2021, followed by stability in November 2021.These trends align closely with data from the Xingtai City Water Resources Bureau, suggesting groundwater levels as a reliable indicator of groundwater resources.
Figure 2 (right) shows the strong autocorrelation in the daily observation data from 2018 to 2022, with the autocorrelation coefficients exceeding 0.77 within a 30-day lag period.This suggests significant short-term memory effects on groundwater levels, influenced by factors like groundwater flow rates, aquifer properties, and external drivers.Figure 2 (right) shows the strong autocorrelation in the daily observation data from 2018 to 2022, with the autocorrelation coefficients exceeding 0.77 within a 30-day lag period.This suggests significant short-term memory effects on groundwater levels, influenced by factors like groundwater flow rates, aquifer properties, and external drivers.
While short-term autocorrelation is evident, relying solely on historical groundwater data may not provide precise predictions due to complex interactions involving meteorological pa erns, hydrogeological properties, and human activities.To improve predictive models, integrating diverse data sources, including precipitation and groundwater extraction rates, is essential.Precipitation affects aquifer recharge, while groundwater extraction adds an anthropogenic dimension.This comprehensive approach enhances our understanding of groundwater dynamics and supports informed water resource management  Granger causality testing is a statistical method used to assess whether there exists a causal relationship between two time series datasets.This approach relies on the concept of lagged values and employs a vector autoregression (VAR) model [33].In the context of Granger causality testing, consider two time series: X and Y.The VAR model takes the following general form for each time series: While short-term autocorrelation is evident, relying solely on historical groundwater data may not provide precise predictions due to complex interactions involving meteorological patterns, hydrogeological properties, and human activities.To improve predictive models, integrating diverse data sources, including precipitation and groundwater extraction rates, is essential.Precipitation affects aquifer recharge, while groundwater extraction adds an anthropogenic dimension.This comprehensive approach enhances our understanding of groundwater dynamics and supports informed water resource management 2.

Mesolevel: Precipitation and Extraction Observation
Mesolevel: Precipitation and Extraction Observation Figure 3 highlights seasonal variations in precipitation, with a rainy season from July to November and a dry season from December to June.Peak rainfall, increasing yearly, was particularly pronounced in July 2021 due to unprecedented heavy rainfall.This July 2021 rainfall significantly contributed to a rapid rise in groundwater levels.While short-term autocorrelation is evident, relying solely on historical groundwater data may not provide precise predictions due to complex interactions involving meteorological pa erns, hydrogeological properties, and human activities.To improve predictive models, integrating diverse data sources, including precipitation and groundwater extraction rates, is essential.Precipitation affects aquifer recharge, while groundwater extraction adds an anthropogenic dimension.This comprehensive approach enhances our understanding of groundwater dynamics and supports informed water resource management  Granger causality testing is a statistical method used to assess whether there exists a causal relationship between two time series datasets.This approach relies on the concept of lagged values and employs a vector autoregression (VAR) model [33].In the context of Granger causality testing, consider two time series: X and Y.The VAR model takes the following general form for each time series: Granger causality testing is a statistical method used to assess whether there exists a causal relationship between two time series datasets.This approach relies on the concept of lagged values and employs a vector autoregression (VAR) model [33].In the context of Granger causality testing, consider two time series: X and Y.The VAR model takes the following general form for each time series:

Mesolevel: Precipitation and Extraction Observation
Water 2023, 15, 4129 6 of 20 where X t and Y t represent the observations of time series X and Y at time t; p is the chosen number of lags; α and β are the coefficients in the model; and ε represents the white noise error terms.Granger causality testing involves formulating null and alternative hypotheses.The null hypothesis (H0) assumes that time series X does not Granger cause time series Y, while the alternative hypothesis (H1) posits that time series X Granger causes time series Y, indicating at least one nonzero β coefficient.The statistical test uses the F-statistic to examine the null hypothesis, with a resulting p-value indicating the probability of observing the test statistic under the null hypothesis.If the p-value is less than a predetermined significance level (typically 0.05), we reject the null hypothesis, suggesting that time series X does indeed Granger cause time series Y.
Past observations confirm that increased precipitation leads to higher groundwater levels, as rainwater replenishes the underground aquifer.Notably, groundwater levels also respond to shorter-term fluctuations related to precipitation, even during an overall declining trend.
To explore the causal link between precipitation and groundwater levels, we conducted a Granger causality test.The results show a strong causal effect, with precipitation at a 2-day lag significantly impacting current groundwater levels.
However, it is important to consider that while statistically significant, the magnitude of this effect may be relatively small, and other factors like aquifer characteristics and human activities could also influence groundwater levels.
In Figure 4, we observe a general decline in groundwater extraction, attributed to efforts to combat excessive extraction in Hebei Province.Notably, groundwater extraction and groundwater levels show an inverse relationship.When extraction decreased, levels began to recover, indicating a correlation.
where  and  represent the observations of time series X and Y at time t; p is the chosen number of lags; α and β are the coefficients in the model; and ε represents the white noise error terms.Granger causality testing involves formulating null and alternative hypotheses.The null hypothesis (H0) assumes that time series X does not Granger cause time series Y, while the alternative hypothesis (H1) posits that time series X Granger causes time series Y, indicating at least one nonzero β coefficient.The statistical test uses the F-statistic to examine the null hypothesis, with a resulting p-value indicating the probability of observing the test statistic under the null hypothesis.If the p-value is less than a predetermined significance level (typically 0.05), we reject the null hypothesis, suggesting that time series X does indeed Granger cause time series Y.
Past observations confirm that increased precipitation leads to higher groundwater levels, as rainwater replenishes the underground aquifer.Notably, groundwater levels also respond to shorter-term fluctuations related to precipitation, even during an overall declining trend.
To explore the causal link between precipitation and groundwater levels, we conducted a Granger causality test.The results show a strong causal effect, with precipitation at a 2-day lag significantly impacting current groundwater levels.
However, it is important to consider that while statistically significant, the magnitude of this effect may be relatively small, and other factors like aquifer characteristics and human activities could also influence groundwater levels.
In  The correlation analysis reveals a moderate negative correlation of −0.272 between the extraction and groundwater levels, suggesting that as the extraction increases, the levels tend to decrease.The Granger causality tests further confirmed a significant causal effect, with extraction impacting levels at a 1-day lag.
It is important to note that while precipitation also impacts groundwater levels with a 2-day lag, these two factors alone cannot fully explain the fluctuations.For example, The correlation analysis reveals a moderate negative correlation of −0.272 between the extraction and groundwater levels, suggesting that as the extraction increases, the levels tend to decrease.The Granger causality tests further confirmed a significant causal effect, with extraction impacting levels at a 1-day lag.
It is important to note that while precipitation also impacts groundwater levels with a 2-day lag, these two factors alone cannot fully explain the fluctuations.For example, during the low precipitation and high extraction from August 2018 to February 2019, the groundwater levels continued to rise, challenging simple predictions based on precipitation and extraction alone.

3.
Macrolevel: Policy Observation Figure 5 clearly demonstrates that the implementation of water management policies has had a tangible impact on groundwater levels, with a noticeable increasing trend and consistently high positions postimplementation.This observation highlights the criticality

Macrolevel: Policy Observation
Figure 5 clearly demonstrates that the implementation of water management policies has had a tangible impact on groundwater levels, with a noticeable increasing trend and consistently high positions postimplementation.This observation highlights the criticality of incorporating policy effects into the analysis and prediction of groundwater levels, alongside other key factors like precipitation and groundwater extraction.The grey background in the figure denotes the range of policy influence, marking the periods corresponding with the rising groundwater levels and reduced extraction rates.Notably, significant policy interventions, such as the groundwater replenishment pilot project, initiated by the Provincial Water Resources Department, in Xingtai city in September 2018, and their subsequent conference in May 2021, emphasizing the reduction of the groundwater extraction, have been instrumental in shaping these trends.These policy measures, marked by key dates and actions in the figure, correspond with the periods of rising groundwater levels and reduced extraction rates, underscoring the profound influence that policy decisions exert on groundwater dynamics.This correlation between policy initiatives and groundwater levels, particularly the sustained rise following measures to curb extraction, affirms the necessity of integrating policy considerations into groundwater management strategies.On the basis of the observations mentioned above, we can draw the conclusion that a comprehensive analysis based on historical groundwater levels, precipitation, groundwater extraction, and policy implications yield valuable insight into the variations of groundwater levels.This integrated approach provides a strong impetus and valuable guidance for predicting groundwater levels, taking into account the interrelationships among these four distinct factors.In summary, the combined analysis of historical groundwater levels, precipitation, groundwater extraction, and policy measures offer meaningful and comprehensive information on groundwater level variations.It also serves as a significant inspiration, urging us to consider the interconnections among these three different levels of factors when forecasting groundwater levels.

Model
In this segment, we endeavor to synthesize the rich insight obtained from empirical data, an exercise extensively elaborated upon in Section 2.1.2.This synthesis forms the crux of a novel and comprehensive framework dubbed "hierarchical groundwater level On the basis of the observations mentioned above, we can draw the conclusion that a comprehensive analysis based on historical groundwater levels, precipitation, groundwater extraction, and policy implications yield valuable insight into the variations of groundwater levels.This integrated approach provides a strong impetus and valuable guidance for predicting groundwater levels, taking into account the interrelationships among these four distinct factors.In summary, the combined analysis of historical groundwater levels, precipitation, groundwater extraction, and policy measures offer meaningful and comprehensive information on groundwater level variations.It also serves as a significant inspiration, urging us to consider the interconnections among these three different levels of factors when forecasting groundwater levels.

Model
In this segment, we endeavor to synthesize the rich insight obtained from empirical data, an exercise extensively elaborated upon in Section 2.1.2.This synthesis forms the crux of a novel and comprehensive framework dubbed "hierarchical groundwater level prediction" (HGP).This framework is designed to intricately decipher the multifaceted patterns of variations in groundwater levels, drawing heavily from the in-depth analysis of human behavior and data tiers presented in Section 2.1.2.
At the microlevel, our focus narrows to historical groundwater level (GWL) data, denoted as "X g ".This layer of the model delves into the temporal intricacies of GWL fluctuations, capturing the nuanced ebb and flow patterns inherent in the historical data.The microlevel represents the bedrock of our model, providing a granular view of groundwater dynamics over time.
Ascending to the mesolevel, we integrate two pivotal datasets: precipitation "X m " and groundwater extraction "X e ".These elements serve as critical indicators, elucidating the Water 2023, 15, 4129 8 of 20 interplay between meteorological conditions and anthropogenic influences on GWL.At this juncture, "X m " and "X e " collectively inform the model about the external factors that directly or indirectly sway the groundwater levels, thereby acknowledging the significant role of environmental and human activities in shaping GWL trends.
The macrolevel of our model, represented by "X p ", encapsulates the overarching impact of government policies.This dimension extends beyond the immediate physical influences on GWL, offering insight into how policy decisions and regulatory frameworks contribute to the broader groundwater environment.Here, "X p " stands as a testament to the far-reaching implications of policy interventions on groundwater dynamics, underpinning the necessity to incorporate these broader, often indirect, factors into our predictive model.
Our research demonstrates that data from the macro-and mesolevels significantly impact microlevel behavior.To achieve a holistic understanding of groundwater level (GWL) patterns and the interplay of diverse influences, we developed a hierarchical framework.This structured framework extracts nuanced features from different data sources and integrates them progressively.
Traditionally, concatenating data at each time point and subjecting them to a single latent representation is a basic approach in modeling multisource sequences, like multivariate recurrent neural networks (MRNNs).However, this method inflates feature dimensions and may overlook critical interrelationships among data from various hierarchical levels.
Our approach differs from this by processing well data separately at each hierarchical level.Individual factors are processed in dedicated recursive layers, where their latent representations interact and fuse in a harmonious manner, culminating in the multivariate fusion of data.This approach avoids the limitations of simple concatenation, ensuring the extraction of unique features from each source and resulting in the harmonious fusion of information for predicting future GWL changes.
In Figure 6, we present the structured architecture of the hierarchical groundwater level (GWL) prediction (HGP) framework, a design that is both intricate and insightful.The HGP framework is characterized by its multilevel approach to feature extraction and hierarchical fusion, enabling the comprehensive analysis of data from diverse sources across three distinct levels.Level 3 represents the culmination of the framework, in which the overarching te poral pa erns that span across multiple data sources are fully integrated.This final lev designated as ℎ , embodies the complete hierarchical fusion process.It synthesiz the insight gathered from the previous levels, offering a detailed and holistic understan ing of GWL behavior.This comprehensive fusion of data allows for an accurate predicti of future GWL trends.
Each level of the HGP framework is designed to progressively build upon the pre ous one, ensuring a thorough and detailed analysis of GWL.The framework utilizes quential input,  , and another source,  , from the k-th level for feature extraction a fusion, following a systematic and rigorous methodology.This structured approach At Level 1, the primary focus is on extracting temporal patterns from various observational sequences independently.These sequences include historical GWL data (h g ), precipitation (h m ), extraction (h e ), and policy impact factors (h p ), each with its unique temporal dynamics.This level is dedicated to isolating and then integrating these temporal patterns, creating pairwise fusions.The objective here is to accurately model the impacts of macro-and mesolevel factors on historical GWL data, ensuring each factor is appropriately represented in the overall analysis.
Water 2023, 15, 4129 9 of 20 Moving to Level 2, the approach shifts from independent temporal pattern analysis to exploring the interplay among these factors.The temporal patterns, initially combined at Level 1, are now integrated to form more cohesive units.This level focuses on the interactions between GWL and other factors: GWL-precipitation (h gm ), GWL-extraction (h ge ), and GWL-policy impact factors (h gp ).The aim is to capture the combined effects of mesolevel factors on GWL, ensuring a comprehensive representation of these influences in relation to the historical data.
Level 3 represents the culmination of the framework, in which the overarching temporal patterns that span across multiple data sources are fully integrated.This final level, designated as h gmep , embodies the complete hierarchical fusion process.It synthesizes the insight gathered from the previous levels, offering a detailed and holistic understanding of GWL behavior.This comprehensive fusion of data allows for an accurate prediction of future GWL trends.
Each level of the HGP framework is designed to progressively build upon the previous one, ensuring a thorough and detailed analysis of GWL.The framework utilizes sequential input, I k t , and another source, ∼ I k t , from the k-th level for feature extraction and fusion, following a systematic and rigorous methodology.This structured approach allows the HGP to provide a nuanced and comprehensive understanding of GWL dynamics.
In this context, h k t represents the latent representation of I k t in the k-th layer at time point t.It is updated by the function F recurrent based on its previous memory h k t−1 and the current input I k t , where the prime symbol ( ) carries the same meaning as for the sequences from another source.W * is a trainable weighted matrix, and the purpose of the function F f use is to merge the information from ∼ I k t into the current sequence I k t .This is followed by an activation function, −F act , to produce an intermediate representation.Next, we provide detailed insight into the model inference and learning for GWL prediction.
Here, we implement HGP using a neural network and learn its parameters by minimizing specific losses.To capture temporal patterns, we can employ LSTM for the recurrent layer, F recurrent .In previous research [34], LSTM has shown better performance compared to GRU, linear RNN, and average pooling for capturing time patterns.Regarding the fusion function, F f use , we introduced a novel hierarchical fusion mechanism to effectively integrate information from different sources.More details are discussed in the next paragraph.
In our hierarchical fusion mechanism, we developed a multitiered approach to analyze the interplay of data across different hierarchical levels.This framework is based on the premise that interactions among data sequences from various levels reveal complex temporal relationships.Our analysis in Section 2.1.2has shown that factors such as extraction and precipitation have a time-lagged impact on groundwater levels (GWLs).This finding suggests the need for a fusion approach that not only combines current data but also considers historical data to capture these evolving dynamics.
To model these temporal interactions, we employed a specific formula, illustrated in Figure 7.In this schematic, red arrows represent the computational steps that update the state of our model, incorporating new information at each timestep.The blue arrows trace the propagation of the hidden state h t , maintaining the temporal continuity essential for our analysis.In this approach, the current latent representation, denoted as h t , is integrated with another latent representation, ∼ h t , derived from a different data source.This Water 2023, 15, 4129 10 of 20 method allows us to effectively combine information from various points in time, providing a more comprehensive understanding of the factors influencing GWL.The orange and blue nodes, marked as h t , illustrate the iterative nature of the state across time, crucial for capturing dynamic changes in GWL.By integrating these diverse data sequences, the model aims to offer a more accurate and dynamic representation of groundwater level behavior, considering both present conditions and historical influences.state of our model, incorporating new information at each timestep.The b the propagation of the hidden state ℎ , maintaining the temporal continu our analysis.In this approach, the current latent representation, denote grated with another latent representation, ℎ , derived from a different d method allows us to effectively combine information from various points ing a more comprehensive understanding of the factors influencing GWL.blue nodes, marked as ℎ , illustrate the iterative nature of the state acro for capturing dynamic changes in GWL.By integrating these diverse dat model aims to offer a more accurate and dynamic representation of gro behavior, considering both present conditions and historical influences.For predicting future groundwater level (GWLs), our model outpu value.This is achieved by applying a linear transformation, depending o the data, to the combined features  ,  ,  , and  .The model is opti Adam optimizer, a popular choice for its efficiency in handling large datas parameters.The objective function, in this case, is the mean square error more suitable for regression tasks, as it directly corresponds to the model curacy for continuous variables like the GWL.
For achieving optimal model performance, we conducted a rigorous h fine-tuning procedure.The learning rate was set at 0.001, which is a comm for stable convergence in many models.The batch size was determined to ing the computational load and the model's ability to generalize from th Additionally, the model was trained for 1000 epochs to ensure thorough le data without overfi ing.This combination of learning rate, batch size, an cial for the model's ability to accurately predict future GWLs, striking a b complexity, computational efficiency, and prediction accuracy [35,36].
In this context, ⊕ represents a concatenation operator, and is a set operator.For the t-th time step, h t W hh h t−1 and h t W hh h t capture the influence of h t−1 and h t on h t , respectively.
For predicting future groundwater level (GWLs), our model outputs an expected value.This is achieved by applying a linear transformation, depending on the nature of the data, to the combined features X g , X m , X e , and X p .The model is optimized using the Adam optimizer, a popular choice for its efficiency in handling large datasets and variable parameters.The objective function, in this case, is the mean square error (MSE), which is more suitable for regression tasks, as it directly corresponds to the model's prediction accuracy for continuous variables like the GWL.
For achieving optimal model performance, we conducted a rigorous hyperparameter fine-tuning procedure.The learning rate was set at 0.001, which is a commonly used value for stable convergence in many models.The batch size was determined to be 128, balancing the computational load and the model's ability to generalize from the training data.Additionally, the model was trained for 1000 epochs to ensure thorough learning from the data without overfitting.This combination of learning rate, batch size, and epochs is crucial for the model's ability to accurately predict future GWLs, striking a balance between complexity, computational efficiency, and prediction accuracy [35,36].

Model Evaluation
In this study, the performance of the HGP models was evaluated using the Nash efficiency factor (NSE), root mean square error (NRMSE), mean absolute error (MAE), and coefficient of determination (R 2 ) [37], which were calculated as follows.
Water 2023, 15, 4129 where n is the total number of data points, R t and P t are the measured and predicted values of GWL for the GRU and LSTM models, -R t and -P t are the average GWL measurements and prediction values, and P t.max and P t.min are the maximum and minimum values of the GWL predictions, respectively.
In the model performance evaluation, key metrics such as the coefficient of determination (R 2 ), root mean square error (RMSE), mean absolute error (MAE), and Nash-Sutcliffe efficiency (NSE) are pivotal.R 2 , with values nearing 1, indicates a model's high predictive accuracy and alignment with the observed data.The RMSE, ideally close to 0, measures the standard deviation of prediction errors, effectively quantifying the average magnitude of errors in the model's predictions.The MAE, also best when near 0, provides a direct average of error magnitudes, useful for uniformly distributed errors [38].Finally, the NSE, ranging from −1 to 1, assesses the model's predictive skill relative to the mean observed data, whereby values closer to 1 signify better performance, and values below 0 indicate poorer performance than a simple mean prediction.These metrics collectively offer a comprehensive evaluation of a model's accuracy and reliability.

Results
In predicting groundwater levels (GWLs), conventional machine learning methods have been rigorously evaluated against a fundamental baseline comprising three renowned techniques: random forest, XGBoost, and support vector machines (SVMs).Random forest excels in ensemble learning, integrating multiple decision trees to form robust predictive models, particularly effective in time series forecasting due to its ability to discern complex patterns while avoiding overfitting [39].XGBoost, or extreme gradient boosting, stands out in temporal data analysis, employing an iterative boosting approach that enhances precision in predictions [40].SVM, a staple in supervised learning, adeptly handles time series forecasting through optimized hyperplane classification, demonstrating a keen ability to uncover patterns in sequential data [41].These established machine learning methods are crucial in the realm of predictive analytics, offering versatility and robustness essential for complex forecasting tasks like GWL prediction, thus forming a solid foundation for data-informed decision making.
To evaluate the capabilities of the hierarchical groundwater level prediction (HGP) framework, we conducted a rigorous comparison with a suite of robust deep learning benchmarks.This comparative analysis included a range of established models such as artificial neural networks (ANNs), feed-forward neural networks (FFNNs), long short-term memory (LSTM) networks, gated recurrent unit (GRU) networks, adaptive neuro-fuzzy inference system (ANFIS), and nonlinear autoregressive networks with exogenous inputs (NARX).This approach allowed us to assess the performance of HGP in the context of these well-established deep learning methodologies, thereby validating its efficacy in groundwater level prediction.
For the evaluation of this framework, we adopted a comprehensive suite of metrics, focusing on the assessment of its predictive performance.Our evaluation criteria included the Nash-Sutcliffe efficiency (NSE), root mean square error (RMSE), mean absolute error (MAE), and the coefficient of determination (R 2 ).These metrics are fundamental in hydrological modeling, providing a robust quantitative assessment of a model's accuracy and predictive capabilities.The NSE offers insight into the predictive skill of a model relative to the mean observed data, while the RMSE and MAE provide measures of the average model prediction error.The R 2 , on the other hand, quantifies the proportion of the variance in the dependent variable that is predictable from the independent variables, offering a gauge of the model's explanatory power.Together, these metrics form a comprehensive framework for evaluating the HGP model's effectiveness in groundwater level prediction.
We compared the experimental results of the HGP model with the traditional machine learning baselines, as shown in Table 1.From the result presented in Table 1, it is evident that in the domain of groundwater level (GWL) prediction, different models exhibit varied levels of effectiveness.Random forest showed a robust training performance (Train NSE: 0.9891, R 2 : 0.9890), but its effectiveness was reduced in the testing phase (Test NSE: 0.9160, R 2 : 0.9137).XGBoost, on the other hand, demonstrated improved accuracy and generalization capabilities, with a Train NSE of 0.9977 and a Test NSE of 0.9573.Similarly, SVM displayed high training accuracy (Train NSE: 0.9987, R 2 : 0.9987) but a slight decline in testing performance (Test NSE: 0.9560, R 2 : 0.9557).
The hierarchical groundwater level prediction (HGP) model, however, outshone these traditional methods, particularly in the testing phase, which is critical for assessing a model's generalization ability.HGP achieved the highest NSE (0.9933) and R 2 (0.9933) in the test set, coupled with the lowest RMSE (0.0259) and MAE (0.3758), signaling its superior predictive accuracy and reliability.This performance underscores the HGP framework's adeptness at comprehensively capturing the complexities inherent in GWL data, leading to more precise predictions.
We compared the experimental results of the HGP model with the deep learning baselines, as shown in Table 2.
The results from Table 2 provide a comprehensive comparison of the various machine learning models for groundwater level (GWL) prediction, highlighting the hierarchical groundwater level prediction (HGP) model's superiority.The ANN, FFNN, LSTM, GRU, ANFIS, and NARX models showed commendable performances, particularly in the training phase, with high NSE and R 2 values close to 1, indicating strong predictive accuracy.However, in the testing phase, these models exhibited varying degrees of decreased effectiveness, as evidenced by the lower NSE and R 2 values compared to their training performance.This decrease is particularly noticeable in the RMSE and MAE values, which increased in the testing phase, indicating reduced accuracy in real-world scenarios.In contrast, the HGP model demonstrated exceptional performance, not only maintaining high NSE (0.9994) and R 2 (0.9933) values in the testing phase but also achieving the lowest RMSE (0.0259) and MAE (0.3758).This indicates that the HGP model not only fits the training data well but also generalizes more effectively to new data.The HGP model's superior performance is attributed to its advanced multitiered approach that captures the complexities inherent in GWL data more effectively than traditional models.
In summary, while traditional models, like random forest, XGBoost, and SVM, and deep learning models, like ANN, FFNN, LSTM, GRU, ANFIS, and NARX, have shown proficiency in GWL prediction, the HGP model outperforms these methods, particularly in terms of generalization capabilities, as evidenced by its testing phase metrics, which is shown in Figure 8.Its ability to accurately predict GWL under varying conditions demonstrates its potential as a robust tool for groundwater level forecasting and highlights its advanced capacity to handle complex hydrological data.The results from Table 2 provide a comprehensive comparison of the various machine learning models for groundwater level (GWL) prediction, highlighting the hierarchical groundwater level prediction (HGP) model's superiority.The ANN, FFNN, LSTM, GRU, ANFIS, and NARX models showed commendable performances, particularly in the training phase, with high NSE and R 2 values close to 1, indicating strong predictive accuracy.However, in the testing phase, these models exhibited varying degrees of decreased effectiveness, as evidenced by the lower NSE and R 2 values compared to their training performance.This decrease is particularly noticeable in the RMSE and MAE values, which increased in the testing phase, indicating reduced accuracy in real-world scenarios.
In contrast, the HGP model demonstrated exceptional performance, not only maintaining high NSE (0.9994) and R 2 (0.9933) values in the testing phase but also achieving the lowest RMSE (0.0259) and MAE (0.3758).This indicates that the HGP model not only fits the training data well but also generalizes more effectively to new data.The HGP model's superior performance is a ributed to its advanced multitiered approach that captures the complexities inherent in GWL data more effectively than traditional models.
In summary, while traditional models, like random forest, XGBoost, and SVM, and deep learning models, like ANN, FFNN, LSTM, GRU, ANFIS, and NARX, have shown proficiency in GWL prediction, the HGP model outperforms these methods, particularly in terms of generalization capabilities, as evidenced by its testing phase metrics, which is shown in Figure 8.Its ability to accurately predict GWL under varying conditions demonstrates its potential as a robust tool for groundwater level forecasting and highlights its advanced capacity to handle complex hydrological data.In the Taylor diagram (Figure 9), each model's performance is denoted by a point, where the angular position represents the correlation coefficient between the model and observed data, the radial distance from the origin indicates the standard deviation, and the distance from the observed data point reflects the RMSE.A model that perfectly predicts the observed data would lie on the point marked "Observation".In the Taylor diagram (Figure 9), each model's performance is denoted by a point, where the angular position represents the correlation coefficient between the model and observed data, the radial distance from the origin indicates the standard deviation, and the distance from the observed data point reflects the RMSE.A model that perfectly predicts the observed data would lie on the point marked "Observation".The Taylor diagram analysis reveals that the hierarchical groundwater level prediction (HGP) model exhibits an exceptional performance in predicting groundwater levels (GWLs).Positioned in close proximity to the "Observation" point, which indicates a perfect prediction with no error, the HGP model demonstrates both a high correlation with the observed data and a notably low root mean square error (RMSE).This proximity suggests that the HGP model not only accurately captures the observed data's pa ern but also its variance, outperforming other models such as GRU, NARX, and ANFIS, which are positioned further from the "observation" point and, hence, indicate less accuracy.The ANFIS model also shows a close correlation with the observed data, yet it is the HGP model that stands out for its superior predictive capabilities.These findings underscore the HGP model's robustness and its significant potential in hydrological modeling applications.
The results summarized in Table 3 provide a detailed evaluation of the hierarchical groundwater level prediction (HGP) model's performance when individual data components are removed.The comparison illustrates the impact of specific inputs on the model's predictive accuracy for groundwater levels (GWLs).The Taylor diagram analysis reveals that the hierarchical groundwater level prediction (HGP) model exhibits an exceptional performance in predicting groundwater levels (GWLs).Positioned in close proximity to the "Observation" point, which indicates a perfect prediction with no error, the HGP model demonstrates both a high correlation with the observed data and a notably low root mean square error (RMSE).This proximity suggests that the HGP model not only accurately captures the observed data's pattern but also its variance, outperforming other models such as GRU, NARX, and ANFIS, which are positioned further from the "observation" point and, hence, indicate less accuracy.The ANFIS model also shows a close correlation with the observed data, yet it is the HGP model that stands out for its superior predictive capabilities.These findings underscore the HGP model's robustness and its significant potential in hydrological modeling applications.
The results summarized in Table 3 provide a detailed evaluation of the hierarchical groundwater level prediction (HGP) model's performance when individual data components are removed.The comparison illustrates the impact of specific inputs on the model's predictive accuracy for groundwater levels (GWLs).model's complexity also poses challenges for computational efficiency and interpretability, highlighting areas for future refinement.In summary, this paper presents significant contributions through a meticulous exploration of GWL variations using a multisource dataset and the innovative HGP model.This research underscores the importance of integrating diverse environmental, hydrological, and policy-related factors, marking a significant advancement in groundwater management.We emphasize the innovative use of policy as a predictive variable within the multisource dataset, enhancing the model's accuracy and providing a comprehensive understanding of GWL dynamics, aligned with the rigorous standards of "nature".Future efforts will aim to further refine the model, exploring real-time data incorporation and advanced machine learning techniques to improve responsiveness and address current limitations, ultimately optimizing the model's utility in groundwater management.

Conclusions
In conclusion, this study has meticulously developed the hierarchical groundwater level prediction (HGP) model, which stands as a testament to the power of integrating multisource data for the accurate prediction of groundwater levels (GWLs).Our comprehensive analysis has detailed the impact of precipitation, extraction volumes, and policy changes on groundwater level fluctuations within the region.The HGP model, through rigorous evaluations, has outperformed established machine learning benchmarks such as GRU, NARX, and ANFIS, as evidenced by its superior performance metrics-high Nash-Sutcliffe efficiency (NSE = 0.9933) and coefficient of determination (R 2 = 0.9933), and notably lower root mean square error (RMSE = 0.0259) and mean absolute error (MAE = 0.3758) in the testing phase.
The innovative inclusion of policy variables, alongside traditional hydrological data, has allowed the HGP model to capture the subtleties of GWL dynamics more accurately.This approach has provided a nuanced understanding of how governance, alongside environmental factors, contributes to the fluctuating nature of groundwater levels.The HGP model's ability to process and analyze these diverse data streams sets a new benchmark for hydrological modeling.
The insight derived from this research not only enhance our scientific understanding of hydrological systems but also equip decision makers with a robust predictive tool for effective groundwater management.By demonstrating a methodological sophistication, this study contributes to the advancement of environmental modeling and underscores the critical role of comprehensive data analysis in the realm of water resource management.

Figure 1 .Figure 1 .
Figure 1.Study area.The study area, primarily situated in the eastern foothills of the southern Taihang Mountains in Hebei Province, China, features a topography of low mountains and hillsFigure 1. Study area.

Figure 3
Figure 3 highlights seasonal variations in precipitation, with a rainy season from July to November and a dry season from December to June.Peak rainfall, increasing yearly, was particularly pronounced in July 2021 due to unprecedented heavy rainfall.This July 2021 rainfall significantly contributed to a rapid rise in groundwater levels.

Figure 2 (
Figure 2 (right) shows the strong autocorrelation in the daily observation data from 2018 to 2022, with the autocorrelation coefficients exceeding 0.77 within a 30-day lag period.This suggests significant short-term memory effects on groundwater levels, influenced by factors like groundwater flow rates, aquifer properties, and external drivers.While short-term autocorrelation is evident, relying solely on historical groundwater data may not provide precise predictions due to complex interactions involving meteorological pa erns, hydrogeological properties, and human activities.To improve predictive models, integrating diverse data sources, including precipitation and groundwater extraction rates, is essential.Precipitation affects aquifer recharge, while groundwater extraction adds an anthropogenic dimension.This comprehensive approach enhances our understanding of groundwater dynamics and supports informed water resource management

Figure 3
Figure 3 highlights seasonal variations in precipitation, with a rainy season from July to November and a dry season from December to June.Peak rainfall, increasing yearly, was particularly pronounced in July 2021 due to unprecedented heavy rainfall.This July 2021 rainfall significantly contributed to a rapid rise in groundwater levels.

Figure 4 ,
we observe a general decline in groundwater extraction, a ributed to efforts to combat excessive extraction in Hebei Province.Notably, groundwater extraction and groundwater levels show an inverse relationship.When extraction decreased, levels began to recover, indicating a correlation.
effects into the analysis and prediction of groundwater levels, alongside other key factors like precipitation and groundwater extraction.The grey background in the figure denotes the range of policy influence, marking the periods corresponding with the rising groundwater levels and reduced extraction rates.Notably, significant policy interventions, such as the groundwater replenishment pilot project, initiated by the Provincial Water Resources Department, in Xingtai city in September 2018, and their subsequent conference in May 2021, emphasizing the reduction of the groundwater extraction, have been instrumental in shaping these trends.These policy measures, marked by key dates and actions in the figure, correspond with the periods of rising groundwater levels and reduced extraction rates, underscoring the profound influence that policy decisions exert on groundwater dynamics.This correlation between policy initiatives and groundwater levels, particularly the sustained rise following measures to curb extraction, affirms the necessity of integrating policy considerations into groundwater management strategies.

Water 2023 ,
15, x FOR PEER REVIEW 9 of mesolevel factors on GWL, ensuring a comprehensive representation of these influenc in relation to the historical data.

7 .
The architecture of recurrent and fusion layers on multisource sequence ℎ , ℎ = ℎ ⊙  ℎ ⊕ ℎ ⨀ ℎIn this context, ⊕ represents a concatenation operator, and ⊙ is a s the t-th time step, ℎ ⊙  ℎ and ℎ ⨀ ℎ capture the influence on ℎ , respectively.

Figure 7 .
Figure 7.The architecture of recurrent and fusion layers on multisource sequences.

Figure 8 .
Figure 8. Comparative performance analysis of the different models

Figure 9 .
Figure 9. Taylor's diagram for the different models.

Figure 9 .
Figure 9. Taylor's diagram for the different models.

Table 1 .
Statistical results of the different ML models during the training and testing period.

Table 2 .
Statistical results of the different DL models during the training and testing periods.

Table 3 .
Effect of multisource information.

Table 3 .
Effect of multisource information.