Application of the Data-Driven Method and Hydrochemistry Analysis to Predict Groundwater Level Change Induced by the Mining Activities: A Case Study of Yili Coalfield in Xinjiang, Norwest China

: As the medium of geological information, groundwater provides an indirect method to solve the secondary disasters of mining activities. Identifying the groundwater regime of overburden aquifers induced by the mining disturbance is significant in mining safety and geological environment protection. This study proposes the novel data-driven algorithm based on the combination of machine learning methods and hydrochemical analyses to predict anomalous changes in groundwater levels within the mine and its neighboring areas induced after mining activities accurately. The hydrochemistry analysis reveals that the dissolution of carbonate and evaporite and the cation exchange function are the main hydrochemical process for controlling the groundwater environment. The anomalous change in the hydrochemistry characteristic in different aquifers reveals that the hydraulic connection between different aquifers is enhanced by mining activities. The continuous wavelet coherence is used to reveal the nonlinear relationship between the groundwater level change and external influencing factors. Based on the above analysis, the groundwater level, precipitation, mine water inflow, and unit goal area could be considered as the input variables of the hydrological model. Two different data-driven algorithms, the Decision Tree and the Long Short-Term Memory (LSTM) neural network, are introduced to construct the hydrological prediction model. Four error metrics (MAPE, RMSE, NSE and R 2 ) are applied for evaluating the performance of hydrological model. For the NSE value, the predictive accuracy of the hydrological model constructed using LSTM is 8% higher than that of Decision Tree algorithm. Accurately predicting the anomalous change in groundwater level caused by the mining activities could ensure the safety of coal mining and prevent the secondary disaster of mining activities.


Introduction
During the past decades, mining activities have caused several geo-environmental problems [1,2], including water loss, the decline of groundwater level and deterioration of groundwater quality, etc.The mining activities-induced secondary disaster [3], geological disaster [4][5][6] and land subsidence [7,8], threaten the safety of mining activities.Groundwater plays an important role during the mining activities, which is capable of amplifying the microdynamics of the geological setting caused by mining activities.In addition, mine water inrush induced by the mining activities threatens the mine safety [9,10].Thus, monitoring and predicting the dynamic change in groundwater level is the most effective way to prevent the secondary disaster of mining activities.
Naturally, groundwater level is influenced by several external factors, including geological factors [11][12][13], meteorological factors, and anthropogenic factors, etc.These factors cause the highly nonlinear dynamics of groundwater level in the time and frequency domains, making it difficult to accurately predict groundwater levels.Traditionally, the physical-based model provides the feasible way to predict the variation in groundwater level, which need stratigraphic structures, aquifer parameters, and boundary conditions.However, due to the heterogeneity and anisotropy of aquifer properties, it is difficult to obtain these hydrogeological parameters accurately [14,15].Nowadays, the data-driven method provides an efficient way to build hydrological models and improve the accuracy of model prediction [16,17].The significant advantage of data-driven algorithms is that there is no need to explicitly define physical relationships between input and output variables.In the previous studies, several machine learning methods have been used in hydrological research, including identifying anomalies and prediction and cluster analysis, which are mainly divided into cluster analysis and data prediction [18,19].A self-organizing neural network is one of the typical algorithms of cluster analysis, which can be used to identify the source of groundwater pollution [20][21][22].The Artificial Neural Network (ANN), Recurrent Neural Network (RNN), Decision Tree and Long-Short Term Memory (LSTM) are used to analyze and predict the variation in groundwater level [23][24][25][26].Generally, the machine learning algorithms have a three-layer structure, including input layer, hidden layer, and output layer.However, for different algorithms, there are great differences in the composition of three-layer structures.In ANN, both the hidden and output layer have the non-linear activation function.This is helpful in improving the prediction accuracy.The disadvantage of this algorithm is that it is not possible to take into account the relationship between different input variables.In RNN, the adjacent neurons in the hidden layer are connected together, which is the biggest difference from the ANN structure.It is helpful to prevent gradient disappearance and gradient explosion.Unlike the previous two methods, the Decision Tree and LSTM algorithm take into account the relationships between different input variables during model training, while extracting valuable information from previous neurons and passing it to the current neuron for predicting data.The Decision Tree is applied for the anomalous identification of radon before seismic activities [27].The LSTM is used to predict the variation in groundwater level under the influence of meteorological factors.However, few studies focus on applying machine learning methods to predict anomalous changes in groundwater levels caused by the mining activities.
In mine area hydrology, most of the previous focus has been on the anomalous change in the hydrochemical components and aquifer parameters caused by the mining activities, while few studies have concentrated on the application of different data-driven methods to predict the dynamic change in groundwater level.The long-term mining activities in the Yili Coalfield has had a significant impact on the regional groundwater regime.Both the deterioration of the groundwater quality and the decline of the groundwater level threatened the local ecological environment and production safety.In this study, a hydrochemistry analysis and two different data-driven methods are used to predict the variation in the groundwater level induced by mining activities.Based on the monitoring dataset in the Yili Coalfield, the aims are to (1) reveal the genetic mechanism of the hydrochemistry characteristics.The comparison of hydrochemical data before and after mining activities is used to identify the effect of mining activities on the hydraulic connection between different aquifers.A further aim is to (2) identify the potential influencing factors.Continuous wavelet coherence is used to identify the external factors on the anomalous change in groundwater level induced by the mining activities, and then select the appropriate input variables to construct the data-driven predicting model; (3) to construct the data-driven models.Using the Decision Trees and Long Short-Term Memory neural network algorithm to construct the predicting Water 2024, 16, 1611 3 of 17 model, this paper aims to (4) evaluate the model performance.Using four different error metrics to evaluate the predictive performance of data-driven model, including R 2 , root mean square error (RMSE), mean absolute percentage error (MAPE), and Nash-Sutcliffe efficiency (NSE).The result of this study could provide the new data-driven method to predict groundwater level change caused by the mining activities, which could also provide the premise for coal mining safety and water resource management.

Regional Hydrogeological Setting
The Yili 1# mining area, covering about 208 km 2 , is located in the northwestern part of the Xinjiang Uygur Autonomous Region, northwestern China, which is the first modernized coalfield in Xinjiang with a production capacity of 10 million t/a.The elevation of the study area ranges from 1000 m to 1300 m [28].The main working areas of the mining activities are No. 3 # and No. 5 # .Longwall mining extracts large rectangular panels of coal, and the roof is temporarily supported using moveable hydraulic supports (Figure 1).The climate of the study area is characterized by a typical arid climate with a low relative humidity and high evapotranspiration.The annual evapotranspiration ranges from 1259 mm to 2381 mm, and the average annual temperature is 8 • C. The annual average precipitation is 90 mm [29,30].No. 3# and No. 5# are the main extracted coal seam.Tectonically, the study area is characterized by a monoclinic structure, with an east-west orientation.According to the mineral XRD analysis results, the mineral species include dolomite, calcite, halite, gypsum, albite, quartz and anorthite [28].The aquifers are classified into Quaternary pore aquifer (phreatic aquifer) and Jurassic fissure aquifer (confined aquifer).The aquitards are composed of mudstone and siltstone, which are interbedded with the aquifer in different thicknesses.The isotopic results indicate that both the Quaternary pore aquifer and Jurassic fissure aquifer are recharged by the atmospheric precipitation and meltwater from the Tianshan Mountain.The Quaternary aquifer is characterized by the good water yield with the Q value of 0.041 L/s/m~0.128L/s/m, while the Jurassic fissure aquifer is characterized by poor water yield with the specific capacity of 0.008 L/s/m~0.109L/s/m [28].

Data Collection 2.2.1. Time Series Data
To ensure mining safety and reveal the impact of mining activities on the groundwater regime, the phreatic groundwater level (PGL), confined groundwater level (CGL), mine water inflow (MWI), and unit goal area (UGA) were monitored in the Yili 1# mining area during 2018~2023.The meteorological dataset was collected from the China Meteorological Administration (http://data.cma.cn/,accessed on 15 May 2024).Analyzing the dynamic change in monitored value could provide an accessible way to identify the impact of mining activities on the aquifer and its potential influencing factors.
In order to eliminate the effect of different monitored intervals, the monitored value of the groundwater level, precipitation, mine water inflow, and unit goal area are transferred to the monthly average value for analysis.As is shown in Figure 2, the precipitation shows a significant seasonal variation with a higher level during the summer wet season and lower levels during the drier winter season.The maximum value of precipitation is 45 mm.The mine water inflow (WI) increases with the increase in unit goal area (GA), indicating that the increase in the extent of mining promoted mine water inrush.The maximum value of the mine water inflow and the unit goal area are 311.5 m 3 /h and 54,300 m 2 , respectively.For the groundwater level change, there is no obvious change in the phreatic groundwater level associated with mining activities.It only shows the step-like change in 2021.In contrast, the confined groundwater level shows a significant decline from 2018 to 2020, associated with increases in WI and GA which indicates that it has been affected by the mining activities.In order to eliminate the effect of different monitored intervals, the monitored value of the groundwater level, precipitation, mine water inflow, and unit goal area are transferred to the monthly average value for analysis.As is shown in Figure 2, the precipitation shows a significant seasonal variation with a higher level during the summer wet season and lower levels during the drier winter season.The maximum value of precipitation is 45 mm.The mine water inflow (WI) increases with the increase in unit goal area (GA), indicating that the increase in the extent of mining promoted mine water inrush.The maximum value of the mine water inflow and the unit goal area are 311.5 m 3 /h and 54,300 m 2 , respectively.For the groundwater level change, there is no obvious change in the phreatic groundwater level associated with mining activities.It only shows the step-like change in 2021.In contrast, the confined groundwater level shows a significant decline from 2018 to 2020, associated with increases in WI and GA which indicates that it has been affected by the mining activities.

Hydrochemistry Data
A total of 59 groundwater samples were collected from the study area.A total of 26 groundwater samples were collected during the period of 2004-2006 (before mining activities), including 6 samples from the phreatic aquifer, and 20 samples from the confined aquifer.For identifying the effect of mining activities on the dynamic variation in groundwater, 33 groundwater samples were collected from 2021 to 2023 (after mining activities),

Hydrochemistry Data
A total of 59 groundwater samples were collected from the study area.A total of 26 groundwater samples were collected during the period of 2004-2006 (before mining activities), including 6 samples from the phreatic aquifer, and 20 samples from the confined aquifer.For identifying the effect of mining activities on the dynamic variation in groundwater, 33 groundwater samples were collected from 2021 to 2023 (after mining activities), including 5 phreatic groundwater samples and 28 confined water samples.All groundwater samples were analyzed and measured in the Key Laboratory of Prevention and Control Technology for Coal Mine Water Hazard, Shaanxi Province.The cations (Na + , K + , Ca 2+ , Mg 2+ ) were measured using ion chromatography.The SO 4 2− and Cl − were also measured using ion chromatography, while the HCO 3 − was measured using the traditional titration with HCl.To ensure the accuracy and reliability of the hydrochemical analysis results, the charge balance error (CBE = ∑ cations−∑ anions ∑ cations+∑ anions × 100) was calculated [8].The results show that the CBE is <±10%, indicating all measurement results are acceptable.

Hydrochemistry Analysis
During the process of the mining activities, several groundwater samples collected from the phreatic and confined aquifers underwent a hydrochemistry analysis.The results of the hydrochemistry analysis are summarized in Table 1.Before the mining activities, 6 groundwater samples were collected from the phreatic aquifer.Their pH values range from 7.70 to 7.90 with the average value of 7.83.The TDS values are 206-648 mg/L with the average value of 349 mg/L; this is considered as the fresh water (TDS < 1000 mg/L).The order of cation abundance is Ca 2+ > Na + + K + > Mg 2+ and those of the anions is HCO 3 . The hydrochemistry type is characterized by HCO 3 -Ca and HCO 3 -Na.Then, the 20 groundwater samples were collected from the confined aquifer.Their range of pH and TDS are 7.50-8.00and 400-3132 mg/L, respectively.The 80% and 20% samples are categorized as fresh water (TDS < 1000 mg/L) and brackish water (1000 mg/L < TDS < 3000 mg/L), respectively.The hydrochemistry type is characterized by HCO 3 -Ca/Na and SO 4 -Ca/Na.
After the mining activities, 5 samples and 28 samples were collected from the phreatic and confined aquifer, respectively.The order of cation and anion abundance in the groundwater sample of the phreatic and confined aquifers are different to those before the mining activities.For the phreatic aquifer, the pH value is 7.30-8.10,with the average value of 7.82.The TDS Water 2024, 16, 1611 6 of 17 value ranges from 614 mg/L to 1719 mg/L.The average value of the TDS is 1264 mg/L, which is about three times than that of before the mining activities.The order of cation and anion abundance are different from those before the mining activities.The order of cation abundance is Na + + K + > Ca 2+ > Mg 2+ and those of the anions is SO 4 2-> HCO 3 − + CO 3 2− > Cl − .The hydrochemistry type is SO 4 -Na and SO 4 -Ca.For the confined aquifer, the pH value ranges from 6.70 to 8.20 with the average value of 7.45.The TDS ranges from 309 mg/L to 1019 mg/L.The order of cation abundance is Ca 2+ > Na + + K + > Mg 2+ and those of the anions is SO

Time Series Analysis (Continuous Wavelet Coherence)
A continuous wavelet transform is an efficiency mathematical tool to identify the correlation between different hydrological time series through the time and frequency domain [31,32].In this study, it is applied to analyze the correlation between external influencing factors and groundwater level change.The Morlet function is selected to conduct the continuous wavelet coherence [33,34].The mathematical equation is defined as follows: where W and S represent the continuous wavelet transform and smoothing operator, respectively.s is wavelet scale.R represents the correlation coefficient.The closer the R value is to 1, the higher the correlation is.

Machine Learning 3.3.1. Long Short-Term Memory Neural Network
The Long Short Term Memory algorithm (LSTM) is composed of three layers: input layer, hidden layer and output layer (Figure 3).The hidden layer is composed of three gates: input gates, output gates, and forget gates (Figure 3), which protects the valuable information of the dataset when it is passed down in the process of information transfer and solves the problem of gradient explosion and gradient disappearance.In this study, the LSTM model is programmed using MATLAB.influencing factors and groundwater level change.The Morlet function is selected to conduct the continuous wavelet coherence [33,34].The mathematical equation is defined as follows: where W and S represent the continuous wavelet transform and smoothing operator, respectively.s is wavelet scale.R represents the correlation coefficient.The closer the R value is to 1, the higher the correlation is.

Long Short-Term Memory Neural Network
The Long Short Term Memory algorithm (LSTM) is composed of three layers: input layer, hidden layer and output layer (Figure 3).The hidden layer is composed of three gates: input gates, output gates, and forget gates (Figure 3), which protects the valuable information of the dataset when it is passed down in the process of information transfer and solves the problem of gradient explosion and gradient disappearance.In this study, the LSTM model is programmed using MATLAB.

Decision Trees
The classical regression method applies fitting curves to build the linear relationship between the independent variables and the dependent variable, although it cannot accurately construct the relationship between the highly nonlinear variables.The Decision Trees could solve this scientific problem, which applies the independent axis-parallel rectangles to partition the space between the independent variables and the dependent variable.It provides an efficient way to build the nonlinear relationship.In this study, the WEKA (3.8.6)

Decision Trees
The classical regression method applies fitting curves to build the linear relationship between the independent variables and the dependent variable, although it cannot accurately construct the relationship between the highly nonlinear variables.The Decision Trees could solve this scientific problem, which applies the independent axis-parallel rectangles to partition the space between the independent variables and the dependent variable.It provides an efficient way to build the nonlinear relationship.In this study, the WEKA (3.8.6) software is applied to build the Decision Tree model using an M5 algorithm.If there is a missing value in the dataset, the linear interpolation is introduced to complement the gap.Before training the hydrological model, the M5 algorithm conducts the pre-pruning of the dataset.The construction of the Decision Tree stops only when the class variance of all examples in the node is sufficiently small.The model in the selected leaf can then predict the value of the class.

Model Development 4.1. Splitting the Dataset into Different Subsets
The splitting of the dataset is the basic step for constructing the data-driven model.If the training dataset takes up a small proportion, the data-driven model cannot identify the mathematic characteristic of the time series that leads to poor prediction accuracy.In contrast, if the training dataset takes up a large proportion, it may cause the overfitting of the data-driven model.Actually, since there is no fixed ratio in the training dataset and prediction dataset, how to divide data sets accurately has always been a controversial problem.According to the results of previous studies, the proportion of the training dataset must be more than 50% of whole dataset.In this study, the dataset is split into three subsets, including a training subset and prediction subset.The proportion of the two subsets is 7:3.The training subset and validation subset are introduced to optimize the data-driven model parameters.

Data Normalization and Error Metric
The min-max normalization method is applied to eliminate the data dimensional influence, which normalizes the input variables into [0, 1].It is defined as follows: where x and x norm represent the monitored value and the normalized value, respectively.
x max and x min are the maximum monitored value and the minimum monitored value, respectively.After the model is trained, the predictive results are retransformed through the inverse transformation of Equation (2).Four different error metrics are introduced to evaluate the predictive efficiency of data-driven model, as follows: The coefficient of determination [2,34,35]: The root mean square error (RMSE) [2,34,35]: The mean absolute percentage error (MAPE) [2,34,35]: Water 2024, 16, 1611 8 of 17 The Nash-Sutcliffe efficiency (NSE) [2,34,35]: where y i is the observed value, y * i and _ y i represent the simulated value and the mean of observed values, respectively.N is the number of observations.R 2 calculates the error metrics between monitored values and simulated values [35].RMSE evaluates the deviation between monitored values and simulated values.MAPE is applied to calculate the predictability accuracy of data-driven models as a percentage.NSE is applied to evaluate the hydrological model accuracy.

Input Variable Selection
The prerequisite for improving the predicted accuracy of the hydrological model is the selection of proper input variables, as it could provide the basic hydrological information.However, no guideline exists on how to select the proper input variables for constructing the hydrologic model using a machine learning algorithm.In this study, the hydrochemistry analysis and continuous wavelet coherence method are combined to analyze the external influencing factors on the anomalous change in the groundwater level caused by mining activities, which provides an effective method for determining the input variables of the data-driven model.

Hydrochemistry Change Induced by the Mining Activities
The ionic ratio and saturation indices are introduced to identify the genetic mechanism of the hydrochemical characteristic.As is shown in Figure 4, the groundwater samples are distributed in the middle part of a Gibbs diagram.It is indicated that the water-rock interaction is the main hydrochemical process for controlling ionic concentration. of observed values, respectively.N is the number of observations.R 2 calculates the error metrics between monitored values and simulated values [35].RMSE evaluates the deviation between monitored values and simulated values.MAPE is applied to calculate the predictability accuracy of data-driven models as a percentage.NSE is applied to evaluate the hydrological model accuracy.

Input Variable Selection
The prerequisite for improving the predicted accuracy of the hydrological model is the selection of proper input variables, as it could provide the basic hydrological information.However, no guideline exists on how to select the proper input variables for constructing the hydrologic model using a machine learning algorithm.In this study, the hydrochemistry analysis and continuous wavelet coherence method are combined to analyze the external influencing factors on the anomalous change in the groundwater level caused by mining activities, which provides an effective method for determining the input variables of the data-driven model.

Hydrochemistry Change Induced by the Mining Activities
The ionic ratio and saturation indices are introduced to identify the genetic mechanism of the hydrochemical characteristic.As is shown in Figure 4, the groundwater samples are distributed in the middle part of a Gibbs diagram.It is indicated that the waterrock interaction is the main hydrochemical process for controlling ionic concentration.All groundwater samples are characterized by high Na + , Ca 2+ , HCO3 − and SO4 2− , which indicates that they are primarily derived from the dissolution of evaporite (e.g., halite and gypsum) and carbonate (e.g., calcite and dolomite).The saturation indices of gypsum, halite, All groundwater samples are characterized by high Na + , Ca 2+ , HCO 3 − and SO 4 2− , which indicates that they are primarily derived from the dissolution of evaporite (e.g., halite and gypsum) and carbonate (e.g., calcite and dolomite).The saturation indices of gypsum, halite, calcite and dolomite are calculated.SI gypsum and SI halite are <0 (Figure 5a), indicating that gypsum and halite tend to dissolve, while SI calcite and SI dolomite in most of the groundwater samples are >0 (Figure 5b), revealing that calcite and dolomite tend to saturate.If halite is the sole source of Na + , the Na + /Cl − value should be equal to 1 [8].As shown in Figure 5c, the Na + /Cl − value in all the groundwater samples is >1, and the value of Ca 2+ /SO 4 2− in some groundwater samples is <1 (Figure 5d), which indicates that there is another hydrochemical process leading to the increase in Na + concentration and the decrease in Ca 2+ concentration (e.g., cation exchange function).The chloro-alkaline indices of CAI 1 and CAI 2, and (Na + + K + − Cl − )/((Ca 2+ + Mg 2+ ) − (SO 4 2− + HCO 3 − )) are introduced to evaluate the interaction of cation exchange.The value of CAI 1 and CAI 2 in most groundwater samples is <0 (Figure 5e), and the value of (Na + + K + − Cl − )/((Ca 2+ + Mg 2+ ) − (SO 4 2− + HCO 3 − )) in most groundwater samples is distributed along the 1:1 line (Figure 5f).Both of the above calculated results indicate that the cation exchange function regulates the ionic concentration of Na + and Ca 2+ .The values of (Ca 2+ + Mg 2+ )/(SO 4 2− + HCO 3 − ), Ca 2+ /HCO 3 − and Ca 2+ /SO 4 2− in most groundwater samples are located near the 1:1 line (Figure 5d,g,h), which indicates that the dissolution of carbonate and gypsum are the main sources of Ca 2+ and Mg 2+ [5].Some groundwater samples are located above the 1:1 line in Figure 5g, which are attributed to the decrease in Ca 2+ and Mg 2+ and the increase in Na + and K + under the cation exchange function.
As is shown in Figure 6, the mining activities cause the anomalous change in the hydrochemistry characteristic in the phreatic and confined aquifers.Before the mining activities, the groundwater flow speed in the phreatic aquifer is fast, resulting in the weakness of the water-rock interaction and low mineralization.In contrast, the groundwater environment of the confined aquifer is relatively stable and the groundwater circulation is slow, which is characterized by the strong water-rock interaction and high mineralization.Previous studies have revealed that the mining activities cause the aquifer's parameters to change and enhance the hydraulic connection between the different aquifers caused by the mining activities.In this study, the hydrochemical characteristics of the phreatic and confined aquifers have changed due to the increased hydraulic connection between them.After the mining activities, the hydrochemistry type of the phreatic aquifer changes from HCO 3 -Ca and HCO 3 -Na to SO 4 -Ca and SO 4 -Na with increasing TDS.It is attributed to the inflow of high mineralization groundwater from the confined aquifer.In addition, the mining activities promote the occurrence of low-mineralization and high-mineralization groundwater.The inflow of groundwater with low TDS in the phreatic aquifer results in the mineralization decrease in the confined aquifer with high TDS.After mining activities, a large number of water-conducting fissures are distributed in the overburden aquifer.In this case, the renewal time of the groundwater system is shortened, which promotes the active flushing of groundwater (i.e., groundwater cycling).As a result, the presence of water-conducting fissures brings more CO 2 into the groundwater, which promotes the dissolution of calcite and dolomite (Equations ( 7) and ( 8)) [8]; this causes the mineralization increase.
In summary, the hydrochemistry analysis reveals that the enhancement of the hydraulic connection between the phreatic aquifer and confined aquifer induced by mining activities causes the anomalous change in the hydrochemistry characteristic.Thus, the phreatic groundwater level can be considered as the input variable data-driven model for predicting the dynamic change in the groundwater level in the confined aquifer.
the value of (Na + K − Cl )/((Ca + Mg ) − (SO4 + HCO3 )) in most groundwater samples is distributed along the 1:1 line (Figure 5f).Both of the above calculated results indicate that the cation exchange function regulates the ionic concentration of Na + and Ca 2+ .The values of (Ca 2+ + Mg 2+ )/(SO4 2− + HCO3 − ), Ca 2+ /HCO3 − and Ca 2+ /SO4 2− in most groundwater samples are located near the 1:1 line (Figure 5d,g,h), which indicates that the dissolution of carbonate and gypsum are the main sources of Ca 2+ and Mg 2+ [5].Some groundwater samples are located above the 1:1 line in Figure 5g, which are attributed to the decrease in Ca 2+ and Mg 2+ and the increase in Na + and K + under the cation exchange function.

The External Influencing Factors
The wavelet coherence method is applied to analyze the relationship between the groundwater level and precipitation (P), mine water inflow (MWI), and the unit goal area (UGA).The analysis results of PGL and CGL are shown in Figure 7 and Figure 8, respectively.

The External Influencing Factors
The wavelet coherence method is applied to analyze the relationship between the groundwater level and precipitation (P), mine water inflow (MWI), and the unit goal area (UGA).The analysis results of PGL and CGL are shown in Figures 7 and 8, respectively.

The External Influencing Factors
The wavelet coherence method is applied to analyze the relationship between the groundwater level and precipitation (P), mine water inflow (MWI), and the unit goal area (UGA).The analysis results of PGL and CGL are shown in Figure 7 and Figure 8, respectively.

The External Influencing Factors
The wavelet coherence method is applied to analyze the relationship between the groundwater level and precipitation (P), mine water inflow (MWI), and the unit goal area (UGA).The analysis results of PGL and CGL are shown in Figure 7 and Figure 8, respectively.As is shown in Figure 7, there is the high coherence between the PGL and precipitation within the band of 128 days in 2021~2022 and the band of 256 days in 2019~2020.The coherence between the PGL and mine water inflow strengthens for 64 days during the interval of 2019~2020 and 128 days in 2022.The PGL is highly coherent with the unit goal area within the band of 256 days in 2020-2022.As is shown in Figure 8, the CGL and precipitation shows coherence within the band of 64~128 days from 2019 to 2021.The coherence between the CGL and mine water inflow is shown to be highly coherent within the band of 128 days during the period of 2019~2022.The coherence between the CGL and unit goal area strengthens for 128 days and 256 days from 2020 to 2021.
To sum up, the dynamic change in the groundwater level is highly coherent with several external influencing factors in different frequency domains over past time periods.Thus, P, MWI and UGA can be considered as the input variables of the data-driven model for predicting the anomalous change in the groundwater level induced by mining activities.

Comparisons of Prediction Performance
Based on the different input variables, the PGL model and the CGL model are constructed for predicting the groundwater level change caused by mining activities.A 70% dataset is considered as the training subset, and the remain 30% dataset is the prediction subset.The results predicted using the Decision Tree and LSTM are shown in Figures 9 and 10.The errors of the model predictions are summarized in Table 2.Although the anomalous change in the groundwater level induced by mining activities can be predicted using the Decision Tree and LSTM, different error indicators indicate that the prediction performance of the two algorithms is different.For the PGL model, the MAPE and RMSE of the LSTM algorithm are smaller than those of the Decision Tree algorithm.The NSE value of LSTM is larger than that of the Decision Tree, which is closer to 1. Generally, the MAPE and RMSE values are close to zero, and the R 2 and NSE values are close to one, indicating that the predicted performance of the data-driven model is perfect.The error metrics indicate that the predicted performance of the PGL model using the LSTM algorithm is greater than that of the For the PGL model, the MAPE and RMSE of the LSTM algorithm are smaller than those of the Decision Tree algorithm.The NSE value of LSTM is larger than that of the Decision Tree, which is closer to 1. Generally, the MAPE and RMSE values are close to zero, and the R 2 and NSE values are close to one, indicating that the predicted performance of the data-driven model is perfect.The error metrics indicate that the predicted performance of the PGL model using the LSTM algorithm is greater than that of the Decision Trees algorithm.Similar conclusions can be found in the error results of the CGL models.In order to further analyze the predictive result of the data-driven model, the scatter plot of the stimulated value and monitored value in the training and prediction stage are shown in Figures 11 and 12, respectively.If the data-driven model could predict the anomalous change in the groundwater level caused by the mining activities accurately, the predictive results should be distributed over X = Y.In the PGL model, the R 2 value of the Decision Tree is 0.93 for the training stage and 0.88 for the predicting stage, respectively, and these values regarding the LSTM algorithm are 0.95 in the training stage and 0.91 in the predicting stage, respectively.In the CGL model, the R 2 value of the training stage is 0.91 in the Decision Tree algorithm and 0.85 in the LSTM algorithm, and these values in the predicting stage are 0.96 in the Decision Tree algorithm and 0.90 in the LSTM algorithm, respectively.The results reveal that the predictive performance of the LSTM algorithm is better than the Decision Tree algorithm.Compared with the prediction results calculated using the Decision Tree algorithm, the prediction accuracy of the hydrological model constructed using the LSTM algorithm is improved by 6%.Thus, the hydrological model based on the LSTM algorithm can predict the anomalous change in the groundwater level caused by the mining activities accurately.
Unlike the structure of the Decision Tree algorithm, the LSTM structure consists of three different layers, including the input layer, hidden layer and output layer, where the hidden layer is composed of three gates: input gates, output gates, and forget gates.The composite structure of the hidden layer could effectively prevent the phenomenon of the gradient disappearing and the gradient explosion in the process of data training.In addition, the LSTM algorithm has two significant advantages.One is that it extracts the valuable information from the previous neurons and transfers them into the current neurons, which can significantly improve prediction accuracy.Another one is that it improves the speed of computation through multithreading parallel computing.Based on the above characteristics, the LSTM algorithm is more accurate than the Decision Tree algorithm.
Yan et al. [36] used the LSTM algorithm to build the hydrological model for predicting the anomalous changes in groundwater levels and hydrochemical components caused by seismic activities.The hydrologic model was trained based on the dynamic changes in the groundwater level and the hydrochemical component without earthquakes, which was used to predict the abnormal changes in groundwater level before earthquakes and then to identify the pre-seismic anomalies.The MAPE, RMSE, NSE and R 2 are also used to evaluate the prediction performance.The results show that the hydrological model based on the LSTM algorithm has the best performance, which can effectively identify the pre-seismic anomalies.It is consistent with the conclusions we obtained.The application of the LSTM algorithm to construct hydrological models can improve the forecast accuracy.0.91 in the predicting stage, respectively.In the CGL model, the R 2 value of the training stage is 0.91 in the Decision Tree algorithm and 0.85 in the LSTM algorithm, and these values in the predicting stage are 0.96 in the Decision Tree algorithm and 0.90 in the LSTM algorithm, respectively.The results reveal that the predictive performance of the LSTM algorithm is better than the Decision Tree algorithm.Compared with the prediction results calculated using the Decision Tree algorithm, the prediction accuracy of the hydrological model constructed using the LSTM algorithm is improved by 6%.Thus, the hydrological model based on the LSTM algorithm can predict the anomalous change in the groundwater level caused by the mining activities accurately.Unlike the structure of the Decision Tree algorithm, the LSTM structure consists three different layers, including the input layer, hidden layer and output layer, where th hidden layer is composed of three gates: input gates, output gates, and forget gates.Th composite structure of the hidden layer could effectively prevent the phenomenon of th gradient disappearing and the gradient explosion in the process of data training.In add tion, the LSTM algorithm has two significant advantages.One is that it extracts the val able information from the previous neurons and transfers them into the current neuron which can significantly improve prediction accuracy.Another one is that it improves th speed of computation through multithreading parallel computing.Based on the abov

Conclusions
In this study, the hydrochemistry analysis and time series analysis are used to identify the effect of mining activities on the groundwater regime.The novel data-driven algorithm

Figure 1 .
Figure 1.The geographic location and hydrogeological profile of the study area.

2. 2 .Figure 1 .
Figure 1.The geographic location and hydrogeological profile of the study area.
Cl − .The hydrochemistry type is HCO 3 -Ca/Na and SO 4 -Ca/Na.The genetic mechanisms of the hydrochemistry change caused by mining activities are discussed in Section 5.1.

Figure 4 .
Figure 4.The Gibbs diagram of groundwater samples.

Figure 4 .
Figure 4.The Gibbs diagram of groundwater samples.

Figure 6 .
Figure 6.The Piper diagram of groundwater samples collected from phreatic and confined aquifer before and after mining activities.

Figure 6 .
Figure 6.The Piper diagram of groundwater samples collected from phreatic and confined aquifer before and after mining activities.

Figure 7 .
Figure 7. Wavelet coherence during the period of 2018~2023 between groundwater level in phreatic aquifer and (a) precipitation, (b) mine water inflow, and (c) unit goal area.

Figure 8 .
Figure 8. Wavelet coherence during the period of 2018~2023 between groundwater level in confined and.(a) precipitation, (b) mine water inflow, and (c) unit goal area.

Figure 7 .
Figure 7. Wavelet coherence during the period of 2018~2023 between groundwater level in phreatic aquifer and (a) precipitation, (b) mine water inflow, and (c) unit goal area.

Figure 6 .
Figure 6.The Piper diagram of groundwater samples collected from phreatic and confined aquifer before and after mining activities.

Figure 7 .
Figure 7. Wavelet coherence during the period of 2018~2023 between groundwater level in phreatic aquifer and (a) precipitation, (b) mine water inflow, and (c) unit goal area.

Figure 8 .
Figure 8. Wavelet coherence during the period of 2018~2023 between groundwater level in confined and.(a) precipitation, (b) mine water inflow, and (c) unit goal area.Figure 8. Wavelet coherence during the period of 2018~2023 between groundwater level in confined and.(a) precipitation, (b) mine water inflow, and (c) unit goal area.

Figure 8 .
Figure 8. Wavelet coherence during the period of 2018~2023 between groundwater level in confined and.(a) precipitation, (b) mine water inflow, and (c) unit goal area.Figure 8. Wavelet coherence during the period of 2018~2023 between groundwater level in confined and.(a) precipitation, (b) mine water inflow, and (c) unit goal area.

Figure 9 .
Figure 9.The training and prediction results of PGL model using.(a) Decision Tree algorithm, (b) LSTM algorithm.

Figure 10 .
Figure 10.The training and prediction results of CGL model using.(a) Decision Tree algorithm, (b) LSTM algorithm.For the PGL model, the MAPE and RMSE of the LSTM algorithm are smaller than those of the Decision Tree algorithm.The NSE value of LSTM is larger than that of the Decision Tree, which is closer to 1. Generally, the MAPE and RMSE values are close to zero, and the R 2 and NSE values are close to one, indicating that the predicted performance of the data-driven model is perfect.The error metrics indicate that the predicted performance of the PGL model using the LSTM algorithm is greater than that of the

Figure 9 . 19 Figure 9 .
Figure 9.The training and prediction results of PGL model using.(a) Decision Tree algorithm, (b) LSTM algorithm.

Figure 10 .
Figure 10.The training and prediction results of CGL model using.(a) Decision Tree algorithm, (b) LSTM algorithm.

Figure 10 .
Figure 10.The training and prediction results of CGL model using.(a) Decision Tree algorithm, (b) LSTM algorithm.

Figure 11 .
Figure 11.Scatter plot of the monitored value vs. the simulated value in the training and prediction stage of PGL model.(A) and (a) represent the training and prediction stage using Decision Tree, respectively.(B) and (b) represent the training and prediction stage using LSTM, respectively.

Figure 11 .Figure 12 .
Figure 11.Scatter plot of the monitored value vs. the simulated value in the training and prediction stage of PGL model.(A) and (a) represent the training and prediction stage using Decision Tree, respectively.(B) and (b) represent the training and prediction stage using LSTM, respectively.Water 2024, 16, x FOR PEER REVIEW 16 of

Figure 12 .
Figure 12.Scatter plot of the monitored value vs. the simulated value in the training and prediction stage of CGL model.(A) and (a) represent the training and prediction stage using Decision Tree, respectively.(B) and (b) represent the training and prediction stage using LSTM, respectively.

Table 1 .
The hydrochemistry parameters of phreatic and confined aquifer before and after mining activities.

Table 2 .
The errors of PCL model and CGL model in training and testing stage.